CN105513610A - Voice analysis method and device - Google Patents
Voice analysis method and device Download PDFInfo
- Publication number
- CN105513610A CN105513610A CN201510819750.6A CN201510819750A CN105513610A CN 105513610 A CN105513610 A CN 105513610A CN 201510819750 A CN201510819750 A CN 201510819750A CN 105513610 A CN105513610 A CN 105513610A
- Authority
- CN
- China
- Prior art keywords
- neural network
- different compression
- module
- compression algorithms
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
本发明实施例公开了一种声音分析方法及装置,涉及声音识别技术领域,能够以较低的成本提高音频文件的来源设备识别的准确率。本发明的方法包括:将采集的声音信号,通过不同的压缩算法以相同采样率和比特率根据所采集的声音信号得到分别对应不同的压缩算法的音频文件;从对应不同的压缩算法的音频文件中提取无声段,并根据所提取的无声段得到语音特征信号;利用所述语音特征信号作为训练数据训练BP神经网络,并通过完成训练的BP神经网络分析测试信号,识别生成所述测试信号的录音设备本发明适用于识别音频文件的来源设备。
The embodiment of the present invention discloses a sound analysis method and device, relates to the technical field of sound recognition, and can improve the accuracy of source device recognition of audio files at a relatively low cost. The method of the present invention comprises: with the sound signal of collecting, obtain the audio file corresponding to different compression algorithm respectively with the same sampling rate and bit rate according to the sound signal collected by different compression algorithms; From the audio file corresponding to different compression algorithm extract the silent segment, and obtain the speech feature signal according to the extracted silent segment; use the speech feature signal as training data to train the BP neural network, and analyze the test signal by completing the training of the BP neural network to identify the signal that generates the test signal Recording Devices The present invention is applicable to identifying the source device of an audio file.
Description
技术领域 technical field
本发明涉及声音识别技术领域,尤其涉及一种声音分析方法及装置。 The invention relates to the technical field of voice recognition, in particular to a voice analysis method and device.
背景技术 Background technique
随着各类电子设备的普及,录音设备已广泛应用至领域。尤其是在司法、执法的实践中,音频文件的采集成为调查取证的一种重要手段。但是,又由于音频文件的易伪造,案件场景还原能力低等问题,使得音频文件在很多时候只能作为参考。 With the popularization of various electronic devices, recording equipment has been widely used in the field. Especially in the practice of judiciary and law enforcement, the collection of audio files has become an important means of investigation and evidence collection. However, due to the easy forgery of audio files and the low ability to restore case scenes, audio files can only be used as a reference in many cases.
音频文件由何种设备录制在一定程度上反映了录音场合和情景,对于判断音频文件是否可以作为有效证据十分重要。但是,目前针对音频文件进行录音设备的有效判别,主要还是通过办案人员的经验进行判定,准确率难以保证,而专业的声纹分析设备的成本又很高昂,进行声音鉴定分析的费用居高不下。由此可见,目前对于音频文件的来源设备的识别,难度高且准确率较低,并且专业的声纹分析鉴定的成本很高,难以在基层执法、司法方面大量普及。 The equipment used to record the audio file reflects the recording occasion and situation to a certain extent, which is very important for judging whether the audio file can be used as valid evidence. However, at present, the effective identification of recording equipment for audio files is mainly based on the experience of the case investigators, and the accuracy is difficult to guarantee. The cost of professional voiceprint analysis equipment is very high, and the cost of voice identification analysis remains high. . It can be seen that at present, the identification of the source equipment of audio files is difficult and the accuracy rate is low, and the cost of professional voiceprint analysis and identification is very high, and it is difficult to popularize it in grassroots law enforcement and judiciary.
发明内容 Contents of the invention
本发明的实施例提供一种声音分析方法及装置,能够以较低的成本提高音频文件的来源设备识别的准确率。 Embodiments of the present invention provide a sound analysis method and device, which can improve the accuracy of identifying the source device of an audio file at a relatively low cost.
为达到上述目的,本发明的实施例采用如下技术方案: In order to achieve the above object, embodiments of the present invention adopt the following technical solutions:
第一方面,本发明的实施例提供一种声音分析方法,包括: In a first aspect, embodiments of the present invention provide a sound analysis method, including:
将采集的声音信号,通过不同的压缩算法以相同采样率和比特率根据所采集的声音信号得到分别对应不同的压缩算法的音频文件; Using different compression algorithms to obtain audio files corresponding to different compression algorithms according to the collected sound signals at the same sampling rate and bit rate;
从对应不同的压缩算法的音频文件中提取无声段,并根据所提取的无声段得到语音特征信号; Extracting silent segments from audio files corresponding to different compression algorithms, and obtaining speech feature signals according to the extracted silent segments;
利用所述语音特征信号作为训练数据训练BP(BackPropagation,多层前馈)神经网络,并通过完成训练的BP神经网络分析测试信号,识别生成所述测试信号的录音设备。 Using the speech feature signal as training data to train a BP (BackPropagation, multi-layer feed-forward) neural network, and analyzing the test signal through the trained BP neural network to identify the recording device that generated the test signal.
第二方面,本发明的实施例提供一种声音分析装置,包括:相互之间通过总线连接的系统主控模块、语音录放模块、TFT触摸屏模块、压缩算法实现模块、存储模块和上位机模块; In a second aspect, an embodiment of the present invention provides a sound analysis device, including: a system main control module, a voice recording and playback module, a TFT touch screen module, a compression algorithm realization module, a storage module and a host computer module connected to each other through a bus;
所述语音录放模块,用于播放声音信号; The voice recording and playback module is used to play sound signals;
所述压缩算法实现模块,用于通过不同的压缩算法以相同采样率和比特率根据所采集的声音信号得到分别对应不同的压缩算法的音频文件; The compression algorithm implementation module is used to obtain audio files respectively corresponding to different compression algorithms according to the collected sound signals with the same sampling rate and bit rate through different compression algorithms;
所述存储模块,用于存储所述对应不同的压缩算法的音频文件; The storage module is used to store the audio files corresponding to different compression algorithms;
所述上位机模块,用于从对应不同的压缩算法的音频文件中提取无声段,并根据所提取的无声段得到语音特征信号;并利用所述语音特征信号作为训练数据训练BP神经网络,并通过完成训练的BP神经网络分析测试信号,识别生成所述测试信号的录音设备。 The upper computer module is used to extract silent segments from audio files corresponding to different compression algorithms, and obtain speech feature signals according to the extracted silent segments; and use the speech feature signals as training data to train BP neural network, and The test signal is analyzed through the trained BP neural network, and the recording device generating the test signal is identified.
本发明实施例提供的声音分析方法及装置,针对采用不同的压缩算法以相同采样率和比特率根据所采集的声音信号,提取录音无声段并分别对其求改进的MFCC参数,将不同波特率的音频文件输入Matlab中得到对应的MFCC特征参数,再利用MFCC特征参数对BP神经网络进行训练,用训练好的BP神经网络分类语音特征信号,根据分类结果识别录音设备,由于STM32以及Matlab等本发明所用的设备成本低廉,因此实现了以较低的成本提高音频文件的来源设备识别的准确率。 The sound analysis method and device provided by the embodiments of the present invention aim at adopting different compression algorithms with the same sampling rate and bit rate according to the collected sound signal, extracting the MFCC parameters of the silent segment of the recording and improving it respectively, and using different baud Input the high-frequency audio file into Matlab to obtain the corresponding MFCC characteristic parameters, and then use the MFCC characteristic parameters to train the BP neural network, use the trained BP neural network to classify the speech characteristic signals, and identify the recording device according to the classification results. Due to STM32 and Matlab, etc. The device used in the present invention has low cost, so the accuracy of identifying the source device of the audio file can be improved at a relatively low cost.
附图说明 Description of drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。 In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without making creative efforts.
图1为本发明实施例提供的声音分析方法的流程图; Fig. 1 is the flowchart of the sound analysis method that the embodiment of the present invention provides;
图2为执行本发明实施例提供的声音分析方法的具体装置示意图; Fig. 2 is a schematic diagram of a specific device for implementing the sound analysis method provided by the embodiment of the present invention;
图3为本发明实施例提供的无声段提取方案的流程示意图; 3 is a schematic flow diagram of a silent segment extraction solution provided by an embodiment of the present invention;
图4为本发明实施例提供的改进MFCC参数提取方案的流程示意图; Fig. 4 is the schematic flow chart of the improved MFCC parameter extraction scheme that the embodiment of the present invention provides;
图5为本发明实施例提供的基于BP神经网络的语音特征信号分类算法的流程示意图; Fig. 5 is the schematic flow chart of the speech characteristic signal classification algorithm based on BP neural network that the embodiment of the present invention provides;
图6为本发明实施例提供的录音设备识别方案的流程示意图; FIG. 6 is a schematic flowchart of a recording device identification solution provided by an embodiment of the present invention;
图7为本发明实施例提供的声音分析装置的结构示意图。 Fig. 7 is a schematic structural diagram of a sound analysis device provided by an embodiment of the present invention.
具体实施方式 detailed description
为使本领域技术人员更好地理解本发明的技术方案,下面结合附图和具体实施方式对本发明作进一步详细描述。下文中将详细描述本发明的实施方式,所述实施方式的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施方式是示例性的,仅用于解释本发明,而不能解释为对本发明的限制。 In order to enable those skilled in the art to better understand the technical solutions of the present invention, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. Hereinafter, embodiments of the present invention will be described in detail, examples of which are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本发明的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的任一单元和全部组合。 Those skilled in the art will understand that unless otherwise stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the word "comprising" used in the description of the present invention refers to the presence of said features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Additionally, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
本技术领域技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语)具有与本发明所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语应该被理解为具有与现有技术的上下文中的意义一致的意义,并且除非像这里一样定义,不会用理想化或过于正式的含义来解释。 Those skilled in the art can understand that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It should also be understood that terms such as those defined in commonly used dictionaries should be understood to have a meaning consistent with the meaning in the context of the prior art, and unless defined as herein, are not to be interpreted in an idealized or overly formal sense Explanation.
本发明实施例提供一种声音分析方法,如图1所示,包括: The embodiment of the present invention provides a sound analysis method, as shown in Figure 1, including:
101,将采集的声音信号,通过不同的压缩算法以相同采样率和比特率根据所采集的声音信号得到分别对应不同的压缩算法的音频文件。 101. Use different compression algorithms to obtain audio files respectively corresponding to different compression algorithms according to the collected sound signals at the same sampling rate and bit rate.
在本实施例中,声音分析方法的具体执行流程可以基于如图2所示架构的装置,具体选取STM32增强型系列F103VET6作为系统主控解决方案;存储模块包括CH376U盘存储电路、SD卡存储模块;压缩算法实现模块包括MP3、AMR、AAC、WMA四种音频压缩算法模块。协调语音录放模块、存储模块、TFT(ThinFilmTransistor,是薄膜晶体管)触摸屏模块、压缩算法实现模块、串口等其他接口工作。语音录放模块包括ISD4004模块、LM386功放电路、滤波偏置模块。 In this embodiment, the specific execution flow of the sound analysis method can be based on the device of the architecture shown in Figure 2, and the STM32 enhanced series F103VET6 is specifically selected as the system master solution; the storage module includes a CH376 U disk storage circuit and an SD card storage module ; Compression algorithm implementation modules include MP3, AMR, AAC, WMA four audio compression algorithm modules. Coordinate voice recording and playback module, storage module, TFT (ThinFilmTransistor, thin film transistor) touch screen module, compression algorithm implementation module, serial port and other interfaces. Voice recording and playback module includes ISD4004 module, LM386 power amplifier circuit, filter bias module.
当装置上电后,可录取一段语音,按停止键结束录音,并经过不同的四种压缩算法,然后将所录的相同采样率和比特率的四段语音存到U盘或SD卡中。其中,SD卡采用的microSD卡,采用SDIO(SecureDigitalInputandOutputCard,安全数字输入输出卡)方式与STM32主控模块相连,最大支持8GSD卡;U盘存储模块是以CH376T为核心,采用USBA型接口连接U盘,最大支持8GU盘。电源具体是5V电源适配器,3.3V电压由AMS1117芯片提供。 When the device is powered on, it can record a piece of voice, press the stop button to end the recording, and go through four different compression algorithms, and then save the recorded four pieces of voice with the same sampling rate and bit rate to the U disk or SD card. Among them, the microSD card used by the SD card is connected to the STM32 main control module by means of SDIO (SecureDigitalInputandOutputCard, a secure digital input and output card), and supports a maximum of 8GSD cards; the U disk storage module uses CH376T as the core, and uses a USBA interface to connect to the U disk , supports up to 8GU disk. The power supply is specifically a 5V power adapter, and the 3.3V voltage is provided by the AMS1117 chip.
102,从对应不同的压缩算法的音频文件中提取无声段,并根据所提取的无声段得到语音特征信号。 102. Extract silent segments from audio files corresponding to different compression algorithms, and obtain speech feature signals according to the extracted silent segments.
具体可以在上位机上实现,首先提取无声段,无声段的提取流程如图3所示。 Specifically, it can be implemented on the host computer. First, the silent segment is extracted. The process of extracting the silent segment is shown in FIG. 3 .
在本实施例中,所述压缩算法包括4中不同的压缩算法,包括MP3、AMR、WMA和AAC。所述根据所提取的无声段得到语音特征信号,具体包括:通过如图4所示的求取改进MFCC(MelFrequencyCepstrumCoefficient,Mel频率倒谱系数)参数的流程,及针对每段无声段,采用倒谱系数法提取500组24维语音特征信号。 In this embodiment, the compression algorithm includes 4 different compression algorithms, including MP3, AMR, WMA and AAC. The described according to the extracted silent segment obtains the speech feature signal, specifically comprises: by seeking to improve the flow process of MFCC (MelFrequencyCepstrumCoefficient, Mel frequency cepstral coefficient) parameter as shown in Figure 4, and for each segment of silent segment, adopt cepstrum 500 groups of 24-dimensional speech feature signals were extracted by mathematical method.
103,利用所述语音特征信号作为训练数据训练BP神经网络,并通过完成训练的BP神经网络分析测试信号,识别生成所述测试信号的录音设备。 103. Use the speech feature signal as training data to train a BP neural network, and analyze a test signal through the trained BP neural network, and identify a recording device that generates the test signal.
在本实施例中,所述BP神经网络的结构包括:输入层设置24个节点,隐含层设置25个节点,输出层设置4个节点。 In this embodiment, the structure of the BP neural network includes: 24 nodes are set in the input layer, 25 nodes are set in the hidden layer, and 4 nodes are set in the output layer.
例如:如图5所示的。通过开发工具MATLAB2014a构建BP神经网络,从而通过编程提取无声段,在语音无声段中提取特征参数,避免了话音信号的干扰,最后确定了录音设备识别系统的识别模型BP神经网络。 For example: as shown in Figure 5. The BP neural network is constructed by the development tool MATLAB2014a, and the silent segment is extracted through programming, and the characteristic parameters are extracted in the speech silent segment, which avoids the interference of the voice signal. Finally, the recognition model BP neural network of the recording device recognition system is determined.
具体的,BP神经网络构建根据系统输入输出数据特点确定BP神经网络的结构,由于语音特征输入信号有24维,待分类的语音信号共有四类,所以BP神经网络的结构为24-25-4即输入层有24个节点,隐含层有25个节点,输出层有4个节点。 Specifically, the construction of the BP neural network determines the structure of the BP neural network according to the characteristics of the input and output data of the system. Since the speech feature input signal has 24 dimensions, there are four types of speech signals to be classified, so the structure of the BP neural network is 24-25-4 That is, the input layer has 24 nodes, the hidden layer has 25 nodes, and the output layer has 4 nodes.
在训练阶段,BP神经网络训练用训练数据训练BP神经网络,比如:共有2000组语音特征信号,从中随机选择1500组数据作为训练数据训练网络,500组数据作为测试数据测试网络分类能力。 In the training phase, BP neural network training uses training data to train the BP neural network. For example, there are 2000 sets of speech feature signals, 1500 sets of data are randomly selected as training data to train the network, and 500 sets of data are used as test data to test the classification ability of the network.
在训练完毕后的测试阶段,BP神经网络分类用训练好的神经网络对测试数据所属语音类别进行分类。从而实现如图6所示的总体流程,即针对采集到的声音信号,获得不同音频格式的四段语音,然后在上位机处理完毕后,输入一段语音能识别出其音频格式从而确定由哪种录音设备所录。 In the test phase after training, the BP neural network classification uses the trained neural network to classify the speech category to which the test data belongs. In this way, the overall process shown in Figure 6 is realized, that is, for the collected sound signal, four segments of speech in different audio formats are obtained, and then after the processing of the host computer is completed, the audio format can be recognized by inputting a segment of speech to determine which audio format is used. recorded by recording equipment.
本发明实施例提供的声音分析方法,针对采用不同的压缩算法以相同采样率和比特率根据所采集的声音信号,提取录音无声段并分别对其求改进的MFCC参数,将不同波特率的音频文件输入Matlab中得到对应的MFCC特征参数,再利用MFCC特征参数对BP神经网络进行训练,用训练好的BP神经网络分类语音特征信号,根据分类结果识别录音设备,由于STM32以及Matlab等本发明所用的设备成本低廉,因此实现了以较低的成本提高音频文件的来源设备识别的准确率。 The sound analysis method that the embodiment of the present invention provides, aiming at adopting different compression algorithms with the same sampling rate and bit rate according to the collected sound signal, extracting the MFCC parameters of the silent segment of the recording and improving it respectively, the MFCC parameters of different baud rates Audio file input in Matlab obtains corresponding MFCC characteristic parameter, utilizes MFCC characteristic parameter to train BP neural network again, classifies voice characteristic signal with trained BP neural network, recognizes recording equipment according to classification result, due to the present invention such as STM32 and Matlab The cost of the equipment used is low, so the accuracy of identifying the source equipment of the audio file can be improved at a relatively low cost.
进一步的,本发明实施例提供一种声音分析装置,如图7所示,包括:相互之间通过总线连接的系统主控模块、语音录放模块、TFT触摸屏模块、压缩算法实现模块、存储模块和上位机模块。 Further, an embodiment of the present invention provides a sound analysis device, as shown in FIG. 7 , including: a system main control module, a voice recording and playback module, a TFT touch screen module, a compression algorithm realization module, a storage module and PC module.
所述语音录放模块,用于播放声音信号。 The voice recording and playback module is used for playing sound signals.
所述压缩算法实现模块,用于通过不同的压缩算法以相同采样率和比特率根据所采集的声音信号得到分别对应不同的压缩算法的音频文件。 The compression algorithm implementation module is used to obtain audio files respectively corresponding to different compression algorithms according to the collected sound signals through different compression algorithms at the same sampling rate and bit rate.
所述存储模块,用于存储所述对应不同的压缩算法的音频文件。 The storage module is used for storing the audio files corresponding to different compression algorithms.
所述上位机模块,用于从对应不同的压缩算法的音频文件中提取无声段,并根据所提取的无声段得到语音特征信号。并利用所述语音特征信号作为训练数据训练BP神经网络,并通过完成训练的BP神经网络分析测试信号,识别生成所述测试信号的录音设备。 The host computer module is used to extract silent segments from audio files corresponding to different compression algorithms, and obtain speech feature signals according to the extracted silent segments. And use the speech feature signal as training data to train the BP neural network, and analyze the test signal through the trained BP neural network to identify the recording device that generates the test signal.
具体的,所述系统主控模块为STM32增强型F103VET6芯片,该芯片是一款32位增强型MCU,采用ARM公司的cortex-M3内核,拥有512KFlash、64KRAM、3个SPI口、一个SDIO口、5个USART、最高达72M的主频。所述语音录放模块为ISD4004,通过LM386集成音频功放电路执行音频放大,录音时间设定为8-16分钟,由于要对语音信号进行采集合成,多次采集量化会造成一定的量化误差,采用ISD4004进行录音,通过多电平直接模拟量存储技术,每个采样值直接存贮在片内闪烁存贮器中,因此能够非常真实、自然地再现语音。其中,音频放大选择LM386集成音频功放电路,稳压选择AMS1117-3.3。所述上位机模块具体通过MATLAB2014a从对应不同的压缩算法的音频文件中提取无声段,并根据所提取的无声段得到语音特征信号,并利用所述语音特征信号作为训练数据训练BP神经网络,并通过完成训练的BP神经网络分析测试信号,识别生成所述测试信号的录音设备。 Specifically, the main control module of the system is an STM32 enhanced F103VET6 chip, which is a 32-bit enhanced MCU, adopts the cortex-M3 core of ARM Company, has 512KFlash, 64KRAM, 3 SPI ports, an SDIO port, 5 USARTs, up to 72M main frequency. The voice recording and playback module is ISD4004, which performs audio amplification through the LM386 integrated audio power amplifier circuit, and the recording time is set to 8-16 minutes. Since the voice signal needs to be collected and synthesized, multiple acquisitions and quantization will cause certain quantization errors. ISD4004 is used For recording, through the multi-level direct analog storage technology, each sampling value is directly stored in the on-chip flash memory, so the voice can be reproduced very realistically and naturally. Among them, LM386 integrated audio power amplifier circuit is selected for audio amplification, and AMS1117-3.3 is selected for voltage stabilization. Described host computer module specifically extracts silent segment from the audio file of corresponding different compression algorithms by MATLAB2014a, and obtains speech characteristic signal according to the silent segment extracted, and utilizes described speech characteristic signal as training data training BP neural network, and The test signal is analyzed through the trained BP neural network, and the recording device generating the test signal is identified.
本发明实施例提供的声音分析装置,针对采用不同的压缩算法以相同采样率和比特率根据所采集的声音信号,提取录音无声段并分别对其求改进的MFCC参数,将不同波特率的音频文件输入Matlab中得到对应的MFCC特征参数,再利用MFCC特征参数对BP神经网络进行训练,用训练好的BP神经网络分类语音特征信号,根据分类结果识别录音设备,由于STM32以及Matlab等本发明所用的设备成本低廉,因此实现了以较低的成本提高音频文件的来源设备识别的准确率。 The sound analysis device that the embodiment of the present invention provides, aiming at adopting different compression algorithms with the same sampling rate and bit rate according to the collected sound signals, extracting the MFCC parameters of the silent segment of the recording and improving it respectively, the MFCC parameters of different baud rates Audio file input in Matlab obtains corresponding MFCC characteristic parameter, utilizes MFCC characteristic parameter to train BP neural network again, classifies voice characteristic signal with trained BP neural network, recognizes recording equipment according to classification result, due to the present invention such as STM32 and Matlab The cost of the equipment used is low, so the accuracy of identifying the source equipment of the audio file can be improved at a relatively low cost.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于设备实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。 Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。 The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. All should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510819750.6A CN105513610A (en) | 2015-11-23 | 2015-11-23 | Voice analysis method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510819750.6A CN105513610A (en) | 2015-11-23 | 2015-11-23 | Voice analysis method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105513610A true CN105513610A (en) | 2016-04-20 |
Family
ID=55721536
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510819750.6A Pending CN105513610A (en) | 2015-11-23 | 2015-11-23 | Voice analysis method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105513610A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106231357A (en) * | 2016-08-31 | 2016-12-14 | 浙江华治数聚科技股份有限公司 | A kind of Forecasting Methodology of television broadcast media audio, video data chip time |
CN106331741A (en) * | 2016-08-31 | 2017-01-11 | 浙江华治数聚科技股份有限公司 | Television and broadcast media audio and video data compression method |
CN106997767A (en) * | 2017-03-24 | 2017-08-01 | 百度在线网络技术(北京)有限公司 | Method of speech processing and device based on artificial intelligence |
CN107516527A (en) * | 2016-06-17 | 2017-12-26 | 中兴通讯股份有限公司 | A kind of encoding and decoding speech method and terminal |
CN110728991A (en) * | 2019-09-06 | 2020-01-24 | 南京工程学院 | Improved recording equipment identification algorithm |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103325382A (en) * | 2013-06-07 | 2013-09-25 | 大连民族学院 | Method for automatically identifying Chinese national minority traditional instrument audio data |
WO2013149123A1 (en) * | 2012-03-30 | 2013-10-03 | The Ohio State University | Monaural speech filter |
CN103426438A (en) * | 2012-05-25 | 2013-12-04 | 洪荣昭 | Method and system for analyzing baby crying |
US20140019390A1 (en) * | 2012-07-13 | 2014-01-16 | Umami, Co. | Apparatus and method for audio fingerprinting |
CN104732977A (en) * | 2015-03-09 | 2015-06-24 | 广东外语外贸大学 | On-line spoken language pronunciation quality evaluation method and system |
-
2015
- 2015-11-23 CN CN201510819750.6A patent/CN105513610A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013149123A1 (en) * | 2012-03-30 | 2013-10-03 | The Ohio State University | Monaural speech filter |
CN103426438A (en) * | 2012-05-25 | 2013-12-04 | 洪荣昭 | Method and system for analyzing baby crying |
US20140019390A1 (en) * | 2012-07-13 | 2014-01-16 | Umami, Co. | Apparatus and method for audio fingerprinting |
CN103325382A (en) * | 2013-06-07 | 2013-09-25 | 大连民族学院 | Method for automatically identifying Chinese national minority traditional instrument audio data |
CN104732977A (en) * | 2015-03-09 | 2015-06-24 | 广东外语外贸大学 | On-line spoken language pronunciation quality evaluation method and system |
Non-Patent Citations (1)
Title |
---|
贺前华等: "基于改进PNCC特征和两步区分性训练的录音设备识别方法", 《电子学报》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107516527A (en) * | 2016-06-17 | 2017-12-26 | 中兴通讯股份有限公司 | A kind of encoding and decoding speech method and terminal |
CN106231357A (en) * | 2016-08-31 | 2016-12-14 | 浙江华治数聚科技股份有限公司 | A kind of Forecasting Methodology of television broadcast media audio, video data chip time |
CN106331741A (en) * | 2016-08-31 | 2017-01-11 | 浙江华治数聚科技股份有限公司 | Television and broadcast media audio and video data compression method |
CN106331741B (en) * | 2016-08-31 | 2019-03-08 | 徐州视达坦诚文化发展有限公司 | A kind of compression method of television broadcast media audio, video data |
CN106997767A (en) * | 2017-03-24 | 2017-08-01 | 百度在线网络技术(北京)有限公司 | Method of speech processing and device based on artificial intelligence |
CN110728991A (en) * | 2019-09-06 | 2020-01-24 | 南京工程学院 | Improved recording equipment identification algorithm |
CN110728991B (en) * | 2019-09-06 | 2022-03-01 | 南京工程学院 | An Improved Recording Device Recognition Algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105513610A (en) | Voice analysis method and device | |
CN107274916B (en) | Method and device for operating audio/video files based on voiceprint information | |
WO2019148586A1 (en) | Method and device for speaker recognition during multi-person speech | |
CN113724712A (en) | Bird sound identification method based on multi-feature fusion and combination model | |
CN114255783B (en) | Method for constructing sound classification model, sound classification method and system | |
CN110047512A (en) | A kind of ambient sound classification method, system and relevant apparatus | |
CN111540346A (en) | Far-field sound classification method and device | |
CN115273904A (en) | Angry emotion recognition method and device based on multi-feature fusion | |
CN114783424A (en) | Text corpus screening method, device, equipment and storage medium | |
CN111782861A (en) | Noise detection method and device, and storage medium | |
CN116758911A (en) | Campus violence monitoring method and system based on voice signal processing | |
CN102170528B (en) | Segmentation method of news program | |
KR101382356B1 (en) | Apparatus for forgery detection of audio file | |
CN114927125A (en) | Audio classification method and device, terminal equipment and storage medium | |
CN115881145A (en) | Voice processing and training method and electronic equipment | |
CN113691382A (en) | Conference recording method, device, computer equipment and medium | |
CN2922038Y (en) | Digital sound effect device with recording function | |
CN104102834A (en) | Method for identifying sound recording locations | |
Fathan et al. | An Ensemble Approach for the Diagnosis of COVID-19 from Speech and Cough Sounds | |
CN100375084C (en) | Computer with language re-reading function and its realizing method | |
KR101551968B1 (en) | Music source information provide method by media of vehicle | |
CN113889081A (en) | Speech recognition method, medium, apparatus and computing device | |
CN115700880A (en) | Behavior monitoring method and device, electronic equipment and storage medium | |
CN113314123B (en) | Voice processing method, electronic equipment and storage device | |
CN204989828U (en) | Pronunciation collection code storage device based on STM32 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160420 |