CN115840877B - Distributed stream processing method, system, storage medium and computer for MFCC extraction - Google Patents

Distributed stream processing method, system, storage medium and computer for MFCC extraction Download PDF

Info

Publication number
CN115840877B
CN115840877B CN202211558715.XA CN202211558715A CN115840877B CN 115840877 B CN115840877 B CN 115840877B CN 202211558715 A CN202211558715 A CN 202211558715A CN 115840877 B CN115840877 B CN 115840877B
Authority
CN
China
Prior art keywords
data stream
function
data
parallel
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211558715.XA
Other languages
Chinese (zh)
Other versions
CN115840877A (en
Inventor
施建明
李鹏
王功
王伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Technology and Engineering Center for Space Utilization of CAS
Original Assignee
Technology and Engineering Center for Space Utilization of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Technology and Engineering Center for Space Utilization of CAS filed Critical Technology and Engineering Center for Space Utilization of CAS
Priority to CN202211558715.XA priority Critical patent/CN115840877B/en
Publication of CN115840877A publication Critical patent/CN115840877A/en
Application granted granted Critical
Publication of CN115840877B publication Critical patent/CN115840877B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention relates to a distributed stream processing method, a system, a storage medium and a computer for MFCC extraction, wherein the method comprises the following steps: obtaining original data streams of the multi-source signals in parallel; the data type of the original data stream of the multi-source signal is String data; parallel flat mapping is carried out on the original data stream of the multi-source signal to obtain a multi-source discrete signal data stream; performing data stream windowing operation on the multi-source discrete signal data stream to obtain parallel continuous sliding windows; extracting Mel frequency cepstrum coefficients from the parallel continuous sliding window by using the parallel window processing function to obtain Mel frequency cepstrum coefficient data stream corresponding to the multi-source signal; according to the invention, through carrying out flat mapping and windowing operation on the data stream, the mel frequency cepstrum coefficient extraction work of the multi-source data stream can be executed in parallel, the extraction efficiency and timeliness of the mel frequency cepstrum coefficient are improved, and the hysteresis of offline processing of a large amount of data and the data processing pressure caused by huge data processing capacity are avoided.

Description

Distributed stream processing method, system, storage medium and computer for MFCC extraction
Technical Field
The invention relates to the field of industrial big data and signal processing, which is used for real-time processing of vibration signals and sound signals of mechanical equipment, in particular to a distributed stream processing method, a system, a storage medium and a computer for MFCC extraction.
Background
MFCC, which is collectively referred to as Mel Frequency Cepstrum Coefficient, i.e., mel-frequency cepstral coefficient, is commonly used for processing of sound signals and also for vibration signal processing. In the field of operation and maintenance of mechanical equipment, vibration monitoring and sound monitoring are common technical means, and are used for abnormality detection and fault diagnosis of the mechanical equipment through extracting the MFCC of the signals.
The prior known MFCC extraction method and technology are all to perform offline analysis after vibration data or sound data acquisition is completed, and the MFCC extraction algorithm needs to perform framing processing on the acquired signals. When the number of monitored devices is increased and the number of vibration and sound sensors is increased, a huge matrix is formed after the signals are framed, so that huge pressure is brought to offline feature extraction work, and on-line fault diagnosis scenes of MFCCs (multiple frequency-division multiple frequency) are difficult to meet due to feature extraction lag.
Disclosure of Invention
In order to solve the technical problems that when the number of monitored devices is large, the number of vibration and sound sensors becomes larger and larger, a huge matrix is formed after the signals are subjected to framing, huge pressure is brought to offline feature extraction work, and on-line fault diagnosis scenes of the MFCC (multi-frequency component communication) are difficult to meet due to feature extraction hysteresis, and the like, the invention provides a distributed stream processing method, a system, a storage medium and a computer for the extraction of the MFCC.
The technical scheme for solving the technical problems is as follows:
a distributed stream processing method for MFCC extraction, comprising the steps of:
obtaining original data streams of the multi-source signals in parallel; wherein, the data type of the original data stream of the multi-source signal is String data;
parallel flat mapping is carried out on the original data stream of the multi-source signal to obtain a multi-source discrete signal data stream;
performing data stream windowing operation on the multi-source discrete signal data stream to obtain parallel continuous sliding windows;
and extracting the Mel frequency cepstrum coefficient from the parallel continuous sliding window by using a parallel window processing function to obtain a Mel frequency cepstrum coefficient data stream corresponding to the multi-source signal.
The beneficial effects of the invention are as follows: vibration, sound, etc., signals generate a plurality of data points every millisecond, if the data points are sent one by one, the situation that the data points firstly occur and then arrive at the processing system is likely to occur, and thus, the data received by the processing system is disordered. Therefore, the invention packages the data collected in a plurality of milliseconds into a segment in a String format for transmission, and signal data points in the segment are arranged according to the original occurrence sequence, so that the sequence is not disordered, and the sequence is not disordered between the segments when the network transmission is normal because each segment is different in a plurality of milliseconds before and after the generation of each segment. When a plurality of data sources of the original data stream exist, the data stream is subjected to flat mapping and zoning windowing operation, so that the mel frequency cepstrum coefficient extraction work of the multi-source data stream can be executed in parallel, the extraction efficiency and timeliness of the mel frequency cepstrum coefficient are improved, and the hysteresis of off-line processing of a large amount of data and the data processing pressure caused by huge data processing capacity are avoided.
On the basis of the technical scheme, the invention can be improved as follows.
Further, the String data at least comprises a time stamp, a sensor ID, a signal value and a separator, wherein the sensor ID is a sensor number corresponding to the original data stream.
Further, the multi-source original data stream is subjected to parallel flat mapping to obtain a multi-source discrete signal data stream, which comprises the following steps: and carrying out parallel flat mapping on the multi-source original data stream by using a Flink stream processing method to obtain a multi-source discrete signal data stream.
The adoption of the further scheme has the advantages that the Flink stream processing has the characteristics of high throughput, low delay, distributed performance and the like, and the data processing efficiency is improved.
Further, the data stream windowing operation is performed on the multi-source discrete signal data stream to obtain parallel continuous sliding windows, which comprises the following steps:
performing key by operation on the multi-source discrete signal data stream according to the sensor ID to obtain a key data stream; the key by operation specifically comprises the steps of sending the multi-source discrete signal data streams with the same sensor ID to a designated partition;
and carrying out data stream windowing operation on the keyed data stream in each partition to obtain parallel continuous sliding windows corresponding to each sensor.
Further, extracting mel frequency cepstrum coefficients from the continuous sliding window by using a parallel window processing function to obtain a mel frequency cepstrum coefficient data stream, comprising the following steps:
storing the data in each sliding window in a corresponding double-precision array by utilizing the parallel window processing function;
and calling a Mel frequency cepstral coefficient extraction function for each double-precision array to obtain the Mel frequency cepstral coefficient data stream.
Further, the mel frequency cepstrum coefficient extraction function comprises a main function and a plurality of sub-functions, wherein the plurality of sub-functions are respectively a mel filter bank function, a discrete cosine transform function, a fast fourier transform function and a hamming window function;
and inputting the double-precision array into the main function, and calculating the double-precision array by the main function by calling a plurality of sub-functions to obtain a Mel frequency cepstrum coefficient.
In order to solve the technical problems, the invention also provides a multi-source signal mel-frequency cepstrum coefficient extraction distributed stream processing system, which comprises the following specific technical contents:
a distributed stream processing system for MFCC extraction, comprising:
the data acquisition module is used for acquiring the original data stream of the multi-source signal in parallel; wherein, the data type of the original data stream of the multi-source signal is String data;
the data processing module is used for carrying out parallel flat mapping on the original data stream of the multi-source signal to obtain a multi-source discrete signal data stream; performing data stream windowing operation on the multi-source discrete signal data stream to obtain parallel continuous sliding windows; and extracting the Mel frequency cepstrum coefficient from the continuous sliding window by using a parallel window processing function to obtain Mel frequency cepstrum coefficient data stream corresponding to the multi-source signal.
The invention also provides a storage medium based on the distributed stream processing method of the multi-source signal Mel frequency cepstrum coefficient extraction, which has the following technical contents:
a storage medium storing a computer program which, when executed by a processor of a computer, implements the above multi-source signal mel-frequency cepstral coefficient extraction distributed stream processing method.
The invention also provides a distributed stream processing extraction method based on the mel frequency cepstrum coefficient of the multi-source signal, which comprises the following technical contents:
a computer comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, implements the distributed stream processing method of MFCC extraction described above.
Drawings
FIG. 1 is a flow chart of a distributed flow processing method for MFCC extraction in embodiment 1 of the present invention;
fig. 2 is a schematic diagram of a signal flow flat mapping process in embodiment 1 of the present invention;
FIG. 3 is a schematic diagram of the structure of the MFCC extraction function set in embodiment 1 of the present invention;
FIG. 4 is a graph showing the initial vibration signal in example 3 of the present invention;
FIG. 5 is a chart showing the data flow recording of vibration signals in embodiment 3 of the present invention;
FIG. 6 is a block diagram of a Flink MFCC extraction task routine in embodiment 3 of the present invention.
Detailed Description
The principles and features of the present invention are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.
Example 1
As shown in fig. 1, the present embodiment provides a distributed flow processing method for MFCC extraction, including the following steps:
s1, acquiring original data streams of multi-source signals in parallel; wherein, the data type of the original data stream is String data; the String data at least comprises a time stamp, a sensor ID, a signal value and a separator, wherein the sensor ID is a sensor number corresponding to the original data stream. For vibration signal and sound signal data sources, the signal acquisition frequency is often high. Taking the acquisition frequency of 20kHz as an example, 20 points will be generated every millisecond, and in order to avoid the problem of disorder caused by the transmission of signals through the network, the Kafka producer sends a message record in a manner of cyclic packing for several milliseconds. Kafka is an open source stream processing platform developed by the Apache software foundation, a high throughput distributed publish-subscribe messaging system written by Scala and Java.
S2, carrying out parallel flat mapping on the original data stream of the multi-source signal to obtain a multi-source discrete signal data stream; specifically, parallel flat mapping is carried out on the original data stream of the multi-source signal by utilizing a Flink stream processing method to obtain a multi-source discrete signal data stream; in big data scenarios, using the Kafka producer as the data source, the flank program pulls and consumes data from the Kafka.
S3, carrying out data stream windowing operation on the multi-source discrete signal data stream by the multi-source discrete signal data stream to obtain parallel continuous sliding windows; the method comprises the following specific steps: performing key by operation on the multi-source discrete signal data stream according to the sensor ID to obtain a key data stream; the key by operation specifically comprises the steps of sending the multi-source discrete signal data streams with the same sensor ID to a designated same partition;
and carrying out data stream windowing operation on the keyed data stream in each designated partition to obtain parallel continuous sliding windows.
S4, extracting the Mel frequency cepstrum coefficient from the parallel continuous sliding window by using a parallel window processing function to obtain a Mel frequency cepstrum coefficient data stream corresponding to the multi-source signal. The method comprises the following specific steps:
storing the data in each sliding window in a corresponding double-precision array by using a parallel window processing function;
calling a mel frequency cepstrum coefficient extraction function for each double-precision array to obtain the mel frequency cepstrum coefficient data stream;
the mel frequency cepstrum coefficient extraction function comprises a main function and a plurality of sub-functions, wherein the sub-functions are respectively a mel filter bank function, a discrete cosine transform function, a fast Fourier transform function and a Hamming window function;
and inputting the double-precision array into the main function, and calculating the double-precision array by the main function by calling a plurality of sub-functions to obtain a Mel frequency cepstrum coefficient. Calling a Mel filter bank generating function to construct a Mel filter bank; then, calling a DCT coefficient matrix generating function to obtain DCT coefficients; then, performing fast Fourier transform on the double-precision array to obtain a frequency spectrum result; and finally, filtering the frequency spectrum result, namely multiplying the frequency spectrum result by a Mel filter bank, obtaining logarithms and multiplying the logarithms by DCT coefficients to obtain the Mel frequency cepstrum coefficients.
As shown in fig. 2, the original data of the multi-source signal is sent by the corresponding Kafka partition, so that the data can be read from the data source in a parallel manner by using the flank stream processing program, the original data stream formed after the data is read is a piece of data in the String format, and then the data in the String format needs to be mapped flatly. The flapMap operator of the flank is adopted to realize the conversion operation, and the core is the custom flatmapfunton. The main steps of the flapMap operator are as follows: dividing the record in String format into character String Array [ String ] according to the separator; the String Array [ String ] is traversed circularly, and data packed into the following sample class (caseclass) format is sent out every cycle from the 3 rd element of the String Array [ String ], namely the 1 st signal value:
case class VibElement(ts:Long,sensorId:Int,acc:Double)
where ts is the timestamp of the original record, sensor ID is the sensor ID of the original record, acc is the discrete signal value.
As shown in fig. 3, the data stream obtained after the flat mapping conversion is subjected to keyBy operation according to the sensor ID to form a keyed data stream, and the subsequent windowing and window processing are respectively performed on different sensor data streams in parallel. Wherein, the key by mainly aims at sending the data with the same key to the same partition; the data are originally distributed in different slots, i.e. partitions, and the key by pulls the data of the same key into the same slot, i.e. partition.
The windowing operation adopts a counting sliding window, and the window length, namely the number of data elements and the sliding step length, are two parameters of the windowing operation. In signal MFCC extraction applications, the window length is typically set to 256 and the sliding step size is set to 128.
When each window collects all data, the data is stored in an array of Double type, and then the MFCC extraction function is called for the array of Double type to obtain the MFCC of the DenseVector [ Double ] type. In order to identify the sensor ID to which the piece of MFCC data belongs and the timestamp of the signal window, the final extraction result is packaged into a sample class output of the following type:
case class MfccResult(startts:Long,endts:Long,
sensorId:Int,mfcc:DenseVector[Double])
the start is extracted from the 1 st data of the window, the start time stamp is the window start time stamp, the end is extracted from the last 1 data of the window, the end time stamp is the window end time stamp, and the sensor Id is the key mark carried by the key data stream. The time stamps are here all event times, i.e. acquisition times of the sensor signals.
As shown in fig. 4, scala is a development language commonly used in big data technology, and in order to enable a flank main program for signal flow feature extraction to be directly called, a MFCC extraction function set developed by Scala is provided, where the MFCC extraction function set includes a main function, a Mel filter bank generating function, a DCT function, an FFT function, and a Hamming window function. The DCT function is the discrete cosine transform coefficient matrix generating function, the FFT function is the fast Fourier transform function, the Hamming window function is the Hamming window function, the Mel filter bank generating function is the Mel filter bank generating function, and the Mel filter bank is the Mel filter bank.
The flow, input and output and function call relationship of the main function are as follows:
(1) design of a main function:
the input of the main function is an Array x of Array [ Double ] types, and the main function further comprises a sampling rate fs, a Mel filter order p and an FFT conversion length N, wherein the data types of the three parameters are I nt, namely integer data types, and p is generally set to 24. The FFT represents the fast fourier transform and the array x represents the double-precision array described above.
The output of the primary function is MFCC, which is of the type DenseVector [ double ] of p/2 format.
The main flow of the main function is: firstly, calling a Mel filter bank generating function to construct a Mel filter bank; then, calling a DCT coefficient matrix generating function to obtain DCT coefficients; then, performing fast Fourier transform on the array x to obtain a frequency spectrum result; finally, filtering the spectrum result, namely multiplying the spectrum by a filter bank, obtaining logarithms and multiplying the logarithms by DCT coefficients to obtain the MFCC coefficients.
(2) Me filter group generating function, three parameters fs, p and N input from main function are transferred to the function, and a conventional Mel filter group generating method is adopted to obtain a DenseMatr ix [ Double ] type Me filter group, wherein the size of the matrix is as follows:
p×(N/2+1)
the DCT coefficient matrix generating function is used for taking the parameter p input from the main function as the input of the function, and a conventional DCT coefficient generating method is adopted to obtain a DCT coefficient matrix of a DenseMatr ix [ double le ] type, wherein the size of the DCT coefficient matrix is p/2 xp. Wherein the DCT coefficients are discrete cosine transform coefficients.
The FFT function is used for taking an Array x input from the main function as the input of the function, and obtaining an Array [ double e ] type output by adopting a conventional FFT conversion method, and in the FFT function, hammi ng window filtering is firstly carried out on x.
The Hamming window filtering function is used for taking an array x input from the FFT function as the input of the function, and obtaining an array with the same length by adopting a conventional Hamming window construction method as a filtered result.
According to the embodiment of the invention, by setting the data of the original data stream as the string data, signals such as vibration and sound, a plurality of data points are generated every millisecond, if the data points are sent one by one, the situation that the data points firstly occur and then arrive at the processing system is likely to occur, and thus the data received by the processing system is disordered. Therefore, the invention packages the data collected in a plurality of milliseconds into a segment in a String format for transmission, and signal data points in the segment are arranged according to the original occurrence sequence, so that the sequence is not disordered, and the sequence is not disordered between the segments when the network transmission is normal because each segment is different in a plurality of milliseconds before and after the generation of each segment. When a plurality of data sources of the original data stream exist, the data stream is subjected to flat mapping and windowing operation, so that the mel frequency cepstrum coefficient extraction work of the multi-source data stream can be executed in parallel, the extraction efficiency and timeliness of the mel frequency cepstrum coefficient are improved, and the hysteresis of off-line processing of a large amount of data and the data processing pressure caused by huge data processing capacity are avoided. The Flink stream processing has the characteristics of high throughput, low delay, distributed and the like, and improves the data processing efficiency. Unlike the offline mel-frequency cepstrum coefficient feature extraction in the prior art, the mel-frequency cepstrum coefficient is extracted under the streaming processing paradigm in the embodiment of the invention, and framing operation is not needed in the extraction process.
Example 2
Based on embodiment 1, this embodiment provides a distributed stream processing system for MFCC extraction, including a data acquisition module and a data processing module;
the data acquisition module is used for acquiring the original data stream of the multi-source signal in parallel; wherein, the data type of the original data stream is String data; the specific data acquisition mode is as follows: the Kafka producer is used as a data source, a multi-source signal original data stream in a String format is continuously transmitted, and the Flink program pulls and consumes data from the Kafka.
The data processing module is used for carrying out parallel flat mapping on the original data stream of the multi-source signal to obtain a multi-source discrete signal data stream; performing data stream windowing operation on the multi-source discrete signal data stream to obtain parallel continuous sliding windows; and extracting the Mel frequency cepstrum coefficient from the parallel continuous sliding window by using a parallel window processing function to obtain a Mel frequency cepstrum coefficient data stream corresponding to the multi-source signal.
The original data stream is subjected to parallel flat mapping to obtain a multi-source discrete signal data stream; specifically, a flatMap operator processed by a Flink stream is utilized to carry out parallel flat mapping on the original data stream, so as to obtain a multi-source discrete signal data stream.
The multi-source discrete signal data stream performs data stream windowing operation on the multi-source discrete signal data stream to obtain parallel continuous sliding windows; the method comprises the following specific steps: performing key by operation on the multi-source discrete signal data stream according to the sensor ID to obtain a key data stream; the key by operation specifically comprises the steps of sending the multi-source discrete signal data streams with the same sensor ID to a designated same partition;
and carrying out data stream windowing operation on the keyed data stream in each designated partition to obtain parallel continuous sliding windows.
And extracting the Mel frequency cepstrum coefficient from the parallel continuous sliding window by using a parallel window processing function to obtain a Mel frequency cepstrum coefficient data stream corresponding to the multi-source signal. The method comprises the following specific steps:
storing the data in each sliding window in a corresponding double-precision array by utilizing a window function;
calling a mel frequency cepstrum coefficient extraction function for each double-precision array to obtain the mel frequency cepstrum coefficient data stream;
the mel frequency cepstrum coefficient extraction function comprises a main function and a plurality of sub-functions, wherein the sub-functions are respectively a mel filter bank function, a discrete cosine transform function, a fast Fourier transform function and a Hamming window function;
and inputting the double-precision array into the main function, and calculating the double-precision array by the main function by calling a plurality of sub-functions to obtain a Mel frequency cepstrum coefficient.
Example 3
Based on embodiment 1, this embodiment provides an experimental verification process and verification result of a distributed flow processing method of MFCC extraction or a distributed flow processing system of MFCC extraction. The specific experimental process and verification result are as follows:
the method and the system provided by the invention are verified by taking acceleration signals acquired by the vibration sensor as examples. In this example, a total of 4 vibration sensors are arranged to acquire vibration signals of the device in real time, the sampling frequency of the signals is 20kHz, and fig. 4 is a graph of vibration signals within 1 s.
By the Kafka producer issuing a signal stream of 4 sensors in parallel, 1 record is produced every 8ms, which contains 160 sample points, resulting in a data stream record as shown in fig. 5.
Test items and test methods;
the method and the system for processing the MFCC extraction flow are tested by adopting 4 vibration signal data sources, and main test items and test methods thereof are shown in table 1:
table 1 test items and test methods
Figure BDA0003983668700000111
Testing the process and the result;
the MFCC extraction distributed stream processing function test is as follows:
running a Kafka producer program, transmitting vibration signal data acquired by 4 vibration sensors in parallel, transmitting 160 sampling points every 8ms by each sensor, representing the sampling points transmitted by each sensor every 8ms as 1 record, continuously transmitting 20000 records, testing whether the MFCC extraction result data stream can be normally extracted and output in the period of about 2.7 minutes for the duration of the data stream.
According to a counting window with 256 lengths and 128 sliding step sizes, the vibration sampling points of each sensor are 3200000, and the number of extracted MFCC results is accumulated as follows:
Figure BDA0003983668700000112
the test results show that:
the data flow generation and the MFCC feature extraction keep synchronous, and the MFCC feature extraction calculation task is triggered and completed at the moment when a signal is sent to the processing system; through repeated tests, the feature extraction result is completely consistent with the offline processing result, and the correctness of program design and operation is indicated; the number of bars per sensor MFCC extraction was 24999, demonstrating 100% data processing integrity.
MFCC feature extraction delay time test:
in order to test the feature extraction delay time, a Kafka producer program and a flank feature extraction stream processing main program are run on the same host computer, the instant computer system time for completing the MFCC feature processing is subtracted by the window cut-off event time, the delay time of each time of MFCC generation is obtained, 99996 delay time data samples of four sensor feature extraction are obtained in total, and relevant statistical results are shown in table 2.
Table 2 delay time test results
Figure BDA0003983668700000121
Since the test data is subject to transmission from the Kafka producer program running on the local host to the Kafka cluster, the data is pulled from the Kafka cluster by the flank host program running on the local host. Through actual measurement, the processing time of the feature extraction per se of each window is very short and less than 1ms; thus, the delay time is largely introduced by network transmissions, and even such an overall delay of an average of more than 30 milliseconds is sufficient to prove that the extraction of MFCC characteristics by stream processing is very efficient, i.e. the delay time from the generation of multi-source signal raw data to the corresponding MFCC characteristic output is extremely short.
Test of feature extraction program in computer cluster:
as shown in FIG. 6, the FlinkMFCC extraction program and its third party dependencies are packaged and deployed to the Hadoop cluster. A yan session is opened for execution of the Flink task by a "/bin/yan-session.sh-nmfccflinktest-d" command. The flank task was then submitted via the "/bin/flankrun-corg.atcsu.mfcc.vibmfccrealtem/opt/program/flankmfcc-1.0.0. Jar" command. The Kafka producer program was run to generate multi-source vibration signal raw data. In the cluster mode, the program runs normally, and the MFCC extracts the result correctly in real time.
According to the embodiment of the invention, the test result is verified to be consistent with the local single machine test result through experiments, which shows that the MFCC (multi-frequency carrier) characteristic can be extracted correctly from the original data stream of the multi-source signal in the computer cluster processing mode, the data generation is completed, and the data processing integrity rate is 100%. Therefore, the invention can enable the Mel frequency cepstrum coefficient extraction work of the multi-source data stream to be executed in real time and in parallel, improves the extraction efficiency and timeliness of the Mel frequency cepstrum coefficient, and avoids the hysteresis of the offline processing of a large amount of data and the data processing pressure caused by huge data processing capacity.
Example 4
Based on embodiment 1, this embodiment provides a storage medium storing a computer program that, when executed by a processor of a computer, implements the above-described distributed stream processing method of MFCC extraction. Storage media refer to carriers that store data. Such as floppy disk, optical disk, DVD, hard disk, flash Memory, U-disk, CF card, SD card, MMC card, SM card, memory Stick (Memory Stick), xD card, etc. The storage medium may also be flash based, i.e. Nandflash, such as a usb disk, CF card, SD card, SDHC card, MMC card, SM card, memory stick, xD card, etc.
The invention realizes the distributed stream processing method of the MFCC extraction by storing the program in the storage medium and executing the program by the processor, thereby improving the data processing efficiency.
Example 5
Based on embodiment 1, this embodiment provides a computer, including a memory and a processor, where the memory stores a computer program, and when the computer program is executed by the processor, the above-mentioned distributed flow processing method for MFCC extraction is implemented. By running or executing software programs and/or modules stored in the memory and invoking data stored in the memory, various functions of the terminal are performed and the data is processed, thereby performing overall monitoring of the terminal, such as implementing the above-described distributed stream processing method of MFCC extraction. A processor may be one or more, and a processor may also be implemented as a combination of computing devices.
The embodiment of the invention improves the data processing efficiency by using a computer to realize the program or the application module corresponding to the distributed stream processing method extracted by the MFCC.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (7)

  1. A distributed stream processing method for mfcc extraction, comprising the steps of:
    obtaining original data streams of the multi-source signals in parallel; wherein, the data type of the original data stream of the multi-source signal is String data;
    parallel flat mapping is carried out on the original data stream of the multi-source signal to obtain a multi-source discrete signal data stream;
    performing data stream windowing operation on the multi-source discrete signal data stream to obtain parallel continuous sliding windows;
    extracting Mel frequency cepstrum coefficients from the sliding window continuously in parallel by using a parallel window processing function to obtain Mel frequency cepstrum coefficient data stream corresponding to the multi-source signal;
    extracting mel frequency cepstrum coefficients in the continuous sliding window by using a parallel window processing function to obtain a mel frequency cepstrum coefficient data stream, comprising the following steps:
    storing the data in each sliding window in a corresponding double-precision array by utilizing the parallel window processing function;
    calling a mel frequency cepstrum coefficient extraction function for each double-precision array to obtain the mel frequency cepstrum coefficient data stream;
    the mel frequency cepstrum coefficient extraction function comprises a main function and a plurality of sub-functions, wherein the plurality of sub-functions are a mel filter bank function, a DCT function, a fast Fourier transform FFT function and a Hamming window function respectively;
    and inputting the double-precision array into the main function, and calculating the double-precision array by the main function by calling a plurality of sub-functions to obtain a Mel frequency cepstrum coefficient.
  2. 2. The method of claim 1, wherein the String data comprises at least a timestamp, a sensor ID, a plurality of signal values, and a separator, wherein the sensor ID is a sensor number corresponding to the original data stream of the multi-source signal.
  3. 3. The method for processing the distributed stream extracted from the MFCC according to claim 2, wherein the parallel flat mapping is performed on the multi-source signal raw data stream, so as to obtain a multi-source discrete signal data stream, comprising the following steps: and carrying out parallel flat mapping operation on the original data stream by using a Flink stream processing method to obtain a multi-source discrete signal data stream.
  4. 4. A method of distributed stream processing for MFCC extraction according to claim 3, wherein said multi-source discrete signal data stream is subjected to a data stream division windowing operation resulting in parallel continuous sliding windows, comprising the steps of:
    performing key by operation on the multi-source discrete signal data stream according to the sensor ID to obtain a key data stream; the key by operation specifically includes that the discrete signal data streams with the same sensor ID are sent to a designated partition;
    and carrying out data stream windowing operation on the keyed data stream in each partition to obtain parallel continuous sliding windows corresponding to each sensor.
  5. A distributed stream processing system for mfcc extraction, comprising:
    the data acquisition module is used for acquiring the original data stream of the multi-source signal in parallel; wherein, the data type of the original data stream of the multi-source signal is String data;
    the data processing module is used for carrying out parallel flat mapping on the original data stream of the multi-source signal to obtain a multi-source discrete signal data stream; performing data stream windowing operation on the multi-source discrete signal data stream to obtain a continuous sliding window; extracting Mel frequency cepstrum coefficients from the continuous sliding window by using a parallel window processing function to obtain Mel frequency cepstrum coefficient data stream;
    extracting mel frequency cepstrum coefficients in the continuous sliding window by using a parallel window processing function to obtain a mel frequency cepstrum coefficient data stream, comprising the following steps:
    storing the data in each sliding window in a corresponding double-precision array by utilizing the parallel window processing function;
    calling a mel frequency cepstrum coefficient extraction function for each double-precision array to obtain the mel frequency cepstrum coefficient data stream;
    the mel frequency cepstrum coefficient extraction function comprises a main function and a plurality of sub-functions, wherein the plurality of sub-functions are a mel filter bank function, a DCT function, a fast Fourier transform FFT function and a Hamming window function respectively;
    and inputting the double-precision array into the main function, and calculating the double-precision array by the main function by calling a plurality of sub-functions to obtain a Mel frequency cepstrum coefficient.
  6. 6. A storage medium storing a computer program which, when executed by a processor of a computer, implements the distributed stream processing method of MFCC extraction according to any one of claims 1 to 4.
  7. 7. A computer comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, implements the distributed stream processing method of MFCC extraction as claimed in any one of claims 1-4.
CN202211558715.XA 2022-12-06 2022-12-06 Distributed stream processing method, system, storage medium and computer for MFCC extraction Active CN115840877B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211558715.XA CN115840877B (en) 2022-12-06 2022-12-06 Distributed stream processing method, system, storage medium and computer for MFCC extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211558715.XA CN115840877B (en) 2022-12-06 2022-12-06 Distributed stream processing method, system, storage medium and computer for MFCC extraction

Publications (2)

Publication Number Publication Date
CN115840877A CN115840877A (en) 2023-03-24
CN115840877B true CN115840877B (en) 2023-07-07

Family

ID=85578169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211558715.XA Active CN115840877B (en) 2022-12-06 2022-12-06 Distributed stream processing method, system, storage medium and computer for MFCC extraction

Country Status (1)

Country Link
CN (1) CN115840877B (en)

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8412526B2 (en) * 2003-04-01 2013-04-02 Nuance Communications, Inc. Restoration of high-order Mel frequency cepstral coefficients
CN101256768B (en) * 2008-04-03 2011-03-30 清华大学 Time frequency two-dimension converse spectrum characteristic extracting method for recognizing language species
US8656369B2 (en) * 2010-05-24 2014-02-18 International Business Machines Corporation Tracing flow of data in a distributed computing application
WO2014020588A1 (en) * 2012-07-31 2014-02-06 Novospeech Ltd. Method and apparatus for speech recognition
KR101371299B1 (en) * 2013-02-14 2014-03-12 한국과학기술원 Analyzing method and apparatus for the depth of anesthesia using cepstrum method
US9256460B2 (en) * 2013-03-15 2016-02-09 International Business Machines Corporation Selective checkpointing of links in a data flow based on a set of predefined criteria
CA2998399C (en) * 2014-09-17 2021-11-16 Evrika Research Technologies Inc. Systems, methods and devices for highly-parallelized qus-value determination for characterizing a specimen
US11017778B1 (en) * 2018-12-04 2021-05-25 Sorenson Ip Holdings, Llc Switching between speech recognition systems
US10983969B2 (en) * 2019-02-18 2021-04-20 Boomi, Inc. Methods and systems for mapping flattened structure to relationship preserving hierarchical structure
US11194798B2 (en) * 2019-04-19 2021-12-07 International Business Machines Corporation Automatic transformation of complex tables in documents into computer understandable structured format with mapped dependencies and providing schema-less query support for searching table data
CN111210806B (en) * 2020-01-10 2022-06-17 东南大学 Low-power-consumption MFCC voice feature extraction circuit based on serial FFT
CN111261189B (en) * 2020-04-02 2023-01-31 中国科学院上海微系统与信息技术研究所 Vehicle sound signal feature extraction method
AU2020102350A4 (en) * 2020-09-21 2020-10-29 Guizhou Minzu University A Spark-Based Deep Learning Method for Data-Driven Traffic Flow Forecasting
CN112270933B (en) * 2020-11-12 2024-03-12 北京猿力未来科技有限公司 Audio identification method and device
CN114095032B (en) * 2021-11-12 2022-07-15 中国科学院空间应用工程与技术中心 Data stream compression method based on Flink and RVR, edge computing system and storage medium
CN115331678A (en) * 2022-03-21 2022-11-11 西北工业大学 Generalized regression neural network acoustic signal identification method using Mel frequency cepstrum coefficient
CN115273904A (en) * 2022-07-22 2022-11-01 浙江大学 Angry emotion recognition method and device based on multi-feature fusion

Also Published As

Publication number Publication date
CN115840877A (en) 2023-03-24

Similar Documents

Publication Publication Date Title
CN110287163B (en) Method, device, equipment and medium for collecting and analyzing security log
CN110083436A (en) A kind of business datum real-time monitoring system and method based on Java bytecode enhancing technology
CN108444589B (en) Hydroelectric generating set state monitoring signal processing method based on frequency domain feature extraction
CN110175154A (en) A kind of processing method of log recording, server and storage medium
CN106534242B (en) The processing method and device requested in a kind of distributed system
CN112835792B (en) Pressure testing system and method
CN110569214A (en) Index construction method and device for log file and electronic equipment
CN114095032B (en) Data stream compression method based on Flink and RVR, edge computing system and storage medium
CN111259073A (en) Intelligent business system running state studying and judging system based on logs, flow and business access
CN103067218A (en) High speed network data package content analysis device
WO2023206860A1 (en) Method and apparatus for determining mechanical device fault
CN110262949A (en) Smart machine log processing system and method
CN102820983A (en) Method for collecting system abnormality information, and manager
CN107659560A (en) A kind of abnormal auditing method for mass network data flow log processing
CN115840877B (en) Distributed stream processing method, system, storage medium and computer for MFCC extraction
CN112446389A (en) Fault judgment method and device
CN106557483B (en) Data processing method, data query method, data processing equipment and data query equipment
CN112882899B (en) Log abnormality detection method and device
CN109800221A (en) A kind of mass data association relationship analysis method, apparatus and system
CN114639391A (en) Mechanical failure prompting method and device, electronic equipment and storage medium
CN111970151A (en) Flow fault positioning method and system for virtual and container network
CN114564983A (en) Hydroelectric generating set state monitoring characteristic signal processing method based on time-frequency conversion
CN114328093A (en) Hadoop-based monitoring method, system, storage medium and equipment
CN114168445A (en) Log analysis method, device and system and readable storage medium
CN109726181B (en) Data processing method and data processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant