CN117746905B - Human activity influence assessment method and system based on time-frequency persistence analysis - Google Patents
Human activity influence assessment method and system based on time-frequency persistence analysis Download PDFInfo
- Publication number
- CN117746905B CN117746905B CN202410179329.2A CN202410179329A CN117746905B CN 117746905 B CN117746905 B CN 117746905B CN 202410179329 A CN202410179329 A CN 202410179329A CN 117746905 B CN117746905 B CN 117746905B
- Authority
- CN
- China
- Prior art keywords
- signal
- signal frame
- energy
- frequency
- effective
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000000694 effects Effects 0.000 title claims abstract description 55
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000004458 analytical method Methods 0.000 title claims abstract description 28
- 230000002688 persistence Effects 0.000 title claims abstract description 27
- 230000005236 sound signal Effects 0.000 claims abstract description 33
- 239000013598 vector Substances 0.000 claims abstract description 14
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 11
- 238000009432 framing Methods 0.000 claims abstract description 4
- 238000012544 monitoring process Methods 0.000 claims description 49
- 230000011218 segmentation Effects 0.000 claims description 12
- 238000010586 diagram Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 230000001360 synchronised effect Effects 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims 1
- 238000001514 detection method Methods 0.000 abstract description 10
- 238000011156 evaluation Methods 0.000 abstract description 6
- 238000012545 processing Methods 0.000 abstract description 4
- 230000007547 defect Effects 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000739 chaotic effect Effects 0.000 description 1
- 230000030808 detection of mechanical stimulus involved in sensory perception of sound Effects 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- VEMKTZHHVJILDY-UHFFFAOYSA-N resmethrin Chemical compound CC1(C)C(C=C(C)C)C1C(=O)OCC1=COC(CC=2C=CC=CC=2)=C1 VEMKTZHHVJILDY-UHFFFAOYSA-N 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Landscapes
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The invention relates to the technical field of voice analysis and processing, in particular to a human activity influence assessment method and system based on time-frequency persistence analysis, wherein the method comprises the following steps: collecting audio signals and carrying out framing processing to obtain signal frames, and constructing an amplitude characteristic group of a signal period; calculating active significant factors of each signal frame; obtaining effective steady-state factors of each signal frame according to the active significant factors of each signal frame, the information entropy of the corresponding amplitude and the amplitude characteristic group of each signal period; effective confidence weights of the signal frames are built, and high-energy effective factors of the signal frames are calculated by combining signal attenuation rates of frequency components in a frequency domain; the short-time energy and the high-energy effective factors form an effective two-dimensional vector; energy efficient persistence coefficients for each signal frame are calculated to extract each active frame, and human activity impact is evaluated based on the active frames. Therefore, the accurate evaluation of the human activity image is realized, and the defect that the traditional VAD endpoint detection algorithm is difficult to accurately judge the activity frame is avoided.
Description
Technical Field
The invention relates to the technical field of voice analysis and processing, in particular to a human activity influence assessment method and system based on time-frequency persistence analysis.
Background
With the deep development of sustainable development concepts and continuous progress of scientific technology, reasonable utilization and protection of natural resources are widely promoted and developed. Human activities have profound and wide influence on natural environment and ecosystem systems, and the purpose of human activity influence evaluation is to measure the influence change degree of human activities on the ecosystem so that subsequent related personnel can take corresponding management and protection measures; whereas theft refers to the act of illegally felling and unauthorized harvesting forest resources in human activities, severely damaging the balance and sustainable utilization of the ecosystem. The monitoring and evaluation of human theft behaviors in the environment can be realized by monitoring the sound signals in the ecological system, and the sustainable development and protection of the ecological environment are promoted.
Because the ecosystem environment is complex and various environmental sounds and noise are doped, the traditional VAD endpoint detection algorithm (Voice Activity Detection) only determines the threshold of the active frame through short-time energy and zero crossing rate, so that the active frame is difficult to accurately judge when the natural ecosystem sound is analyzed, and the interference of a silent section and a non-target sound section cannot be eliminated, and the influence on the follow-up evaluation of the human activity influence is influenced.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a human activity influence assessment method and a system based on time-frequency persistence analysis, and the adopted technical scheme is as follows:
In a first aspect, an embodiment of the present invention provides a method for evaluating the influence of human activity based on time-frequency persistence analysis, the method comprising the steps of:
Collecting an audio signal of a natural ecosystem and carrying out framing treatment to obtain each signal frame; drawing a corresponding time domain waveform diagram of the sound signal corresponding to each signal frame;
Constructing an amplitude characteristic group of the signal period according to the maximum value and the minimum value of the amplitude of each period in the time domain waveform diagram of the signal frame; obtaining active significant factors of each signal frame according to the main frequency energy corresponding to each frequency component of each signal frame in the frequency domain and the zero crossing rate of the signal frame; obtaining effective steady-state factors of the signal frames according to the active significant factors of the signal frames, the information entropy of the corresponding amplitude and the cosine similarity among the amplitude characteristic groups of the signal periods; obtaining effective confidence weights of the signal frames according to the effective steady-state factors of the signal frames and the 3dB bandwidths of the frequency components of the signal frames in the frequency domain; calculating the signal attenuation rate of each frequency component of the signal frame in the frequency domain, and obtaining the high-energy effective factor of each signal frame according to the effective confidence weight of each signal frame and the distribution condition of the signal attenuation rate of each frequency component in the frequency domain; the short-time energy and the high-energy effective factor of each signal frame are formed into an effective two-dimensional vector of each signal frame; constructing high-energy-efficiency continuous coefficients of each signal frame according to the short-time energy and high-energy effective factors of each signal frame and the pearson correlation coefficients between the effective two-dimensional vectors of each signal frame and other signal frames;
and (3) combining the high-energy-efficiency continuous coefficients of the signal frames in the monitoring interval and the Ojin method to obtain the active frames, and evaluating the influence of the active frames on human activities.
Further, the constructing an amplitude feature set of the signal period according to the maximum value and the minimum value of the amplitude of each period in the time domain waveform diagram of the signal frame includes: the maximum and minimum values of the amplitude are combined into amplitude characteristic groups corresponding to the signal period.
Further, the obtaining the active significant factor of each signal frame according to the main frequency energy corresponding to each frequency component of each signal frame in the frequency domain and the zero crossing rate of the signal frame includes:
taking the zero crossing rate of the signal frame as an index of an exponential function based on a natural constant;
counting the maximum main frequency energy value of all frequency components in the frequency of the signal frame, and calculating the sum value of the maximum main frequency energy value and the main frequency energy difference value of each frequency component;
And taking the result of the product of the inverse of the sum value and the exponential function as an active significant factor of a signal frame.
Further, the obtaining the effective steady-state factor of each signal frame according to the active significant factor of each signal frame, the information entropy of the corresponding amplitude and the cosine similarity between the amplitude characteristic groups of each signal period includes:
Calculating information entropy of all amplitudes contained in the signal frame, and obtaining the ratio of the active significant factor of the signal frame to the information entropy; and calculating the sum value of cosine similarity of the amplitude characteristic groups of all any two signal periods of the signal frame, and taking the product of the ratio and the sum value as an effective steady-state factor of the signal frame.
Further, the obtaining the effective confidence weight of each signal frame according to the effective steady-state factor of each signal frame and the 3dB bandwidth of each frequency component of the signal frame in the frequency domain comprises the following steps:
Calculating the average value of the 3dB bandwidths of all the frequency components of the signal frame, obtaining the absolute value of the difference value between the 3dB bandwidths of all the frequency components of the signal frame in the frequency domain and the average value, and obtaining the reciprocal of the sum value of the absolute value of the difference values of all the frequency components of the signal frame in the frequency domain;
Taking the product of the effective steady-state factor of the signal frame and the reciprocal as the effective confidence weight of the signal frame.
Further, the calculating the signal attenuation rate of each frequency component of the signal frame in the frequency domain includes: the ratio of the main frequency energy of each frequency component to the 3dB bandwidth of each frequency component is taken as the signal attenuation rate of each frequency component.
Further, the obtaining the high-energy effective factor of each signal frame according to the effective confidence weight of each signal frame and the distribution condition of the signal attenuation rate of each frequency component in the frequency domain includes:
And calculating the average value of the signal attenuation rates of all frequency components in the signal frame, obtaining the sum value of the signal attenuation rates of all frequency components in the frequency domain of the signal frame and the absolute value of the difference value of the average value, and taking the product of the effective confidence weight of the signal frame and the reciprocal of the sum value as the high-energy effective factor of the signal frame.
Further, the constructing the energy-efficient persistence coefficient of each signal frame according to the short-time energy, the high-energy effective factor of each signal frame and the pearson correlation coefficient between the effective two-dimensional vectors of each signal frame and other signal frames specifically includes:
for each signal frame, calculating the sum of pearson correlation coefficients between effective two-dimensional vectors of the signal frame and other signal frames, and taking the sum as an index of an exponential function based on a natural constant;
And calculating the difference absolute value of the high-energy effective factors of the signal frames and the average value of the high-energy effective factors of all the signal frames, obtaining the ratio of the short-time energy of the signal frames to the difference absolute value, and taking the product of the ratio and the calculation result of the exponential function as the high-energy efficiency continuous coefficient of the signal frames.
Further, the method for obtaining each active frame by combining the high energy efficiency continuous coefficient of each signal frame in the monitoring interval and the oxford method, and evaluating the influence of the active frame on the human activity comprises the following steps:
presetting the duration of a monitoring interval, calculating the segmentation threshold value of the high-energy-efficiency continuous coefficients of all signal frames in the monitoring interval by adopting an Ojin method, and judging the signal frames to be active frames when the high-energy-efficiency continuous coefficients of the signal frames are more than or equal to the segmentation threshold value;
Taking all active frames in the monitoring interval as input, and acquiring all active frames in the monitoring interval by using a PSOLA pitch synchronous superposition algorithm to splice to form a voice segment; when the duration of the voice section of the ecological area in the monitoring interval is greater than or equal to a preset time threshold, the human activity has an influence on the natural ecological system, and otherwise, the human activity has no influence.
In a second aspect, embodiments of the present invention also provide a human activity impact assessment system based on time-frequency persistence analysis, comprising a memory, a processor and a computer program stored in the memory and running on the processor, the processor implementing the steps of any one of the methods described above when executing the computer program.
The invention has at least the following beneficial effects:
according to the method, the effective confidence weight of the signal frame is built according to the signal characteristics in the monitoring interval, the time domain, the frequency domain characteristics and the signal activity stability characteristics of signals in each signal frame in the monitoring interval are comprehensively considered, the high-energy-efficiency continuous coefficient is obtained by combining the noise interference degree of the sound signals in the signal frame and the signal continuous rule characteristics, the effective signal degree of the continuous rule contained in the signal frame can be accurately measured, the high-energy-efficiency continuous coefficient sequence is built according to the high-energy-efficiency continuous coefficient to obtain the segmentation threshold, whether the signal frame belongs to the active frame is judged through the segmentation threshold, the defect that the traditional VAD endpoint detection algorithm is difficult to accurately judge the active frame is avoided, meanwhile, the active frame judged according to the method can better reflect the human activity condition in the ecological region in the monitoring interval, the human activity influence is estimated through the total duration of the active frame voice segment, and the accurate estimation of the human activity influence combined with sound time-frequency continuous analysis is realized.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the steps of a human activity impact assessment method based on time-frequency persistence analysis according to an embodiment of the present invention;
FIG. 2 is a schematic representation of a time domain waveform of an acoustic signal in a natural ecosystem;
FIG. 3 is a schematic diagram of a time-domain spectrogram of an acoustic signal in a natural ecosystem;
Fig. 4 is a schematic representation of the spectral amplitude of an acoustic signal in a natural ecosystem.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description refers to the specific implementation, structure, characteristics and effects of the human activity influence assessment method and system based on time-frequency persistence analysis according to the present invention, which are described in detail below with reference to the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the human activity influence evaluation method and system based on time-frequency persistence analysis provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a flowchart illustrating a method for evaluating human activity impact based on time-frequency persistence analysis according to an embodiment of the invention is shown, the method includes the following steps:
Step S001, collecting and acquiring sound signals in the natural ecosystem, and preprocessing the acquired audio signals.
The sound signal in natural ecosystem is continuously collected by using a mono sound sensor, the effective monitoring range of the sound sensor used in the invention is a 20m range with the sensor as the center, the ecosystem is divided according to the placement positions of all the sound sensors in the natural ecosystem and the effective monitoring range of the sensors (namely, the monitoring range of each sensor is taken as an ecological area), the signal sampling rate of the sound sensor is 22.05kHz, and the number of the sound sensors isEvery 10min is set as a monitoring interval, in this embodiment/>625, The practitioner can set himself according to the actual situation.
In order to facilitate the subsequent characteristic analysis of the sound signals in each ecological area, the frame-dividing pretreatment is also needed for the sound signals, and in the frame-dividing pretreatment operation of the invention, the frame length of the sound signals is set to be 20ms, and the frame length is selected in a frame shift wayFor the subsequent more accurate analysis of the frequency domain characteristics of the sound signals in the natural ecosystem, a hamming window is selected to perform windowing operation on the sound signals of each frame, and the overlapping proportion of the previous frame and the current frame is 50%, so that the framing process is a known technology, and the embodiment will not be repeated.
Thus, each signal frame of each ecological area in the monitoring interval can be obtained.
Step S002, effective confidence weights of the signal frames are constructed according to the signal characteristics in the monitoring interval, high-energy-efficiency continuous coefficients are obtained through the combination of the effective confidence weights and the noise interference degree of the signal frames and the signal continuous rule characteristics, high-energy-efficiency continuous coefficient sequences are constructed according to the high-energy-efficiency continuous coefficients of the signal frames, whether the signal frames belong to the active frames is judged through the segmentation threshold, and the endpoint detection of sound signals in all ecological areas in the monitoring interval is achieved.
Because the human piracy activity involves the use of mechanical equipment such as electric saws and other tools, the tools can generate vibration and high-frequency noise when in operation, so that the sound signal has higher zero crossing rate and higher energy, and the human piracy activity can be kept for a certain duration, so that the sound signal generated by the human piracy activity also has stronger regularity; since natural environmental sounds (including but not limited to wind sounds and water sounds) in an ecological area have wider frequency spectrum characteristics, the natural environmental sounds have lower zero-crossing rate and signal energy, while animal sounds in the natural environmental sounds usually belong to high-frequency information, have higher energy and zero-crossing rate, but due to the self characteristics of animal sound signals, the signal stability is poorer than that of human piracy sound signals. The time domain waveform diagram of the sound signal in the natural ecosystem is shown in fig. 2, the abscissa in fig. 2 is time, and the ordinate is the amplitude of the waveform; the time domain spectrogram of the sound signal in the natural ecosystem is shown in fig. 3, the abscissa in fig. 3 is frequency, and the ordinate is amplitude; the spectrum amplitude diagram of the sound signal in the natural ecosystem is shown in fig. 4, wherein in fig. 4, the abscissa is discrete frequency, and the ordinate is amplitude.
An ecological area is taken as an example for analysis, a corresponding time domain waveform diagram is drawn according to the amplitude value of the sound signal corresponding to each signal frame, and the amplitude sequence of the signal frame is constructed according to the time sequence of the amplitude of the sound signal corresponding to the signal frame. The sound signal in each signal frame is converted into a frequency domain through Fourier transformation, a fundamental frequency envelope in a frequency spectrum corresponding to the signal frame is determined, a frequency corresponding to a maximum energy value in the fundamental frequency envelope is used as a fundamental frequency, the fundamental frequency is inverted to obtain a signal period size t, and the signal frames are divided according to the signal period size. It should be noted that, the period acquisition enforcer of the signal frame may select other existing techniques to calculate, and the signal period acquisition method is many, and the present embodiment is not limited to this specific technique. The amplitude feature set of the jth signal period in the jth signal frame is constructed by the maximum value and the minimum value of the amplitude of the jth signal period and is recorded asWherein/>、The maximum and minimum values of the amplitude in the t signal period in the j signal frame are respectively.
Taking the maximum energy value of each frequency component (namely each signal envelope) of the signal frame in the frequency domain as the main frequency energy of the frequency component, constructing an energy extremum sequence of the signal frame according to the sequence of the frequency components in the spectrogram by the main frequency energy of all the frequency components of the signal frame in the frequency domain, and simultaneously calculating the 3dB bandwidth of each frequency component in the signal frame to be Wb, wherein the 3dB bandwidth calculation of the frequency components is a known technology, and will not be repeated here.
Based on the analysis, the sound signals of human theft and valve actions existing in the sound segments corresponding to the signal frames are recorded as effective signals, and then effective confidence weights of each signal frame are constructed and used for representing the possibility that the sound segments corresponding to the signal frames contain effective signals, and the calculation process is specifically as follows:
wherein, Active significant factor for the jth signal frame; /(I)The total number of frequency components in the frequency domain for the jth signal frame; /(I)、/>The maximum main frequency energy value of all frequency components in the frequency domain of the jth signal frame is respectively the main frequency energy of the kth frequency component in the corresponding frequency domain of the jth signal frame; /(I)As the zero crossing rate of the sound signal in the jth signal frame is calculated as a known technique, the calculation process is not repeated here. /(I)In order to adjust parameters, the empirical value is 1, and the situation that the denominator is 0 is prevented in the calculation processThe value of (2) can be set by the operator.
When the sum of the difference between the fundamental frequency energy value in the jth signal frame and the dominant frequency energy of each frequency component in said signal frame is smaller, i.eThe smaller the signal frame, the more concentrated the energy distribution between the fundamental frequency and the respective frequency components in the corresponding frequency domain, i.e. the more pronounced the sound component in the signal frame; at the same time when the zero crossing rate of the sound signal in the signal frame is larger, i.e./>The larger the signal frame, the faster the waveform change speed of the sound signal in the signal frame, the more active the sound component in the signal frame, namely the active significant factor/>The larger.
An effective steady state factor for the jth signal frame; /(I)Active significant factor for the jth signal frame; /(I)The information entropy of the amplitude sequence corresponding to the jth signal frame, namely the information entropy of all the amplitudes contained in the jth signal frame; /(I)The total number of signal periods in the jth signal frame; /(I)、/>Respectively the amplitude characteristic groups of the t and v signal periods in the jth signal frame; /(I)For amplitude feature set/>And/>Cosine similarity between them.
When the active significant factor of the jth signal frame is greater, i.e.The larger the more active the sound component in the signal frame; meanwhile, when the information entropy of the amplitude sequence corresponding to the jth signal frame is smaller, namely/>The smaller the signal frame, the smaller the chaotic fluctuation degree of the corresponding amplitude sequence is; meanwhile, when the sum of cosine similarity among amplitude characteristic groups corresponding to all signal periods in the jth signal frame is larger, namely/>The larger the signal frame, the greater the degree of similarity between the amplitude feature sets corresponding to all signal periods in the signal frame, the more stable the sound component in the signal frame is on the basis of being more active, namely the effective steady-state factor/>The larger.
Effective confidence weight for the jth signal frame; /(I)An effective steady state factor for the jth signal frame; /(I)The total number of frequency components in the frequency domain for the jth signal frame; /(I)、/>The average value of 3dB bandwidths of the nth frequency component in the corresponding frequency domain of the jth signal frame and the 3dB bandwidths of all the frequency components in the jth signal frame are respectively; /(I)Denominator addition/>In order to avoid the case where the denominator is zero.
When the effective steady-state factor of the jth signal frame is greater, i.e.The larger the signal frame, the more stable the sound component in the signal frame is on a more active basis; while when the sum of the absolute value of the difference between the 3dB bandwidths of all frequency components in the corresponding frequency domain of the jth signal frame and the 3dB bandwidth mean of all frequency components in the signal frame is smaller, namelyThe smaller the frequency interval existing between different frequency components in the signal frame is, the more concentrated the frequency distribution is, the more probability that effective signals exist in the corresponding sound segments of the signal frame is, the effective confidence weight/>The larger.
By the method, the effective confidence weights of the signal frames in the ecological area can be obtained, and when the effective confidence weights of the signal frames are directly used as judging conditions of the active frames, the interference degree of noise on effective signals in the signal frames and the continuous rule degree of the effective signals are not considered, so that the follow-up active frames can be missed, and the final evaluation result is influenced.
Based on the analysis, the invention constructs the high-energy-efficiency persistence coefficient of each signal frame, which is used for representing the effective signal degree of the persistence rule contained in the signal frame:
wherein, A high energy effective factor for the j-th signal frame; /(I)The total number of frequency components in the frequency domain for the jth signal frame; /(I)、/>The signal attenuation rate of the kth frequency component in the frequency domain corresponding to the jth signal frame and the signal attenuation rate average value of all the frequency components in the frequency domain corresponding to the jth signal frame are respectively calculated by the following steps: the ratio of the dominant frequency energy of the frequency component to the 3dB bandwidth of the frequency component; /(I)Is the effective confidence weight of the jth signal frame.
When the sum of the absolute value of the difference between the signal attenuation rate of each frequency component and the average value of the signal attenuation rates of all frequency components in the frequency domain corresponding to the jth signal frame is smaller, that isThe smaller the signal frame, the less the sound signal corresponding to the signal frame is disturbed by noise; meanwhile, when the effective confidence weight of the jth signal frame is larger, the probability of existence of effective signals in the signal frame is higher, namely the high-energy effective factor/>The larger.
The short-time energy of the jth signal frame is obtained and recorded asIn this embodiment, the signal frame is obtained by summing the squares of the amplitudes of all the time domain signals; constructing an effective two-dimensional vector of the signal frame according to the short-time energy and the high-energy effective factor of the signal frame and marking the effective two-dimensional vector as/>. Further, the energy-efficient persistence coefficients of the signal frames are dispersed, and the expression is:
The energy-efficient persistence coefficient for the jth signal frame; /(I) As the short-time energy of the jth signal frame is known, the short-time energy calculation process of the signal frame is not described in detail in this embodiment; /(I)、/>The average value of the high-energy effective factors of all the signal frames in the monitoring interval where the jth signal frame is positioned is respectively the high-energy effective factors of the jth signal frame; /(I)The total number of signal frames in the monitoring interval is set; /(I)、/>The effective two-dimensional vectors of the j-th signal frame and the z-th signal frame are respectively; Is an effective two-dimensional vector/> 、/>Pearson correlation coefficient therebetween; /(I)In order to adjust the parameters, the empirical value is 1, so that the situation that the denominator is 0 in the calculation process is prevented, and a specific value-taking implementation can also set by the user.
When the short-time energy of the jth signal frame is larger, i.eThe larger the signal frame, the more obvious the sound component is contained in the signal frame; meanwhile, when the absolute value of the difference between the high-energy effective factors of the jth information frame and the average value of the high-energy effective factors of all the information frames in the monitoring interval is smaller, namely/>The smaller the effective signal is, the less the effective signal in the signal frame is interfered by noise and the difference between the possibility of existence of the effective signal and the average interfered condition and the average possibility of existence of the effective signal of all the signal frames in the monitoring interval of the signal frame is smaller; at the same time, when the sum of the pearson correlation coefficients between the jth signal frame and the effective two-dimensional vectors corresponding to all the other signal frames in the monitoring interval is larger, namelyThe larger the positive correlation between the short-time energy and the high-energy effective factor in the signal frame and the rest of the signal frames in the monitoring interval, the more remarkable the effective signal duration in the signal frame in the monitoring interval, the more the signal frame should be judged as an active frame, namely the high-energy effective duration coefficient/>The larger.
Thus, the high-energy-efficiency continuous coefficient of each signal frame in the monitoring interval can be obtained. Constructing an energy-efficient continuous sequence according to the time sequence according to the energy-efficient continuous coefficients of all signal frames in the monitoring interval, obtaining a segmentation threshold value in the energy-efficient continuous sequence through an OTSU (on-line per se) Otsu method, wherein the input of the OTSU Otsu method is the energy-efficient continuous sequence based on the monitoring interval, and the algorithm is output as the segmentation threshold value。
When the energy-efficient persistence coefficient of the signal frame is greater than or equal to the segmentation threshold valueDetermining the signal frame as an active frame; and otherwise, determining the signal frame as an inactive frame. And taking all the active frames in the monitoring interval as input, and acquiring a voice segment formed by splicing all the active frames in the monitoring interval by using PSOLA (Pitch Synchronous Overlap Add) pitch synchronous superposition algorithm, wherein PSOLA algorithm is a known technology, and the specific acquisition process is not repeated.
Thus, the voice segments formed by splicing all adjacent active frames in the monitoring interval can be obtained, the starting point and the ending point of each voice segment are recorded, and the voice segments are used as the output of the VAD end point detection algorithm, so that the end point detection of the ecological region sound signal in the monitoring interval is realized. Because the OTSU oxford method and the VAD endpoint detection algorithm are both known techniques, the present invention is not repeated.
According to the method, the endpoint detection of the sound signals in the ecological area can be realized and used for evaluating the influence of human activities on the natural ecological system.
And step S003, evaluating the influence of human activities through the total duration of the active frame voice segments of each ecological area in the acquired monitoring interval.
According to the mode, the total duration of each active frame spliced voice segment contained in each ecological region in the monitoring interval can be obtained, and when the duration of all active frame voice segments contained in the ecological region in the monitoring interval is greater than or equal to a preset time threshold, the influence of human activities of the ecological region on a natural ecological system in the monitoring interval is judged; otherwise, when the duration time of all the active frame voice segments contained in the ecological region in the monitoring interval is smaller than the preset time threshold value, judging that the human activity of the ecological region in the monitoring interval has no influence on the natural ecological system. Wherein the preset time threshold value is set by the user himself, the embodiment is not limited in particular, and the preset time threshold value is set as the monitoring interval duration in the embodiment。
Thus, the human activity influence assessment method based on the time-frequency persistence analysis can be realized in the mode.
Based on the same inventive concept as the above method, the embodiment of the present invention further provides a human activity impact assessment system based on time-frequency persistence analysis, which comprises a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor implements the steps of any one of the above human activity impact assessment methods based on time-frequency persistence analysis when executing the computer program.
In summary, the embodiment of the invention constructs the effective confidence weight of the signal frame according to the signal characteristics in the monitoring interval, comprehensively considers the time domain, frequency domain characteristics and signal activity stability characteristics of the signals in each signal frame in the monitoring interval, subsequently combines the noise interference degree of the sound signals in the signal frame and the signal duration rule characteristics to obtain the high-energy efficiency duration coefficient, can more accurately measure the effective signal degree of the duration rule contained in the signal frame, constructs the high-energy efficiency duration coefficient sequence according to the high-energy efficiency duration coefficient to obtain the segmentation threshold, judges whether the signal frame belongs to the active frame through the segmentation threshold, avoids the defect that the traditional VAD endpoint detection algorithm is difficult to accurately judge the active frame, and meanwhile, the active frame judged according to the embodiment of the invention can better reflect the human activity condition in the ecological region in the detection interval, and subsequently accurately evaluates the human activity influence through the total duration of the voice segment of the active frame.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing description of the preferred embodiments of the present invention is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present invention are intended to be included within the scope of the present invention.
Claims (6)
1. A method for evaluating the impact of human activity based on time-frequency persistence analysis, the method comprising the steps of:
Collecting an audio signal of a natural ecosystem and carrying out framing treatment to obtain each signal frame; drawing a corresponding time domain waveform diagram of the sound signal corresponding to each signal frame;
Constructing an amplitude characteristic group of the signal period according to the maximum value and the minimum value of the amplitude of each period in the time domain waveform diagram of the signal frame; obtaining active significant factors of each signal frame according to the main frequency energy corresponding to each frequency component of each signal frame in the frequency domain and the zero crossing rate of the signal frame; obtaining effective steady-state factors of the signal frames according to the active significant factors of the signal frames, the information entropy of the corresponding amplitude and the cosine similarity among the amplitude characteristic groups of the signal periods; obtaining effective confidence weights of the signal frames according to the effective steady-state factors of the signal frames and the 3dB bandwidths of the frequency components of the signal frames in the frequency domain; calculating the signal attenuation rate of each frequency component of the signal frame in the frequency domain, and obtaining the high-energy effective factor of each signal frame according to the effective confidence weight of each signal frame and the distribution condition of the signal attenuation rate of each frequency component in the frequency domain; the short-time energy and the high-energy effective factor of each signal frame are formed into an effective two-dimensional vector of each signal frame; constructing high-energy-efficiency continuous coefficients of each signal frame according to the short-time energy and high-energy effective factors of each signal frame and the pearson correlation coefficients between the effective two-dimensional vectors of each signal frame and other signal frames;
Combining the high-energy-efficiency continuous coefficients of all signal frames in the monitoring interval with the Ojin method to obtain all active frames, and evaluating the influence of human activities based on the active frames;
The calculating the signal attenuation rate of each frequency component of the signal frame in the frequency domain comprises the following steps: taking the ratio of the main frequency energy of each frequency component to the 3dB bandwidth of each frequency component as the signal attenuation rate of each frequency component;
The obtaining the high-energy effective factor of each signal frame according to the effective confidence weight of each signal frame and the distribution condition of the signal attenuation rate of each frequency component in the frequency domain comprises the following steps:
Calculating the average value of the signal attenuation rates of all frequency components in the signal frame, obtaining the sum value of the signal attenuation rates of all frequency components in the frequency domain of the signal frame and the absolute value of the difference value of the average value, and taking the product of the effective confidence weight of the signal frame and the reciprocal of the sum value as the high-energy effective factor of the signal frame;
The construction of the high-energy-efficiency continuous coefficient of each signal frame according to the short-time energy, the high-energy effective factor of each signal frame and the pearson correlation coefficient between the effective two-dimensional vectors of each signal frame and other signal frames specifically comprises the following steps:
for each signal frame, calculating the sum of pearson correlation coefficients between effective two-dimensional vectors of the signal frame and other signal frames, and taking the sum as an index of an exponential function based on a natural constant;
Calculating the difference absolute value of the high-energy effective factors of the signal frames and the average value of the high-energy effective factors of all the signal frames, obtaining the ratio of the short-time energy of the signal frames to the difference absolute value, and taking the product of the ratio and the calculation result of the exponential function as the high-energy efficiency continuous coefficient of the signal frames;
the method for obtaining each active frame by combining the high-energy-efficiency continuous coefficient of each signal frame in the monitoring interval and the Ojin method, and evaluating the influence of the active frame on the human activity comprises the following steps:
presetting the duration of a monitoring interval, calculating the segmentation threshold value of the high-energy-efficiency continuous coefficients of all signal frames in the monitoring interval by adopting an Ojin method, and judging the signal frames to be active frames when the high-energy-efficiency continuous coefficients of the signal frames are more than or equal to the segmentation threshold value;
Taking all active frames in the monitoring interval as input, and acquiring all active frames in the monitoring interval by using a PSOLA pitch synchronous superposition algorithm to splice to form a voice segment; when the duration of the voice section of the ecological area in the monitoring interval is greater than or equal to a preset time threshold, the human activity has an influence on the natural ecological system, and otherwise, the human activity has no influence.
2. The human activity impact assessment method based on time-frequency persistence analysis according to claim 1, wherein the constructing the amplitude feature set of the signal period according to the maximum value and the minimum value of the amplitude of each period in the time domain waveform of the signal frame comprises: the maximum and minimum values of the amplitude are combined into amplitude characteristic groups corresponding to the signal period.
3. The human activity impact assessment method based on time-frequency persistence analysis according to claim 1, wherein the obtaining the activity significance factor of each signal frame according to the main frequency energy corresponding to each frequency component of each signal frame in the frequency domain and the zero crossing rate of the signal frame comprises:
taking the zero crossing rate of the signal frame as an index of an exponential function based on a natural constant;
counting the maximum main frequency energy value of all frequency components in the frequency of the signal frame, and calculating the sum value of the maximum main frequency energy value and the main frequency energy difference value of each frequency component;
And taking the result of the product of the inverse of the sum value and the exponential function as an active significant factor of a signal frame.
4. The method for evaluating the influence of human activity based on time-frequency persistence analysis according to claim 2, wherein the obtaining the effective steady-state factor of each signal frame according to the activity significance factor of each signal frame, the information entropy of the corresponding amplitude, and the cosine similarity between the amplitude feature sets of each signal period comprises:
Calculating information entropy of all amplitudes contained in the signal frame, and obtaining the ratio of the active significant factor of the signal frame to the information entropy; and calculating the sum value of cosine similarity of the amplitude characteristic groups of all any two signal periods of the signal frame, and taking the product of the ratio and the sum value as an effective steady-state factor of the signal frame.
5. The method for evaluating the influence of human activity based on time-frequency persistence analysis according to claim 1, wherein the obtaining the effective confidence weight of each signal frame from the effective steady-state factor of each signal frame and the 3dB bandwidth of each frequency component of the signal frame in the frequency domain comprises:
Calculating the average value of the 3dB bandwidths of all the frequency components of the signal frame, obtaining the absolute value of the difference value between the 3dB bandwidths of all the frequency components of the signal frame in the frequency domain and the average value, and obtaining the reciprocal of the sum value of the absolute value of the difference values of all the frequency components of the signal frame in the frequency domain;
Taking the product of the effective steady-state factor of the signal frame and the reciprocal as the effective confidence weight of the signal frame.
6. Human activity impact assessment system based on time-frequency persistence analysis, comprising a memory, a processor and a computer program stored in the memory and running on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-5 when executing the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410179329.2A CN117746905B (en) | 2024-02-18 | 2024-02-18 | Human activity influence assessment method and system based on time-frequency persistence analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410179329.2A CN117746905B (en) | 2024-02-18 | 2024-02-18 | Human activity influence assessment method and system based on time-frequency persistence analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117746905A CN117746905A (en) | 2024-03-22 |
CN117746905B true CN117746905B (en) | 2024-04-19 |
Family
ID=90278080
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410179329.2A Active CN117746905B (en) | 2024-02-18 | 2024-02-18 | Human activity influence assessment method and system based on time-frequency persistence analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117746905B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5876899A (en) * | 1981-10-31 | 1983-05-10 | 株式会社東芝 | Voice segment detector |
WO2015139452A1 (en) * | 2014-03-17 | 2015-09-24 | 华为技术有限公司 | Method and apparatus for processing speech signal according to frequency domain energy |
WO2018011794A1 (en) * | 2016-07-10 | 2018-01-18 | B.G. Negev Technologies And Applications Ltd., At Ben-Gurion University | Methods and systems for estimation of obstructive sleep apnea severity in wake subjects by multiple speech analyses |
CN116884431A (en) * | 2023-08-03 | 2023-10-13 | 西华大学 | CFCC (computational fluid dynamics) feature-based robust audio copy-paste tamper detection method and device |
CN116992219A (en) * | 2023-09-07 | 2023-11-03 | 博睿康科技(常州)股份有限公司 | Signal quality characterization unit and noise source positioning method based on noise detection index |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100586893B1 (en) * | 2004-06-28 | 2006-06-08 | 삼성전자주식회사 | System and method for estimating speaker localization in non-stationary noise environment |
US10783764B2 (en) * | 2016-04-19 | 2020-09-22 | University Of South Carolina | Impact force estimation and event localization |
JP7230545B2 (en) * | 2019-02-04 | 2023-03-01 | 富士通株式会社 | Speech processing program, speech processing method and speech processing device |
-
2024
- 2024-02-18 CN CN202410179329.2A patent/CN117746905B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5876899A (en) * | 1981-10-31 | 1983-05-10 | 株式会社東芝 | Voice segment detector |
WO2015139452A1 (en) * | 2014-03-17 | 2015-09-24 | 华为技术有限公司 | Method and apparatus for processing speech signal according to frequency domain energy |
WO2018011794A1 (en) * | 2016-07-10 | 2018-01-18 | B.G. Negev Technologies And Applications Ltd., At Ben-Gurion University | Methods and systems for estimation of obstructive sleep apnea severity in wake subjects by multiple speech analyses |
CN116884431A (en) * | 2023-08-03 | 2023-10-13 | 西华大学 | CFCC (computational fluid dynamics) feature-based robust audio copy-paste tamper detection method and device |
CN116992219A (en) * | 2023-09-07 | 2023-11-03 | 博睿康科技(常州)股份有限公司 | Signal quality characterization unit and noise source positioning method based on noise detection index |
Non-Patent Citations (1)
Title |
---|
多媒体直播场景下实时语音活动检测算法研究与实现;郝雪;中国优秀硕士学位论文全文数据库信息科技辑;20240115(第01期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN117746905A (en) | 2024-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2788980B1 (en) | Harmonicity-based single-channel speech quality estimation | |
TWI647961B (en) | Method and apparatus for determining directions of uncorrelated sound sources in a higher order ambisonics representation of a sound field | |
CN110503970A (en) | A kind of audio data processing method, device and storage medium | |
CN110875060A (en) | Voice signal processing method, device, system, equipment and storage medium | |
CN105788603A (en) | Audio identification method and system based on empirical mode decomposition | |
CN111429932A (en) | Voice noise reduction method, device, equipment and medium | |
CN110265065B (en) | Method for constructing voice endpoint detection model and voice endpoint detection system | |
CN106157967A (en) | Impulse noise mitigation | |
CN111785288B (en) | Voice enhancement method, device, equipment and storage medium | |
CN112786059A (en) | Voiceprint feature extraction method and device based on artificial intelligence | |
CN112750461B (en) | Voice communication optimization method and device, electronic equipment and readable storage medium | |
CN107393549A (en) | Delay time estimation method and device | |
CN111883181A (en) | Audio detection method and device, storage medium and electronic device | |
Li et al. | A multi-objective learning speech enhancement algorithm based on IRM post-processing with joint estimation of SCNN and TCNN | |
Hidayat et al. | A Modified MFCC for Improved Wavelet-Based Denoising on Robust Speech Recognition. | |
CN115223583A (en) | Voice enhancement method, device, equipment and medium | |
CN117746905B (en) | Human activity influence assessment method and system based on time-frequency persistence analysis | |
CN106356076A (en) | Method and device for detecting voice activity on basis of artificial intelligence | |
CN112735466A (en) | Audio detection method and device | |
CN106128480B (en) | The method that a kind of pair of noisy speech carries out voice activity detection | |
CN105989837B (en) | Audio matching method and device | |
CN103778914A (en) | Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching | |
CN112712818A (en) | Voice enhancement method, device and equipment | |
JP5147012B2 (en) | Target signal section estimation device, target signal section estimation method, target signal section estimation program, and recording medium | |
CN111816218B (en) | Voice endpoint detection method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |