CN117746905B - Human activity influence assessment method and system based on time-frequency persistence analysis - Google Patents

Human activity influence assessment method and system based on time-frequency persistence analysis Download PDF

Info

Publication number
CN117746905B
CN117746905B CN202410179329.2A CN202410179329A CN117746905B CN 117746905 B CN117746905 B CN 117746905B CN 202410179329 A CN202410179329 A CN 202410179329A CN 117746905 B CN117746905 B CN 117746905B
Authority
CN
China
Prior art keywords
signal
signal frame
energy
frequency
effective
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410179329.2A
Other languages
Chinese (zh)
Other versions
CN117746905A (en
Inventor
滕兵
张晓东
蔡宪文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bainiao Data Technology Beijing Co ltd
Original Assignee
Bainiao Data Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bainiao Data Technology Beijing Co ltd filed Critical Bainiao Data Technology Beijing Co ltd
Priority to CN202410179329.2A priority Critical patent/CN117746905B/en
Publication of CN117746905A publication Critical patent/CN117746905A/en
Application granted granted Critical
Publication of CN117746905B publication Critical patent/CN117746905B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention relates to the technical field of voice analysis and processing, in particular to a human activity influence assessment method and system based on time-frequency persistence analysis, wherein the method comprises the following steps: collecting audio signals and carrying out framing processing to obtain signal frames, and constructing an amplitude characteristic group of a signal period; calculating active significant factors of each signal frame; obtaining effective steady-state factors of each signal frame according to the active significant factors of each signal frame, the information entropy of the corresponding amplitude and the amplitude characteristic group of each signal period; effective confidence weights of the signal frames are built, and high-energy effective factors of the signal frames are calculated by combining signal attenuation rates of frequency components in a frequency domain; the short-time energy and the high-energy effective factors form an effective two-dimensional vector; energy efficient persistence coefficients for each signal frame are calculated to extract each active frame, and human activity impact is evaluated based on the active frames. Therefore, the accurate evaluation of the human activity image is realized, and the defect that the traditional VAD endpoint detection algorithm is difficult to accurately judge the activity frame is avoided.

Description

Human activity influence assessment method and system based on time-frequency persistence analysis
Technical Field
The invention relates to the technical field of voice analysis and processing, in particular to a human activity influence assessment method and system based on time-frequency persistence analysis.
Background
With the deep development of sustainable development concepts and continuous progress of scientific technology, reasonable utilization and protection of natural resources are widely promoted and developed. Human activities have profound and wide influence on natural environment and ecosystem systems, and the purpose of human activity influence evaluation is to measure the influence change degree of human activities on the ecosystem so that subsequent related personnel can take corresponding management and protection measures; whereas theft refers to the act of illegally felling and unauthorized harvesting forest resources in human activities, severely damaging the balance and sustainable utilization of the ecosystem. The monitoring and evaluation of human theft behaviors in the environment can be realized by monitoring the sound signals in the ecological system, and the sustainable development and protection of the ecological environment are promoted.
Because the ecosystem environment is complex and various environmental sounds and noise are doped, the traditional VAD endpoint detection algorithm (Voice Activity Detection) only determines the threshold of the active frame through short-time energy and zero crossing rate, so that the active frame is difficult to accurately judge when the natural ecosystem sound is analyzed, and the interference of a silent section and a non-target sound section cannot be eliminated, and the influence on the follow-up evaluation of the human activity influence is influenced.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a human activity influence assessment method and a system based on time-frequency persistence analysis, and the adopted technical scheme is as follows:
In a first aspect, an embodiment of the present invention provides a method for evaluating the influence of human activity based on time-frequency persistence analysis, the method comprising the steps of:
Collecting an audio signal of a natural ecosystem and carrying out framing treatment to obtain each signal frame; drawing a corresponding time domain waveform diagram of the sound signal corresponding to each signal frame;
Constructing an amplitude characteristic group of the signal period according to the maximum value and the minimum value of the amplitude of each period in the time domain waveform diagram of the signal frame; obtaining active significant factors of each signal frame according to the main frequency energy corresponding to each frequency component of each signal frame in the frequency domain and the zero crossing rate of the signal frame; obtaining effective steady-state factors of the signal frames according to the active significant factors of the signal frames, the information entropy of the corresponding amplitude and the cosine similarity among the amplitude characteristic groups of the signal periods; obtaining effective confidence weights of the signal frames according to the effective steady-state factors of the signal frames and the 3dB bandwidths of the frequency components of the signal frames in the frequency domain; calculating the signal attenuation rate of each frequency component of the signal frame in the frequency domain, and obtaining the high-energy effective factor of each signal frame according to the effective confidence weight of each signal frame and the distribution condition of the signal attenuation rate of each frequency component in the frequency domain; the short-time energy and the high-energy effective factor of each signal frame are formed into an effective two-dimensional vector of each signal frame; constructing high-energy-efficiency continuous coefficients of each signal frame according to the short-time energy and high-energy effective factors of each signal frame and the pearson correlation coefficients between the effective two-dimensional vectors of each signal frame and other signal frames;
and (3) combining the high-energy-efficiency continuous coefficients of the signal frames in the monitoring interval and the Ojin method to obtain the active frames, and evaluating the influence of the active frames on human activities.
Further, the constructing an amplitude feature set of the signal period according to the maximum value and the minimum value of the amplitude of each period in the time domain waveform diagram of the signal frame includes: the maximum and minimum values of the amplitude are combined into amplitude characteristic groups corresponding to the signal period.
Further, the obtaining the active significant factor of each signal frame according to the main frequency energy corresponding to each frequency component of each signal frame in the frequency domain and the zero crossing rate of the signal frame includes:
taking the zero crossing rate of the signal frame as an index of an exponential function based on a natural constant;
counting the maximum main frequency energy value of all frequency components in the frequency of the signal frame, and calculating the sum value of the maximum main frequency energy value and the main frequency energy difference value of each frequency component;
And taking the result of the product of the inverse of the sum value and the exponential function as an active significant factor of a signal frame.
Further, the obtaining the effective steady-state factor of each signal frame according to the active significant factor of each signal frame, the information entropy of the corresponding amplitude and the cosine similarity between the amplitude characteristic groups of each signal period includes:
Calculating information entropy of all amplitudes contained in the signal frame, and obtaining the ratio of the active significant factor of the signal frame to the information entropy; and calculating the sum value of cosine similarity of the amplitude characteristic groups of all any two signal periods of the signal frame, and taking the product of the ratio and the sum value as an effective steady-state factor of the signal frame.
Further, the obtaining the effective confidence weight of each signal frame according to the effective steady-state factor of each signal frame and the 3dB bandwidth of each frequency component of the signal frame in the frequency domain comprises the following steps:
Calculating the average value of the 3dB bandwidths of all the frequency components of the signal frame, obtaining the absolute value of the difference value between the 3dB bandwidths of all the frequency components of the signal frame in the frequency domain and the average value, and obtaining the reciprocal of the sum value of the absolute value of the difference values of all the frequency components of the signal frame in the frequency domain;
Taking the product of the effective steady-state factor of the signal frame and the reciprocal as the effective confidence weight of the signal frame.
Further, the calculating the signal attenuation rate of each frequency component of the signal frame in the frequency domain includes: the ratio of the main frequency energy of each frequency component to the 3dB bandwidth of each frequency component is taken as the signal attenuation rate of each frequency component.
Further, the obtaining the high-energy effective factor of each signal frame according to the effective confidence weight of each signal frame and the distribution condition of the signal attenuation rate of each frequency component in the frequency domain includes:
And calculating the average value of the signal attenuation rates of all frequency components in the signal frame, obtaining the sum value of the signal attenuation rates of all frequency components in the frequency domain of the signal frame and the absolute value of the difference value of the average value, and taking the product of the effective confidence weight of the signal frame and the reciprocal of the sum value as the high-energy effective factor of the signal frame.
Further, the constructing the energy-efficient persistence coefficient of each signal frame according to the short-time energy, the high-energy effective factor of each signal frame and the pearson correlation coefficient between the effective two-dimensional vectors of each signal frame and other signal frames specifically includes:
for each signal frame, calculating the sum of pearson correlation coefficients between effective two-dimensional vectors of the signal frame and other signal frames, and taking the sum as an index of an exponential function based on a natural constant;
And calculating the difference absolute value of the high-energy effective factors of the signal frames and the average value of the high-energy effective factors of all the signal frames, obtaining the ratio of the short-time energy of the signal frames to the difference absolute value, and taking the product of the ratio and the calculation result of the exponential function as the high-energy efficiency continuous coefficient of the signal frames.
Further, the method for obtaining each active frame by combining the high energy efficiency continuous coefficient of each signal frame in the monitoring interval and the oxford method, and evaluating the influence of the active frame on the human activity comprises the following steps:
presetting the duration of a monitoring interval, calculating the segmentation threshold value of the high-energy-efficiency continuous coefficients of all signal frames in the monitoring interval by adopting an Ojin method, and judging the signal frames to be active frames when the high-energy-efficiency continuous coefficients of the signal frames are more than or equal to the segmentation threshold value;
Taking all active frames in the monitoring interval as input, and acquiring all active frames in the monitoring interval by using a PSOLA pitch synchronous superposition algorithm to splice to form a voice segment; when the duration of the voice section of the ecological area in the monitoring interval is greater than or equal to a preset time threshold, the human activity has an influence on the natural ecological system, and otherwise, the human activity has no influence.
In a second aspect, embodiments of the present invention also provide a human activity impact assessment system based on time-frequency persistence analysis, comprising a memory, a processor and a computer program stored in the memory and running on the processor, the processor implementing the steps of any one of the methods described above when executing the computer program.
The invention has at least the following beneficial effects:
according to the method, the effective confidence weight of the signal frame is built according to the signal characteristics in the monitoring interval, the time domain, the frequency domain characteristics and the signal activity stability characteristics of signals in each signal frame in the monitoring interval are comprehensively considered, the high-energy-efficiency continuous coefficient is obtained by combining the noise interference degree of the sound signals in the signal frame and the signal continuous rule characteristics, the effective signal degree of the continuous rule contained in the signal frame can be accurately measured, the high-energy-efficiency continuous coefficient sequence is built according to the high-energy-efficiency continuous coefficient to obtain the segmentation threshold, whether the signal frame belongs to the active frame is judged through the segmentation threshold, the defect that the traditional VAD endpoint detection algorithm is difficult to accurately judge the active frame is avoided, meanwhile, the active frame judged according to the method can better reflect the human activity condition in the ecological region in the monitoring interval, the human activity influence is estimated through the total duration of the active frame voice segment, and the accurate estimation of the human activity influence combined with sound time-frequency continuous analysis is realized.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the steps of a human activity impact assessment method based on time-frequency persistence analysis according to an embodiment of the present invention;
FIG. 2 is a schematic representation of a time domain waveform of an acoustic signal in a natural ecosystem;
FIG. 3 is a schematic diagram of a time-domain spectrogram of an acoustic signal in a natural ecosystem;
Fig. 4 is a schematic representation of the spectral amplitude of an acoustic signal in a natural ecosystem.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description refers to the specific implementation, structure, characteristics and effects of the human activity influence assessment method and system based on time-frequency persistence analysis according to the present invention, which are described in detail below with reference to the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the human activity influence evaluation method and system based on time-frequency persistence analysis provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a flowchart illustrating a method for evaluating human activity impact based on time-frequency persistence analysis according to an embodiment of the invention is shown, the method includes the following steps:
Step S001, collecting and acquiring sound signals in the natural ecosystem, and preprocessing the acquired audio signals.
The sound signal in natural ecosystem is continuously collected by using a mono sound sensor, the effective monitoring range of the sound sensor used in the invention is a 20m range with the sensor as the center, the ecosystem is divided according to the placement positions of all the sound sensors in the natural ecosystem and the effective monitoring range of the sensors (namely, the monitoring range of each sensor is taken as an ecological area), the signal sampling rate of the sound sensor is 22.05kHz, and the number of the sound sensors isEvery 10min is set as a monitoring interval, in this embodiment/>625, The practitioner can set himself according to the actual situation.
In order to facilitate the subsequent characteristic analysis of the sound signals in each ecological area, the frame-dividing pretreatment is also needed for the sound signals, and in the frame-dividing pretreatment operation of the invention, the frame length of the sound signals is set to be 20ms, and the frame length is selected in a frame shift wayFor the subsequent more accurate analysis of the frequency domain characteristics of the sound signals in the natural ecosystem, a hamming window is selected to perform windowing operation on the sound signals of each frame, and the overlapping proportion of the previous frame and the current frame is 50%, so that the framing process is a known technology, and the embodiment will not be repeated.
Thus, each signal frame of each ecological area in the monitoring interval can be obtained.
Step S002, effective confidence weights of the signal frames are constructed according to the signal characteristics in the monitoring interval, high-energy-efficiency continuous coefficients are obtained through the combination of the effective confidence weights and the noise interference degree of the signal frames and the signal continuous rule characteristics, high-energy-efficiency continuous coefficient sequences are constructed according to the high-energy-efficiency continuous coefficients of the signal frames, whether the signal frames belong to the active frames is judged through the segmentation threshold, and the endpoint detection of sound signals in all ecological areas in the monitoring interval is achieved.
Because the human piracy activity involves the use of mechanical equipment such as electric saws and other tools, the tools can generate vibration and high-frequency noise when in operation, so that the sound signal has higher zero crossing rate and higher energy, and the human piracy activity can be kept for a certain duration, so that the sound signal generated by the human piracy activity also has stronger regularity; since natural environmental sounds (including but not limited to wind sounds and water sounds) in an ecological area have wider frequency spectrum characteristics, the natural environmental sounds have lower zero-crossing rate and signal energy, while animal sounds in the natural environmental sounds usually belong to high-frequency information, have higher energy and zero-crossing rate, but due to the self characteristics of animal sound signals, the signal stability is poorer than that of human piracy sound signals. The time domain waveform diagram of the sound signal in the natural ecosystem is shown in fig. 2, the abscissa in fig. 2 is time, and the ordinate is the amplitude of the waveform; the time domain spectrogram of the sound signal in the natural ecosystem is shown in fig. 3, the abscissa in fig. 3 is frequency, and the ordinate is amplitude; the spectrum amplitude diagram of the sound signal in the natural ecosystem is shown in fig. 4, wherein in fig. 4, the abscissa is discrete frequency, and the ordinate is amplitude.
An ecological area is taken as an example for analysis, a corresponding time domain waveform diagram is drawn according to the amplitude value of the sound signal corresponding to each signal frame, and the amplitude sequence of the signal frame is constructed according to the time sequence of the amplitude of the sound signal corresponding to the signal frame. The sound signal in each signal frame is converted into a frequency domain through Fourier transformation, a fundamental frequency envelope in a frequency spectrum corresponding to the signal frame is determined, a frequency corresponding to a maximum energy value in the fundamental frequency envelope is used as a fundamental frequency, the fundamental frequency is inverted to obtain a signal period size t, and the signal frames are divided according to the signal period size. It should be noted that, the period acquisition enforcer of the signal frame may select other existing techniques to calculate, and the signal period acquisition method is many, and the present embodiment is not limited to this specific technique. The amplitude feature set of the jth signal period in the jth signal frame is constructed by the maximum value and the minimum value of the amplitude of the jth signal period and is recorded asWherein/>The maximum and minimum values of the amplitude in the t signal period in the j signal frame are respectively.
Taking the maximum energy value of each frequency component (namely each signal envelope) of the signal frame in the frequency domain as the main frequency energy of the frequency component, constructing an energy extremum sequence of the signal frame according to the sequence of the frequency components in the spectrogram by the main frequency energy of all the frequency components of the signal frame in the frequency domain, and simultaneously calculating the 3dB bandwidth of each frequency component in the signal frame to be Wb, wherein the 3dB bandwidth calculation of the frequency components is a known technology, and will not be repeated here.
Based on the analysis, the sound signals of human theft and valve actions existing in the sound segments corresponding to the signal frames are recorded as effective signals, and then effective confidence weights of each signal frame are constructed and used for representing the possibility that the sound segments corresponding to the signal frames contain effective signals, and the calculation process is specifically as follows:
wherein, Active significant factor for the jth signal frame; /(I)The total number of frequency components in the frequency domain for the jth signal frame; /(I)、/>The maximum main frequency energy value of all frequency components in the frequency domain of the jth signal frame is respectively the main frequency energy of the kth frequency component in the corresponding frequency domain of the jth signal frame; /(I)As the zero crossing rate of the sound signal in the jth signal frame is calculated as a known technique, the calculation process is not repeated here. /(I)In order to adjust parameters, the empirical value is 1, and the situation that the denominator is 0 is prevented in the calculation processThe value of (2) can be set by the operator.
When the sum of the difference between the fundamental frequency energy value in the jth signal frame and the dominant frequency energy of each frequency component in said signal frame is smaller, i.eThe smaller the signal frame, the more concentrated the energy distribution between the fundamental frequency and the respective frequency components in the corresponding frequency domain, i.e. the more pronounced the sound component in the signal frame; at the same time when the zero crossing rate of the sound signal in the signal frame is larger, i.e./>The larger the signal frame, the faster the waveform change speed of the sound signal in the signal frame, the more active the sound component in the signal frame, namely the active significant factor/>The larger.
An effective steady state factor for the jth signal frame; /(I)Active significant factor for the jth signal frame; /(I)The information entropy of the amplitude sequence corresponding to the jth signal frame, namely the information entropy of all the amplitudes contained in the jth signal frame; /(I)The total number of signal periods in the jth signal frame; /(I)、/>Respectively the amplitude characteristic groups of the t and v signal periods in the jth signal frame; /(I)For amplitude feature set/>And/>Cosine similarity between them.
When the active significant factor of the jth signal frame is greater, i.e.The larger the more active the sound component in the signal frame; meanwhile, when the information entropy of the amplitude sequence corresponding to the jth signal frame is smaller, namely/>The smaller the signal frame, the smaller the chaotic fluctuation degree of the corresponding amplitude sequence is; meanwhile, when the sum of cosine similarity among amplitude characteristic groups corresponding to all signal periods in the jth signal frame is larger, namely/>The larger the signal frame, the greater the degree of similarity between the amplitude feature sets corresponding to all signal periods in the signal frame, the more stable the sound component in the signal frame is on the basis of being more active, namely the effective steady-state factor/>The larger.
Effective confidence weight for the jth signal frame; /(I)An effective steady state factor for the jth signal frame; /(I)The total number of frequency components in the frequency domain for the jth signal frame; /(I)、/>The average value of 3dB bandwidths of the nth frequency component in the corresponding frequency domain of the jth signal frame and the 3dB bandwidths of all the frequency components in the jth signal frame are respectively; /(I)Denominator addition/>In order to avoid the case where the denominator is zero.
When the effective steady-state factor of the jth signal frame is greater, i.e.The larger the signal frame, the more stable the sound component in the signal frame is on a more active basis; while when the sum of the absolute value of the difference between the 3dB bandwidths of all frequency components in the corresponding frequency domain of the jth signal frame and the 3dB bandwidth mean of all frequency components in the signal frame is smaller, namelyThe smaller the frequency interval existing between different frequency components in the signal frame is, the more concentrated the frequency distribution is, the more probability that effective signals exist in the corresponding sound segments of the signal frame is, the effective confidence weight/>The larger.
By the method, the effective confidence weights of the signal frames in the ecological area can be obtained, and when the effective confidence weights of the signal frames are directly used as judging conditions of the active frames, the interference degree of noise on effective signals in the signal frames and the continuous rule degree of the effective signals are not considered, so that the follow-up active frames can be missed, and the final evaluation result is influenced.
Based on the analysis, the invention constructs the high-energy-efficiency persistence coefficient of each signal frame, which is used for representing the effective signal degree of the persistence rule contained in the signal frame:
wherein, A high energy effective factor for the j-th signal frame; /(I)The total number of frequency components in the frequency domain for the jth signal frame; /(I)、/>The signal attenuation rate of the kth frequency component in the frequency domain corresponding to the jth signal frame and the signal attenuation rate average value of all the frequency components in the frequency domain corresponding to the jth signal frame are respectively calculated by the following steps: the ratio of the dominant frequency energy of the frequency component to the 3dB bandwidth of the frequency component; /(I)Is the effective confidence weight of the jth signal frame.
When the sum of the absolute value of the difference between the signal attenuation rate of each frequency component and the average value of the signal attenuation rates of all frequency components in the frequency domain corresponding to the jth signal frame is smaller, that isThe smaller the signal frame, the less the sound signal corresponding to the signal frame is disturbed by noise; meanwhile, when the effective confidence weight of the jth signal frame is larger, the probability of existence of effective signals in the signal frame is higher, namely the high-energy effective factor/>The larger.
The short-time energy of the jth signal frame is obtained and recorded asIn this embodiment, the signal frame is obtained by summing the squares of the amplitudes of all the time domain signals; constructing an effective two-dimensional vector of the signal frame according to the short-time energy and the high-energy effective factor of the signal frame and marking the effective two-dimensional vector as/>. Further, the energy-efficient persistence coefficients of the signal frames are dispersed, and the expression is:
The energy-efficient persistence coefficient for the jth signal frame; /(I) As the short-time energy of the jth signal frame is known, the short-time energy calculation process of the signal frame is not described in detail in this embodiment; /(I)、/>The average value of the high-energy effective factors of all the signal frames in the monitoring interval where the jth signal frame is positioned is respectively the high-energy effective factors of the jth signal frame; /(I)The total number of signal frames in the monitoring interval is set; /(I)、/>The effective two-dimensional vectors of the j-th signal frame and the z-th signal frame are respectively; Is an effective two-dimensional vector/> 、/>Pearson correlation coefficient therebetween; /(I)In order to adjust the parameters, the empirical value is 1, so that the situation that the denominator is 0 in the calculation process is prevented, and a specific value-taking implementation can also set by the user.
When the short-time energy of the jth signal frame is larger, i.eThe larger the signal frame, the more obvious the sound component is contained in the signal frame; meanwhile, when the absolute value of the difference between the high-energy effective factors of the jth information frame and the average value of the high-energy effective factors of all the information frames in the monitoring interval is smaller, namely/>The smaller the effective signal is, the less the effective signal in the signal frame is interfered by noise and the difference between the possibility of existence of the effective signal and the average interfered condition and the average possibility of existence of the effective signal of all the signal frames in the monitoring interval of the signal frame is smaller; at the same time, when the sum of the pearson correlation coefficients between the jth signal frame and the effective two-dimensional vectors corresponding to all the other signal frames in the monitoring interval is larger, namelyThe larger the positive correlation between the short-time energy and the high-energy effective factor in the signal frame and the rest of the signal frames in the monitoring interval, the more remarkable the effective signal duration in the signal frame in the monitoring interval, the more the signal frame should be judged as an active frame, namely the high-energy effective duration coefficient/>The larger.
Thus, the high-energy-efficiency continuous coefficient of each signal frame in the monitoring interval can be obtained. Constructing an energy-efficient continuous sequence according to the time sequence according to the energy-efficient continuous coefficients of all signal frames in the monitoring interval, obtaining a segmentation threshold value in the energy-efficient continuous sequence through an OTSU (on-line per se) Otsu method, wherein the input of the OTSU Otsu method is the energy-efficient continuous sequence based on the monitoring interval, and the algorithm is output as the segmentation threshold value
When the energy-efficient persistence coefficient of the signal frame is greater than or equal to the segmentation threshold valueDetermining the signal frame as an active frame; and otherwise, determining the signal frame as an inactive frame. And taking all the active frames in the monitoring interval as input, and acquiring a voice segment formed by splicing all the active frames in the monitoring interval by using PSOLA (Pitch Synchronous Overlap Add) pitch synchronous superposition algorithm, wherein PSOLA algorithm is a known technology, and the specific acquisition process is not repeated.
Thus, the voice segments formed by splicing all adjacent active frames in the monitoring interval can be obtained, the starting point and the ending point of each voice segment are recorded, and the voice segments are used as the output of the VAD end point detection algorithm, so that the end point detection of the ecological region sound signal in the monitoring interval is realized. Because the OTSU oxford method and the VAD endpoint detection algorithm are both known techniques, the present invention is not repeated.
According to the method, the endpoint detection of the sound signals in the ecological area can be realized and used for evaluating the influence of human activities on the natural ecological system.
And step S003, evaluating the influence of human activities through the total duration of the active frame voice segments of each ecological area in the acquired monitoring interval.
According to the mode, the total duration of each active frame spliced voice segment contained in each ecological region in the monitoring interval can be obtained, and when the duration of all active frame voice segments contained in the ecological region in the monitoring interval is greater than or equal to a preset time threshold, the influence of human activities of the ecological region on a natural ecological system in the monitoring interval is judged; otherwise, when the duration time of all the active frame voice segments contained in the ecological region in the monitoring interval is smaller than the preset time threshold value, judging that the human activity of the ecological region in the monitoring interval has no influence on the natural ecological system. Wherein the preset time threshold value is set by the user himself, the embodiment is not limited in particular, and the preset time threshold value is set as the monitoring interval duration in the embodiment
Thus, the human activity influence assessment method based on the time-frequency persistence analysis can be realized in the mode.
Based on the same inventive concept as the above method, the embodiment of the present invention further provides a human activity impact assessment system based on time-frequency persistence analysis, which comprises a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor implements the steps of any one of the above human activity impact assessment methods based on time-frequency persistence analysis when executing the computer program.
In summary, the embodiment of the invention constructs the effective confidence weight of the signal frame according to the signal characteristics in the monitoring interval, comprehensively considers the time domain, frequency domain characteristics and signal activity stability characteristics of the signals in each signal frame in the monitoring interval, subsequently combines the noise interference degree of the sound signals in the signal frame and the signal duration rule characteristics to obtain the high-energy efficiency duration coefficient, can more accurately measure the effective signal degree of the duration rule contained in the signal frame, constructs the high-energy efficiency duration coefficient sequence according to the high-energy efficiency duration coefficient to obtain the segmentation threshold, judges whether the signal frame belongs to the active frame through the segmentation threshold, avoids the defect that the traditional VAD endpoint detection algorithm is difficult to accurately judge the active frame, and meanwhile, the active frame judged according to the embodiment of the invention can better reflect the human activity condition in the ecological region in the detection interval, and subsequently accurately evaluates the human activity influence through the total duration of the voice segment of the active frame.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing description of the preferred embodiments of the present invention is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present invention are intended to be included within the scope of the present invention.

Claims (6)

1. A method for evaluating the impact of human activity based on time-frequency persistence analysis, the method comprising the steps of:
Collecting an audio signal of a natural ecosystem and carrying out framing treatment to obtain each signal frame; drawing a corresponding time domain waveform diagram of the sound signal corresponding to each signal frame;
Constructing an amplitude characteristic group of the signal period according to the maximum value and the minimum value of the amplitude of each period in the time domain waveform diagram of the signal frame; obtaining active significant factors of each signal frame according to the main frequency energy corresponding to each frequency component of each signal frame in the frequency domain and the zero crossing rate of the signal frame; obtaining effective steady-state factors of the signal frames according to the active significant factors of the signal frames, the information entropy of the corresponding amplitude and the cosine similarity among the amplitude characteristic groups of the signal periods; obtaining effective confidence weights of the signal frames according to the effective steady-state factors of the signal frames and the 3dB bandwidths of the frequency components of the signal frames in the frequency domain; calculating the signal attenuation rate of each frequency component of the signal frame in the frequency domain, and obtaining the high-energy effective factor of each signal frame according to the effective confidence weight of each signal frame and the distribution condition of the signal attenuation rate of each frequency component in the frequency domain; the short-time energy and the high-energy effective factor of each signal frame are formed into an effective two-dimensional vector of each signal frame; constructing high-energy-efficiency continuous coefficients of each signal frame according to the short-time energy and high-energy effective factors of each signal frame and the pearson correlation coefficients between the effective two-dimensional vectors of each signal frame and other signal frames;
Combining the high-energy-efficiency continuous coefficients of all signal frames in the monitoring interval with the Ojin method to obtain all active frames, and evaluating the influence of human activities based on the active frames;
The calculating the signal attenuation rate of each frequency component of the signal frame in the frequency domain comprises the following steps: taking the ratio of the main frequency energy of each frequency component to the 3dB bandwidth of each frequency component as the signal attenuation rate of each frequency component;
The obtaining the high-energy effective factor of each signal frame according to the effective confidence weight of each signal frame and the distribution condition of the signal attenuation rate of each frequency component in the frequency domain comprises the following steps:
Calculating the average value of the signal attenuation rates of all frequency components in the signal frame, obtaining the sum value of the signal attenuation rates of all frequency components in the frequency domain of the signal frame and the absolute value of the difference value of the average value, and taking the product of the effective confidence weight of the signal frame and the reciprocal of the sum value as the high-energy effective factor of the signal frame;
The construction of the high-energy-efficiency continuous coefficient of each signal frame according to the short-time energy, the high-energy effective factor of each signal frame and the pearson correlation coefficient between the effective two-dimensional vectors of each signal frame and other signal frames specifically comprises the following steps:
for each signal frame, calculating the sum of pearson correlation coefficients between effective two-dimensional vectors of the signal frame and other signal frames, and taking the sum as an index of an exponential function based on a natural constant;
Calculating the difference absolute value of the high-energy effective factors of the signal frames and the average value of the high-energy effective factors of all the signal frames, obtaining the ratio of the short-time energy of the signal frames to the difference absolute value, and taking the product of the ratio and the calculation result of the exponential function as the high-energy efficiency continuous coefficient of the signal frames;
the method for obtaining each active frame by combining the high-energy-efficiency continuous coefficient of each signal frame in the monitoring interval and the Ojin method, and evaluating the influence of the active frame on the human activity comprises the following steps:
presetting the duration of a monitoring interval, calculating the segmentation threshold value of the high-energy-efficiency continuous coefficients of all signal frames in the monitoring interval by adopting an Ojin method, and judging the signal frames to be active frames when the high-energy-efficiency continuous coefficients of the signal frames are more than or equal to the segmentation threshold value;
Taking all active frames in the monitoring interval as input, and acquiring all active frames in the monitoring interval by using a PSOLA pitch synchronous superposition algorithm to splice to form a voice segment; when the duration of the voice section of the ecological area in the monitoring interval is greater than or equal to a preset time threshold, the human activity has an influence on the natural ecological system, and otherwise, the human activity has no influence.
2. The human activity impact assessment method based on time-frequency persistence analysis according to claim 1, wherein the constructing the amplitude feature set of the signal period according to the maximum value and the minimum value of the amplitude of each period in the time domain waveform of the signal frame comprises: the maximum and minimum values of the amplitude are combined into amplitude characteristic groups corresponding to the signal period.
3. The human activity impact assessment method based on time-frequency persistence analysis according to claim 1, wherein the obtaining the activity significance factor of each signal frame according to the main frequency energy corresponding to each frequency component of each signal frame in the frequency domain and the zero crossing rate of the signal frame comprises:
taking the zero crossing rate of the signal frame as an index of an exponential function based on a natural constant;
counting the maximum main frequency energy value of all frequency components in the frequency of the signal frame, and calculating the sum value of the maximum main frequency energy value and the main frequency energy difference value of each frequency component;
And taking the result of the product of the inverse of the sum value and the exponential function as an active significant factor of a signal frame.
4. The method for evaluating the influence of human activity based on time-frequency persistence analysis according to claim 2, wherein the obtaining the effective steady-state factor of each signal frame according to the activity significance factor of each signal frame, the information entropy of the corresponding amplitude, and the cosine similarity between the amplitude feature sets of each signal period comprises:
Calculating information entropy of all amplitudes contained in the signal frame, and obtaining the ratio of the active significant factor of the signal frame to the information entropy; and calculating the sum value of cosine similarity of the amplitude characteristic groups of all any two signal periods of the signal frame, and taking the product of the ratio and the sum value as an effective steady-state factor of the signal frame.
5. The method for evaluating the influence of human activity based on time-frequency persistence analysis according to claim 1, wherein the obtaining the effective confidence weight of each signal frame from the effective steady-state factor of each signal frame and the 3dB bandwidth of each frequency component of the signal frame in the frequency domain comprises:
Calculating the average value of the 3dB bandwidths of all the frequency components of the signal frame, obtaining the absolute value of the difference value between the 3dB bandwidths of all the frequency components of the signal frame in the frequency domain and the average value, and obtaining the reciprocal of the sum value of the absolute value of the difference values of all the frequency components of the signal frame in the frequency domain;
Taking the product of the effective steady-state factor of the signal frame and the reciprocal as the effective confidence weight of the signal frame.
6. Human activity impact assessment system based on time-frequency persistence analysis, comprising a memory, a processor and a computer program stored in the memory and running on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-5 when executing the computer program.
CN202410179329.2A 2024-02-18 2024-02-18 Human activity influence assessment method and system based on time-frequency persistence analysis Active CN117746905B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410179329.2A CN117746905B (en) 2024-02-18 2024-02-18 Human activity influence assessment method and system based on time-frequency persistence analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410179329.2A CN117746905B (en) 2024-02-18 2024-02-18 Human activity influence assessment method and system based on time-frequency persistence analysis

Publications (2)

Publication Number Publication Date
CN117746905A CN117746905A (en) 2024-03-22
CN117746905B true CN117746905B (en) 2024-04-19

Family

ID=90278080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410179329.2A Active CN117746905B (en) 2024-02-18 2024-02-18 Human activity influence assessment method and system based on time-frequency persistence analysis

Country Status (1)

Country Link
CN (1) CN117746905B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5876899A (en) * 1981-10-31 1983-05-10 株式会社東芝 Voice segment detector
WO2015139452A1 (en) * 2014-03-17 2015-09-24 华为技术有限公司 Method and apparatus for processing speech signal according to frequency domain energy
WO2018011794A1 (en) * 2016-07-10 2018-01-18 B.G. Negev Technologies And Applications Ltd., At Ben-Gurion University Methods and systems for estimation of obstructive sleep apnea severity in wake subjects by multiple speech analyses
CN116884431A (en) * 2023-08-03 2023-10-13 西华大学 CFCC (computational fluid dynamics) feature-based robust audio copy-paste tamper detection method and device
CN116992219A (en) * 2023-09-07 2023-11-03 博睿康科技(常州)股份有限公司 Signal quality characterization unit and noise source positioning method based on noise detection index

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100586893B1 (en) * 2004-06-28 2006-06-08 삼성전자주식회사 System and method for estimating speaker localization in non-stationary noise environment
US10783764B2 (en) * 2016-04-19 2020-09-22 University Of South Carolina Impact force estimation and event localization
JP7230545B2 (en) * 2019-02-04 2023-03-01 富士通株式会社 Speech processing program, speech processing method and speech processing device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5876899A (en) * 1981-10-31 1983-05-10 株式会社東芝 Voice segment detector
WO2015139452A1 (en) * 2014-03-17 2015-09-24 华为技术有限公司 Method and apparatus for processing speech signal according to frequency domain energy
WO2018011794A1 (en) * 2016-07-10 2018-01-18 B.G. Negev Technologies And Applications Ltd., At Ben-Gurion University Methods and systems for estimation of obstructive sleep apnea severity in wake subjects by multiple speech analyses
CN116884431A (en) * 2023-08-03 2023-10-13 西华大学 CFCC (computational fluid dynamics) feature-based robust audio copy-paste tamper detection method and device
CN116992219A (en) * 2023-09-07 2023-11-03 博睿康科技(常州)股份有限公司 Signal quality characterization unit and noise source positioning method based on noise detection index

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
多媒体直播场景下实时语音活动检测算法研究与实现;郝雪;中国优秀硕士学位论文全文数据库信息科技辑;20240115(第01期);全文 *

Also Published As

Publication number Publication date
CN117746905A (en) 2024-03-22

Similar Documents

Publication Publication Date Title
EP2788980B1 (en) Harmonicity-based single-channel speech quality estimation
TWI647961B (en) Method and apparatus for determining directions of uncorrelated sound sources in a higher order ambisonics representation of a sound field
CN110503970A (en) A kind of audio data processing method, device and storage medium
CN110875060A (en) Voice signal processing method, device, system, equipment and storage medium
CN105788603A (en) Audio identification method and system based on empirical mode decomposition
CN111429932A (en) Voice noise reduction method, device, equipment and medium
CN110265065B (en) Method for constructing voice endpoint detection model and voice endpoint detection system
CN106157967A (en) Impulse noise mitigation
CN111785288B (en) Voice enhancement method, device, equipment and storage medium
CN112786059A (en) Voiceprint feature extraction method and device based on artificial intelligence
CN112750461B (en) Voice communication optimization method and device, electronic equipment and readable storage medium
CN107393549A (en) Delay time estimation method and device
CN111883181A (en) Audio detection method and device, storage medium and electronic device
Li et al. A multi-objective learning speech enhancement algorithm based on IRM post-processing with joint estimation of SCNN and TCNN
Hidayat et al. A Modified MFCC for Improved Wavelet-Based Denoising on Robust Speech Recognition.
CN115223583A (en) Voice enhancement method, device, equipment and medium
CN117746905B (en) Human activity influence assessment method and system based on time-frequency persistence analysis
CN106356076A (en) Method and device for detecting voice activity on basis of artificial intelligence
CN112735466A (en) Audio detection method and device
CN106128480B (en) The method that a kind of pair of noisy speech carries out voice activity detection
CN105989837B (en) Audio matching method and device
CN103778914A (en) Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching
CN112712818A (en) Voice enhancement method, device and equipment
JP5147012B2 (en) Target signal section estimation device, target signal section estimation method, target signal section estimation program, and recording medium
CN111816218B (en) Voice endpoint detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant