CN115985349A - Audio processing method, apparatus, device and medium based on parameter equalizer - Google Patents

Audio processing method, apparatus, device and medium based on parameter equalizer Download PDF

Info

Publication number
CN115985349A
CN115985349A CN202211565856.4A CN202211565856A CN115985349A CN 115985349 A CN115985349 A CN 115985349A CN 202211565856 A CN202211565856 A CN 202211565856A CN 115985349 A CN115985349 A CN 115985349A
Authority
CN
China
Prior art keywords
audio
frequency
segment
background sound
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211565856.4A
Other languages
Chinese (zh)
Inventor
戚成杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Wondershare Software Co Ltd
Original Assignee
Shenzhen Wondershare Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Wondershare Software Co Ltd filed Critical Shenzhen Wondershare Software Co Ltd
Priority to CN202211565856.4A priority Critical patent/CN115985349A/en
Publication of CN115985349A publication Critical patent/CN115985349A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Tone Control, Compression And Expansion, Limiting Amplitude (AREA)

Abstract

The embodiment of the invention discloses an audio processing method, device, equipment and medium based on a parameter equalizer, and relates to the technical field of audio processing. Wherein the method comprises the following steps: smoothing the output loudness of the obtained audio to be processed to find out a segment line higher than the average loudness as a first background sound segment; carrying out Fourier transform on the audio to be processed to obtain a time-frequency audio, and processing the time-frequency audio through a similarity method to obtain a second background sound segment; the intersection of the first background sound segment and the second background sound segment is selected and expanded to obtain a refrain time segment; scanning and monitoring human voice frequency and background voice frequency in the audio to be processed to obtain frequency width and central frequency; and carrying out equalization processing on the audio to be processed through a parameter equalizer according to the frequency width, the center frequency and the refrain time segment. The embodiment of the application solves the problem that human voice and background voice in audio are mutually masked, so that the tone quality effect of the audio is improved.

Description

Audio processing method, apparatus, device and medium based on parameter equalizer
Technical Field
Embodiments of the present invention relate to the field of audio processing technologies, and in particular, to an audio processing method, apparatus, device, and medium based on a parameter equalizer.
Background
Multitrack music processing often appears in the in-process that master tape was handled, has seen the binaural audio frequency unlike our daily, and multitrack music has richer musical instrument accompaniment and vocal qing singing track, can produce different audio effects through carrying out solitary processing to each track. In the prior art, a voice improvement method mainly based on a fixed frequency band equalizer is used for improving the frequency energy of a fixed interval, so that the pitch of voice is changed, the overall tone quality feeling is improved, and the method based on the fixed frequency band equalizer has the following defects: 1. the non-adjustability of the fixed frequency ensures that the frequency which can be modified by the voice is relatively fixed, and even if the fixed frequency band is continuously subdivided, the audio frequency band in the whole field cannot be covered; 2. the mutual frequency masking of the optimized human voice and the background voice is easily caused, and the integral tone quality effect is influenced.
Disclosure of Invention
The embodiment of the invention provides an audio processing method, device, equipment and medium based on a parameter equalizer, aiming at solving the problem of poor tone quality effect caused by mutual masking of human voice and background voice in the existing audio.
In a first aspect, an embodiment of the present invention provides an audio processing method based on a parametric equalizer, which includes: acquiring audio to be processed, and smoothing the output loudness of the audio to be processed to find out a segment line higher than the average loudness as a first background sound segment; performing Fourier transform on the audio to be processed to obtain a time-frequency audio, and processing the time-frequency audio through a similarity method to obtain a second background sound segment; taking intersection and expanding the first background sound segment and the second background sound segment to obtain a refrain time segment; scanning and monitoring the human voice frequency and the background voice frequency in the audio to be processed to obtain a frequency width and a central frequency; and carrying out equalization processing on the audio to be processed through a parameter equalizer according to the frequency width, the center frequency and the refrain time segment.
Further, calculating the audio to be processed through a loudness calculation formula to obtain output loudness; smoothing the output loudness based on a moving median method to obtain a target output loudness; and calculating the average loudness of the target output loudness, and taking the section which is larger than the average loudness in the target output loudness as a first background sound section.
Further, sequentially acquiring audio segments with preset lengths from the initial position in the time-frequency audio as a first segment; taking the audio segments except the first segment in the time-frequency audio as residual segments, and sequentially acquiring audio segments with the same length as the first segment from the residual segments as second segments; calculating a similarity between the first segment and the second segment by a pearson coefficient to construct a self-similarity matrix; and performing line detection on the self-similarity matrix by using image processing to obtain a second background sound fragment.
Further, the first background sound segment and the second background sound segment are merged and intersect to obtain an initial refrain segment; and expanding the initial refrain segment by a long-rise method and a long-fall method to obtain a refrain time segment.
Further, acquiring a human voice frequency and a background voice frequency in the audio to be processed, and calculating a difference between the human voice frequency and the background voice frequency to obtain a frequency difference value; if the frequency difference value is smaller than a preset frequency difference value, storing the sampling frequency and the sampling time point corresponding to the frequency difference value, and determining the central frequency and the frequency width according to the sampling frequency and the sampling time point.
Further, adjusting the background loudness gain in the audio to be processed according to the refrain time slice to obtain a target loudness gain; and obtaining a background sound target frequency response value through a frequency response value calculation formula corresponding to a parameter equalizer according to the frequency width, the center frequency and the target loudness gain, and carrying out equalization processing on the audio to be processed according to the background sound target frequency response value.
Further, judging whether the background sound in the audio to be processed is a refrain segment according to a refrain starting time corresponding to the refrain time segment; if the background sound is the refrain section, adjusting down the background sound loudness gain in the audio to be processed; and if the background sound is not the refrain segment, adjusting the loudness gain of the background sound.
In a second aspect, an embodiment of the present invention further provides an audio processing apparatus based on a parametric equalizer, including: the searching unit is used for acquiring the audio to be processed and smoothing the output loudness of the audio to be processed so as to search a segment line higher than the average loudness as a first background sound segment; the processing unit is used for carrying out Fourier transform on the audio to be processed to obtain a time-frequency audio, and processing the time-frequency audio through a similarity method to obtain a second background sound segment; the expansion unit is used for taking intersection and expanding the first background sound segment and the second background sound segment to obtain a refrain time segment; the monitoring unit is used for scanning and monitoring the human voice frequency and the background voice frequency in the audio to be processed to obtain the frequency width and the central frequency; and the equalizing unit is used for equalizing the audio to be processed through a parameter equalizer according to the frequency width, the center frequency and the refrain time segment.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the above method when executing the computer program.
In a fourth aspect, the present invention also provides a computer-readable storage medium, which stores a computer program, and the computer program can implement the above method when being executed by a processor.
The embodiment of the invention provides an audio processing method, device, equipment and medium based on a parameter equalizer. Wherein the method comprises the following steps: acquiring audio to be processed, and smoothing the output loudness of the audio to be processed to find out a segment line higher than the average loudness as a first background sound segment; carrying out Fourier transform on the audio to be processed to obtain a time-frequency audio, and processing the time-frequency audio by a similarity method to obtain a second background sound segment; taking intersection and expanding the first background sound segment and the second background sound segment to obtain a refrain time segment; scanning and monitoring the human voice frequency and the background voice frequency in the audio to be processed to obtain the frequency width and the central frequency; and carrying out equalization processing on the audio to be processed through a parameter equalizer according to the frequency width, the center frequency and the refrain time slice. According to the technical scheme of the embodiment of the invention, a series of processing is firstly carried out on the audio to be processed to obtain the refrain time segment, then the human voice frequency and the background voice frequency in the audio to be processed are scanned and monitored to obtain the frequency width and the central frequency, and finally the audio to be processed is balanced through the parameter balancer according to the refrain time segment, the frequency width and the central frequency, so that the problem that the human voice and the background voice in the audio are mutually masked is solved, and the tone quality effect of the audio is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of an audio processing method based on a parametric equalizer according to an embodiment of the present invention;
fig. 2 is a schematic sub-flow diagram of an audio processing method based on a parametric equalizer according to an embodiment of the present invention;
fig. 3 is a schematic sub-flow diagram of an audio processing method based on a parametric equalizer according to an embodiment of the present invention;
fig. 4 is a schematic sub-flow diagram of an audio processing method based on a parametric equalizer according to an embodiment of the present invention;
fig. 5 is a schematic block diagram of an audio processing apparatus based on a parametric equalizer according to an embodiment of the present invention; and
fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Referring to fig. 1, fig. 1 is a flowchart illustrating an audio processing method based on a parametric equalizer according to an embodiment of the present invention. The audio processing method based on the parametric equalizer will be described in detail below. As shown in fig. 1, the method includes the following steps S100-S140.
S100, obtaining audio to be processed, and smoothing the output loudness of the audio to be processed to find out a segment line higher than the average loudness as a first background sound segment.
In the embodiment of the invention, a user uploads the audio to be processed to audio and video editing software, after the uploading is finished, the audio and video editing software acquires the audio to be processed, and smoothes the output loudness of the audio to be processed to find out a segment line higher than the average loudness as a first background sound segment. It should be noted that, in this embodiment, the audio to be processed includes a background sound and a human voice, where the background sound includes a refrain, a prelude, an ending, and the like. Understandably, the refrain part is a main component of a song, the part often needs the highest loudness value of the human voice and the refrain at the same time, and the refrain part can be found in the long audio frequency according to the similarity of the frequency and the loudness.
In some embodiments, such as the present embodiment, the step S100 may include steps S101-S103, as shown in fig. 2.
S101, calculating the audio to be processed through a loudness calculation formula to obtain output loudness;
s102, smoothing the output loudness based on a moving median method to obtain a target output loudness;
s103, calculating the average loudness of the target output loudness, and taking the segment which is larger than the average loudness in the target output loudness as a first background sound segment.
In the embodiment of the invention, the output loudness is obtained by calculating the audio to be processed through a loudness calculation formula, wherein the loudness calculation formula is shown as a formula (1), in the formula (1), data is the audio to be processed, and X is data db The output loudness of the audio to be processed; understandably, in the embodiment, the number of the input sampling points of the audio to be processed is the same as that of the output sampling points; smoothing the output loudness based on a moving median method to obtain a target output loudness; and calculating the average loudness of the target output loudness, and recording a segment which is larger than the average loudness in the target output loudness as a first background sound segment. In the present embodiment, the following is adoptedThe reason why the output loudness is smoothed by the moving median method is that if 5 median values are taken in the conventional median, all 5 median values are covered as one number, and the moving median increases the information usage rate of points in each movement, so that the data result is smoother, for example, 12345 where the conventional median is 3, and 12345 where the moving median is 2, 2.5, 3.
X db =20*log10(abs(data)) (1)
S110, carrying out Fourier transform on the audio to be processed to obtain a time-frequency audio, and processing the time-frequency audio through a similarity method to obtain a second background sound segment.
In the embodiment of the present invention, after the first background sound segment is obtained, fourier transform is performed on the audio to be processed to obtain a time-frequency audio, where the audio signal to be processed is a one-dimensional signal, and the time-frequency audio is a two-dimensional matrix of time-frequency analysis; and processing the time-frequency audio by a similarity method to obtain a second background sound segment, wherein the similarity method is to calculate the similarity between an audio segment with a preset length in the time-frequency audio and an audio segment with the same length in the rest segments by a Pearson coefficient. It should be noted that, in this embodiment, the amplitude value of the background sound in the audio to be processed is calculated through fourier transform, because the amplitude value represents energy information of the audio to be processed on the frequency domain. Understandably, the total sampling point of the audio is the time length multiplied by the sampling rate, the sampling rate represents how many sampling points exist in one second of the audio, and the sampling points contain audio time domain information.
In some embodiments, such as this embodiment, as shown in FIG. 3, the step S110 may include steps S111-S114.
S111, sequentially acquiring audio segments with preset lengths from the initial positions in the time-frequency audio as first segments;
s112, taking the audio segments except the first segment in the time-frequency audio as residual segments, and sequentially acquiring audio segments with the same length as the first segment from the residual segments as second segments;
s113, calculating the similarity between the first segment and the second segment through a Pearson coefficient to construct a self-similarity matrix;
and S114, performing line detection on the self-similarity matrix by using image processing to obtain a second background sound segment.
In the embodiment of the invention, audio segments with preset lengths are sequentially obtained as first segments from the initial position in the time-frequency audio, wherein the preset length is 3-5s; taking the audio segments except the first segment in the time-frequency audio as residual segments, and sequentially acquiring audio segments with the same length as the first segment from the residual segments as second segments; calculating a similarity between the first segment and the second segment by a pearson coefficient to construct a self-similarity matrix; and performing line detection on the self-similarity matrix by using convolution operation in image processing to obtain a second background sound fragment. For convenience of understanding, assuming that the time-frequency audio is an audio with a length of 20s, when the first segment is obtained for the first time, obtaining from a start position of the time-frequency audio, that is, the first segment is an audio segment with a length of 0-4s, that is, an audio segment with a length of 4s, understandably, the audio segment with a length of 4s to 20s is a remaining segment, obtaining an audio segment with a length equal to that of the first segment from the remaining segment as a second segment, for example, obtaining an audio segment with a length of 4s-8s as the second segment, calculating a similarity value between the first segment and the second segment by using a pearson coefficient to be SIM11, then obtaining an audio segment with a length of 8s-12s as the second segment, performing a step of calculating a similarity value between the first segment and the second segment by using a pearson coefficient to obtain a similarity value SIM12, and so on the other hand, and obtaining SIMs 13 and SIM14; and when the first segment is obtained for the second time, starting from the end position of the first segment obtained for the first time, namely obtaining the audio segment of which the first segment is 4-8s for the second time, obtaining the audio segments of which the rest segments are 0-4s and 8-20s for the second time, sequentially taking the audio segments of 0-4s, 8-12s, 12-16s and 16-20s as the second segment, sequentially calculating the similarity with the first segment to obtain the SIM21, the SIM22, the SIM23 and the SIM24, and repeating the steps to obtain the self-similarity matrix. It should be noted that in other embodiments, the selection of the second segment is logically overlapped for the same first segment, for example, when the second segment corresponding to SIM11 is an audio segment of 4-8s, the second segment corresponding to SIM12 may be an audio segment of 7s-11s, i.e., an audio segment overlapped by 1 s. It should be further noted that, in this embodiment, the closer the SIM value is to 1, the higher the similarity is, which indicates that the spectrum meanings of the two audio segments are the same, and it is considered that the two audio segments belong to the same part in the background sound, for example, all belong to other song components such as a chorus, prelude or ending.
S120, intersection taking and expansion are carried out on the first background sound segment and the second background sound segment to obtain the refrain time segment.
In the embodiment of the invention, the first background sound segment and the second background sound segment are merged and the intersection is taken to obtain the initial refrain segment; and expanding the initial refrain segment by a long-rise method and a long-fall method to obtain a refrain time segment. It should be noted that, in this embodiment, the initial refraining segment is extended by using a long-rise method and a long-fall method because a loudness rise process is accompanied at the beginning of a conventional audio refraining segment, which will often be regarded as the content of the refraining, a smooth curve is used to filter the transient loudness amplitude, find the first loudness point of the audio refraining segment, and in a period of time thereafter, the loudness value rises continuously with the sampling point, and similarly, the logic of the long fall is the same, and the lowest loudness point at the end of the refraining is found, so as to obtain the refraining portion of the audio to be processed.
S130, scanning and monitoring the human voice frequency and the background voice frequency in the audio to be processed to obtain the frequency width and the central frequency.
In the embodiment of the invention, the human voice frequency and the background voice frequency in the audio to be processed are obtained, and the difference between the human voice frequency and the background voice frequency is calculated to obtain the frequency difference value; and judging whether the frequency difference value is smaller than a preset frequency difference value, if the frequency difference value is smaller than the preset frequency difference value, indicating that the human voice frequency is close to the background voice frequency, storing the sampling frequency and the sampling time point corresponding to the frequency difference value, and determining the central frequency and the frequency width according to the sampling frequency and the sampling time point. Understandably, the frequency width is a frequency width between a sampling time point stored for the last time and a sampling point stored for the first time, and the center frequency is a frequency corresponding to a midpoint of the frequency width. It should be noted that, in this embodiment, the scanning and monitoring of the human voice frequency and the background voice frequency is performed to find whether masking exists between the background sound and the human voice, and if masking exists, the user may hear the sound with delay, thereby reducing the experience of the user.
And S140, carrying out equalization processing on the audio to be processed through a parameter equalizer according to the frequency width, the center frequency and the refrain time slice.
In the embodiment of the invention, the background sound loudness gain in the audio to be processed is adjusted according to the refraining time segment to obtain the target loudness gain, and then the audio to be processed is equalized through a parameter equalizer according to the frequency width, the central frequency and the target loudness gain. It should be noted that the difference between the parametric equalizer and the graphic equalizer is that the audio frequency and the band that can be adjusted by the graphic equalizer are fixed, and the parametric equalizer can arbitrarily determine the audio frequency (i.e. the center frequency point) to be adjusted in the full-voice range and can also determine the adjusted bandwidth. The core of the parameter equalizer is a second-order filter bank which mainly comprises a central frequency f0, a frequency width Q, a loudness factor dbgain and the like, wherein Q is the frequency band width needing gain or attenuation, namely the influence range of the gain of the frequency point, and when the set Q value is larger, the frequency band is narrower; conversely, the smaller the Q value, the wider the frequency band.
In some embodiments, such as this embodiment, as shown in FIG. 4, the step S140 may include steps S141-S144.
S141, judging whether the background sound in the audio to be processed is the refrain segment according to the refrain starting time corresponding to the refrain time segment, if the background sound is the refrain segment, executing the step S142, otherwise executing the step S143;
s142, adjusting the background sound loudness gain in the audio to be processed;
s143, adjusting the background sound loudness gain;
s144, obtaining a background sound target frequency response value through a frequency response value calculation formula corresponding to a parameter equalizer according to the frequency width, the center frequency and the target loudness gain, and carrying out equalization processing on the audio to be processed according to the background sound target frequency response value.
In the embodiment of the invention, whether the background sound in the audio to be processed is the refrain segment is judged according to the refrain starting time corresponding to the refrain time segment, wherein the refrain starting time comprises a refrain starting time and a refrain ending time; specifically, if the time corresponding to the background sound in the audio to be processed is between the start time of the refrain and the end time of the refrain, it is determined that the background sound is the refrain segment, otherwise, if the time corresponding to the background sound in the audio to be processed is not between the start time of the refrain and the end time of the refrain, it is determined that the background sound is not the refrain segment; if the background sound is the refraining section, adjusting the background sound loudness gain in the audio to be processed downwards to highlight the human voice; if the background sound is not the refrain segment, for example, the refrain segment is a prelude or an ending part, the background sound loudness gain is adjusted up to highlight the background sound, a background sound target frequency response value is obtained through a frequency response value calculation formula corresponding to a parameter equalizer according to the frequency width, the center frequency and the target loudness gain, and the audio to be processed is equalized according to the background sound target frequency response value, wherein the frequency response value calculation formula is shown as a formula (2), in the formula (2), H (z) is a background sound target frequency response value, a0, a1, a2, b0, b1 and b2 are coefficients of a second-order filter, Q is a frequency width, dBgain is a loudness gain factor, f0 is a center frequency of the audio, fs is a sampling frequency of the audio, a is a loudness gain, w0 is a rotation direction in an audio frequency domain, and is physically called an angular velocity, and S is a shelf slope used for balancing the center frequency gain to change the distribution of the frequency domain to achieve the purpose of mutual frequency domain capability distribution and the background sound that is not masked.
Figure BDA0003986050030000091
Wherein the content of the first and second substances,
b0=sin(w0)/2=Q*alpha
b1=0
b2=-sin(w0)/2=-Q*alpha
a0=1+alpha
a1=-2*cos(w0)
a2=1-alpha
A=sqrt(10^(dBgain/20))=10^(dBgain/40)
(for peaking and shelving EQ filters only)
w0=2*pi*f0/Fs
alpha=sin(w0)/2*sqrt((A+1/A)*(1/S-1)+2)
fig. 5 is a schematic block diagram of an audio processing apparatus 200 based on a parametric equalizer according to an embodiment of the present invention. As shown in fig. 5, the present invention also provides an audio processing apparatus 200 based on a parametric equalizer, corresponding to the above audio processing method based on a parametric equalizer. The parametric equalizer-based audio processing device 200 comprises means for performing the above-described parametric equalizer-based audio processing method. Specifically, referring to fig. 5, the audio processing apparatus 200 based on a parametric equalizer includes a searching unit 201, a processing unit 202, an expanding unit 203, a monitoring unit 204, and an equalizing unit 205.
The searching unit 201 is configured to obtain an audio to be processed, and perform smoothing processing on the output loudness of the audio to be processed to find a segment line higher than an average loudness as a first background sound segment; the processing unit 202 is configured to perform fourier transform on the audio to be processed to obtain a time-frequency audio, and process the time-frequency audio by using a similarity method to obtain a second background sound segment; the expansion unit 203 is configured to obtain an intersection and expand the first background sound segment and the second background sound segment to obtain a refrain time segment; the monitoring unit 204 is configured to scan and monitor a human voice frequency and a background voice frequency in the audio to be processed to obtain a frequency width and a center frequency; the equalizing unit 205 is configured to perform equalization processing on the audio to be processed through a parameter equalizer according to the frequency width, the center frequency, and the refrain time slice.
In some embodiments, such as this embodiment, the lookup unit 201 includes a first computing unit, a smoothing unit, and a second computing unit.
The first calculating unit is used for calculating the audio to be processed through a loudness calculating formula to obtain output loudness; the smoothing processing unit is used for smoothing the output loudness based on a moving median method to obtain a target output loudness; the second calculating unit is used for calculating the average loudness of the target output loudness, and taking the section which is larger than the average loudness in the target output loudness as a first background sound section.
In some embodiments, such as this embodiment, the processing unit 202 includes a first obtaining unit, a second obtaining unit, a third calculating unit, and a detecting unit.
The first obtaining unit is used for sequentially obtaining audio segments with preset lengths from the initial position in the time-frequency audio as a first segment; the second obtaining unit is configured to take the audio segment of the time-frequency audio from which the first segment is removed as a remaining segment, and sequentially obtain audio segments with the same length as the first segment from the remaining segment as second segments; the third calculation unit is used for calculating the similarity between the first segment and the second segment through a Pearson coefficient so as to construct a self-similarity matrix; the detection unit is used for performing line detection on the self-similarity matrix by utilizing image processing to obtain a second background sound segment.
In some embodiments, such as the present embodiment, the expansion unit 203 includes a merging unit and an expansion subunit.
The merging unit is used for merging the first background sound segment and the second background sound segment and taking an intersection to obtain an initial refrain segment; and the expansion subunit is used for expanding the initial refrain segment by a long ascending method and a long descending method to obtain a refrain time segment.
In some embodiments, such as this embodiment, the monitoring unit 204 includes a fourth calculating unit and a saving determining unit.
The fourth calculating unit is configured to obtain a human voice frequency and a background voice frequency in the audio to be processed, and calculate a difference between the human voice frequency and the background voice frequency to obtain a frequency difference value; the storage determining unit is used for storing the sampling frequency and the sampling time point corresponding to the frequency difference value if the frequency difference value is smaller than a preset frequency difference value, and determining the central frequency and the frequency width according to the sampling frequency and the sampling time point.
In some embodiments, such as the present embodiment, the equalization unit 205 includes an adjustment unit and an equalization subunit.
The adjusting unit is used for adjusting the background sound loudness gain in the audio to be processed according to the refraining time segment to obtain a target loudness gain; and the equalizing subunit is used for obtaining a background sound target frequency response value through a frequency response value calculation formula corresponding to the parameter equalizer according to the frequency width, the central frequency and the target loudness gain, and equalizing the audio to be processed according to the background sound target frequency response value.
In some embodiments, for example, in this embodiment, the adjusting unit includes a determining unit, a first adjusting subunit, and a second adjusting subunit.
The judging unit is used for judging whether the background sound in the audio to be processed is the refrain segment according to the refrain starting time corresponding to the refrain time segment; the first adjusting subunit is configured to, if the background sound is the refrain segment, down-adjust a background sound loudness gain in the audio to be processed; the second adjusting subunit is configured to, if the background sound is not the refrain segment, adjust up the background sound loudness gain.
The specific implementation manner of the audio processing apparatus 200 based on the parameter equalizer according to the embodiment of the present invention corresponds to the above-mentioned audio processing method based on the parameter equalizer, and is not described herein again.
The above-described audio processing apparatus based on a parametric equalizer may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 6.
Referring to fig. 6, fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 300 is a terminal, and the terminal may be an electronic device with a communication function, such as a smart phone, a desktop computer, a laptop computer, or a tablet computer.
Referring to fig. 6, the computer device 300 includes a processor 302, a memory, which may include a storage medium 303 and an internal memory 304, and a network interface 305 connected by a system bus 301.
The storage medium 303 may store an operating system 3031 and computer programs 3032. The computer program 3032, when executed, may cause the processor 302 to perform a method of audio processing based on a parametric equalizer.
The processor 302 is used to provide computing and control capabilities to support the operation of the overall computer device 300.
The internal memory 304 provides an environment for the execution of a computer program 3032 in the storage medium 303, which computer program 3032, when executed by the processor 302, causes the processor 302 to perform a method for audio processing based on a parametric equalizer.
The network interface 305 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 6 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation of the computer apparatus 300 to which the present application is applied, and that a particular computer apparatus 300 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 302 is configured to run a computer program 3032 stored in the memory to implement the following steps: acquiring audio to be processed, and smoothing the output loudness of the audio to be processed to find out a segment line higher than the average loudness as a first background sound segment; carrying out Fourier transform on the audio to be processed to obtain a time-frequency audio, and processing the time-frequency audio by a similarity method to obtain a second background sound segment; taking intersection and expanding the first background sound segment and the second background sound segment to obtain a refrain time segment; scanning and monitoring the human voice frequency and the background voice frequency in the audio to be processed to obtain the frequency width and the central frequency; and carrying out equalization processing on the audio to be processed through a parameter equalizer according to the frequency width, the center frequency and the refrain time segment.
In some embodiments, for example, in this embodiment, when the step of smoothing the output loudness of the audio to be processed to find a segment row higher than the average loudness as the first background sound segment is implemented, the processor 302 specifically implements the following steps: calculating the audio to be processed through a loudness calculation formula to obtain output loudness; smoothing the output loudness based on a moving median method to obtain a target output loudness; and calculating the average loudness of the target output loudness, and taking the segment which is larger than the average loudness in the target output loudness as a first background sound segment.
In some embodiments, for example, in this embodiment, when the processor 302 implements the step of processing the time-frequency audio by the similarity method to obtain the second background sound segment, the following steps are specifically implemented: sequentially acquiring audio segments with preset lengths from the initial position in the time-frequency audio as a first segment; taking the audio segments except the first segment in the time-frequency audio as residual segments, and sequentially acquiring audio segments with the same length as the first segment from the residual segments as second segments; calculating a similarity between the first segment and the second segment by a pearson coefficient to construct a self-similarity matrix; and performing line detection on the self-similarity matrix by using image processing to obtain a second background sound segment.
In some embodiments, for example, in this embodiment, when the step of obtaining the intersection and the extension of the first background sound segment and the second background sound segment to obtain the refraining time segment is implemented by the processor 302, the following steps are specifically implemented: combining the first background sound segment and the second background sound segment, and taking an intersection to obtain an initial refrain segment; and expanding the initial refrain segment by a long-rise method and a long-fall method to obtain a refrain time segment.
In some embodiments, for example, in this embodiment, when the processor 302 implements the step of scanning and monitoring the human voice frequency and the background voice frequency in the audio to be processed to obtain the frequency width and the center frequency, the following steps are specifically implemented: acquiring human voice frequency and background voice frequency in the audio to be processed, and calculating the difference between the human voice frequency and the background voice frequency to obtain a frequency difference value; if the frequency difference value is smaller than a preset frequency difference value, storing the sampling frequency and the sampling time point corresponding to the frequency difference value, and determining the central frequency and the frequency width according to the sampling frequency and the sampling time point.
In some embodiments, for example, in this embodiment, when implementing the step of performing equalization processing on the audio to be processed by the processor 302 through a parameter equalizer according to the frequency width, the center frequency, and the refrain time slice, the following steps are specifically implemented: judging whether the background sound in the audio to be processed is a refrain segment according to the refrain starting time corresponding to the refrain time segment; if the background sound is the refraining section, adjusting the background sound loudness gain in the audio to be processed; if the background sound is not the refrain segment, adjusting the loudness gain of the background sound; and obtaining a background sound target frequency response value through a frequency response value calculation formula corresponding to a parameter equalizer according to the frequency width, the center frequency and the target loudness gain, and carrying out equalization processing on the audio to be processed according to the background sound target frequency response value.
It should be understood that, in the embodiment of the present Application, the Processor 302 may be a Central Processing Unit (CPU), and the Processor 302 may also be other general-purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program may be stored in a storage medium, which is a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.
Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program. The computer program is executed by at least one processor in the computer system to implement the flow steps of an embodiment of the above-described parametric equalizer-based audio processing method.
The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.
Those of ordinary skill in the art will appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components and steps of the various examples have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be combined, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, while the invention has been described with respect to the specific embodiments, it will be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for audio processing based on a parametric equalizer, comprising:
acquiring audio to be processed, and smoothing the output loudness of the audio to be processed to find out a segment line higher than the average loudness as a first background sound segment;
carrying out Fourier transform on the audio to be processed to obtain a time-frequency audio, and processing the time-frequency audio by a similarity method to obtain a second background sound segment;
the intersection of the first background sound segment and the second background sound segment is extracted and expanded to obtain a refrain time segment;
scanning and monitoring the human voice frequency and the background voice frequency in the audio to be processed to obtain the frequency width and the central frequency;
and carrying out equalization processing on the audio to be processed through a parameter equalizer according to the frequency width, the center frequency and the refrain time segment.
2. The method of claim 1, wherein the step of smoothing the output loudness of the audio to be processed to find a segment row above average loudness as the first background sound segment comprises:
calculating the audio to be processed through a loudness calculation formula to obtain output loudness;
smoothing the output loudness based on a moving median method to obtain a target output loudness;
and calculating the average loudness of the target output loudness, and taking the section which is larger than the average loudness in the target output loudness as a first background sound section.
3. The method as claimed in claim 1, wherein the step of processing the time-frequency audio by similarity method to obtain the second background sound segment comprises:
sequentially acquiring audio segments with preset lengths from the initial position in the time-frequency audio as a first segment;
taking the audio segments except the first segment in the time-frequency audio as residual segments, and sequentially acquiring audio segments with the same length as the first segment from the residual segments as second segments;
calculating a similarity between the first segment and the second segment by a pearson coefficient to construct a self-similarity matrix;
and performing line detection on the self-similarity matrix by using image processing to obtain a second background sound fragment.
4. The method of claim 1, wherein the step of intersecting and expanding the first background sound segment and the second background sound segment to obtain a refraining time segment comprises:
combining the first background sound segment and the second background sound segment, and taking an intersection to obtain an initial refrain segment;
and expanding the initial refrain segment by a long-rise method and a long-fall method to obtain a refrain time segment.
5. The audio processing method according to claim 1, wherein the step of scanning and monitoring the human voice frequency and the background voice frequency in the audio to be processed to obtain the frequency width and the center frequency comprises:
acquiring human voice frequency and background voice frequency in the audio to be processed, and calculating the difference between the human voice frequency and the background voice frequency to obtain a frequency difference value;
if the frequency difference value is smaller than a preset frequency difference value, storing the sampling frequency and the sampling time point corresponding to the frequency difference value, and determining the central frequency and the frequency width according to the sampling frequency and the sampling time point.
6. The audio processing method according to claim 1, wherein the step of equalizing the audio to be processed by a parametric equalizer according to the frequency width, the center frequency and the refrain time slice comprises:
adjusting the background sound loudness gain in the audio to be processed according to the refrain time segment to obtain a target loudness gain;
and obtaining a background sound target frequency response value through a frequency response value calculation formula corresponding to a parameter equalizer according to the frequency width, the central frequency and the target loudness gain, and carrying out equalization processing on the audio to be processed according to the background sound target frequency response value.
7. The audio processing method based on the parametric equalizer according to claim 6, wherein the step of adjusting the background loudness gain in the audio to be processed according to the refrain time slice to obtain the target loudness gain comprises:
judging whether the background sound in the audio to be processed is a refrain segment according to the refrain starting time corresponding to the refrain time segment;
if the background sound is the refraining section, adjusting the background sound loudness gain in the audio to be processed;
and if the background sound is not the refrain segment, the loudness gain of the background sound is adjusted upwards.
8. An audio processing apparatus based on a parametric equalizer, comprising:
the searching unit is used for acquiring audio to be processed and smoothing the output loudness of the audio to be processed to find out a segment line higher than the average loudness as a first background sound segment;
the processing unit is used for carrying out Fourier transform on the audio to be processed to obtain a time-frequency audio, and processing the time-frequency audio through a similarity method to obtain a second background sound segment;
the expansion unit is used for taking intersection and expanding the first background sound segment and the second background sound segment to obtain a refrain time segment;
the monitoring unit is used for scanning and monitoring the human voice frequency and the background voice frequency in the audio to be processed to obtain the frequency width and the central frequency;
and the equalizing unit is used for equalizing the audio to be processed through a parameter equalizer according to the frequency width, the center frequency and the refrain time segment.
9. A computer device, characterized in that the computer device comprises a memory, on which a computer program is stored, and a processor, which when executing the computer program, implements the method for parametric equalizer based audio processing according to any of claims 1-7.
10. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method for parametric equalizer-based audio processing according to any of claims 1-7.
CN202211565856.4A 2022-12-07 2022-12-07 Audio processing method, apparatus, device and medium based on parameter equalizer Pending CN115985349A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211565856.4A CN115985349A (en) 2022-12-07 2022-12-07 Audio processing method, apparatus, device and medium based on parameter equalizer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211565856.4A CN115985349A (en) 2022-12-07 2022-12-07 Audio processing method, apparatus, device and medium based on parameter equalizer

Publications (1)

Publication Number Publication Date
CN115985349A true CN115985349A (en) 2023-04-18

Family

ID=85974977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211565856.4A Pending CN115985349A (en) 2022-12-07 2022-12-07 Audio processing method, apparatus, device and medium based on parameter equalizer

Country Status (1)

Country Link
CN (1) CN115985349A (en)

Similar Documents

Publication Publication Date Title
KR101266894B1 (en) Apparatus and method for processing an audio signal for speech emhancement using a feature extraxtion
US7482530B2 (en) Signal processing apparatus and method, recording medium and program
JP4878437B2 (en) System and method for generating audio thumbnails
CN110265064B (en) Audio frequency crackle detection method, device and storage medium
JP4740609B2 (en) Voiced and unvoiced sound detection apparatus and method
CN102750948B (en) Music searching Apparatus and method for
JP6174856B2 (en) Noise suppression device, control method thereof, and program
CN110111811B (en) Audio signal detection method, device and storage medium
CN106887233B (en) Audio data processing method and system
CN108847253B (en) Vehicle model identification method, device, computer equipment and storage medium
US9502017B1 (en) Automatic audio remixing with repetition avoidance
KR102018286B1 (en) Method and Apparatus for Removing Speech Components in Sound Source
CN115985349A (en) Audio processing method, apparatus, device and medium based on parameter equalizer
JP6539829B1 (en) How to detect voice and non-voice level
CN114466285B (en) Method, device, equipment and storage medium for adjusting loudness of audio signal
US9398387B2 (en) Sound processing device, sound processing method, and program
CN114302301B (en) Frequency response correction method and related product
CN107025902B (en) Data processing method and device
CN110335623B (en) Audio data processing method and device
JP2002049397A (en) Digital signal processing method, learning method, and their apparatus, and program storage media therefor
CN110097888B (en) Human voice enhancement method, device and equipment
EP3860156A1 (en) Information processing device, method, and program
CN109119089B (en) Method and equipment for performing transparent processing on music
JPH10319985A (en) Noise level detecting method, system and recording medium
JP5588233B2 (en) Noise suppression device and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination