CN117457013A - Voice correction method, device, equipment and computer readable storage medium - Google Patents

Voice correction method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN117457013A
CN117457013A CN202311626468.7A CN202311626468A CN117457013A CN 117457013 A CN117457013 A CN 117457013A CN 202311626468 A CN202311626468 A CN 202311626468A CN 117457013 A CN117457013 A CN 117457013A
Authority
CN
China
Prior art keywords
response curve
frequency response
voice
frequency
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311626468.7A
Other languages
Chinese (zh)
Inventor
钱忠根
吴劼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Goertek Techology Co Ltd
Original Assignee
Goertek Techology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Goertek Techology Co Ltd filed Critical Goertek Techology Co Ltd
Priority to CN202311626468.7A priority Critical patent/CN117457013A/en
Publication of CN117457013A publication Critical patent/CN117457013A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Abstract

The invention discloses a voice correction method, a device, equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a target voice signal acquired by a microphone, converting the target voice signal to obtain a first voice frequency spectrum, and converting the target voice signal to obtain a first frequency response curve; acquiring a second voice frequency spectrum; comparing the first voice frequency spectrum with the second voice frequency spectrum, and determining a target frequency band in the first voice frequency spectrum, wherein the difference between the target frequency band and the second voice frequency spectrum is larger than a preset threshold value; and acquiring a reference frequency response curve, calculating according to the first frequency response curve and the reference frequency response curve to obtain gain values for compensating the curve values of all frequency points of the first frequency response curve in a target frequency band to the level of the reference frequency response curve, and correcting the target voice signal by adopting the gain values and then outputting the corrected target voice signal. The invention provides a voice correction scheme which corrects a voice signal of a user so as to improve the definition of voice.

Description

Voice correction method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of speech processing technologies, and in particular, to a method, apparatus, device, and computer readable storage medium for correcting speech.
Background
At present, the conversation of the electronic product is that local speaking sounds and environmental sounds are picked up by a microphone, and the local speaking sounds and the environmental sounds are sent to the opposite party after being processed by an algorithm to reduce environmental noise. In our daily life, the speaking sound is unclear and turbid, so that the other party has some difficulty in understanding the speaking content, such as heavy nasal sound caused by cold, thick sound, and the other party sounds relatively burnt, so that the understanding difficulty of the other party is increased, and the other party can understand the speaking content only after speaking for a few times; and the mouth is too cool or the emotion is tense, so that the mouth skin is unclear or the tooth tone is too heavy, and the understanding difficulty of the other side is increased.
Disclosure of Invention
The present invention is directed to a method, apparatus, device and computer readable storage medium for correcting voice, and is aimed at providing a voice correction scheme for improving the clarity of voice by correcting user voice when there is a difference between the user voice and the voice in normal health state.
In order to achieve the above object, the present invention provides a voice correction method, the method comprising the steps of:
acquiring a target voice signal acquired by a microphone, converting the target voice signal to obtain a first voice frequency spectrum, and converting the target voice signal to obtain a first frequency response curve;
acquiring a second voice frequency spectrum, wherein the second voice frequency spectrum is a frequency spectrum obtained by converting a voice signal acquired when a user is in a health state;
comparing the first voice frequency spectrum with the second voice frequency spectrum, and determining a target frequency band in the first voice frequency spectrum, wherein the difference between the target frequency band and the second voice frequency spectrum is larger than a preset threshold value;
and acquiring a reference frequency response curve, calculating according to the first frequency response curve and the reference frequency response curve to obtain a gain value which compensates the curve value of each frequency point of the first frequency response curve in the target frequency band to the level of the reference frequency response curve, and correcting the target voice signal by adopting the gain value and then outputting the corrected target voice signal.
Optionally, the step of converting the target voice signal to obtain a first voice spectrum, and converting the target voice signal to obtain a first frequency response curve includes:
preprocessing the target voice signal, converting the preprocessed target voice signal to obtain a first voice frequency spectrum, and converting the preprocessed target voice signal to obtain a first frequency response curve, wherein the preprocessing comprises noise reduction processing and/or removing signals of a preset frequency band.
Optionally, the step of acquiring the reference frequency response curve includes:
acquiring an individualized frequency response curve and a standard frequency response curve, wherein the individualized frequency response curve is a frequency response curve obtained by converting user voice signals acquired in a historical call, and the standard frequency response curve is a frequency response curve obtained by testing multiple people;
and carrying out weighted summation on the personalized frequency response curve and the standard frequency response curve to obtain the reference frequency response curve.
Optionally, the voice correction method further includes:
converting the collected voice signals of the user in the conversation process to obtain a second frequency response curve;
and carrying out weighted summation on the second frequency response curve obtained in the secondary call process and the personalized frequency response curve when the secondary call is carried out so as to update the personalized frequency response curve, wherein the personalized frequency response curve when the primary call is the standard frequency response curve.
Optionally, the step of performing weighted summation on the personalized frequency response curve and the standard frequency response curve to obtain the reference frequency response curve includes:
determining weights corresponding to the personalized frequency response curve and the standard frequency response curve according to the historical update times of the personalized frequency response curve, wherein when the historical update times are more, the weights corresponding to the personalized frequency response curve are higher, and the weights corresponding to the standard frequency response curve are lower;
and carrying out weighted summation on the personalized frequency response curve and the standard frequency response curve according to the weight to obtain the reference frequency response curve.
Optionally, the step of converting the collected voice signal of the user to obtain the second frequency response curve in the call process includes:
detecting whether a voice signal of a user collected in a conversation process is a voice signal in a health state or not;
if yes, the collected voice signals of the user are converted to obtain a second frequency response curve.
Optionally, the step of acquiring the second speech spectrum includes:
acquiring target identity information of a user corresponding to the target voice signal;
and matching the frequency spectrum corresponding to the target identity information from the frequency spectrums corresponding to the preset various identity information as a second voice frequency spectrum.
In order to achieve the above object, the present invention also provides a voice correction apparatus, the apparatus comprising:
the first acquisition module is used for acquiring a target voice signal acquired by a microphone, converting the target voice signal into a first voice frequency spectrum and converting the target voice signal into a first frequency response curve;
the second acquisition module is used for acquiring a second voice frequency spectrum, wherein the second voice frequency spectrum is a frequency spectrum obtained by converting a voice signal acquired when a user is in a health state;
the comparison module is used for comparing the first voice frequency spectrum with the second voice frequency spectrum and determining a target frequency band, in which the difference between the first voice frequency spectrum and the second voice frequency spectrum is larger than a preset threshold value, in the first voice frequency spectrum;
the correction module is used for obtaining a reference frequency response curve, calculating the curve value of each frequency point of the first frequency response curve in the target frequency band according to the first frequency response curve and the reference frequency response curve, compensating the curve value of each frequency point of the first frequency response curve to the gain value of the reference frequency response curve level, correcting the target voice signal by adopting the gain value, and outputting the corrected target voice signal.
To achieve the above object, the present invention also provides a voice correction apparatus including: the system comprises a memory, a processor and a voice correction program stored on the memory and capable of running on the processor, wherein the voice correction program realizes the steps of the voice correction method when being executed by the processor.
In addition, in order to achieve the above object, the present invention also proposes a computer-readable storage medium having stored thereon a voice correction program which, when executed by a processor, implements the steps of the voice correction method as described above.
In the embodiment of the invention, the target voice signal acquired by the microphone is acquired, the target voice signal is converted to obtain a first voice frequency spectrum, and the target voice signal is converted to obtain a first frequency response curve, so that basic information of voice correction is obtained; then, a second voice frequency spectrum obtained by converting voice signals acquired when a user is in a healthy state is obtained, the first voice frequency spectrum is compared with the second voice frequency spectrum, a target frequency band, in which the difference between the first voice frequency spectrum and the second voice frequency spectrum is larger than a preset threshold, is determined, and a frequency band with unclear pronunciation caused by poor physical state of the user is firstly locked; and then acquiring a reference frequency response curve, calculating according to the first frequency response curve and the reference frequency response curve to obtain a gain value for compensating the curve value of each frequency point of the first frequency response curve in a target frequency band to the level of the reference frequency response curve, correcting a target voice signal by adopting the gain value, and outputting the corrected target voice signal to realize correction of the voice signal of each frequency point in the frequency band with unclear pronunciation of a user, so that when the user is poor in physical condition, unclear and turbid in speaking voice, the pronunciation of the user can be corrected, the clarity of the voice is improved, and the speaking opposite terminal can better understand the speaking content of the user.
Drawings
FIG. 1 is a flow chart of a method for modifying speech according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of functional modules of a voice correction device according to a preferred embodiment of the present invention;
fig. 3 is a schematic structural diagram of a hardware running environment according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart of an embodiment of a voice correction method according to the present invention.
The embodiments of the present invention provide embodiments of a voice correction method, it being noted that although a logic sequence is shown in the flow chart, in some cases the steps shown or described may be performed in a different order than that shown or described herein. In this embodiment, the execution subject of the voice correction method may be a voice correction device, and the voice correction device may be a device such as a mobile phone, an earphone, a personal computer, or the like. In this embodiment, for convenience of description, description of each embodiment of the execution body is omitted. In this embodiment, the voice correction method includes the following steps S10 to S40:
step S10, a target voice signal acquired by a microphone is acquired, the target voice signal is converted to obtain a first voice frequency spectrum, and the target voice signal is converted to obtain a first frequency response curve.
In a scenario where a user's voice needs to be recorded, for example, a conversation process, a recording process, an intercom process, etc., a microphone may be used to collect a voice signal (hereinafter referred to as a target voice signal to show distinction) generated by speaking the user. In a specific embodiment, the frame processing may be performed on the voice signals collected in real time, and the voice correction may be performed on each frame of voice signal frame by frame according to the collection sequence, and the target voice signal may refer to one frame or continuous multi-frame voice signals.
The target voice signal collected by the microphone is acquired, and the target voice signal is converted into a frequency spectrum (hereinafter referred to as a first voice frequency spectrum to show distinction). The manner in which the speech signal is converted into the frequency spectrum is not limited in the present embodiment. The target speech signal is converted into a frequency response curve (hereinafter referred to as a first frequency response curve to show the distinction), and the first frequency response curve is obtained after fourier transformation, for example.
In a possible implementation manner, the step of converting the target voice signal to obtain a first voice spectrum in step S10, and the step of converting the target voice signal to obtain a first frequency response curve includes:
step S101, preprocessing the target voice signal, converting the preprocessed target voice signal to obtain a first voice spectrum, and converting the preprocessed target voice signal to obtain a first frequency response curve, where the preprocessing includes noise reduction processing and/or removing signals in a preset frequency band.
In order to improve the definition of the subsequent voice signal after voice correction, in this embodiment, after the target voice signal is obtained, the target voice signal may be preprocessed, where the preprocessing may include noise reduction processing and/or removing a signal in a preset frequency band, so as to improve the signal-to-noise ratio in the preprocessed target voice signal. Among them, there are various ways of noise reduction processing, and the present embodiment is not limited thereto. The preset frequency band can be a preset frequency band needing filtering, can be a frequency band not containing human voice, and can be used for removing signals of the preset frequency band, so that the complexity of a subsequent voice correction process can be reduced, and the definition of corrected voice signals can be improved.
Step S20, a second voice frequency spectrum is obtained, wherein the second voice frequency spectrum is a frequency spectrum obtained by converting a voice signal acquired when a user is in a health state.
The spectrum obtained by converting the voice signal collected when the user is in a healthy state is called a second voice spectrum to show distinction. The second speech spectrum reflects the pronunciation characteristics of the user in a healthy state. The second speech spectrum may be pre-acquired and stored in the device, or may be acquired from another device, which is not limited in this embodiment.
For example, in a possible implementation manner, when the device is started up for the first time, a prompt message is output to prompt the user to input a voice signal in a healthy state of the user, the user receives the prompt message and can operate in the device, the device receives an operation instruction to enter a recording mode, and the voice signal of the user is recorded and then converted into a frequency spectrum to be stored for later use in voice correction.
In a possible embodiment, the step of acquiring the second speech spectrum in the step S20 includes steps S201 to S202:
step S201, obtaining target identity information of a user corresponding to the target voice signal.
The identity information of the user corresponding to the target voice signal is called target identity information to show distinction. The target identity information is used to characterize the identity of the user to distinguish other users, such as, for example, the user's code, the user's name, nickname, etc., and is not limiting in this embodiment. There are various ways of obtaining the target identity information, and the present embodiment is not limited thereto. For example, in a possible implementation, the correspondence between the voiceprint and the user code may be stored in the device, and each time a new voiceprint is identified, the voiceprint is encoded to represent a new user; the voice print recognition can be carried out on the collected voice signals of the user at the initial stage of recording the voice of the user, for example, when the conversation is just started, if the voice print is matched with the prestored voice print, the user code corresponding to the voice print matched with the voice print is used as the target identity information. As another example, in one possible implementation, an ID or nickname logged in the application may be obtained as the target user identity information.
Step S202, matching a frequency spectrum corresponding to the target identity information from frequency spectrums corresponding to the preset various identity information as a second voice frequency spectrum.
The frequency spectrums corresponding to different users can be preset in the device and are stored in association with the identity information of each user, and the frequency spectrums corresponding to the users are obtained by converting the collected user voice signals when the users are in a health state. The spectrum corresponding to the target identity information can be obtained by matching the target identity information with each identity information and searching the spectrum as a second voice spectrum.
Step S30, comparing the first voice frequency spectrum with the second voice frequency spectrum, and determining a target frequency band in the first voice frequency spectrum, wherein the difference between the target frequency band and the second voice frequency spectrum is larger than a preset threshold value.
After the first voice spectrum and the second voice spectrum are acquired, the two spectrums are compared, and a frequency band (called a target frequency band to show distinction) in which the difference between the first voice spectrum and the second voice spectrum is larger than a preset threshold value is determined. The preset threshold may be preset as required, which is not limited in this embodiment. In a specific embodiment, the difference between the two spectrums is calculated in a variety of ways, for example, the difference between the two spectrums can be calculated by using an energy and autocorrelation analysis method. In a possible implementation manner, a plurality of frequency bands may be divided in advance, for example, into three frequency bands of low frequency, middle-low frequency and middle-high frequency (endpoint values of the three frequency bands are not limited in this embodiment), where the low frequency can cover a nasal sound, a vowel, and a plosive in a consonant, the middle-low frequency covers a vowel, the middle-high frequency covers a cleaning in a consonant, and the difference between the first voice spectrum and the second voice spectrum in each frequency band is compared respectively, so as to determine in which frequency band the difference is greater than a preset threshold, that is, the frequency band is determined as a target frequency band.
The frequency band with unclear pronunciation of the user can be locked out first by comparing the target voice signal with the frequency spectrum obtained by converting the voice signal collected by the user in the healthy state, and it can be understood that the target frequency band can be the frequency band with unclear pronunciation caused by the poor physical state of the user.
Step S40, a reference frequency response curve is obtained, a gain value for compensating the curve value of each frequency point of the first frequency response curve to the level of the reference frequency response curve in the target frequency band is obtained through calculation according to the first frequency response curve and the reference frequency response curve, and the target voice signal is corrected by adopting the gain value and then output.
The reference frequency response curve is a frequency response curve characterizing a voice signal of a person in a healthy state, and the acquisition source of the reference frequency response curve is not determined in the present embodiment. For example, in a specific embodiment, the voice signal collected by the user in the health state can be converted, or a frequency response curve obtained by testing multiple persons in a laboratory in advance can be obtained.
After the reference frequency response curve and the target frequency band are obtained, the curve values (namely loudness values) of all frequency points of the first frequency response curve in the target frequency band can be calculated according to the first frequency response curve and the reference frequency response curve, and the gain values are compensated to the level of the reference frequency response curve. Specifically, a curve value of any frequency point in the target frequency band on the reference frequency response curve can be divided by a curve value of the frequency point on the first frequency response curve to obtain a gain value corresponding to the frequency point. And performing gain processing on the target voice signal by adopting gain values corresponding to all frequency points in the target frequency band so as to correct the target voice signal, obtaining a corrected voice signal, and outputting the corrected voice signal. For example, when the device is a bluetooth headset, the modified voice signal may be sent to the connected user handset through the bluetooth module, and the user handset sends the modified voice signal to the opposite end of the call.
In this embodiment, by acquiring a target voice signal acquired by a microphone, converting the target voice signal to obtain a first voice spectrum, and converting the target voice signal to obtain a first frequency response curve, basic information of voice correction is obtained; then, a second voice frequency spectrum obtained by converting voice signals acquired when a user is in a healthy state is obtained, the first voice frequency spectrum is compared with the second voice frequency spectrum, a target frequency band, in which the difference between the first voice frequency spectrum and the second voice frequency spectrum is larger than a preset threshold, is determined, and a frequency band with unclear pronunciation caused by poor physical state of the user is firstly locked; and then acquiring a reference frequency response curve, calculating according to the first frequency response curve and the reference frequency response curve to obtain a gain value for compensating the curve value of each frequency point of the first frequency response curve in a target frequency band to the level of the reference frequency response curve, correcting a target voice signal by adopting the gain value, and outputting the corrected target voice signal to realize correction of the voice signal of each frequency point in the frequency band with unclear pronunciation of a user, so that when the user is poor in physical condition, unclear and turbid in speaking voice, the pronunciation of the user can be corrected, the clarity of the voice is improved, and the speaking opposite terminal can better understand the speaking content of the user.
Based on the above-mentioned first embodiment, a second embodiment of the voice correction method of the present invention is provided, in this embodiment, the step of acquiring the reference frequency response curve in the step S40 includes steps S401 to S402:
step S401, acquiring an individualized frequency response curve and a standard frequency response curve, wherein the individualized frequency response curve is a frequency response curve obtained by converting user voice signals acquired in a history call, and the standard frequency response curve is a frequency response curve obtained by testing multiple people.
In this embodiment, the obtained reference frequency response curve may be obtained by performing weighted summation on the user personalized frequency response curve and the standard frequency response curve, so that the reference frequency response curve is closer to the individual voice characteristics of the user, and the corrected voice signal has higher and more natural restoration degree.
The personalized frequency response curve can be a frequency response curve obtained by converting the user voice signals collected in the user history call process, and can represent the individual voice characteristics of the user. In a specific embodiment, the personalized frequency response curve can be obtained by calculation based on the historical voice signals of the user collected by one or more calls. The standard frequency response curve is a frequency response curve obtained by testing a plurality of persons, wherein the persons can be a plurality of persons with standard pronunciation and normal pronunciation, the frequency response curve can be obtained by converting a plurality of voice signals acquired respectively, and then each frequency response curve is averaged to obtain the standard frequency response curve, and the standard frequency response curve can be configured in equipment after being measured in a laboratory. The standard frequency response curve reflects the characteristics of standard and normal human voice.
And step S402, carrying out weighted summation on the personalized frequency response curve and the standard frequency response curve to obtain the reference frequency response curve.
And carrying out weighted summation on the personalized frequency response curve and the standard frequency response curve to obtain a reference frequency response curve. The weight of the weighted sum may be preset as needed, for example, in the case of a higher requirement for sharpness, the weight of the standard frequency response curve may be set higher, in the case of a higher requirement for reduction, the personalized frequency response curve may be set higher, and for example, in the case of a defect in the speech of the user, the weight of the standard frequency response curve may be set higher.
In a possible implementation manner, the voice correction method further includes steps S50 to S60:
step S50, converting the collected voice signals of the user in the conversation process to obtain a second frequency response curve.
During the call, one or more sections of user voice signals can be collected and converted to obtain a frequency response curve (hereinafter referred to as a second frequency response curve for distinguishing), where the second frequency response curve can represent the characteristics of the user voice during the next call.
And step S60, carrying out weighted summation on the second frequency response curve obtained in the process of the second call and the personalized frequency response curve when the second call is carried out so as to update the personalized frequency response curve, wherein the personalized frequency response curve when the first call is the standard frequency response curve.
In the call process, the second frequency response curve obtained in the current call process and the personalized frequency response curve obtained in the current call process can be weighted and summed, and the obtained result is used as a new personalized frequency response curve. And the standard frequency response curve can be adopted as the personalized frequency response curve when the call is made for the first time. It can be understood that when a call is made for the first time, the device does not collect the voice signal of the user yet, and cannot obtain the personalized frequency response curve of the user, so that the standard frequency response curve can be adopted to make an initial personalized frequency response curve, in the subsequent call process, the personalized frequency response curve is updated according to the second frequency response curve obtained by converting the collected voice signal of the user, so that the personalized frequency response curve is more and more similar to the pronunciation characteristics of the user, the pronunciation states of the user in each call process may be different, the environments in which the pronunciation states are may be different, and the personalized frequency response curve is continuously updated by adopting a weighted summation mode, so that the obtained personalized frequency response curve can represent the pronunciation characteristics of the user, is not influenced by the environment and the body state of the user, the pronunciation characteristics of the user can be represented by adopting weighted summation of the personalized frequency response curve and the standard frequency response curve, and the finally obtained voice signal has higher reduction degree. The weight used when the second frequency response curve and the personalized frequency response curve are weighted and summed may be preset as required, and is not limited in this embodiment.
In a possible embodiment, the step S402 includes steps S4021 to S4022:
step S4021, determining weights corresponding to the personalized frequency response curve and the standard frequency response curve according to the historical update times of the personalized frequency response curve, where the higher the historical update times are, the higher the weights corresponding to the personalized frequency response curve are, and the lower the weights corresponding to the standard frequency response curve are.
And updating the personalized frequency response curve by adopting a second frequency response curve obtained in the one-time conversation process, namely adding 1 to the historical updating times of the personalized frequency response curve. When the reference frequency response curve is calculated according to the personalized frequency response curve and the standard frequency response curve, the weights corresponding to the personalized frequency response curve and the standard curve can be determined according to the historical update times of the latest record of the personalized frequency response curve. In the present embodiment, the method of determining the weight according to the number of history updates is that the higher the weight corresponding to the personalized frequency response curve is, the lower the weight corresponding to the standard frequency response curve is when the number of history updates is larger, but the specific method is not limited, and for example, table mapping or calculation by a calculation formula may be adopted.
And step S4022, carrying out weighted summation on the personalized frequency response curve and the standard frequency response curve according to the weight to obtain the reference frequency response curve.
After the weights of the personalized frequency response curve and the standard frequency response curve are determined, the personalized frequency response curve and the standard frequency response curve are weighted and summed according to the weights, and the reference frequency response curve is obtained. The weights of the personalized frequency response curve and the standard frequency response curve are determined according to the historical updating times, and when the historical updating times are more, the weights corresponding to the personalized frequency response curve are higher, so that the weights of the personalized frequency response curve are higher along with the more user call times, the finally calculated reference frequency response curve can more and more accurately represent the individual speaking characteristics of the user, and the finally corrected voice signal is higher in reduction degree.
In a possible embodiment, the step S50 includes steps S501 to S502:
in step S501, it is detected whether the voice signal of the user collected during the call is a voice signal in a healthy state.
In the call process, whether the collected voice signal of the user is a voice signal in a healthy state can be detected first, and the detection method is various, but is not limited in this embodiment. For example, in a possible implementation manner, a frequency spectrum obtained by converting a user voice signal may be compared with a second voice frequency spectrum, whether a frequency band with a difference greater than a preset threshold exists between the two frequency bands is calculated, and if not, it is determined that the user is in a health state.
Step S502, if yes, converting the collected voice signals of the user to obtain a second frequency response curve.
If the voice signal of the user collected in the communication process is determined to be the voice signal in the health state, the voice signal can be converted to obtain a second frequency response curve, and then the personalized frequency response curve is updated by adopting the second frequency response curve, so that the personalized frequency response curve can more accurately represent the personal pronunciation characteristics of the user in the health state, and the finally corrected voice signal has higher definition and higher reduction degree.
In addition, an embodiment of the present invention further provides a device for correcting voice, referring to fig. 2, where the device includes:
a first obtaining module 10, configured to obtain a target voice signal collected by a microphone, convert the target voice signal to obtain a first voice spectrum, and convert the target voice signal to obtain a first frequency response curve;
the second obtaining module 20 is configured to obtain a second voice spectrum, where the second voice spectrum is a spectrum obtained by converting a voice signal collected when the user is in a healthy state;
a comparison module 30, configured to compare the first voice spectrum with the second voice spectrum, and determine a target frequency band in the first voice spectrum, where a difference between the target frequency band and the second voice spectrum is greater than a preset threshold;
and the correction module 40 is configured to obtain a reference frequency response curve, calculate according to the first frequency response curve and the reference frequency response curve, compensate the curve value of each frequency point of the first frequency response curve in the target frequency band to a gain value of the reference frequency response curve level, and correct the target voice signal by using the gain value and output the corrected target voice signal.
In a possible embodiment, the first obtaining module 10 is further configured to:
preprocessing the target voice signal, converting the preprocessed target voice signal to obtain a first voice frequency spectrum, and converting the preprocessed target voice signal to obtain a first frequency response curve, wherein the preprocessing comprises noise reduction processing and/or removing signals of a preset frequency band.
In a possible embodiment, the correction module 40 is further configured to:
acquiring an individualized frequency response curve and a standard frequency response curve, wherein the individualized frequency response curve is a frequency response curve obtained by converting user voice signals acquired in a historical call, and the standard frequency response curve is a frequency response curve obtained by testing multiple people;
and carrying out weighted summation on the personalized frequency response curve and the standard frequency response curve to obtain the reference frequency response curve.
In a possible embodiment, the voice correction device further includes:
the conversion module is used for converting the collected voice signals of the user in the conversation process to obtain a second frequency response curve;
and the updating module is used for carrying out weighted summation on the second frequency response curve obtained in the process of the next call and the personalized frequency response curve when the next call is carried out so as to update the personalized frequency response curve, wherein the personalized frequency response curve when the first call is the standard frequency response curve.
In a possible embodiment, the correction module 40 is further configured to:
determining weights corresponding to the personalized frequency response curve and the standard frequency response curve according to the historical update times of the personalized frequency response curve, wherein when the historical update times are more, the weights corresponding to the personalized frequency response curve are higher, and the weights corresponding to the standard frequency response curve are lower;
and carrying out weighted summation on the personalized frequency response curve and the standard frequency response curve according to the weight to obtain the reference frequency response curve.
In a possible embodiment, the conversion module is further configured to:
detecting whether a voice signal of a user collected in a conversation process is a voice signal in a health state or not;
if yes, the collected voice signals of the user are converted to obtain a second frequency response curve.
In a possible embodiment, the second obtaining module 20 is further configured to:
acquiring target identity information of a user corresponding to the target voice signal;
and matching the frequency spectrum corresponding to the target identity information from the frequency spectrums corresponding to the preset various identity information as a second voice frequency spectrum.
In addition, the embodiment of the invention also provides a voice correction device, as shown in fig. 3, and fig. 3 is a schematic device structure diagram of a hardware operation environment related to the embodiment of the invention. It should be noted that, the voice correction device according to the embodiment of the present invention may be a device such as an earphone, a smart phone, a personal computer, a server, and the like, which is not limited herein.
As shown in fig. 3, the voice modification apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
It will be appreciated by those skilled in the art that the device structure shown in fig. 3 is not limiting of the voice modification device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
As shown in fig. 3, an operating system, a network communication module, a user interface module, and a voice correction program may be included in the memory 1005, which is one type of computer storage medium. An operating system is a program that manages and controls the hardware and software resources of the device, supporting the execution of voice-modifying programs and other software or programs. In the device shown in fig. 3, the user interface 1003 is mainly used for data communication with the client; the network interface 1004 is mainly used for establishing communication connection with a server; and the processor 1001 may be configured to call a voice correction program stored in the memory 1005 and perform the following operations:
acquiring a target voice signal acquired by a microphone, converting the target voice signal to obtain a first voice frequency spectrum, and converting the target voice signal to obtain a first frequency response curve;
acquiring a second voice frequency spectrum, wherein the second voice frequency spectrum is a frequency spectrum obtained by converting a voice signal acquired when a user is in a health state;
comparing the first voice frequency spectrum with the second voice frequency spectrum, and determining a target frequency band in the first voice frequency spectrum, wherein the difference between the target frequency band and the second voice frequency spectrum is larger than a preset threshold value;
and acquiring a reference frequency response curve, calculating according to the first frequency response curve and the reference frequency response curve to obtain a gain value which compensates the curve value of each frequency point of the first frequency response curve in the target frequency band to the level of the reference frequency response curve, and correcting the target voice signal by adopting the gain value and then outputting the corrected target voice signal.
In a possible implementation manner, the converting the target voice signal to obtain a first voice spectrum, and the converting the target voice signal to obtain a first frequency response curve include:
preprocessing the target voice signal, converting the preprocessed target voice signal to obtain a first voice frequency spectrum, and converting the preprocessed target voice signal to obtain a first frequency response curve, wherein the preprocessing comprises noise reduction processing and/or removing signals of a preset frequency band.
In a possible implementation manner, the operation of obtaining the reference frequency response curve includes:
acquiring an individualized frequency response curve and a standard frequency response curve, wherein the individualized frequency response curve is a frequency response curve obtained by converting user voice signals acquired in a historical call, and the standard frequency response curve is a frequency response curve obtained by testing multiple people;
and carrying out weighted summation on the personalized frequency response curve and the standard frequency response curve to obtain the reference frequency response curve.
In a possible implementation, the processor 1001 may be further configured to invoke a voice modification program stored in the memory 1005 to perform the following operations:
converting the collected voice signals of the user in the conversation process to obtain a second frequency response curve;
and carrying out weighted summation on the second frequency response curve obtained in the secondary call process and the personalized frequency response curve when the secondary call is carried out so as to update the personalized frequency response curve, wherein the personalized frequency response curve when the primary call is the standard frequency response curve.
In a possible implementation manner, the operation of performing weighted summation on the personalized frequency response curve and the standard frequency response curve to obtain the reference frequency response curve includes:
determining weights corresponding to the personalized frequency response curve and the standard frequency response curve according to the historical update times of the personalized frequency response curve, wherein when the historical update times are more, the weights corresponding to the personalized frequency response curve are higher, and the weights corresponding to the standard frequency response curve are lower;
and carrying out weighted summation on the personalized frequency response curve and the standard frequency response curve according to the weight to obtain the reference frequency response curve.
In a possible implementation manner, the converting the collected voice signal of the user to obtain the second frequency response curve during the call includes:
detecting whether a voice signal of a user collected in a conversation process is a voice signal in a health state or not;
if yes, the collected voice signals of the user are converted to obtain a second frequency response curve.
In a possible implementation manner, the operation of obtaining the second voice spectrum includes:
acquiring target identity information of a user corresponding to the target voice signal;
and matching the frequency spectrum corresponding to the target identity information from the frequency spectrums corresponding to the preset various identity information as a second voice frequency spectrum.
In addition, the embodiment of the invention also provides a computer readable storage medium, wherein the storage medium stores a voice correction program, and the voice correction program realizes the steps of a voice correction method when being executed by a processor.
Embodiments of the present invention voice correction apparatus and computer readable storage medium may refer to embodiments of the present invention voice correction method, and will not be described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (10)

1. A method for modifying speech, the method comprising the steps of:
acquiring a target voice signal acquired by a microphone, converting the target voice signal to obtain a first voice frequency spectrum, and converting the target voice signal to obtain a first frequency response curve;
acquiring a second voice frequency spectrum, wherein the second voice frequency spectrum is a frequency spectrum obtained by converting a voice signal acquired when a user is in a health state;
comparing the first voice frequency spectrum with the second voice frequency spectrum, and determining a target frequency band in the first voice frequency spectrum, wherein the difference between the target frequency band and the second voice frequency spectrum is larger than a preset threshold value;
and acquiring a reference frequency response curve, calculating according to the first frequency response curve and the reference frequency response curve to obtain a gain value which compensates the curve value of each frequency point of the first frequency response curve in the target frequency band to the level of the reference frequency response curve, and correcting the target voice signal by adopting the gain value and then outputting the corrected target voice signal.
2. The method of claim 1, wherein the step of converting the target speech signal to a first speech spectrum and converting the target speech signal to a first frequency response curve comprises:
preprocessing the target voice signal, converting the preprocessed target voice signal to obtain a first voice frequency spectrum, and converting the preprocessed target voice signal to obtain a first frequency response curve, wherein the preprocessing comprises noise reduction processing and/or removing signals of a preset frequency band.
3. The method of claim 1, wherein the step of obtaining a reference frequency response curve comprises:
acquiring an individualized frequency response curve and a standard frequency response curve, wherein the individualized frequency response curve is a frequency response curve obtained by converting user voice signals acquired in a historical call, and the standard frequency response curve is a frequency response curve obtained by testing multiple people;
and carrying out weighted summation on the personalized frequency response curve and the standard frequency response curve to obtain the reference frequency response curve.
4. The voice correction method as claimed in claim 3, wherein the voice correction method further comprises:
converting the collected voice signals of the user in the conversation process to obtain a second frequency response curve;
and carrying out weighted summation on the second frequency response curve obtained in the secondary call process and the personalized frequency response curve when the secondary call is carried out so as to update the personalized frequency response curve, wherein the personalized frequency response curve when the primary call is the standard frequency response curve.
5. The method of claim 3 wherein said step of weighting and summing said personalized frequency response curve and said standard frequency response curve to obtain said reference frequency response curve comprises:
determining weights corresponding to the personalized frequency response curve and the standard frequency response curve according to the historical update times of the personalized frequency response curve, wherein when the historical update times are more, the weights corresponding to the personalized frequency response curve are higher, and the weights corresponding to the standard frequency response curve are lower;
and carrying out weighted summation on the personalized frequency response curve and the standard frequency response curve according to the weight to obtain the reference frequency response curve.
6. The voice modification method of claim 4, wherein the step of converting the collected voice signal of the user during the call to obtain the second frequency response curve comprises:
detecting whether a voice signal of a user collected in a conversation process is a voice signal in a health state or not;
if yes, the collected voice signals of the user are converted to obtain a second frequency response curve.
7. The voice correction method according to any one of claims 1 to 6, characterized in that the step of acquiring the second voice spectrum includes:
acquiring target identity information of a user corresponding to the target voice signal;
and matching the frequency spectrum corresponding to the target identity information from the frequency spectrums corresponding to the preset various identity information as a second voice frequency spectrum.
8. A voice modification apparatus, the apparatus comprising:
the first acquisition module is used for acquiring a target voice signal acquired by a microphone, converting the target voice signal into a first voice frequency spectrum and converting the target voice signal into a first frequency response curve;
the second acquisition module is used for acquiring a second voice frequency spectrum, wherein the second voice frequency spectrum is a frequency spectrum obtained by converting a voice signal acquired when a user is in a health state;
the comparison module is used for comparing the first voice frequency spectrum with the second voice frequency spectrum and determining a target frequency band, in which the difference between the first voice frequency spectrum and the second voice frequency spectrum is larger than a preset threshold value, in the first voice frequency spectrum;
the correction module is used for obtaining a reference frequency response curve, calculating the curve value of each frequency point of the first frequency response curve in the target frequency band according to the first frequency response curve and the reference frequency response curve, compensating the curve value of each frequency point of the first frequency response curve to the gain value of the reference frequency response curve level, correcting the target voice signal by adopting the gain value, and outputting the corrected target voice signal.
9. A voice correction apparatus, characterized in that the voice correction apparatus comprises: memory, a processor and a speech modification program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the speech modification method according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a speech modification program which, when executed by a processor, implements the steps of the speech modification method according to any one of claims 1 to 7.
CN202311626468.7A 2023-11-29 2023-11-29 Voice correction method, device, equipment and computer readable storage medium Pending CN117457013A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311626468.7A CN117457013A (en) 2023-11-29 2023-11-29 Voice correction method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311626468.7A CN117457013A (en) 2023-11-29 2023-11-29 Voice correction method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN117457013A true CN117457013A (en) 2024-01-26

Family

ID=89596942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311626468.7A Pending CN117457013A (en) 2023-11-29 2023-11-29 Voice correction method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN117457013A (en)

Similar Documents

Publication Publication Date Title
JP5834449B2 (en) Utterance state detection device, utterance state detection program, and utterance state detection method
US7627470B2 (en) Speaking period detection device, voice recognition processing device, transmission system, signal level control device and speaking period detection method
JP5664480B2 (en) Abnormal state detection device, telephone, abnormal state detection method, and program
JP4836720B2 (en) Noise suppressor
US20090018826A1 (en) Methods, Systems and Devices for Speech Transduction
US20130246059A1 (en) System and method for producing an audio signal
JP2017148431A (en) Cognitive function evaluation system, cognitive function evaluation method, and program
JP5803125B2 (en) Suppression state detection device and program by voice
JP6098149B2 (en) Audio processing apparatus, audio processing method, and audio processing program
JP6695057B2 (en) Cognitive function evaluation device, cognitive function evaluation method, and program
JP6268916B2 (en) Abnormal conversation detection apparatus, abnormal conversation detection method, and abnormal conversation detection computer program
US9078071B2 (en) Mobile electronic device and control method
CN111653281A (en) Method for individualized signal processing of an audio signal of a hearing aid
JP2003319497A (en) Test center system, terminal device, audition compensation method, audition compensation method program recording medium, and program for audition compensation method
JP6468258B2 (en) Voice dialogue apparatus and voice dialogue method
CN117457013A (en) Voice correction method, device, equipment and computer readable storage medium
JP2004279768A (en) Device and method for estimating air-conducted sound
JP4785563B2 (en) Audio processing apparatus and audio processing method
JP6233867B2 (en) Dictionary registration system for speech recognition, speech recognition system, speech recognition service system, method and program
JP6197367B2 (en) Communication device and masking sound generation program
CN113411715B (en) Prompting method for speaking sound volume, earphone and readable storage medium
CN110795996B (en) Method, device, equipment and storage medium for classifying heart sound signals
JP2002258899A (en) Method and device for suppressing noise
JP2014106247A (en) Signal processing device, signal processing method, and signal processing program
WO2020039597A1 (en) Signal processing device, voice communication terminal, signal processing method, and signal processing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination