CN110176242A - A kind of recognition methods of tone color, device, computer equipment and storage medium - Google Patents

A kind of recognition methods of tone color, device, computer equipment and storage medium Download PDF

Info

Publication number
CN110176242A
CN110176242A CN201910621995.6A CN201910621995A CN110176242A CN 110176242 A CN110176242 A CN 110176242A CN 201910621995 A CN201910621995 A CN 201910621995A CN 110176242 A CN110176242 A CN 110176242A
Authority
CN
China
Prior art keywords
frequency
signal
energy
frequency point
range
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910621995.6A
Other languages
Chinese (zh)
Inventor
沈俊聪
李泽隆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Li Zhi Network Technology Co Ltd
Original Assignee
Guangzhou Li Zhi Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Li Zhi Network Technology Co Ltd filed Critical Guangzhou Li Zhi Network Technology Co Ltd
Priority to CN201910621995.6A priority Critical patent/CN110176242A/en
Publication of CN110176242A publication Critical patent/CN110176242A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Abstract

The embodiment of the invention provides a kind of recognition methods of tone color, device, computer equipment and storage mediums, this method comprises: determining voice signal;The voice signal is converted into spectrum signal;Calculate the energy of the spectrum signal intermediate-frequeney point;According to the frequency of the Thin interbed pitch signal of the frequency point;The tone color of the voice signal is determined according to the frequency of the pitch signal.Since fundamental tone is consistent with vibration frequency of vocal band or matches, tone color is identified by fundamental tone, it is ensured that the accuracy of tone color, also, the operation of pitch Detection is relatively simple, can reduce operand, improves treatment effeciency.

Description

A kind of recognition methods of tone color, device, computer equipment and storage medium
Technical field
The present embodiments relate to the technology of audio processing more particularly to a kind of recognition methods of tone color, device, computer Equipment and storage medium.
Background technique
Tone color belongs to a kind of sensory attribute, allows hearer according to it is judged that two loudness having the same and pitch out Sound be dissimilar.
Currently, the identification of tone color, the usually tone color of tagged speech signal are carried out to voice signal, and, extract voice The feature of signal, such as MFCC (Mel Frequency Cepstral Coefficient, mel cepstrum coefficients), in this, as instruction Practice sample training machine learning model and uses machine such as SVM (Support Vector Machine, refer to support vector machines) Device learning model identifies the tone color of other voice signals.
But complicated operation for this mode, operand is big, and treatment effeciency is lower.
Summary of the invention
The embodiment of the invention provides a kind of recognition methods of tone color, device, computer equipment and storage mediums, to solve The feature training machine learning model for extracting voice signal identifies tone color, and complicated operation, operand is big, treatment effeciency is lower The problem of.
In a first aspect, the embodiment of the invention provides a kind of recognition methods of tone color, comprising:
Determine voice signal;
The voice signal is converted into spectrum signal;
Calculate the energy of the spectrum signal intermediate-frequeney point;
According to the frequency of the Thin interbed pitch signal of the frequency point;
The tone color of the voice signal is determined according to the frequency of the pitch signal.
Optionally, the determining voice signal, comprising:
Receive audio file;
It is multiframe audio signal by the audio file cutting;
Window function is added to the audio signal;
Voice activity detection is carried out to the audio signal, with recognition of speech signals.
It is optionally, described that the voice signal is converted into spectrum signal, comprising:
Fourier transformation is carried out to the voice signal, obtains spectrum signal, wherein frequency point in the spectrum signal with Complex representation;
The energy for calculating the spectrum signal intermediate-frequeney point, comprising:
Extract the real part and imaginary part in the plural number;
Calculate the real part square and the imaginary part square between and value;
Extracting operation is carried out to described and value, obtains the energy of the frequency point.
Optionally, the frequency of the Thin interbed pitch signal according to the frequency point, comprising:
Search the frequency point that the energy meets preset fundamental tone energy condition, the fundamental tone frequency point as pitch signal;
The fundamental tone frequency point of the pitch signal is converted to the frequency of the pitch signal.
Optionally, the fundamental tone energy condition includes following at least one:
The energy of next frequency point is greater than preset energy threshold;
The energy of adjacent frequency is in rising trend;
Frequency point belonging to energy is converted to the frequency in preset voice frequency range.
Optionally, the frequency point searched the energy and meet preset fundamental tone energy condition, as fundamental tone frequency point, packet It includes:
The frequency point that the energy meets preset fundamental tone energy condition is searched, as candidate frequency point;
The average value for calculating the candidate frequency point, as fundamental tone frequency point.
Optionally, the frequency that the fundamental tone frequency point is converted to pitch signal, comprising:
Determine the sample frequency and quantity of the fundamental tone frequency point;
The ratio between the sample frequency and the quantity is calculated, as Candidate Frequency;
By the fundamental tone frequency point multiplied by the Candidate Frequency, the frequency of pitch signal is obtained.
Optionally, the frequency according to the pitch signal determines the tone color of the voice signal, comprising:
Determine frequency range belonging to the frequency of the pitch signal;
The tone color of the voice signal is determined based on the frequency range.
Optionally, the tone color that the voice signal is determined based on the frequency range, comprising:
If the frequency range is the first range, it is determined that the tone color of the voice signal is Loli's sound;
If the frequency range is the second range, it is determined that the tone color of the voice signal is maiden's sound, wherein described the One range is greater than second range;
If the frequency range is third range, it is determined that the tone color of the voice signal is imperial elder sister's sound, wherein described the Two ranges are greater than the third range;
If the frequency range is the 4th range, it is determined that the tone color of the voice signal is queen's sound, wherein described the Three ranges are greater than the 4th range;
If the frequency range is the 5th range, it is determined that the tone color of the voice signal is juvenile sound, wherein described the Four ranges are greater than the 5th range;
If the frequency range is the 6th range, it is determined that the tone color of the voice signal is positive too sound, wherein described the Five ranges are greater than the 6th range;
If the frequency range is the 7th range, it is determined that the tone color of the voice signal is young sound, wherein described the Six ranges are greater than the 7th range;
If the frequency range is the 8th range, it is determined that the tone color of the voice signal is uncle's sound, wherein described the Seven ranges are greater than the 8th range.
Second aspect, the embodiment of the invention also provides a kind of identification devices of tone color, comprising:
Voice signal determining module, for determining voice signal;
Spectrum signal conversion module, for the voice signal to be converted to spectrum signal;
Energy computation module, for calculating the energy of the spectrum signal intermediate-frequeney point;
Frequency identification module, for the frequency according to the Thin interbed pitch signal of the frequency point;
Tone color determining module, for determining the tone color of the voice signal according to the frequency of the pitch signal.
Optionally, the voice signal determining module includes:
Audio file receiving submodule, for receiving audio file;
Audio signal cutting submodule, for being multiframe audio signal by the audio file cutting;
Audio signal adds submodule, for adding window function to the audio signal;
Voice activity detection submodule, for carrying out voice activity detection to the audio signal, with recognition of speech signals.
Optionally, the spectrum signal conversion module includes:
Fourier transformation submodule obtains spectrum signal for carrying out Fourier transformation to the voice signal, wherein Frequency point in the spectrum signal is with complex representation;
The energy computation module includes:
Plural extracting sub-module, for extracting real part and imaginary part in the plural number;
With value computational submodule, for calculate the real part square and the imaginary part square between and value;
Extracting operation submodule obtains the energy of the frequency point for carrying out extracting operation to described and value.
Optionally, the frequency identification module includes:
Fundamental tone frequency point searches submodule, and the frequency point of preset fundamental tone energy condition is met for searching the energy, as The fundamental tone frequency point of pitch signal;
Fundamental tone frequency point transform subblock, the frequency of the pitch signal is converted to for the fundamental tone frequency point by the pitch signal Rate.
Optionally, the fundamental tone energy condition includes following at least one:
The energy of next frequency point is greater than preset energy threshold;
The energy of adjacent frequency is in rising trend;
Frequency point belonging to energy is converted to the frequency in preset voice frequency range.
Optionally, the fundamental tone frequency point lookup submodule includes:
Candidate frequency point searching unit meets the frequency point of preset fundamental tone energy condition for searching the energy, as time Frequency-selecting point;
Average calculation unit, for calculating the average value of the candidate frequency point, as fundamental tone frequency point.
Optionally, the fundamental tone frequency point transform subblock includes:
Parameter determination unit, for determining the sample frequency and quantity of the fundamental tone frequency point;
Ratio calculation unit, for calculating the ratio between the sample frequency and the quantity, as Candidate Frequency;
Frequency obtaining unit, for the fundamental tone frequency point multiplied by the Candidate Frequency, to be obtained the frequency of pitch signal.
Optionally, the tone color determining module includes:
Frequency range belongs to submodule, for determining frequency range belonging to the frequency of the pitch signal;
Frequency range determines submodule, for determining the tone color of the voice signal based on the frequency range.
Optionally, the frequency range determines that submodule includes:
First range determination unit, if being the first range for the frequency range, it is determined that the sound of the voice signal Color is Loli's sound;
Second range determination unit, if being the second range for the frequency range, it is determined that the sound of the voice signal Color is maiden's sound, wherein first range is greater than second range;
Third range determination unit, if being third range for the frequency range, it is determined that the sound of the voice signal Color is imperial elder sister's sound, wherein second range is greater than the third range;
4th range determination unit, if being the 4th range for the frequency range, it is determined that the sound of the voice signal Color is queen's sound, wherein the third range is greater than the 4th range;
5th range determination unit, if being the 5th range for the frequency range, it is determined that the sound of the voice signal Color is juvenile sound, wherein the 4th range is greater than the 5th range;
6th range determination unit, if being the 6th range for the frequency range, it is determined that the sound of the voice signal Color is positive too sound, wherein the 5th range is greater than the 6th range;
7th range determination unit, if being the 7th range for the frequency range, it is determined that the sound of the voice signal Color is young sound, wherein the 6th range is greater than the 7th range;
8th range determination unit, if being the 8th range for the frequency range, it is determined that the sound of the voice signal Color is uncle's sound, wherein the 7th range is greater than the 8th range.
The third aspect the embodiment of the invention also provides a kind of computer equipment, including processor, memory and is stored in On the memory and the computer program that can run on the processor, the computer program are executed by the processor The step of recognition methods of the described in any item tone colors of Shi Shixian such as first aspect.
Fourth aspect, it is described computer-readable to deposit the embodiment of the invention also provides a kind of computer readable storage medium Computer program is stored on storage media, is realized when the computer program is executed by processor as first aspect is described in any item The step of recognition methods of tone color.
In embodiments of the present invention, it determines voice signal, converts voice signals into spectrum signal, calculate in spectrum signal The energy of frequency point determines voice signal according to the frequency of pitch signal according to the frequency of the Thin interbed pitch signal of frequency point Tone color identifies tone color by fundamental tone since fundamental tone is consistent with vibration frequency of vocal band or matches, it is ensured that tone color it is accurate Property, also, the operation of pitch Detection is relatively simple, can reduce operand, improves treatment effeciency.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the recognition methods for tone color that the embodiment of the present invention one provides;
Fig. 2 is a kind of structural schematic diagram of the identification device of tone color provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention;
Fig. 4 is the structural schematic diagram of another computer equipment provided in an embodiment of the present invention.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of flow chart of the recognition methods for tone color that the embodiment of the present invention one provides, and the present embodiment is applicable to The case where identifying tone color by fundamental tone, furthermore, when sounding body is made a sound due to vibration, sound can generally divide Solution is many simple sine waves, that is to say, that all natural sounds are substantially the sinusoidal wave component different by many frequencies , wherein the minimum sine wave of frequency is fundamental tone, and the higher sine wave of other frequencies is then overtone.
This method can be executed by the identification device of tone color, and the identification device of the tone color can be by software and/or hardware It realizes, is configurable in computer equipment, for example, server, work station, mobile terminal (such as mobile phone, tablet computer, a number Word assistant etc.), intelligent wearable device (such as smartwatch, intelligent glasses) etc..
As shown in Figure 1, this method specifically comprises the following steps:
Step 101 determines voice signal.
Sound is a kind of wave, can be heard by human ear, and for vibration frequency between 20Hz-20kHz, voice is one kind of sound, It is that sound that organ issues, with certain grammer and meaning occurs by people, and the vibration frequency of voice reaches as high as 15kHz Left and right.
Voice signal can refer to the digital signal for carrying voice.
In the concrete realization, it can receive audio file, voice data, the short-sighted frequency, etc. recorded such as user.
Framing is carried out to audio file, to be multiframe audio signal, the length of every frame audio signal by audio file cutting Degree can be 50ms or so.
Subsequent to be converted to spectrum signal to voice signal, which belongs to analysis spectrum, is the approximation of actual spectrum. If sampling is improper, the signal energy of a certain frequency can be diffused on adjacent frequency, spectrum leakage phenomenon occur.
In order to reduce spectrum leakage, can to audio signal add window function, for example, quarter window, Hanning window (hanning), Hamming window, Gaussian window etc..
Voice activity detection (Voice Activity Detection, VAD) is carried out to audio signal, to identify that voice is believed Number.
If certain frame audio signal is voice signal, pitch Detection is carried out to the frame audio signal, if certain frame audio signal It is not voice signal, then skips the frame audio signal.
The voice signal is converted to spectrum signal by step 102.
In embodiments of the present invention, the sample audio signal that will be indicated under time domain is converted to the frequency spectrum indicated under frequency domain Signal.
In the concrete realization, Fourier transformation can be carried out to voice signal, obtains spectrum signal, the frequency in the spectrum signal Point is with complex representation.
Wherein, Fourier transformation may include FT (Fourier Transformation, Fourier transformation), FFT (Fast Fourier Transformation, Fast Fourier Transform (FFT)) etc..
Step 103, the energy for calculating the spectrum signal intermediate-frequeney point.
In the concrete realization, since the frequency point in spectrum signal is with complex representation, then can extract real part in plural number with Imaginary part, calculate real part square and imaginary part square between and value, to value carry out extracting operation, obtain the energy of frequency point.
Furthermore, the energy of spectrum signal intermediate-frequeney point can be calculated by following formula:
Wherein, EkFor the energy of k-th of frequency point, k-th of frequency point is after Fourier transformation with complex representation, akBy in plural number Real part, bkFor the imaginary part in plural number.
Assuming that the frequency point quantity in each frame voice signal is L, then the value of k is 1-L.
Step 104, the frequency according to the Thin interbed pitch signal of the frequency point.
In the concrete realization, pitch Detection is carried out by the energy of frequency point, to identify the frequency of pitch signal.
In one preferred embodiment of the invention, step 104 may include steps of:
S11, the frequency point that the energy meets preset fundamental tone energy condition, the fundamental tone frequency point as pitch signal are searched.
The ENERGY E of frequency pointkIt is saved in an array, which is traversed by with fundamental tone energy condition, thus Acquire the fundamental tone frequency point of pitch signal.
In the concrete realization, fundamental tone energy condition includes following at least one:
1, the energy of next frequency point is greater than preset energy threshold
For example, Ek+1> 30000, indicate that this frequency point starts the position for being likely to fundamental tone frequency point.
2, the energy of adjacent frequency is in rising trend
For example,The energy for indicating this frequency point is to rise.
3, frequency point belonging to energy is converted to the frequency in preset voice frequency range.
For example, 80Hz < k*fs/ L < 600Hz, wherein fsFor sample frequency, L is the frequency point quantity in voice signal, 80Hz For the lower-frequency limit of normal person's sound, 600Hz is the upper frequency limit of normal person's sound.
If meeting above-mentioned fundamental tone energy condition, it can be confirmed that k is the fundamental tone frequency point of pitch signal.
Certainly, above-mentioned fundamental tone energy condition is intended only as example, in implementing the embodiments of the present invention, can be according to practical feelings Other fundamental tone energy conditions are arranged in condition, and the embodiments of the present invention are not limited thereto.In addition, in addition to above-mentioned fundamental tone energy condition Outside, those skilled in the art can also use other fundamental tone energy conditions according to actual needs, the embodiment of the present invention to this not yet It limits.
Furthermore, it is possible to which further confirming to pitch signal can by entire audio file traversal fundamental tone energy condition To obtain multiple frequency points for meeting fundamental tone energy condition, at this point, the frequency point that energy meets preset fundamental tone energy condition can be searched, As candidate frequency point, it is saved in array.
The average value for calculating the candidate frequency point, as fundamental tone frequency point.
S12, the frequency that the fundamental tone frequency point of the pitch signal is converted to the pitch signal.
In the concrete realization, can determine the sample frequency and quantity of fundamental tone frequency point, calculate sample frequency and state quantity it Between ratio, obtain the frequency of pitch signal by fundamental tone frequency point multiplied by Candidate Frequency as Candidate Frequency.
Furthermore, the frequency of pitch signal can be calculated by following formula:
F=k*fs/L
Wherein, f is the frequency of pitch signal, and k is the frequency point (numerical value for referring to frequency point) of pitch signal, fsFor fundamental tone frequency point Sample frequency, L are the quantity of fundamental tone frequency point.
It should be noted that L is same value, i.e., the frequency point quantity of each frame voice signal is identical.
Step 105, the tone color that the voice signal is determined according to the frequency of the pitch signal.
For the frequency of different pitch signals, then the tone color of voice signal can be confirmed.
In the concrete realization, the mapping relations between frequency range and tone color can be preset, accordingly, it can be determined that fundamental tone Frequency range belonging to the frequency of signal determines the tone color of voice signal based on frequency range.
In general, tone color includes male tone color (such as uncle's sound, young sound, juvenile sound, just too sound etc.), female tone color (such as trailing plants Jasmine sound drives elder sister's sound, Shao Nvyin, queen's sound etc.), the frequency range of male timbre map is lower than the frequency range of female's timbre map.
In one example, tone color can be divided into following classification: great Shu Yin, young sound, juvenile sound, just too sound, Loli Sound drives elder sister's sound, Shao Nvyin, Nv Wangyin.
In this example, if frequency range is the first range, such as f > 400, it is determined that the tone color of voice signal is Loli's sound.
If frequency range is the second range, such as 320 < f < 400, it is determined that the tone color of voice signal is maiden's sound, wherein the One range is greater than the second range;
If frequency range is third range, such as 250 < f < 320, it is determined that the tone color of voice signal is imperial elder sister's sound, wherein the Two ranges are greater than third range;
If frequency range is the 4th range, such as 180 < f < 250, it is determined that the tone color of voice signal is queen's sound, wherein the Three ranges are greater than the 4th range;
If frequency range is the 5th range, such as 150 < f < 180, it is determined that the tone color of voice signal is juvenile sound, wherein the Four ranges are greater than the 5th range;
If frequency range is the 6th range, such as 130 < f < 150, it is determined that the tone color of voice signal is positive too sound, wherein the Five ranges are greater than the 6th range;
If frequency range is the 7th range, such as 110 < f < 130, it is determined that the tone color of voice signal is young sound, wherein the Six ranges are greater than the 7th range;
If frequency range is the 8th range, such as 80 < f < 110, it is determined that the tone color of voice signal is uncle's sound, wherein the Seven ranges are greater than the 8th range.
It should be noted that everyone sound has the attribute of its tone color, tone color can characterize year on certain probability Age, but do not represent the age.
Certainly, said frequencies range and its tone color are intended only as example, in implementing the embodiments of the present invention, can be according to reality Other frequency ranges and its tone color is arranged in border situation, and the embodiments of the present invention are not limited thereto.In addition, in addition to said frequencies model Enclose and its tone color outside, those skilled in the art can also use other frequency ranges and its tone color according to actual needs, the present invention Embodiment is also without restriction to this.
In embodiments of the present invention, it determines voice signal, converts voice signals into spectrum signal, calculate in spectrum signal The energy of frequency point determines voice signal according to the frequency of pitch signal according to the frequency of the Thin interbed pitch signal of frequency point Tone color identifies tone color by fundamental tone since fundamental tone is consistent with vibration frequency of vocal band or matches, it is ensured that tone color it is accurate Property, also, the operation of pitch Detection is relatively simple, can reduce operand, improves treatment effeciency.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented Necessary to example.
Embodiment two
Fig. 2 is a kind of structural schematic diagram of the identification device of tone color provided by Embodiment 2 of the present invention, which specifically may be used To include following module:
Voice signal determining module 201, for determining voice signal;
Spectrum signal conversion module 202, for the voice signal to be converted to spectrum signal;
Energy computation module 203, for calculating the energy of the spectrum signal intermediate-frequeney point;
Frequency identification module 204, for the frequency according to the Thin interbed pitch signal of the frequency point;
Tone color determining module 205, for determining the tone color of the voice signal according to the frequency of the pitch signal.
In one preferred embodiment of the invention, the voice signal determining module 201 includes:
Audio file receiving submodule, for receiving audio file;
Audio signal cutting submodule, for being multiframe audio signal by the audio file cutting;
Audio signal adds submodule, for adding window function to the audio signal;
Voice activity detection submodule, for carrying out voice activity detection to the audio signal, with recognition of speech signals.
In one preferred embodiment of the invention, the spectrum signal conversion module 202 includes:
Fourier transformation submodule obtains spectrum signal for carrying out Fourier transformation to the voice signal, wherein Frequency point in the spectrum signal is with complex representation;
The energy computation module 203 includes:
Plural extracting sub-module, for extracting real part and imaginary part in the plural number;
With value computational submodule, for calculate the real part square and the imaginary part square between and value;
Extracting operation submodule obtains the energy of the frequency point for carrying out extracting operation to described and value.
In one preferred embodiment of the invention, the frequency identification module 204 includes:
Fundamental tone frequency point searches submodule, and the frequency point of preset fundamental tone energy condition is met for searching the energy, as The fundamental tone frequency point of pitch signal;
Fundamental tone frequency point transform subblock, the frequency of the pitch signal is converted to for the fundamental tone frequency point by the pitch signal Rate.
In a preferred example of an embodiment of the present invention, the fundamental tone energy condition includes following at least one:
The energy of next frequency point is greater than preset energy threshold;
The energy of adjacent frequency is in rising trend;
Frequency point belonging to energy is converted to the frequency in preset voice frequency range.
In one preferred embodiment of the invention, the fundamental tone frequency point lookup submodule includes:
Candidate frequency point searching unit meets the frequency point of preset fundamental tone energy condition for searching the energy, as time Frequency-selecting point;
Average calculation unit, for calculating the average value of the candidate frequency point, as fundamental tone frequency point.
In one preferred embodiment of the invention, the fundamental tone frequency point transform subblock includes:
Parameter determination unit, for determining the sample frequency and quantity of the fundamental tone frequency point;
Ratio calculation unit, for calculating the ratio between the sample frequency and the quantity, as Candidate Frequency;
Frequency obtaining unit, for the fundamental tone frequency point multiplied by the Candidate Frequency, to be obtained the frequency of pitch signal.
In one preferred embodiment of the invention, the tone color determining module 205 includes:
Frequency range belongs to submodule, for determining frequency range belonging to the frequency of the pitch signal;
Frequency range determines submodule, for determining the tone color of the voice signal based on the frequency range.
In a preferred example of an embodiment of the present invention, the frequency range determines that submodule includes:
First range determination unit, if being the first range for the frequency range, it is determined that the sound of the voice signal Color is Loli's sound;
Second range determination unit, if being the second range for the frequency range, it is determined that the sound of the voice signal Color is maiden's sound, wherein first range is greater than second range;
Third range determination unit, if being third range for the frequency range, it is determined that the sound of the voice signal Color is imperial elder sister's sound, wherein second range is greater than the third range;
4th range determination unit, if being the 4th range for the frequency range, it is determined that the sound of the voice signal Color is queen's sound, wherein the third range is greater than the 4th range;
5th range determination unit, if being the 5th range for the frequency range, it is determined that the sound of the voice signal Color is juvenile sound, wherein the 4th range is greater than the 5th range;
6th range determination unit, if being the 6th range for the frequency range, it is determined that the sound of the voice signal Color is positive too sound, wherein the 5th range is greater than the 6th range;
7th range determination unit, if being the 7th range for the frequency range, it is determined that the sound of the voice signal Color is young sound, wherein the 6th range is greater than the 7th range;
8th range determination unit, if being the 8th range for the frequency range, it is determined that the sound of the voice signal Color is uncle's sound, wherein the 7th range is greater than the 8th range.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.
In embodiments of the present invention, it determines voice signal, converts voice signals into spectrum signal, calculate in spectrum signal The energy of frequency point determines voice signal according to the frequency of pitch signal according to the frequency of the Thin interbed pitch signal of frequency point Tone color identifies tone color by fundamental tone since fundamental tone is consistent with vibration frequency of vocal band or matches, it is ensured that tone color it is accurate Property, also, the operation of pitch Detection is relatively simple, can reduce operand, improves treatment effeciency.
Fig. 3 is a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention.The computer equipment 300 includes Server, work station etc. can generate bigger difference because configuration or performance are different, may include in one or more Central processor (central processing units, CPU) 322 (for example, one or more processors) and memory 332, one or more storage application programs 342 or data 344 storage medium 330 (such as one or more sea Amount storage equipment).Wherein, memory 332 and storage medium 330 can be of short duration storage or persistent storage.Storage is stored in be situated between The program of matter 330 may include one or more modules (diagram does not mark), and each module may include in server Series of instructions operation.Further, central processing unit 322 can be set to communicate with storage medium 330, in server The series of instructions operation in storage medium 330 is executed on 300.
Server 300 can also include one or more power supplys 326, one or more wired or wireless networks Interface 350, one or more input/output interfaces 358, one or more keyboards 356, and/or, one or one The above operating system 341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
Fig. 4 is the structural schematic diagram of another computer equipment provided in an embodiment of the present invention.
Computer equipment 400 includes but is not limited to: radio frequency unit 401, network module 402, audio output unit 403, defeated Enter unit 404, sensor 405, display unit 406, user input unit 407, interface unit 408, memory 409, processor The components such as 410 and power supply 411.It will be understood by those skilled in the art that the not structure of computer equipment structure shown in Fig. 4 The restriction of pairs of computer equipment, computer equipment may include than illustrating more or fewer components, or the certain portions of combination Part or different component layouts.In embodiments of the present invention, computer equipment includes but is not limited to mobile phone, tablet computer, pen Remember this computer, palm PC, car-mounted terminal, wearable device and pedometer etc..
It should be understood that the embodiment of the present invention in, radio frequency unit 401 can be used for receiving and sending messages or communication process in, signal Send and receive, specifically, by from base station downlink data receive after, to processor 410 handle;In addition, by uplink Data are sent to base station.In general, radio frequency unit 401 includes but is not limited to antenna, at least one amplifier, transceiver, coupling Device, low-noise amplifier, duplexer etc..In addition, radio frequency unit 401 can also by wireless communication system and network and other set Standby communication.
Computer equipment provides wireless broadband internet by network module 402 for user and accesses, and such as helps user It sends and receive e-mail, browse webpage and access streaming video etc..
Audio output unit 403 can be received by radio frequency unit 401 or network module 402 or in memory 409 The audio data of storage is converted into audio signal and exports to be sound.Moreover, audio output unit 403 can also be provided and be counted The relevant audio output of specific function of the execution of machine equipment 400 is calculated (for example, call signal receives sound, message sink sound etc. Deng).Audio output unit 403 includes loudspeaker, buzzer and receiver etc..
Input unit 404 is for receiving audio or video signal.Input unit 404 may include graphics processor (Graphics Processing Unit, GPU) 4041 and microphone 4042, graphics processor 4041 is in video acquisition mode Or the image data of the static images or video obtained in image capture mode by image capture apparatus (such as camera) carries out Reason.Treated, and picture frame may be displayed on display unit 406.Through graphics processor 4041, treated that picture frame can be deposited Storage is sent in memory 409 (or other storage mediums) or via radio frequency unit 401 or network module 402.Mike Wind 4042 can receive sound, and can be audio data by such acoustic processing.Treated audio data can be The format output that mobile communication base station can be sent to via radio frequency unit 401 is converted in the case where telephone calling model.
Computer equipment 400 further includes at least one sensor 405, for example, optical sensor, motion sensor and other Sensor.Specifically, optical sensor includes ambient light sensor and proximity sensor, wherein ambient light sensor can be according to ring The light and shade of border light adjusts the brightness of display panel 4061, proximity sensor can when computer equipment 400 is moved in one's ear, Close display panel 4061 and/or backlight.As a kind of motion sensor, accelerometer sensor can detect in all directions The size of (generally three axis) acceleration, can detect that size and the direction of gravity, can be used to identify computer equipment when static Posture (such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (for example pedometer, strike Hit) etc.;Sensor 405 can also include fingerprint sensor, pressure sensor, iris sensor, molecule sensor, gyroscope, Barometer, hygrometer, thermometer, infrared sensor etc., details are not described herein.
Display unit 406 is for showing information input by user or being supplied to the information of user.Display unit 406 can wrap Display panel 4061 is included, liquid crystal display (Liquid Crystal Display, LCD), Organic Light Emitting Diode can be used Forms such as (Organic Light-Emitting Diode, OLED) configure display panel 4061.
User input unit 407 can be used for receiving the number or character information of input, and generate and computer equipment User setting and the related key signals input of function control.Specifically, user input unit 407 include touch panel 4071 with And other input equipments 4072.Touch panel 4071, also referred to as touch screen collect the touch operation of user on it or nearby (for example user uses any suitable objects or attachment such as finger, stylus on touch panel 4071 or in touch panel 4071 Neighbouring operation).Touch panel 4071 may include both touch detecting apparatus and touch controller.Wherein, touch detection Device detects the touch orientation of user, and detects touch operation bring signal, transmits a signal to touch controller;Touch control Device processed receives touch information from touch detecting apparatus, and is converted into contact coordinate, then gives processor 410, receiving area It manages the order that device 410 is sent and is executed.Furthermore, it is possible to more using resistance-type, condenser type, infrared ray and surface acoustic wave etc. Seed type realizes touch panel 4071.In addition to touch panel 4071, user input unit 407 can also include other input equipments 4072.Specifically, other input equipments 4072 can include but is not limited to physical keyboard, function key (such as volume control button, Switch key etc.), trace ball, mouse, operating stick, details are not described herein.
Further, touch panel 4071 can be covered on display panel 4061, when touch panel 4071 is detected at it On or near touch operation after, send processor 410 to determine the type of touch event, be followed by subsequent processing device 410 according to touching The type for touching event provides corresponding visual output on display panel 4061.Although in Fig. 4, touch panel 4071 and display Panel 4061 is the function that outputs and inputs of realizing computer equipment as two independent components, but in some embodiments In, touch panel 4071 and display panel 4061 can be integrated and be realized the function that outputs and inputs of computer equipment, specifically Herein without limitation.
Interface unit 408 is the interface that external device (ED) is connect with computer equipment 400.For example, external device (ED) may include Wired or wireless headphone port, external power supply (or battery charger) port, wired or wireless data port, storage card Port, port, the port audio input/output (I/O), video i/o port, earphone for connecting the device with identification module Port etc..Interface unit 408 can be used for receiving the input (for example, data information, electric power etc.) from external device (ED) simultaneously And by one or more elements that the input received is transferred in computer equipment 400 or it can be used in computer equipment Data are transmitted between 400 and external device (ED).
Memory 409 can be used for storing software program and various data.Memory 409 can mainly include storing program area The storage data area and, wherein storing program area can (such as the sound of application program needed for storage program area, at least one function Sound playing function, image player function etc.) etc.;Storage data area can store according to mobile phone use created data (such as Audio data, phone directory etc.) etc..In addition, memory 409 may include high-speed random access memory, it can also include non-easy The property lost memory, a for example, at least disk memory, flush memory device or other volatile solid-state parts.
Processor 410 is the control centre of computer equipment, utilizes various interfaces and the entire computer equipment of connection Various pieces, by running or execute the software program and/or module that are stored in memory 409, and call and be stored in Data in memory 409 execute the various functions and processing data of computer equipment, to carry out to computer equipment whole Monitoring.Processor 410 may include one or more processing units;Preferably, processor 410 can integrate application processor and modulation Demodulation processor, wherein the main processing operation system of application processor, user interface and application program etc., modulation /demodulation processing Device mainly handles wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 410.
Computer equipment 400 can also include the power supply 411 (such as battery) powered to all parts, it is preferred that power supply 411 can be logically contiguous by power-supply management system and processor 410, thus charged by power-supply management system realization management, The functions such as electric discharge and power managed.
In addition, computer equipment 400 includes some unshowned functional modules, details are not described herein.
Preferably, the embodiment of the present invention also provides a kind of computer equipment, including processor, and memory is stored in storage On device and the computer program that can run on the processor, the computer program realize above-mentioned tone color when being executed by processor Recognition methods embodiment each process, and identical technical effect can be reached, to avoid repeating, which is not described herein again.
The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium Calculation machine program, the computer program realize each process of the recognition methods embodiment of above-mentioned tone color when being executed by processor, and Identical technical effect can be reached, to avoid repeating, which is not described herein again.Wherein, the computer readable storage medium, such as Read-only memory (Read-Only Memory, abbreviation ROM), random access memory (Random Access Memory, abbreviation RAM), magnetic or disk etc..
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or device.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal (can be mobile phone, computer, service Device, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form belongs within protection of the invention.

Claims (10)

1. a kind of recognition methods of tone color characterized by comprising
Determine voice signal;
The voice signal is converted into spectrum signal;
Calculate the energy of the spectrum signal intermediate-frequeney point;
According to the frequency of the Thin interbed pitch signal of the frequency point;
The tone color of the voice signal is determined according to the frequency of the pitch signal.
2. the method according to claim 1, wherein the determining voice signal, comprising:
Receive audio file;
It is multiframe audio signal by the audio file cutting;
Window function is added to the audio signal;
Voice activity detection is carried out to the audio signal, with recognition of speech signals.
3. the method according to claim 1, wherein
It is described that the voice signal is converted into spectrum signal, comprising:
Fourier transformation is carried out to the voice signal, obtains spectrum signal, wherein the frequency point in the spectrum signal is with plural number It indicates;
The energy for calculating the spectrum signal intermediate-frequeney point, comprising:
Extract the real part and imaginary part in the plural number;
Calculate the real part square and the imaginary part square between and value;
Extracting operation is carried out to described and value, obtains the energy of the frequency point.
4. method according to claim 1 or 2 or 3, which is characterized in that the Thin interbed fundamental tone according to the frequency point The frequency of signal, comprising:
Search the frequency point that the energy meets preset fundamental tone energy condition, the fundamental tone frequency point as pitch signal;
The fundamental tone frequency point of the pitch signal is converted to the frequency of the pitch signal.
5. according to the method described in claim 4, it is characterized in that, the fundamental tone energy condition includes following at least one:
The energy of next frequency point is greater than preset energy threshold;
The energy of adjacent frequency is in rising trend;
Frequency point belonging to energy is converted to the frequency in preset voice frequency range.
6. according to the method described in claim 4, it is characterized in that, the lookup energy meets preset fundamental tone energy bar The frequency point of part, as fundamental tone frequency point, comprising:
The frequency point that the energy meets preset fundamental tone energy condition is searched, as candidate frequency point;
The average value for calculating the candidate frequency point, as fundamental tone frequency point.
7. according to the method described in claim 4, it is characterized in that, the frequency that the fundamental tone frequency point is converted to pitch signal Rate, comprising:
Determine the sample frequency and quantity of the fundamental tone frequency point;
The ratio between the sample frequency and the quantity is calculated, as Candidate Frequency;
By the fundamental tone frequency point multiplied by the Candidate Frequency, the frequency of pitch signal is obtained.
8. a kind of identification device of tone color characterized by comprising
Voice signal determining module, for determining voice signal;
Spectrum signal conversion module, for the voice signal to be converted to spectrum signal;
Energy computation module, for calculating the energy of the spectrum signal intermediate-frequeney point;
Frequency identification module, for the frequency according to the Thin interbed pitch signal of the frequency point;
Tone color determining module, for determining the tone color of the voice signal according to the frequency of the pitch signal.
9. a kind of computer equipment, which is characterized in that including processor, memory and be stored on the memory and can be in institute The computer program run on processor is stated, such as claim 1 to 7 is realized when the computer program is executed by the processor Any one of described in tone color recognition methods the step of.
10. a kind of computer readable storage medium, which is characterized in that store computer journey on the computer readable storage medium Sequence realizes the recognition methods of the tone color as described in any one of claims 1 to 7 when the computer program is executed by processor The step of.
CN201910621995.6A 2019-07-10 2019-07-10 A kind of recognition methods of tone color, device, computer equipment and storage medium Pending CN110176242A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910621995.6A CN110176242A (en) 2019-07-10 2019-07-10 A kind of recognition methods of tone color, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910621995.6A CN110176242A (en) 2019-07-10 2019-07-10 A kind of recognition methods of tone color, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110176242A true CN110176242A (en) 2019-08-27

Family

ID=67699937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910621995.6A Pending CN110176242A (en) 2019-07-10 2019-07-10 A kind of recognition methods of tone color, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110176242A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826515A (en) * 2019-11-13 2020-02-21 三峡大学 Closed idiosyncrasy singing tone detection device
CN113113052A (en) * 2021-04-08 2021-07-13 深圳市品索科技有限公司 Voice fundamental tone recognition device of discrete points and computer storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102842305A (en) * 2011-06-22 2012-12-26 华为技术有限公司 Method and device for detecting keynote
WO2013168200A1 (en) * 2012-05-11 2013-11-14 パイオニア株式会社 Audio processing device, playback device, audio processing method, and program
CN105575393A (en) * 2015-12-02 2016-05-11 中国传媒大学 Personalized song recommendation method based on voice timbre
CN107170457A (en) * 2017-06-29 2017-09-15 深圳市泰衡诺科技有限公司 Age recognition methods, device and terminal
CN107833581A (en) * 2017-10-20 2018-03-23 广州酷狗计算机科技有限公司 A kind of method, apparatus and readable storage medium storing program for executing of the fundamental frequency for extracting sound
CN107958672A (en) * 2017-12-12 2018-04-24 广州酷狗计算机科技有限公司 The method and apparatus for obtaining pitch waveform data
CN109360583A (en) * 2018-11-13 2019-02-19 无锡冰河计算机科技发展有限公司 A kind of tone color assessment method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102842305A (en) * 2011-06-22 2012-12-26 华为技术有限公司 Method and device for detecting keynote
WO2013168200A1 (en) * 2012-05-11 2013-11-14 パイオニア株式会社 Audio processing device, playback device, audio processing method, and program
CN105575393A (en) * 2015-12-02 2016-05-11 中国传媒大学 Personalized song recommendation method based on voice timbre
CN107170457A (en) * 2017-06-29 2017-09-15 深圳市泰衡诺科技有限公司 Age recognition methods, device and terminal
CN107833581A (en) * 2017-10-20 2018-03-23 广州酷狗计算机科技有限公司 A kind of method, apparatus and readable storage medium storing program for executing of the fundamental frequency for extracting sound
CN107958672A (en) * 2017-12-12 2018-04-24 广州酷狗计算机科技有限公司 The method and apparatus for obtaining pitch waveform data
CN109360583A (en) * 2018-11-13 2019-02-19 无锡冰河计算机科技发展有限公司 A kind of tone color assessment method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
傅柏忻: "《演技教程 表演心理学 最新修订版》", 31 October 2018 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826515A (en) * 2019-11-13 2020-02-21 三峡大学 Closed idiosyncrasy singing tone detection device
CN113113052A (en) * 2021-04-08 2021-07-13 深圳市品索科技有限公司 Voice fundamental tone recognition device of discrete points and computer storage medium
CN113113052B (en) * 2021-04-08 2024-04-05 深圳市品索科技有限公司 Discrete point voice fundamental tone recognition device and computer storage medium

Similar Documents

Publication Publication Date Title
CN110544488B (en) Method and device for separating multi-person voice
CN103578474B (en) A kind of sound control method, device and equipment
CN108735209A (en) Wake up word binding method, smart machine and storage medium
CN109558512A (en) A kind of personalized recommendation method based on audio, device and mobile terminal
CN110096580B (en) FAQ conversation method and device and electronic equipment
CN108511002B (en) Method for recognizing sound signal of dangerous event, terminal and computer readable storage medium
CN110335620A (en) A kind of noise suppressing method, device and mobile terminal
CN107799125A (en) A kind of audio recognition method, mobile terminal and computer-readable recording medium
CN111524501B (en) Voice playing method, device, computer equipment and computer readable storage medium
CN109065060B (en) Voice awakening method and terminal
CN109308178A (en) A kind of voice drafting method and its terminal device
CN107798107A (en) The method and mobile device of song recommendations
CN109754823A (en) A kind of voice activity detection method, mobile terminal
CN111177180A (en) Data query method and device and electronic equipment
CN108989558A (en) The method and device of terminal call
CN110012172A (en) A kind of processing incoming call and terminal equipment
CN110176242A (en) A kind of recognition methods of tone color, device, computer equipment and storage medium
CN110728993A (en) Voice change identification method and electronic equipment
CN109992753A (en) A kind of translation processing method and terminal device
CN109949809A (en) A kind of sound control method and terminal device
CN111292727B (en) Voice recognition method and electronic equipment
CN108520760A (en) A kind of audio signal processing method and terminal
CN110111795B (en) Voice processing method and terminal equipment
CN111145734A (en) Voice recognition method and electronic equipment
CN108494949B (en) A kind of image classification method and mobile terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190827