CN108847217A - A kind of phonetic segmentation method, apparatus, computer equipment and storage medium - Google Patents
A kind of phonetic segmentation method, apparatus, computer equipment and storage medium Download PDFInfo
- Publication number
- CN108847217A CN108847217A CN201810548508.3A CN201810548508A CN108847217A CN 108847217 A CN108847217 A CN 108847217A CN 201810548508 A CN201810548508 A CN 201810548508A CN 108847217 A CN108847217 A CN 108847217A
- Authority
- CN
- China
- Prior art keywords
- frame
- audio data
- mute
- voice
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 46
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000003860 storage Methods 0.000 title claims abstract description 20
- 238000005520 cutting process Methods 0.000 claims abstract description 56
- 238000009432 framing Methods 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims description 27
- 238000004590 computer program Methods 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 14
- 238000007781 pre-processing Methods 0.000 claims description 11
- 230000001105 regulatory effect Effects 0.000 claims description 5
- 230000006870 function Effects 0.000 description 10
- 238000013139 quantization Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 229910002056 binary alloy Inorganic materials 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Telephonic Communication Services (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention discloses a kind of phonetic segmentation method, apparatus, computer equipment and storage medium, the method includes:It obtains voice document to be pre-processed, obtains audio data;Audio data is normalized, and framing is carried out to audio data, calculates the frame energy of every frame speech frame;If the frame energy of speech frame is less than preset frame energy threshold, marking the speech frame is mute frame;If detecting, the quantity of continuous mute frame is greater than preset mute frame amount threshold, and marking it is mute section;According to the cutting frame of mute section of determining voice document, cutting is carried out to voice document using cutting frame, obtains file destination.Technical solution of the present invention does not need manual intervention by using the mode that frame energy carries out phonetic segmentation as cutting standard, and complexity is low, mute and pause in sentence can be accurately identified, realize while accurate cutting voice document, effectively improve cutting efficiency.
Description
Technical field
The present invention relates to voice processing technology field more particularly to a kind of phonetic segmentation method, apparatus, computer equipment and
Storage medium.
Background technique
In speech processes field, carrying out cutting to voice document is a relatively crucial problem, because of longer language
Sound file can generate biggish consumption to system resource during speech recognition conversion, and recognition accuracy is not high.To language
After sound file is split processing, the calculation amount of speech recognition can be reduced and improve the accuracy of identification of speech recognition system.Together
When, to the accuracy of phonetic segmentation will have a direct impact on speech recognition as a result, if there is mistake, voice signal in phonetic segmentation
Identification may just will appear very big deviation, the identification for even resulting in voice signal cannot achieve.
But voice document is when needing to carry out cutting by sentence to voice content at present, is needed mostly using manual type
Manual cutting is carried out, the accuracy rate for causing voice to be divided is not high, or needs to be handled by complicated algorithm, so that cutting
Efficiency it is lower.
Summary of the invention
The embodiment of the present invention provides a kind of phonetic segmentation method, apparatus, computer equipment and storage medium, current to solve
The problem that cutting efficiency is low and accuracy rate is low is carried out to voice document.
A kind of phonetic segmentation method, including:
Obtain voice document to be slit;
Institute's voice file is pre-processed, audio data is obtained, wherein the audio data includes n sampled point
Sampled value, n is positive integer;
The audio data is normalized, obtains the corresponding normal data of the audio data, wherein described
Normal data includes the corresponding standard value of each sampled value;
Sub-frame processing is carried out to the audio data according to preset frame length and preset step-length, obtains K frame speech frame,
In, K is positive integer;
According to the normal data calculate every frame described in speech frame frame energy;
The language is marked if the frame energy of the speech frame is less than preset frame energy threshold for speech frame described in every frame
Sound frame is mute frame;
If detecting, the quantity of continuous mute frame is greater than preset mute frame amount threshold, marks this continuous mute
Frame is mute section;
According to the cutting frame of described mute section determining institute voice file, and using the cutting frame to institute's voice file
Cutting is carried out, file destination is obtained.
A kind of phonetic segmentation device, including:
Voice document obtains module, for obtaining voice document to be slit;
Voice document preprocessing module obtains audio data, wherein institute for pre-processing to institute's voice file
The sampled value that audio data includes n sampled point is stated, n is positive integer;
Audio data processing module obtains the audio data pair for the audio data to be normalized
The normal data answered, wherein the normal data includes the corresponding standard value of each sampled value;
Audio data framing module, for carrying out framing to the audio data according to preset frame length and preset step-length
Processing, obtains K frame speech frame, wherein K is positive integer;
Frame energy computation module, for according to the normal data calculate every frame described in speech frame frame energy;
Mute frame mark module, for being directed to speech frame described in every frame, if the frame energy of the speech frame is less than preset frame
Energy threshold, then marking the speech frame is mute frame;
Mute segment mark module, if for detecting that the quantity of continuous mute frame is greater than preset mute number of frames threshold
Value, then marking the continuous mute frame is mute section;
File destination obtains module, for the cutting frame according to described mute section determining institute voice file, and uses institute
It states cutting frame and cutting is carried out to institute's voice file, obtain file destination.
A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processing
The computer program run on device, the processor realize the step of above-mentioned phonetic segmentation method when executing the computer program
Suddenly.
A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meter
The step of calculation machine program realizes above-mentioned phonetic segmentation method when being executed by processor.
In above-mentioned phonetic segmentation method, apparatus, computer equipment and storage medium, by the way that voice document is carried out at framing
The normalization of reason and audio data, improves the treatment effeciency of voice data, then calculate the frame energy of every frame speech frame,
The short-time rating of speech frame is judged, and determines mute section of audio data according to frame energy, so as to in sentence
Mute and pause accurately identified, cutting is carried out to voice document according to determining cutting frame, is realized to the correct of sentence
Cutting avoids damage to the integrality of sentence, improves the accuracy rate of phonetic segmentation, meanwhile, using frame energy as cutting standard
The mode for carrying out phonetic segmentation does not need manual intervention, and complexity is low, realizes while accurate cutting voice document, has
Effect improves cutting efficiency.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings
Obtain other attached drawings.
Fig. 1 is an application environment schematic diagram of phonetic segmentation method in one embodiment of the invention;
Fig. 2 is a flow chart of phonetic segmentation method in one embodiment of the invention;
Fig. 3 is a specific flow chart of step S2 in Fig. 2;
Fig. 4 is a functional block diagram of phonetic segmentation device in one embodiment of the invention;
Fig. 5 is a schematic diagram of computer equipment in one embodiment of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
Referring to Fig. 1, Fig. 1 shows the application environment of phonetic segmentation method provided in an embodiment of the present invention.The phonetic segmentation
Method is applied in speech recognition system, for training speech recognition modeling.The speech recognition system includes server-side and client
End, wherein be attached between server-side and client by network, user carries out voice input, client by client
It specifically can be, but not limited to be various personal computers, laptop, smart phone, tablet computer and portable wearable set
Standby, server-side can specifically be realized with the server cluster that independent server or multiple servers form.The present invention is implemented
The phonetic segmentation method that example provides is applied to server-side.
Fig. 2 shows a flow charts of phonetic segmentation method in the present embodiment in one of the embodiments, as shown in Fig. 2,
The phonetic segmentation method includes step S1 to step S8, and details are as follows:
S1:Obtain voice document to be slit.
In embodiments of the present invention, voice document to be slit can be obtained from the corpus for voice training, or
It is obtained at the voice document that person is collected by third party's tool, such as wechat or disclosed sound bank, herein with no restrictions.
Further, whether the audio format for the voice document that server-side detection is got is wav format, wherein voice
The audio format of file is that the extension of wav format is entitled " * .wav ", and wav is a kind of lossless audio file formats, can be complete
The data information of ground preservation voice.Specifically, if the audio format of audio file is wav format, which is used for
Identify simultaneously cutting, the audio file is otherwise converted into wav format using audio format converter, by the audio after conversion
File is as voice document to be slit.
S2:Voice document is pre-processed, audio data is obtained, wherein audio data includes the sampling of n sampled point
Value, n is positive integer.
Specifically, after getting voice document to be slit, pulse code modulation scheme is used to voice document
(pulse code modulation, PCM) is encoded, by the analog signal of voice document every the preset time to one
Sampled point is sampled, its discretization is made, which is determined according to the sample frequency of pcm encoder, is specifically adopted
Sample frequency can be set according to historical experience, if sample frequency can be set to 8000Hz per second, indicate acquisition 8000 per second
Sampled signal can also be configured, herein with no restrictions according to practical application.
Further, the sampled signal of n sampled point is quantified, the number in a manner of binary system code character after output quantization
Signal obtains voice signal corresponding with voice document, wherein sampled point is n, n be voice document time span with
The product of sample frequency.
Further, preemphasis processing is carried out for voice signal, enhances the radio-frequency component of voice signal, avoids due to height
Excessive decaying of the frequency component in transmission process, obtained voice is not clear and accurate enough, influences the accuracy of speech recognition, specifically
Time-Domain Technique can be used or frequency domain technique is realized to the preemphasis of voice signal, reinforce the speech energy of voice segments, obtain
Audio data.
S3:Audio data is normalized, obtains the corresponding normal data of audio data, wherein normal data
Including the corresponding standard value of each sampled value.
In embodiments of the present invention, the audio data obtained for step S2, is normalized audio data, tool
Body normalized mode can be the sampled value of each sampled point divided by the maximum value in the sampled value of audio data, can also
With the mean value by the sampled value of each sampled point divided by the sampled value of corresponding audio data, by Data Convergence to specific sections, side
Just data processing is carried out.
Specifically, after normalized, sampled point sampled value each in audio data is converted into corresponding standard
Value, to obtain normal data corresponding with audio data.
S4:Sub-frame processing is carried out to audio data according to preset frame length and preset step-length, obtains K frame speech frame,
In, K is positive integer.
In embodiments of the present invention, according to preset frame length and step-length, the nonoverlapping framing of interframe is carried out to audio data,
Frame length is the length of the speech frame obtained, and step-length is to obtain the time interval of speech frame, when frame length is equal to step-length, is enabled to
Be not in overlapping phenomenon between each speech frame obtained after framing, obtain K frame speech frame, K is the time of voice document
Length, will not frame energy meter to speech frame while improving data-handling efficiency divided by the quotient of the time span of speech frame
It impacts.
Specifically, the value of usual frame length setting can be in the range of 0.01s-0.03s, and the voice in this section of short time is believed
It is number relatively steady, it can also be configured according to the needs of practical application, herein with no restriction.
For example, step-length is set as 0.01s, sample frequency 8000Hz, acquisition 8000 per second if frame length is set as 0.01s
Audio data is then determined as a frame speech frame according to 80 sampled values and carries out sub-frame processing, if last frame by a sampled signal
The sampled value of speech frame is then added the information data that sampled value is 0 to last frame speech frame, so that finally less than 80
One speech frame includes 80 sampled values.
S5:The frame energy of every frame speech frame is calculated according to normal data.
In embodiments of the present invention, frame energy is the short-time energy of voice signal, reflects the voice messaging of speech frame
Data volume is able to carry out by frame energy and judges whether the speech frame is sentence frame or mute frame.
Further, it since the Data Convergence of standard value is relatively good, carries out calculating every frame speech frame using normal data
Frame energy, can be improved data-handling efficiency.
S6:The voice is marked if the frame energy of the speech frame is less than preset frame energy threshold for every frame speech frame
Frame is mute frame.
In embodiments of the present invention, frame energy threshold is preset parameter, if the frame energy being calculated is less than frame
Energy threshold, then be labeled as mute frame for corresponding speech frame, which can specifically be set according to historical experience
It sets, if frame energy threshold is set as 0.5, concrete analysis can also be carried out according to the frame energy that each speech frame is calculated and is set
It sets, herein with no restrictions.
S7:If detecting, the quantity of continuous mute frame is greater than preset mute frame amount threshold, marks this continuous
Mute frame is mute section.
In embodiments of the present invention, mute frame amount threshold is preset parameter, if detecting the presence of continuous quiet
The quantity of sound frame is greater than preset mute frame amount threshold, then marking the continuous mute frame is mute section, the frame energy threshold
It can be specifically configured according to historical experience, it, can also be according to each language be calculated if mute frame amount threshold is set as 5
The frame energy of sound frame carries out concrete analysis setting, herein with no restrictions.
S8:Cutting is carried out to voice document according to the cutting frame of mute section of determining voice document, and using cutting frame, is obtained
File destination.
In embodiments of the present invention, in order to ensure that will not be sliced into voice segments, and it is certain to guarantee that voice segments front and back has
Duration, if the number of continuous frame number is even number, takes continuous frame number using the intermediate frame of mute section of continuous frame number as separation
Intermediate wherein lesser frame number is labeled as cutting frame, can also take among continuous frame number that wherein lesser frame number is labeled as cutting
Frame, herein with no restrictions.
For example, mute frame amount threshold is 5, then screening obtains frame ENERGY E ne1, Ene2 if frame energy threshold is 0.5,
Ene8, Ene13, Ene14, Ene15, Ene16, Ene17, Ene18 are to obtain screening to be less than frame energy threshold less than 0.5
The frame number of speech frame be labeled as mute frame, then filter out the frame number that continuous frame number is greater than 5 frames, by Ene13, Ene14,
The corresponding frame number of Ene15, Ene16, Ene17, Ene18 is labeled as mute section, obtains among continuous frame number wherein lesser frame number,
And the 15th frame speech frame is labeled as cutting frame.
According to the cutting frame of label, audio data is subjected to cutting according to cutting frame, the frame between each cut-off is merged
For an independent voice section, obtain include voice document after multiple cuttings file destination.
In one of the embodiments, by pre-processing to voice document, audio data is obtained, so that voice document
It is converted into the data format that sound card is directly supported.Audio data is divided into multiple by audio data to being normalized again
Speech frame improves the efficiency of data processing.The frame energy of every frame speech frame is calculated according to the corresponding normal data of audio data, if
The frame energy of speech frame is less than preset frame energy threshold, then marking the speech frame is mute frame, further, if the company of detecting
Continuous mute number of frames is greater than preset mute frame amount threshold, then marking the continuous mute frame is mute section, and determination is cut
The frame number of framing finally carries out cutting to voice document according to cutting frame, obtains file destination.By the way that voice document is divided
Frame processing, calculates the frame energy of every frame speech frame, mute section of audio data is determined according to frame energy, so as to sentence
In mute and pause accurately identified, realize the correct cutting to sentence, avoid damage to the integrality of sentence, improve language
The accuracy rate of sound cutting, meanwhile, it carries out not needing manual intervention by the way of phonetic segmentation as cutting standard using frame energy, and
Complexity is low, realizes while accurate cutting voice document, effectively improves cutting efficiency.
The present embodiment proposition in one of the embodiments, pre-processes voice document, obtains the tool of audio data
Body implementation method is described in detail.
Referring to Fig. 3, Fig. 3 shows a specific flow chart of step S2, details are as follows:
S21:Voice document is encoded using pulse code modulation scheme, obtains voice signal.
In embodiments of the present invention, using pulse code modulation scheme to the coding of voice document include sampling, quantization and
The analog signal of continuous time, is converted into the digital signal of discrete amplitudes, and in a manner of binary system code character by the processes such as coding
Digital signal after output quantization obtains voice signal corresponding with voice document.
Specifically, if voice document is monophonic voices file, the sampled value of each sampled point is one 8 without symbol
Number integer, if voice document is the voice document of stereophony, the sampled value of each sampled point is one 16 whole
Number, wherein the specific value range of the data format of 8 and 16 PCM waveform codings is as shown in Table 1.
Table one
Sample type | Data format | Minimum value | Maximum value |
Monophonic voices file | unsigned int | 0 | 225 |
Two-channel voice document | int | -32767 | 32767 |
For example, using the coding mode of PCM, then audio is literary so that monophonic, sample rate is the wav files of 8000Hz as an example
Part coded format is (8000Hz, 8bit, Unsigned), i.e., sample frequency is 8000 times per second, uses 8 signless integers
A sampled value is represented, for the range of sampled value between [0,225], speech volume size is directly proportional to the sampled value of pcm encoder
Relationship, when speech volume is bigger, the level for sampling acquisition is higher, and the signless integer of quantization encoding is bigger.
S22:Preemphasis processing is carried out to voice signal, obtains audio data.
In embodiments of the present invention, voice signal low-frequency range energy is big, and high frequency band signal energy is obviously small, and voice is caused to believe
Number low frequency signal-to-noise ratio it is very big, and signal to noise in high frequency is smaller, weak so as to cause high-frequency transmission, keeps high-frequency transmission difficult.
Specifically, it is aggravated using high frequency band signal of the high-pass filter to voice document, obtains audio data, it can
Enhance the high-frequency energy of voice signal, increases the amplitude of high band voice signal rising edge and falling edge, to increase high frequency
Signal-to-noise ratio improves the quality of voice signal.
Voice document is encoded by using pulse code modulation scheme in one of the embodiments, it will be continuous
The analog signal of time is converted into the digital signal of discrete amplitudes, the number letter in a manner of binary system code character after output quantization
Number, voice signal is obtained, is convenient to the information data of identification with processing voice document, then carry out at preemphasis to voice signal
Reason, enhances the high-frequency energy of voice signal, obtains audio data, improve the quality of voice signal.
What step S3 was referred in one of the embodiments, is normalized audio data, obtains audio data
Corresponding normal data can specifically realize that details are as follows in the following way:
Normal data is calculated according to formula (1):
X=Y/max (Y) formula (1)
Wherein, Y is audio data, and X is normal data, and max (Y) is the amplitude of audio data.
It should be noted that the amplitude of audio data is the maximum value in audio data in the sampled value of each sampled point.
In embodiments of the present invention, the corresponding normal data of audio data can be accurately calculated by formula (1), it will
Data Convergence improves the efficiency of data processing to specific sections.
The frame energy that every frame speech frame is calculated according to normal data that step S5 is referred in one of the embodiments, tool
Body can realize that details are as follows in the following way:
The frame energy of every frame speech frame is calculated according to formula (2):
Ene [i]=A × sum (Xi2) formula (2)
Wherein, Ene [i] is the frame energy of the i-th frame speech frame, and A is preset regulatory factor, sum (Xi2) it is the i-th frame language
The quadratic sum of the standard value for the sampled point for including in sound frame.
It should be noted that A is preset regulatory factor, which is preset according to the characteristic of voice document,
It avoids since the volume of sentence in voice document is too small or ambient noise is excessive, so that sentence and mute discrimination be not high,
And influence the accuracy rate of phonetic segmentation.
In embodiments of the present invention, the frame energy that every frame speech frame can be rapidly calculated by formula (2), reflects
The data volume size of the voice messaging of each speech frame improves the accuracy rate of phonetic segmentation, and can further utilize frame energy
Amount goes to judge whether the speech frame is sentence frame or mute frame.
In one of the embodiments, in step S8 according to the cutting frame of mute section of determining voice document, and according to cutting
Frame carries out cutting to voice document, after obtaining file destination, further can also carry out speech recognition using file destination
Model training, the phonetic segmentation method further include:
Speech recognition modeling training is carried out using file destination.
Specifically, the file destination that the model training module receiving step S8 of speech recognition system is obtained, in batches to target
Voice document in file after cutting is identified corresponding identification information, the model training for speech recognition system.
It should be noted that if the voice document time span used is longer, then in the training process of speech recognition modeling
In due to system automatic aligning etc. influence training effect so that the discrimination for the voice document is not high, and voice
Identify that the longer voice document of transcription time span can generate biggish consumption to system resource.
In embodiments of the present invention, speech recognition modeling training is carried out using file destination, so that being used for speech recognition system
The sentence of voice document in the model training of system is all short sentence, improves the efficiency and accuracy rate of model training.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process
Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit
It is fixed.
In one embodiment, Fig. 4 is shown fills with the one-to-one phonetic segmentation of phonetic segmentation method in above-described embodiment
It sets.For ease of description, only parts related to embodiments of the present invention are shown.
As shown in figure 4, the phonetic segmentation device include voice document obtain module 31, voice document preprocessing module 32,
It is audio data processing module 33, audio data framing module 34, frame energy computation module 35, mute frame mark module 36, mute
Segment mark module 37 and file destination obtain module 38.Detailed description are as follows for each functional module:
Voice document obtains module 31, for obtaining voice document to be slit;
Voice document preprocessing module 32 obtains audio data for pre-processing to voice document, wherein audio
Data include the sampled value of n sampled point, and n is positive integer;
Audio data processing module 33 obtains the corresponding mark of audio data for audio data to be normalized
Quasi- data, wherein normal data includes the corresponding standard value of each sampled value;
Audio data framing module 34, for being carried out at framing according to preset frame length and preset step-length to audio data
Reason, obtains K frame speech frame, wherein K is positive integer;
Frame energy computation module 35, for calculating the frame energy of every frame speech frame according to normal data;
Mute frame mark module 36, for being directed to every frame speech frame, if the frame energy of the speech frame is less than preset frame energy
Threshold value is measured, then marking the speech frame is mute frame;
Mute segment mark module 37, if for detecting that the quantity of continuous mute frame is greater than preset mute number of frames threshold
Value, then marking the continuous mute frame is mute section;
File destination obtains module 38, for the cutting frame according to mute section of determining voice document, and uses cutting frame pair
Voice document carries out cutting, obtains file destination.
Further, voice document preprocessing module 32 includes:
Encoding submodule 321 obtains voice letter for encoding using pulse code modulation scheme to voice document
Number;
Preemphasis submodule 322 obtains audio data for carrying out preemphasis processing to voice signal.
Further, audio data processing module 33 includes:
Normal data computational submodule 331, for calculating normal data according to formula (1):
X=Y/max (Y) formula (1)
Wherein, Y is audio data, and X is normal data, and max (Y) is the amplitude of audio data.
Further, frame energy computation module 35 includes:
Frame energy balane submodule 351, for calculating the frame energy of every frame speech frame according to formula (2):
Ene [i]=A × sum (Xi2) formula (2)
Wherein, Ene [i] is the frame energy of the i-th frame speech frame, and A is preset regulatory factor, sum (Xi2) it is the i-th frame language
The quadratic sum of the standard value for the sampled point for including in sound frame.
Further, which further includes:
Model training module 39, for carrying out speech recognition modeling training using file destination.
Each module realizes the process of respective function in a kind of phonetic segmentation device provided in this embodiment, specifically refers to
The description of embodiment is stated, details are not described herein again.
The present embodiment provides a computer readable storage medium, computer journey is stored on the computer readable storage medium
Sequence, the computer program realize phonetic segmentation method in above-described embodiment when being executed by processor, alternatively, the computer program quilt
Processor realizes the function of each module in phonetic segmentation device in above-described embodiment when executing.It is no longer superfluous here to avoid repeating
It states.
It is to be appreciated that the computer readable storage medium may include:The computer program code can be carried
Any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic disk, CD, computer storage, read-only memory
(Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), electric carrier signal and
Telecommunication signal etc..
Fig. 5 is the schematic diagram for the computer equipment that one embodiment of the invention provides.As shown in figure 5, the calculating of the embodiment
Machine equipment 5 includes:Processor 51, memory 52 and it is stored in the computer that can be run in memory 52 and on processor 51
Program 53.Processor 51 realizes the step in above-mentioned each phonetic segmentation embodiment of the method when executing computer program 53, such as
Step S1 shown in Fig. 2 to step S8.Alternatively, processor 51 realizes that voice is cut in above-described embodiment when executing computer program 53
The function of each module of separating device, such as module 31 shown in Fig. 4 is to the function of module 38.
Illustratively, computer program 53 can be divided into one or more modules, one or more module is deposited
Storage executes in memory 52, and by processor 51, to complete the present invention.One or more modules can be can complete it is specific
The series of computation machine program instruction section of function, the instruction segment is for describing the holding in computer equipment 5 of computer program 53
Row process.For example, computer program 53, which can be divided into voice document, obtains module, voice document preprocessing module, audio
Data processing module, audio data framing module, frame energy computation module, mute frame mark module, mute segment mark module and
File destination obtains module, and the concrete function of each module, to avoid repeating, is not gone to live in the household of one's in-laws on getting married one by one herein as shown in above-mentioned Installation practice
It states.
Computer equipment 5 can be desktop PC, notebook, palm PC and cloud server etc. and calculate equipment.
Computer equipment 5 may include, but be not limited only to, processor 51, memory 52.It will be understood by those skilled in the art that Fig. 5 is only
It is the example of computer equipment 5, does not constitute the restriction to computer equipment 5, may include than illustrating more or fewer portions
Part perhaps combines certain components or different components, such as computer equipment 5 can also include input-output equipment, network
Access device, bus etc..
Alleged processor 51 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng.
Memory 52 can be the internal storage unit of computer equipment 5, such as the hard disk or memory of computer equipment 5.
Memory 52 is also possible to the plug-in type hard disk being equipped on the External memory equipment of computer equipment 5, such as computer equipment 5,
Intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash
Card) etc..Further, memory 52 can also both including computer equipment 5 internal storage unit and also including external storage
Equipment.Memory 52 is for other programs and data needed for storing computer program and computer equipment 5.Memory 52 is also
It can be used for temporarily storing the data that has exported or will export.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function
Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different
Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing
The all or part of function of description.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality
Applying example, invention is explained in detail, those skilled in the art should understand that:It still can be to aforementioned each
Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified
Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all
It is included within protection scope of the present invention.
Claims (10)
1. a kind of phonetic segmentation method, which is characterized in that the phonetic segmentation method includes:
Obtain voice document to be slit;
Institute's voice file is pre-processed, audio data is obtained, wherein the audio data includes adopting for n sampled point
Sample value, n are positive integer;
The audio data is normalized, obtains the corresponding normal data of the audio data, wherein the standard
Data include the corresponding standard value of each sampled value;
Sub-frame processing is carried out to the audio data according to preset frame length and preset step-length, obtains K frame speech frame, wherein K
For positive integer;
According to the normal data calculate every frame described in speech frame frame energy;
The speech frame is marked if the frame energy of the speech frame is less than preset frame energy threshold for speech frame described in every frame
For mute frame;
If detecting, the quantity of continuous mute frame is greater than preset mute frame amount threshold, marks the continuous mute frame to be
Mute section;
Institute's voice file is carried out according to the cutting frame of described mute section determining institute voice file, and using the cutting frame
Cutting obtains file destination.
2. phonetic segmentation method as described in claim 1, which is characterized in that it is described that institute's voice file is pre-processed,
Obtaining audio data includes:
Institute's voice file is encoded using pulse code modulation scheme, obtains voice signal;
Preemphasis processing is carried out to the voice signal, obtains the audio data.
3. phonetic segmentation method as described in claim 1, which is characterized in that it is described to the audio data to being normalized
Processing, obtaining the corresponding normal data of the audio data includes:
The normal data is calculated according to the following formula:
X=Y/max (Y)
Wherein, Y is the audio data, and X is the normal data, and max (Y) is the amplitude of the audio data.
4. phonetic segmentation method as described in claim 1, which is characterized in that described to calculate every frame institute according to the normal data
The frame energy for stating speech frame includes:
The frame energy of speech frame described in every frame is calculated according to the following formula:
Ene [i]=A × sum (Xi2)
Wherein, Ene [i] is the frame energy of the i-th frame speech frame, and A is preset regulatory factor, sum (Xi2) it is in the i-th frame speech frame
The quadratic sum of the standard value for the sampled point for including.
5. phonetic segmentation method as described in claim 1, which is characterized in that according to the mute section of determination voice text
The cutting frame of part, and cutting is carried out to institute's voice file using the cutting frame, after obtaining file destination, the voice is cut
Point method further includes:
Speech recognition modeling training is carried out using the file destination.
6. a kind of phonetic segmentation device, which is characterized in that the phonetic segmentation device includes:
Voice document obtains module, for obtaining voice document to be slit;
Voice document preprocessing module obtains audio data, wherein the sound for pre-processing to institute's voice file
For frequency according to the sampled value comprising n sampled point, n is positive integer;
It is corresponding to obtain the audio data for the audio data to be normalized for audio data processing module
Normal data, wherein the normal data includes the corresponding standard value of each sampled value;
Audio data framing module, for being carried out at framing according to preset frame length and preset step-length to the audio data
Reason, obtains K frame speech frame, wherein K is positive integer;
Frame energy computation module, for according to the normal data calculate every frame described in speech frame frame energy;
Mute frame mark module, for being directed to speech frame described in every frame, if the frame energy of the speech frame is less than preset frame energy
Threshold value, then marking the speech frame is mute frame;
Mute segment mark module, if for detecting that the quantity of continuous mute frame is greater than preset mute frame amount threshold,
Marking the continuous mute frame is mute section;
File destination obtains module, for the cutting frame according to described mute section determining institute voice file, and is cut using described
Framing carries out cutting to institute's voice file, obtains file destination.
7. phonetic segmentation device as claimed in claim 6, which is characterized in that institute's voice file preprocessing module includes:
Encoding submodule obtains voice signal for encoding using pulse code modulation scheme to institute's voice file;
Preemphasis submodule obtains the audio data for carrying out preemphasis processing to the voice signal.
8. phonetic segmentation device as claimed in claim 6, which is characterized in that the frame energy computation module includes:
Frame energy balane submodule, for calculating the frame energy of speech frame described in every frame according to the following formula:
Ene [i]=A × sum (Xi2)
Wherein, Ene [i] is the frame energy of the i-th frame speech frame, and A is preset regulatory factor, sum (Xi2) it is in the i-th frame speech frame
The quadratic sum of the standard value for the sampled point for including.
9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor
The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to
The step of any one of 5 phonetic segmentation method.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists
In the step of realization phonetic segmentation method as described in any one of claim 1 to 5 when the computer program is executed by processor
Suddenly.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810548508.3A CN108847217A (en) | 2018-05-31 | 2018-05-31 | A kind of phonetic segmentation method, apparatus, computer equipment and storage medium |
PCT/CN2018/092566 WO2019227547A1 (en) | 2018-05-31 | 2018-06-25 | Voice segmenting method and apparatus, and computer device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810548508.3A CN108847217A (en) | 2018-05-31 | 2018-05-31 | A kind of phonetic segmentation method, apparatus, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108847217A true CN108847217A (en) | 2018-11-20 |
Family
ID=64210253
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810548508.3A Pending CN108847217A (en) | 2018-05-31 | 2018-05-31 | A kind of phonetic segmentation method, apparatus, computer equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108847217A (en) |
WO (1) | WO2019227547A1 (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109495496A (en) * | 2018-12-11 | 2019-03-19 | 泰康保险集团股份有限公司 | Method of speech processing, device, electronic equipment and computer-readable medium |
CN109840052A (en) * | 2019-01-31 | 2019-06-04 | 成都超有爱科技有限公司 | A kind of audio-frequency processing method, device, electronic equipment and storage medium |
CN109948124A (en) * | 2019-03-15 | 2019-06-28 | 腾讯科技(深圳)有限公司 | Voice document cutting method, device and computer equipment |
CN110457002A (en) * | 2019-07-03 | 2019-11-15 | 平安科技(深圳)有限公司 | A kind of multimedia file processing method, device and computer storage medium |
CN110491370A (en) * | 2019-07-15 | 2019-11-22 | 北京大米科技有限公司 | A kind of voice stream recognition method, device, storage medium and server |
WO2019227547A1 (en) * | 2018-05-31 | 2019-12-05 | 平安科技(深圳)有限公司 | Voice segmenting method and apparatus, and computer device and storage medium |
CN110602302A (en) * | 2019-08-15 | 2019-12-20 | 厦门快商通科技股份有限公司 | Voice intercepting method and device for telephone robot and storage medium |
CN110992989A (en) * | 2019-12-06 | 2020-04-10 | 广州国音智能科技有限公司 | Voice acquisition method and device and computer readable storage medium |
CN111108553A (en) * | 2019-12-24 | 2020-05-05 | 广州国音智能科技有限公司 | Voiceprint detection method, device and equipment for sound collection object |
CN111213205A (en) * | 2019-12-30 | 2020-05-29 | 深圳市优必选科技股份有限公司 | Streaming voice conversion method and device, computer equipment and storage medium |
CN111312219A (en) * | 2020-01-16 | 2020-06-19 | 上海携程国际旅行社有限公司 | Telephone recording marking method, system, storage medium and electronic equipment |
CN111326172A (en) * | 2018-12-17 | 2020-06-23 | 北京嘀嘀无限科技发展有限公司 | Conflict detection method and device, electronic equipment and readable storage medium |
CN111627453A (en) * | 2020-05-13 | 2020-09-04 | 广州国音智能科技有限公司 | Public security voice information management method, device, equipment and computer storage medium |
CN111696526A (en) * | 2020-06-22 | 2020-09-22 | 北京达佳互联信息技术有限公司 | Method for generating voice recognition model, voice recognition method and device |
CN111710332A (en) * | 2020-06-30 | 2020-09-25 | 北京达佳互联信息技术有限公司 | Voice processing method and device, electronic equipment and storage medium |
CN112185390A (en) * | 2020-09-27 | 2021-01-05 | 中国商用飞机有限责任公司北京民用飞机技术研究中心 | Onboard information assisting method and device |
CN112185424A (en) * | 2020-09-29 | 2021-01-05 | 国家计算机网络与信息安全管理中心 | Voice file cutting and restoring method, device, equipment and storage medium |
CN112331188A (en) * | 2019-07-31 | 2021-02-05 | 武汉Tcl集团工业研究院有限公司 | Voice data processing method, system and terminal equipment |
CN112614515A (en) * | 2020-12-18 | 2021-04-06 | 广州虎牙科技有限公司 | Audio processing method and device, electronic equipment and storage medium |
CN112615869A (en) * | 2020-12-22 | 2021-04-06 | 平安银行股份有限公司 | Audio data processing method, device, equipment and storage medium |
CN112712791A (en) * | 2020-12-08 | 2021-04-27 | 深圳市优必选科技股份有限公司 | Mute voice detection method, device, terminal equipment and storage medium |
CN112750453A (en) * | 2020-12-24 | 2021-05-04 | 北京猿力未来科技有限公司 | Audio signal screening method, device, equipment and storage medium |
CN112767920A (en) * | 2020-12-31 | 2021-05-07 | 深圳市珍爱捷云信息技术有限公司 | Method, device, equipment and storage medium for recognizing call voice |
CN113593528A (en) * | 2021-06-30 | 2021-11-02 | 北京百度网讯科技有限公司 | Training method and device of voice segmentation model, electronic equipment and storage medium |
CN113823277A (en) * | 2021-11-23 | 2021-12-21 | 北京百瑞互联技术有限公司 | Keyword recognition method, system, medium, and apparatus based on deep learning |
CN114283840A (en) * | 2021-12-22 | 2022-04-05 | 天翼爱音乐文化科技有限公司 | Instruction audio generation method, system, device and storage medium |
CN116847245A (en) * | 2023-06-30 | 2023-10-03 | 杭州雄迈集成电路技术股份有限公司 | Digital audio automatic gain method, system and computer storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101308653A (en) * | 2008-07-17 | 2008-11-19 | 安徽科大讯飞信息科技股份有限公司 | End-point detecting method applied to speech identification system |
CN103366739A (en) * | 2012-03-28 | 2013-10-23 | 郑州市科学技术情报研究所 | Self-adaptive endpoint detection method and self-adaptive endpoint detection system for isolate word speech recognition |
CN106887241A (en) * | 2016-10-12 | 2017-06-23 | 阿里巴巴集团控股有限公司 | A kind of voice signal detection method and device |
CN107170464A (en) * | 2017-05-25 | 2017-09-15 | 厦门美图之家科技有限公司 | A kind of changing speed of sound method and computing device based on music rhythm |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100505040C (en) * | 2005-07-26 | 2009-06-24 | 浙江大学 | Audio frequency splitting method for changing detection based on decision tree and speaking person |
CN101221762A (en) * | 2007-12-06 | 2008-07-16 | 上海大学 | MP3 compression field audio partitioning method |
CN101685446A (en) * | 2008-09-25 | 2010-03-31 | 索尼(中国)有限公司 | Device and method for analyzing audio data |
CN103345922B (en) * | 2013-07-05 | 2016-07-06 | 张巍 | A kind of large-length voice full-automatic segmentation method |
CN108847217A (en) * | 2018-05-31 | 2018-11-20 | 平安科技(深圳)有限公司 | A kind of phonetic segmentation method, apparatus, computer equipment and storage medium |
-
2018
- 2018-05-31 CN CN201810548508.3A patent/CN108847217A/en active Pending
- 2018-06-25 WO PCT/CN2018/092566 patent/WO2019227547A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101308653A (en) * | 2008-07-17 | 2008-11-19 | 安徽科大讯飞信息科技股份有限公司 | End-point detecting method applied to speech identification system |
CN103366739A (en) * | 2012-03-28 | 2013-10-23 | 郑州市科学技术情报研究所 | Self-adaptive endpoint detection method and self-adaptive endpoint detection system for isolate word speech recognition |
CN106887241A (en) * | 2016-10-12 | 2017-06-23 | 阿里巴巴集团控股有限公司 | A kind of voice signal detection method and device |
CN107170464A (en) * | 2017-05-25 | 2017-09-15 | 厦门美图之家科技有限公司 | A kind of changing speed of sound method and computing device based on music rhythm |
Cited By (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019227547A1 (en) * | 2018-05-31 | 2019-12-05 | 平安科技(深圳)有限公司 | Voice segmenting method and apparatus, and computer device and storage medium |
CN109495496A (en) * | 2018-12-11 | 2019-03-19 | 泰康保险集团股份有限公司 | Method of speech processing, device, electronic equipment and computer-readable medium |
CN111326172A (en) * | 2018-12-17 | 2020-06-23 | 北京嘀嘀无限科技发展有限公司 | Conflict detection method and device, electronic equipment and readable storage medium |
CN109840052B (en) * | 2019-01-31 | 2022-03-18 | 成都超有爱科技有限公司 | Audio processing method and device, electronic equipment and storage medium |
CN109840052A (en) * | 2019-01-31 | 2019-06-04 | 成都超有爱科技有限公司 | A kind of audio-frequency processing method, device, electronic equipment and storage medium |
CN109948124A (en) * | 2019-03-15 | 2019-06-28 | 腾讯科技(深圳)有限公司 | Voice document cutting method, device and computer equipment |
CN109948124B (en) * | 2019-03-15 | 2022-12-23 | 腾讯科技(深圳)有限公司 | Voice file segmentation method and device and computer equipment |
CN110457002A (en) * | 2019-07-03 | 2019-11-15 | 平安科技(深圳)有限公司 | A kind of multimedia file processing method, device and computer storage medium |
CN110491370A (en) * | 2019-07-15 | 2019-11-22 | 北京大米科技有限公司 | A kind of voice stream recognition method, device, storage medium and server |
CN112331188A (en) * | 2019-07-31 | 2021-02-05 | 武汉Tcl集团工业研究院有限公司 | Voice data processing method, system and terminal equipment |
CN110602302A (en) * | 2019-08-15 | 2019-12-20 | 厦门快商通科技股份有限公司 | Voice intercepting method and device for telephone robot and storage medium |
CN110992989A (en) * | 2019-12-06 | 2020-04-10 | 广州国音智能科技有限公司 | Voice acquisition method and device and computer readable storage medium |
CN110992989B (en) * | 2019-12-06 | 2022-05-27 | 广州国音智能科技有限公司 | Voice acquisition method and device and computer readable storage medium |
CN111108553A (en) * | 2019-12-24 | 2020-05-05 | 广州国音智能科技有限公司 | Voiceprint detection method, device and equipment for sound collection object |
CN111213205B (en) * | 2019-12-30 | 2023-09-08 | 深圳市优必选科技股份有限公司 | Stream-type voice conversion method, device, computer equipment and storage medium |
CN111213205A (en) * | 2019-12-30 | 2020-05-29 | 深圳市优必选科技股份有限公司 | Streaming voice conversion method and device, computer equipment and storage medium |
CN111312219A (en) * | 2020-01-16 | 2020-06-19 | 上海携程国际旅行社有限公司 | Telephone recording marking method, system, storage medium and electronic equipment |
CN111312219B (en) * | 2020-01-16 | 2023-11-28 | 上海携程国际旅行社有限公司 | Telephone recording labeling method, system, storage medium and electronic equipment |
CN111627453A (en) * | 2020-05-13 | 2020-09-04 | 广州国音智能科技有限公司 | Public security voice information management method, device, equipment and computer storage medium |
CN111627453B (en) * | 2020-05-13 | 2024-02-09 | 广州国音智能科技有限公司 | Public security voice information management method, device, equipment and computer storage medium |
CN111696526A (en) * | 2020-06-22 | 2020-09-22 | 北京达佳互联信息技术有限公司 | Method for generating voice recognition model, voice recognition method and device |
CN111710332A (en) * | 2020-06-30 | 2020-09-25 | 北京达佳互联信息技术有限公司 | Voice processing method and device, electronic equipment and storage medium |
CN112185390B (en) * | 2020-09-27 | 2023-10-03 | 中国商用飞机有限责任公司北京民用飞机技术研究中心 | On-board information auxiliary method and device |
CN112185390A (en) * | 2020-09-27 | 2021-01-05 | 中国商用飞机有限责任公司北京民用飞机技术研究中心 | Onboard information assisting method and device |
CN112185424A (en) * | 2020-09-29 | 2021-01-05 | 国家计算机网络与信息安全管理中心 | Voice file cutting and restoring method, device, equipment and storage medium |
CN112712791A (en) * | 2020-12-08 | 2021-04-27 | 深圳市优必选科技股份有限公司 | Mute voice detection method, device, terminal equipment and storage medium |
CN112712791B (en) * | 2020-12-08 | 2024-01-12 | 深圳市优必选科技股份有限公司 | Mute voice detection method, mute voice detection device, terminal equipment and storage medium |
CN112614515A (en) * | 2020-12-18 | 2021-04-06 | 广州虎牙科技有限公司 | Audio processing method and device, electronic equipment and storage medium |
CN112614515B (en) * | 2020-12-18 | 2023-11-21 | 广州虎牙科技有限公司 | Audio processing method, device, electronic equipment and storage medium |
CN112615869B (en) * | 2020-12-22 | 2022-08-26 | 平安银行股份有限公司 | Audio data processing method, device, equipment and storage medium |
CN112615869A (en) * | 2020-12-22 | 2021-04-06 | 平安银行股份有限公司 | Audio data processing method, device, equipment and storage medium |
CN112750453A (en) * | 2020-12-24 | 2021-05-04 | 北京猿力未来科技有限公司 | Audio signal screening method, device, equipment and storage medium |
CN112750453B (en) * | 2020-12-24 | 2023-03-14 | 北京猿力未来科技有限公司 | Audio signal screening method, device, equipment and storage medium |
CN112767920A (en) * | 2020-12-31 | 2021-05-07 | 深圳市珍爱捷云信息技术有限公司 | Method, device, equipment and storage medium for recognizing call voice |
CN113593528B (en) * | 2021-06-30 | 2022-05-17 | 北京百度网讯科技有限公司 | Training method and device of voice segmentation model, electronic equipment and storage medium |
CN113593528A (en) * | 2021-06-30 | 2021-11-02 | 北京百度网讯科技有限公司 | Training method and device of voice segmentation model, electronic equipment and storage medium |
CN113823277A (en) * | 2021-11-23 | 2021-12-21 | 北京百瑞互联技术有限公司 | Keyword recognition method, system, medium, and apparatus based on deep learning |
CN114283840A (en) * | 2021-12-22 | 2022-04-05 | 天翼爱音乐文化科技有限公司 | Instruction audio generation method, system, device and storage medium |
CN116847245A (en) * | 2023-06-30 | 2023-10-03 | 杭州雄迈集成电路技术股份有限公司 | Digital audio automatic gain method, system and computer storage medium |
CN116847245B (en) * | 2023-06-30 | 2024-04-09 | 浙江芯劢微电子股份有限公司 | Digital audio automatic gain method, system and computer storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2019227547A1 (en) | 2019-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108847217A (en) | A kind of phonetic segmentation method, apparatus, computer equipment and storage medium | |
US10565983B2 (en) | Artificial intelligence-based acoustic model training method and apparatus, device and storage medium | |
CN102446504B (en) | Voice/Music identifying method and equipment | |
WO2021128741A1 (en) | Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium | |
KR102128926B1 (en) | Method and device for processing audio information | |
CN106653056B (en) | Fundamental frequency extraction model and training method based on LSTM recurrent neural network | |
US9805712B2 (en) | Method and device for recognizing voice | |
EP3989220B1 (en) | Time delay estimation method and device | |
CN104966517A (en) | Voice frequency signal enhancement method and device | |
CN112786029B (en) | Method and apparatus for training VAD using weakly supervised data | |
WO2021196475A1 (en) | Intelligent language fluency recognition method and apparatus, computer device, and storage medium | |
US10147443B2 (en) | Matching device, judgment device, and method, program, and recording medium therefor | |
CN104952449A (en) | Method and device for identifying environmental noise sources | |
CN102376306B (en) | Method and device for acquiring level of speech frame | |
CN112331188A (en) | Voice data processing method, system and terminal equipment | |
CN110111811A (en) | Audio signal detection method, device and storage medium | |
KR101140896B1 (en) | Method and apparatus for speech segmentation | |
CN114023342B (en) | Voice conversion method, device, storage medium and electronic equipment | |
CN114267342A (en) | Recognition model training method, recognition method, electronic device and storage medium | |
CN103474067B (en) | speech signal transmission method and system | |
US10276186B2 (en) | Parameter determination device, method, program and recording medium for determining a parameter indicating a characteristic of sound signal | |
CN112820305A (en) | Encoding device, encoding method, encoding program, and recording medium | |
CN113314134B (en) | Bone conduction signal compensation method and device | |
CN113613159B (en) | Microphone blowing signal detection method, device and system | |
Hsieh et al. | Energy-based VAD with grey magnitude spectral subtraction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181120 |
|
RJ01 | Rejection of invention patent application after publication |