CN109036386A - A kind of method of speech processing and device - Google Patents
A kind of method of speech processing and device Download PDFInfo
- Publication number
- CN109036386A CN109036386A CN201811076321.4A CN201811076321A CN109036386A CN 109036386 A CN109036386 A CN 109036386A CN 201811076321 A CN201811076321 A CN 201811076321A CN 109036386 A CN109036386 A CN 109036386A
- Authority
- CN
- China
- Prior art keywords
- sound
- bic
- detection
- sound bite
- bites
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000012545 processing Methods 0.000 title claims abstract description 25
- 238000001514 detection method Methods 0.000 claims abstract description 111
- 230000002159 abnormal effect Effects 0.000 claims abstract description 44
- 230000015654 memory Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 15
- 238000007689 inspection Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 abstract description 6
- 238000005516 engineering process Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 8
- 239000012634 fragment Substances 0.000 description 8
- 238000004891 communication Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000035800 maturation Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
- 230000003313 weakening effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
- G10L15/05—Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Telephone Function (AREA)
- Telephonic Communication Services (AREA)
Abstract
The present invention provides a kind of method of speech processing and devices, wherein this method comprises: mixing voice is divided into N number of sound bite by end-point detection, wherein the N is the natural number more than or equal to 2;Bayesian information criterion BIC detection is carried out to two sound bites of the arbitrary neighborhood in N number of sound bite, and abandons BIC detection and abnormal sound bite occurs, obtains the efficient voice segment of target object.By the invention it is possible to solve the problems, such as to realize the effect of the quick separating specific objective voice from mixing voice for being mainly that the mixing voice spoken of specific objective cannot quickly and effectively isolate specific objective voice in the related technology.
Description
Technical field
The present invention relates to the communications fields, in particular to a kind of method of speech processing and device.
Background technique
Original scheme that the detection of speaker's turning point is done based on bayesian information criterion BIC, to be separated into purpose, one
As be finally in order to which the mixing voice of multiple speakers is separated.Technically will not to the position of turning point carry out it is assumed that and
And it can generally retain the voice data of different speakers as far as possible.In addition the method will not generally be used alone, for example calculate
The distance between different data distribution, and cluster, etc..It is dominant for the voice duration of certain speaker dependent, Qi Taren
Or the voice duration of noise is relatively low, and is less concerned about voice content, more concerned with the occasion of speaker characteristic, proposes
To be separated into the scheme of purpose.For such issues that, current solution complexity is high, and effect is undesirable, lacks
The solution of weary maturation.
For in the related technology for be mainly the mixing voice spoken of specific objective cannot quickly and effectively isolate it is specific
The problem of target voice, not yet proposition solution.
Summary of the invention
The embodiment of the invention provides a kind of method of speech processing and devices, at least to solve in the related technology for main
The problem of specific objective voice cannot quickly and effectively be isolated for the mixing voice that specific objective is spoken.
According to one embodiment of present invention, a kind of method of speech processing is provided, comprising:
Mixing voice is divided into N number of sound bite by end-point detection, wherein the N is the nature more than or equal to 2
Number;
Bayesian information criterion BIC detection is carried out to two sound bites of the arbitrary neighborhood in N number of sound bite,
And abandon BIC detection and abnormal sound bite occur, obtain the efficient voice segment of target object.
Optionally, two sound bites of the arbitrary neighborhood in N number of sound bite carry out Bayesian Information
Criterion BIC detection, and abandon the abnormal sound bite of BIC detection appearance and include:
To two sound bites adjacent in N number of sound bite to progress BIC detection;
Judge whether two sound bites of BIC detection exception occur;
In the case where the judgment result is yes, it abandons BIC detection and two abnormal sound bites occurs;
It repeats to carry out BIC detection to two sound bites adjacent in remaining N-2 sound bite, abandons BIC detection
There are two abnormal sound bites, until remaining two neighboring sound bite does not occur exception.
Optionally, judge whether two sound bites of BIC detection exception occur and include:
Judge whether the BIC value between described two sound bites is greater than predetermined threshold;
In the case where the judgment result is yes, it is abnormal to determine that described two sound bites occur;
If the determination result is NO, determine that described two sound bites are normal.
Optionally, two sound bites in N number of sound bite carry out bayesian information criterion BIC inspection
It surveys, and abandons the abnormal sound bite of BIC detection appearance and include:
To the sound bite in N number of sound bite to carrying out BIC detection, wherein the sound bite is to being the N
Two sound bites in a sound bite;
Judge that the sound bite of BIC detection to whether there is exception, obtains testing result;
Abandoning the testing result is abnormal sound bite pair.
Optionally, judge the sound bite of BIC detection to whether occur abnormal include:
Judge whether the BIC value of sound bite pair is greater than predetermined threshold;
In the case where the judgment result is yes, determine that the sound bite is abnormal to occurring;
If the determination result is NO, determine the sound bite to normal.
Optionally, carrying out BIC detection to two sound bites in N number of sound bite includes:
Calculate the BIC value between two sound bites;
The BIC value is normalized.
Optionally, it is divided into N number of sound bite to include: mixing voice by end-point detection
Obtain mute section in the mixing voice;
Remove described mute section;
The mixing voice is split according to described mute section, the long sound bite after being divided;
The long sound bite is divided into N number of sound bite by end-point detection.
According to still another embodiment of the invention, a kind of voice processing apparatus is additionally provided, comprising:
Divide module, for mixing voice to be divided into N number of sound bite by end-point detection, wherein the N for greater than
Or the natural number equal to 2;
Detection module carries out Bayes's letter for two sound bites to the arbitrary neighborhood in N number of sound bite
Criterion BIC detection is ceased, and abandons BIC detection and abnormal sound bite occurs, obtains the efficient voice segment of target object.
Optionally, the detection module includes:
Detection unit, for two sound bites adjacent in N number of sound bite to carry out BIC detection;
Judging unit, for judging whether two sound bites of BIC detection exception occur;
There are two abnormal voice sheets in the case where the judgment result is yes, abandoning BIC detection in discarding unit
Section;
Repetition detection unit carries out BIC to two sound bites adjacent in remaining N-2 sound bite for repeating
Detection abandons BIC detection and two abnormal sound bites occurs, until remaining two neighboring sound bite do not occur it is different
Often.
Optionally, the judging unit, is also used to
Judge whether the BIC value between described two sound bites is greater than predetermined threshold;
In the case where the judgment result is yes, it is abnormal to determine that described two sound bites occur;
If the determination result is NO, determine that described two sound bites are normal.
Optionally, the detection module includes:
Computing unit, for calculating the BIC value between two sound bites;
Processing unit, for the BIC value to be normalized.
Optionally, the segmentation module includes:
Acquiring unit, for obtaining mute section in the mixing voice;
Removal unit, for removing described mute section;
First cutting unit, for being split according to described mute section to the mixing voice, the length after being divided
Sound bite;
Second cutting unit, for the long sound bite to be divided into N number of sound bite by end-point detection.
According to still another embodiment of the invention, a kind of storage medium is additionally provided, meter is stored in the storage medium
Calculation machine program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
According to still another embodiment of the invention, a kind of electronic device, including memory and processor are additionally provided, it is described
Computer program is stored in memory, the processor is arranged to run the computer program to execute any of the above-described
Step in embodiment of the method.
Through the invention, mixing voice is divided by N number of sound bite by end-point detection, wherein the N be greater than or
Natural number equal to 2;Bayesian information criterion is carried out to two sound bites of the arbitrary neighborhood in N number of sound bite
BIC detection, and abandon BIC detection and abnormal sound bite occur, obtains the efficient voice segment of target object, therefore, can be with
It solves in the related technology for being mainly that the mixing voice spoken of specific objective cannot quickly and effectively isolate specific objective voice
The problem of, realize the effect of the quick separating specific objective voice from mixing voice.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of hardware block diagram of the mobile terminal of method of speech processing of the embodiment of the present invention;
Fig. 2 is a kind of flow chart of method of speech processing according to an embodiment of the present invention;
Fig. 3 is a kind of block diagram of voice processing apparatus according to an embodiment of the present invention;
Fig. 4 is a kind of block diagram one of voice processing apparatus according to the preferred embodiment of the invention;
Fig. 5 is a kind of block diagram two of voice processing apparatus according to the preferred embodiment of the invention.
Specific embodiment
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in combination with Examples.It should be noted that not conflicting
In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.
Embodiment 1
Embodiment of the method provided by the embodiment of the present application one can be in mobile terminal, terminal or similar fortune
It calculates and is executed in device.For running on mobile terminals, Fig. 1 is a kind of movement of method of speech processing of the embodiment of the present invention
The hardware block diagram of terminal, as shown in Figure 1, mobile terminal 10 may include at one or more (only showing one in Fig. 1)
It manages device 102 (processing unit that processor 102 can include but is not limited to Micro-processor MCV or programmable logic device FPGA etc.)
Memory 104 for storing data, optionally, above-mentioned mobile terminal can also include the transmission device for communication function
106 and input-output equipment 108.It will appreciated by the skilled person that structure shown in FIG. 1 is only to illustrate, simultaneously
The structure of above-mentioned mobile terminal is not caused to limit.For example, mobile terminal 10 may also include it is more than shown in Fig. 1 or less
Component, or with the configuration different from shown in Fig. 1.
Memory 104 can be used for storing computer program, for example, the software program and module of application software, such as this hair
The corresponding computer program of message method of reseptance in bright embodiment, processor 102 are stored in memory 104 by operation
Computer program realizes above-mentioned method thereby executing various function application and data processing.Memory 104 may include
High speed random access memory, may also include nonvolatile memory, as one or more magnetic storage device, flash memory or its
His non-volatile solid state memory.In some instances, memory 104 can further comprise remotely setting relative to processor 102
The memory set, these remote memories can pass through network connection to mobile terminal 10.The example of above-mentioned network includes but not
It is limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 106 is used to that data to be received or sent via a network.Above-mentioned network specific example may include
The wireless network that the communication providers of mobile terminal 10 provide.In an example, transmitting device 106 includes a Network adaptation
Device (Network Interface Controller, referred to as NIC), can be connected by base station with other network equipments to
It can be communicated with internet.In an example, transmitting device 106 can for radio frequency (Radio Frequency, referred to as
RF) module is used to wirelessly be communicated with internet.
A kind of method of speech processing for running on above-mentioned mobile terminal or the network architecture, Fig. 2 are provided in the present embodiment
It is a kind of flow chart of method of speech processing according to an embodiment of the present invention, as shown in Fig. 2, the process includes the following steps:
Mixing voice is divided into N number of sound bite by end-point detection by step S202, wherein the N is to be greater than or wait
In 2 natural number;
It is quasi- to carry out Bayesian Information to two sound bites of the arbitrary neighborhood in N number of sound bite by step S204
Then BIC is detected, and abandons BIC detection and abnormal sound bite occur, obtains the efficient voice segment of target object.
Through the above steps, mixing voice is divided by N number of sound bite by end-point detection, wherein the N be greater than
Or the natural number equal to 2;Bayesian information criterion is carried out to two sound bites of the arbitrary neighborhood in N number of sound bite
BIC detection, and abandon BIC detection and abnormal sound bite occur, obtains the efficient voice segment of target object, therefore, can be with
It solves in the related technology for being mainly that the mixing voice spoken of specific objective cannot quickly and effectively isolate specific objective voice
The problem of, realize the effect of the quick separating specific objective voice from mixing voice.
In the embodiment of the present invention, Bayes is carried out to two sound bites of the arbitrary neighborhood in N number of sound bite
Information criterion BIC detection, and abandon BIC detection occur abnormal sound bite mode can be it is a variety of, can be to N number of voice
Any two in segment are detected, and are also possible to successively detect every two according to the sequencing of voice, one
It in a optional embodiment, specifically includes: to two sound bites adjacent in N number of sound bite to progress BIC detection;
Judge whether two sound bites of BIC detection exception occur;In the case where the judgment result is yes, BIC detection is abandoned to occur
Two abnormal sound bites;It repeats to carry out BIC detection to the two neighboring sound bite in remaining N-2 sound bite,
It abandons BIC detection and two abnormal sound bites occurs, until remaining two neighboring sound bite does not occur exception.
Further, judge whether two sound bites of BIC detection occur abnormal may include: to judge described two languages
Whether the BIC value between tablet section is greater than predetermined threshold;In the case where the judgment result is yes, described two sound bites are determined
Occur abnormal;If the determination result is NO, determine that described two sound bites are normal.
In another alternative embodiment, two sound bites in N number of sound bite carry out pattra leaves
This information criterion BIC detection, and abandoning BIC detection abnormal sound bite occur includes: to the language in N number of sound bite
Tablet section is to carrying out BIC detection, wherein the sound bite is to being any two sound bite in N number of sound bite;
Judge that the sound bite of BIC detection to whether there is exception, obtains testing result;Abandoning the testing result is abnormal voice
Segment pair.
Further, judge the sound bite of BIC detection to whether occurring abnormal may include: to judge sound bite pair
Whether BIC value is greater than predetermined threshold;In the case where the judgment result is yes, determine that the sound bite is abnormal to occurring;Sentencing
In the case that disconnected result is no, determine the sound bite to normal.
In the embodiment of the present invention, carrying out BIC detection to two sound bites in N number of sound bite specifically be can wrap
It includes: calculating the BIC value between two sound bites;The BIC value is normalized.
In the embodiment of the present invention, it is divided into N number of sound bite can specifically include mixing voice by end-point detection: obtains
Take mute section in the mixing voice;Remove described mute section;The mixing voice is split according to described mute section,
Long sound bite after being divided;The long sound bite is divided into N number of sound bite by end-point detection.Its
In, end-point detection is a basic link of speech recognition and speech processes and a hot fields of the Research of Speech Recognition.
The main purpose of technology is to distinguish from the voice of input to voice and non-voice, can remove in voice it is mute at
Point, obtain efficient voice in input voice.
The embodiment of the present invention is directed to specific mixing voice, is dominant for the voice duration of certain speaker dependent, other
The voice duration of people or noise is relatively low.And voice content is less concerned about, more concerned with the occasion of speaker characteristic, is mentioned
Go out for the purpose of detectable, by weakening target, innovatory algorithm promotes effect, and guarantee obtains relatively cleaner specific theory
Talk about human speech sound.The specific method is as follows:
The position occurred to speaker's turning point assumes, it is believed that turning point appears in the voice sheet by end-point detection
On section boundary.Bring benefit in this way: if one section of mixing voice there are 100,000 sampled points, there is 100 after end-point detection
After a sound bite, BIC detection mode turning point originally possibly are present on this 100,000 points, and the embodiment of the present invention is improved
Turning point is only present on the head and the tail of this 100 sound bites, greatly improves computational efficiency.
Two sound bites abnormal to BIC detection, directly discarding, and calculate remaining sound bite between any two
BIC.Until whole sound bites all at least calculated 1 BIC.
If occurring the abnormal situation of BIC detection in epicycle BIC calculating process, repeating the above steps, until epicycle
Do not occur BIC exception, terminate, after remaining sound bite reconfigures at this time, as passes through speaker dependent's language of BIC detection
Sound.Voice at this time is without noise and nonspecific speaker's voice.
For the influence for reducing the BIC calculating different with distribution of sound bite length, BIC is normalized, specifically by
The maximum value of number of sampling points and current BIC are normalized, and are normalized by following formula:
Wherein N1For the length (sample points) of fragment 1, N2For the length of fragment 2, N is after fragment 1 merges with fragment 2
Length, σ1For the variance of 1 sample point of fragment, σ2For the variance of 2 sample point of fragment, σ is sample point after fragment 1 merges with fragment 2
Variance, λ are a coefficient.For above-mentioned BIC, it is normalized toF is the letter of sample point variance and length
Number, according to real data and empirically determined.
The influence that the threshold value of above-mentioned end-point detection is chosen to BIC is very big, is embodied in the acutance and void for influencing BIC signal
Alert, mainly influence signal acutance, the present invention choose suitable end-point detection threshold value by experimental method, it is sharp to improve BIC signal
Degree.
The difficulty of traditional BIC practical application is that BIC abnormal signal is related to a threshold value, and this threshold value is usually
Every 2 sections of voices are distinctive when calculating BIC value, do not have universality, are difficult to obtain a Global B IC threshold value, are worth pointing out
It is that, by above-mentioned improvement, BIC signal has normalized, and since the threshold value to end-point detection is controlled, BIC signal
It is relatively accurate effectively, the BIC threshold value of the embodiment of the present invention basic global sense, and no longer needing according to call every time
Data threshold value, to ensure that the practical application effect of the embodiment of the present invention.
Embodiment 2
A kind of voice processing apparatus is additionally provided in the present embodiment, and the device is real for realizing above-described embodiment and preferably
Mode is applied, the descriptions that have already been made will not be repeated.As used below, the soft of predetermined function may be implemented in term " module "
The combination of part and/or hardware.Although device described in following embodiment is preferably realized with software, hardware, or
The realization of the combination of software and hardware is also that may and be contemplated.
Fig. 3 is a kind of block diagram of voice processing apparatus according to an embodiment of the present invention, as shown in Figure 3, comprising:
Divide module 32, for mixing voice to be divided into N number of sound bite by end-point detection, wherein the N is big
In or equal to 2 natural number;
Detection module 34 carries out Bayes for two sound bites to the arbitrary neighborhood in N number of sound bite
Information criterion BIC detection, and abandon BIC detection and abnormal sound bite occur, obtain the efficient voice segment of target object.
Fig. 4 is a kind of block diagram one of voice processing apparatus according to the preferred embodiment of the invention, as shown in figure 4, the inspection
Surveying module 34 includes:
Detection unit 42, for two sound bites in N number of sound bite to carry out BIC detection;
Judging unit 44, for judging whether two sound bites of BIC detection exception occur;
There are two abnormal voice sheets in the case where the judgment result is yes, abandoning BIC detection in discarding unit 46
Section;
Repetition detection unit 48 carries out BIC inspection to two sound bites in remaining N-2 sound bite for repeating
It surveys, abandons BIC detection and two abnormal sound bites occur, until remaining any two sound bite does not occur exception.
Optionally, the judging unit 44, is also used to
Judge whether the BIC value between described two sound bites is greater than predetermined threshold;
In the case where the judgment result is yes, it is abnormal to determine that described two sound bites occur;
If the determination result is NO, determine that described two sound bites are normal.
Fig. 5 is a kind of block diagram two of voice processing apparatus according to the preferred embodiment of the invention, as shown in figure 5, the inspection
Surveying module 34 includes:
Computing unit 52, for calculating the BIC value between any two sound bite;
Processing unit 54, for the BIC value to be normalized.
Optionally, the segmentation module 32 includes:
Acquiring unit, for obtaining mute section in the mixing voice;
Removal unit, for removing described mute section;
First cutting unit, for being split according to described mute section to the mixing voice, the length after being divided
Sound bite;
Second cutting unit, for the long sound bite to be divided into N number of sound bite by end-point detection.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong
Following manner realization is crossed, but not limited to this: above-mentioned module is respectively positioned in same processor;Alternatively, above-mentioned modules are with any
Combined form is located in different processors.
Embodiment 3
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein
The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps
Calculation machine program:
Mixing voice is divided into N number of sound bite by end-point detection by S11, wherein the N is more than or equal to 2
Natural number;
S12 carries out bayesian information criterion BIC to two sound bites of the arbitrary neighborhood in N number of sound bite
Detection, and abandon BIC detection and abnormal sound bite occur, obtain the efficient voice segment of target object.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (Read-
Only Memory, referred to as ROM), it is random access memory (Random Access Memory, referred to as RAM), mobile hard
The various media that can store computer program such as disk, magnetic or disk.
Embodiment 4
The embodiments of the present invention also provide a kind of electronic device, including memory and processor, stored in the memory
There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method
Suddenly.
Optionally, above-mentioned electronic device can also include transmission device and input-output equipment, wherein the transmission device
It is connected with above-mentioned processor, which connects with above-mentioned processor.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
Mixing voice is divided into N number of sound bite by end-point detection by S11, wherein the N is more than or equal to 2
Natural number;
S12 carries out bayesian information criterion BIC to two sound bites of the arbitrary neighborhood in N number of sound bite
Detection, and abandon BIC detection and abnormal sound bite occur, obtain the efficient voice segment of target object.
Optionally, the specific example in the present embodiment can be with reference to described in above-described embodiment and optional embodiment
Example, details are not described herein for the present embodiment.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general
Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed
Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored
It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein
Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or
Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.It is all within principle of the invention, it is made it is any modification, etc.
With replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of method of speech processing characterized by comprising
Mixing voice is divided into N number of sound bite by end-point detection, wherein the N is the natural number more than or equal to 2;
Bayesian information criterion BIC detection is carried out to two sound bites of the arbitrary neighborhood in N number of sound bite, and is lost
It abandons BIC detection and abnormal sound bite occurs, obtain the efficient voice segment of target object.
2. the method according to claim 1, wherein the arbitrary neighborhood in N number of sound bite
Two sound bites carry out bayesian information criterion BIC detection, and abandon the abnormal sound bite of BIC detection appearance and include:
To two sound bites adjacent in N number of sound bite to progress BIC detection;
Judge whether two sound bites of BIC detection exception occur;
In the case where the judgment result is yes, it abandons BIC detection and two abnormal sound bites occurs;
It repeats to carry out BIC detection to two sound bites adjacent in remaining N-2 sound bite, abandons BIC detection and occur
Two abnormal sound bites, until remaining two neighboring sound bite does not occur exception.
3. according to the method described in claim 2, it is characterized in that, judge BIC detection two sound bites whether occur it is different
Often include:
Judge whether the BIC value between described two sound bites is greater than predetermined threshold;
In the case where the judgment result is yes, it is abnormal to determine that described two sound bites occur;
If the determination result is NO, determine that described two sound bites are normal.
4. according to the method in any one of claims 1 to 3, which is characterized in that two in N number of sound bite
Sound bite carries out BIC detection
Calculate the BIC value between described two sound bites;
The BIC value is normalized.
5. according to the method in any one of claims 1 to 3, which is characterized in that by end-point detection by creolized language cent
Being cut into N number of sound bite includes:
Obtain mute section in the mixing voice;
Remove described mute section;
The mixing voice is split according to described mute section, the long sound bite after being divided;
The long sound bite is divided into N number of sound bite by end-point detection.
6. a kind of voice processing apparatus characterized by comprising
Divide module, for mixing voice to be divided into N number of sound bite by end-point detection, wherein the N is to be greater than or wait
In 2 natural number;
It is quasi- to carry out Bayesian Information for two sound bites to the arbitrary neighborhood in N number of sound bite for detection module
Then BIC is detected, and abandons BIC detection and abnormal sound bite occur, obtains the efficient voice segment of target object.
7. device according to claim 6, which is characterized in that the detection module includes:
Detection unit, for two sound bites adjacent in N number of sound bite to carry out BIC detection;
Judging unit, for judging whether two sound bites of BIC detection exception occur;
There are two abnormal sound bites in the case where the judgment result is yes, abandoning BIC detection in discarding unit;
Repetition detection unit carries out BIC inspection to two sound bites adjacent in remaining N-2 sound bite for repeating
It surveys, abandons BIC detection and two abnormal sound bites occur, until remaining two neighboring sound bite does not occur exception.
8. device according to claim 7, which is characterized in that the judging unit is also used to
Judge whether the BIC value between described two sound bites is greater than predetermined threshold;
In the case where the judgment result is yes, it is abnormal to determine that described two sound bites occur;
If the determination result is NO, determine that described two sound bites are normal.
9. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer
Program is arranged to execute method described in the claim 1 to 5 when operation.
10. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory
Sequence, the processor are arranged to run the computer program to execute method described in the claim 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811076321.4A CN109036386B (en) | 2018-09-14 | 2018-09-14 | Voice processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811076321.4A CN109036386B (en) | 2018-09-14 | 2018-09-14 | Voice processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109036386A true CN109036386A (en) | 2018-12-18 |
CN109036386B CN109036386B (en) | 2021-03-16 |
Family
ID=64622220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811076321.4A Active CN109036386B (en) | 2018-09-14 | 2018-09-14 | Voice processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109036386B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390946A (en) * | 2019-07-26 | 2019-10-29 | 龙马智芯(珠海横琴)科技有限公司 | A kind of audio signal processing method, device, electronic equipment and storage medium |
CN111343344A (en) * | 2020-03-13 | 2020-06-26 | Oppo(重庆)智能科技有限公司 | Voice abnormity detection method and device, storage medium and electronic equipment |
CN111613249A (en) * | 2020-05-22 | 2020-09-01 | 云知声智能科技股份有限公司 | Voice analysis method and equipment |
CN111883159A (en) * | 2020-08-05 | 2020-11-03 | 龙马智芯(珠海横琴)科技有限公司 | Voice processing method and device |
CN112562635A (en) * | 2020-12-03 | 2021-03-26 | 云知声智能科技股份有限公司 | Method, device and system for solving pulse signal generation at splicing position in voice synthesis |
CN112735470A (en) * | 2020-12-28 | 2021-04-30 | 携程旅游网络技术(上海)有限公司 | Audio cutting method, system, device and medium based on time delay neural network |
CN112951212A (en) * | 2021-04-19 | 2021-06-11 | 中国科学院声学研究所 | Voice turning point detection method and device for multiple speakers |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1716380A (en) * | 2005-07-26 | 2006-01-04 | 浙江大学 | Audio frequency splitting method for changing detection based on decision tree and speaking person |
CN101315771A (en) * | 2008-06-04 | 2008-12-03 | 哈尔滨工业大学 | Compensation method for different speech coding influence in speaker recognition |
CN102034472A (en) * | 2009-09-28 | 2011-04-27 | 戴红霞 | Speaker recognition method based on Gaussian mixture model embedded with time delay neural network |
CN102682760A (en) * | 2011-03-07 | 2012-09-19 | 株式会社理光 | Overlapped voice detection method and system |
CN103559882A (en) * | 2013-10-14 | 2014-02-05 | 华南理工大学 | Meeting presenter voice extracting method based on speaker division |
CN104021785A (en) * | 2014-05-28 | 2014-09-03 | 华南理工大学 | Method of extracting speech of most important guest in meeting |
CN105304082A (en) * | 2015-09-08 | 2016-02-03 | 北京云知声信息技术有限公司 | Voice output method and voice output device |
US20160241346A1 (en) * | 2015-02-17 | 2016-08-18 | Adobe Systems Incorporated | Source separation using nonnegative matrix factorization with an automatically determined number of bases |
CN107393527A (en) * | 2017-07-17 | 2017-11-24 | 广东讯飞启明科技发展有限公司 | The determination methods of speaker's number |
-
2018
- 2018-09-14 CN CN201811076321.4A patent/CN109036386B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1716380A (en) * | 2005-07-26 | 2006-01-04 | 浙江大学 | Audio frequency splitting method for changing detection based on decision tree and speaking person |
CN101315771A (en) * | 2008-06-04 | 2008-12-03 | 哈尔滨工业大学 | Compensation method for different speech coding influence in speaker recognition |
CN102034472A (en) * | 2009-09-28 | 2011-04-27 | 戴红霞 | Speaker recognition method based on Gaussian mixture model embedded with time delay neural network |
CN102682760A (en) * | 2011-03-07 | 2012-09-19 | 株式会社理光 | Overlapped voice detection method and system |
CN103559882A (en) * | 2013-10-14 | 2014-02-05 | 华南理工大学 | Meeting presenter voice extracting method based on speaker division |
CN104021785A (en) * | 2014-05-28 | 2014-09-03 | 华南理工大学 | Method of extracting speech of most important guest in meeting |
US20160241346A1 (en) * | 2015-02-17 | 2016-08-18 | Adobe Systems Incorporated | Source separation using nonnegative matrix factorization with an automatically determined number of bases |
CN105304082A (en) * | 2015-09-08 | 2016-02-03 | 北京云知声信息技术有限公司 | Voice output method and voice output device |
CN107393527A (en) * | 2017-07-17 | 2017-11-24 | 广东讯飞启明科技发展有限公司 | The determination methods of speaker's number |
Non-Patent Citations (3)
Title |
---|
PATRICK KENNY: "Joint Factor Analysis Versus Eigenchannels", 《AUDIO,SPEECH,AND LANGUAGE PROCESSING》 * |
杨登舟等: "基于计算听觉场景分析的说话人转换检测", 《计算机工程》 * |
赖松轩: "说话人聚类的初始类生成方法", 《计算机工程与应用》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390946A (en) * | 2019-07-26 | 2019-10-29 | 龙马智芯(珠海横琴)科技有限公司 | A kind of audio signal processing method, device, electronic equipment and storage medium |
CN111343344A (en) * | 2020-03-13 | 2020-06-26 | Oppo(重庆)智能科技有限公司 | Voice abnormity detection method and device, storage medium and electronic equipment |
CN111613249A (en) * | 2020-05-22 | 2020-09-01 | 云知声智能科技股份有限公司 | Voice analysis method and equipment |
CN111883159A (en) * | 2020-08-05 | 2020-11-03 | 龙马智芯(珠海横琴)科技有限公司 | Voice processing method and device |
CN112562635A (en) * | 2020-12-03 | 2021-03-26 | 云知声智能科技股份有限公司 | Method, device and system for solving pulse signal generation at splicing position in voice synthesis |
CN112562635B (en) * | 2020-12-03 | 2024-04-09 | 云知声智能科技股份有限公司 | Method, device and system for solving generation of pulse signals at splicing position in speech synthesis |
CN112735470A (en) * | 2020-12-28 | 2021-04-30 | 携程旅游网络技术(上海)有限公司 | Audio cutting method, system, device and medium based on time delay neural network |
CN112735470B (en) * | 2020-12-28 | 2024-01-23 | 携程旅游网络技术(上海)有限公司 | Audio cutting method, system, equipment and medium based on time delay neural network |
CN112951212A (en) * | 2021-04-19 | 2021-06-11 | 中国科学院声学研究所 | Voice turning point detection method and device for multiple speakers |
Also Published As
Publication number | Publication date |
---|---|
CN109036386B (en) | 2021-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109036386A (en) | A kind of method of speech processing and device | |
CN109584876B (en) | Voice data processing method and device and voice air conditioner | |
CN103440867B (en) | Audio recognition method and system | |
CN110517667A (en) | A kind of method of speech processing, device, electronic equipment and storage medium | |
CN104781862B (en) | Real-time traffic is detected | |
CN112400310A (en) | Voice-based call quality detector | |
CN110390946A (en) | A kind of audio signal processing method, device, electronic equipment and storage medium | |
CN108877783A (en) | The method and apparatus for determining the audio types of audio data | |
CN109065051A (en) | Voice recognition processing method and device | |
CN111816216A (en) | Voice activity detection method and device | |
CN112581938B (en) | Speech breakpoint detection method, device and equipment based on artificial intelligence | |
CN109740530A (en) | Extracting method, device, equipment and the computer readable storage medium of video-frequency band | |
CN113129876A (en) | Network searching method and device, electronic equipment and storage medium | |
CN113329372B (en) | Method, device, equipment, medium and product for vehicle-mounted call | |
CN107196979A (en) | Pre- system for prompting of calling out the numbers based on speech recognition | |
CN112562727A (en) | Audio scene classification method, device and equipment applied to audio monitoring | |
JP2000163098A (en) | Voice recognition device | |
CN114038487A (en) | Audio extraction method, device, equipment and readable storage medium | |
US7571093B1 (en) | Method of identifying duplicate voice recording | |
CN114049898A (en) | Audio extraction method, device, equipment and storage medium | |
CN112053686B (en) | Audio interruption method, device and computer readable storage medium | |
US11322137B2 (en) | Video camera | |
EP3309777A1 (en) | Device and method for audio frame processing | |
CN114005436A (en) | Method, device and storage medium for determining voice endpoint | |
EP3171360B1 (en) | Speech recognition with determination of noise suppression processing mode |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230105 Address after: Room 502 and Room 504, Jiayuan Office Building, No. 369, Yuelu Avenue, Xianjiahu Street, Yuelu District, Changsha City, Hunan Province 410205 Patentee after: Hunan Huawei Jin'an Enterprise Management Co.,Ltd. Address before: 100080 370m south of Huandao, Yanfu Road, Yancun Town, Fangshan District, Beijing Patentee before: BEIJING WANGZHONG GONGCHUANG TECHNOLOGY CO.,LTD. |
|
TR01 | Transfer of patent right |