CN109036386A - A kind of method of speech processing and device - Google Patents

A kind of method of speech processing and device Download PDF

Info

Publication number
CN109036386A
CN109036386A CN201811076321.4A CN201811076321A CN109036386A CN 109036386 A CN109036386 A CN 109036386A CN 201811076321 A CN201811076321 A CN 201811076321A CN 109036386 A CN109036386 A CN 109036386A
Authority
CN
China
Prior art keywords
sound
bic
detection
sound bite
bites
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811076321.4A
Other languages
Chinese (zh)
Other versions
CN109036386B (en
Inventor
邹新生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Huawei Jin'an Enterprise Management Co ltd
Original Assignee
Beijing Net Co Creation Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Net Co Creation Technology Co Ltd filed Critical Beijing Net Co Creation Technology Co Ltd
Priority to CN201811076321.4A priority Critical patent/CN109036386B/en
Publication of CN109036386A publication Critical patent/CN109036386A/en
Application granted granted Critical
Publication of CN109036386B publication Critical patent/CN109036386B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • G10L15/05Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention provides a kind of method of speech processing and devices, wherein this method comprises: mixing voice is divided into N number of sound bite by end-point detection, wherein the N is the natural number more than or equal to 2;Bayesian information criterion BIC detection is carried out to two sound bites of the arbitrary neighborhood in N number of sound bite, and abandons BIC detection and abnormal sound bite occurs, obtains the efficient voice segment of target object.By the invention it is possible to solve the problems, such as to realize the effect of the quick separating specific objective voice from mixing voice for being mainly that the mixing voice spoken of specific objective cannot quickly and effectively isolate specific objective voice in the related technology.

Description

A kind of method of speech processing and device
Technical field
The present invention relates to the communications fields, in particular to a kind of method of speech processing and device.
Background technique
Original scheme that the detection of speaker's turning point is done based on bayesian information criterion BIC, to be separated into purpose, one As be finally in order to which the mixing voice of multiple speakers is separated.Technically will not to the position of turning point carry out it is assumed that and And it can generally retain the voice data of different speakers as far as possible.In addition the method will not generally be used alone, for example calculate The distance between different data distribution, and cluster, etc..It is dominant for the voice duration of certain speaker dependent, Qi Taren Or the voice duration of noise is relatively low, and is less concerned about voice content, more concerned with the occasion of speaker characteristic, proposes To be separated into the scheme of purpose.For such issues that, current solution complexity is high, and effect is undesirable, lacks The solution of weary maturation.
For in the related technology for be mainly the mixing voice spoken of specific objective cannot quickly and effectively isolate it is specific The problem of target voice, not yet proposition solution.
Summary of the invention
The embodiment of the invention provides a kind of method of speech processing and devices, at least to solve in the related technology for main The problem of specific objective voice cannot quickly and effectively be isolated for the mixing voice that specific objective is spoken.
According to one embodiment of present invention, a kind of method of speech processing is provided, comprising:
Mixing voice is divided into N number of sound bite by end-point detection, wherein the N is the nature more than or equal to 2 Number;
Bayesian information criterion BIC detection is carried out to two sound bites of the arbitrary neighborhood in N number of sound bite, And abandon BIC detection and abnormal sound bite occur, obtain the efficient voice segment of target object.
Optionally, two sound bites of the arbitrary neighborhood in N number of sound bite carry out Bayesian Information Criterion BIC detection, and abandon the abnormal sound bite of BIC detection appearance and include:
To two sound bites adjacent in N number of sound bite to progress BIC detection;
Judge whether two sound bites of BIC detection exception occur;
In the case where the judgment result is yes, it abandons BIC detection and two abnormal sound bites occurs;
It repeats to carry out BIC detection to two sound bites adjacent in remaining N-2 sound bite, abandons BIC detection There are two abnormal sound bites, until remaining two neighboring sound bite does not occur exception.
Optionally, judge whether two sound bites of BIC detection exception occur and include:
Judge whether the BIC value between described two sound bites is greater than predetermined threshold;
In the case where the judgment result is yes, it is abnormal to determine that described two sound bites occur;
If the determination result is NO, determine that described two sound bites are normal.
Optionally, two sound bites in N number of sound bite carry out bayesian information criterion BIC inspection It surveys, and abandons the abnormal sound bite of BIC detection appearance and include:
To the sound bite in N number of sound bite to carrying out BIC detection, wherein the sound bite is to being the N Two sound bites in a sound bite;
Judge that the sound bite of BIC detection to whether there is exception, obtains testing result;
Abandoning the testing result is abnormal sound bite pair.
Optionally, judge the sound bite of BIC detection to whether occur abnormal include:
Judge whether the BIC value of sound bite pair is greater than predetermined threshold;
In the case where the judgment result is yes, determine that the sound bite is abnormal to occurring;
If the determination result is NO, determine the sound bite to normal.
Optionally, carrying out BIC detection to two sound bites in N number of sound bite includes:
Calculate the BIC value between two sound bites;
The BIC value is normalized.
Optionally, it is divided into N number of sound bite to include: mixing voice by end-point detection
Obtain mute section in the mixing voice;
Remove described mute section;
The mixing voice is split according to described mute section, the long sound bite after being divided;
The long sound bite is divided into N number of sound bite by end-point detection.
According to still another embodiment of the invention, a kind of voice processing apparatus is additionally provided, comprising:
Divide module, for mixing voice to be divided into N number of sound bite by end-point detection, wherein the N for greater than Or the natural number equal to 2;
Detection module carries out Bayes's letter for two sound bites to the arbitrary neighborhood in N number of sound bite Criterion BIC detection is ceased, and abandons BIC detection and abnormal sound bite occurs, obtains the efficient voice segment of target object.
Optionally, the detection module includes:
Detection unit, for two sound bites adjacent in N number of sound bite to carry out BIC detection;
Judging unit, for judging whether two sound bites of BIC detection exception occur;
There are two abnormal voice sheets in the case where the judgment result is yes, abandoning BIC detection in discarding unit Section;
Repetition detection unit carries out BIC to two sound bites adjacent in remaining N-2 sound bite for repeating Detection abandons BIC detection and two abnormal sound bites occurs, until remaining two neighboring sound bite do not occur it is different Often.
Optionally, the judging unit, is also used to
Judge whether the BIC value between described two sound bites is greater than predetermined threshold;
In the case where the judgment result is yes, it is abnormal to determine that described two sound bites occur;
If the determination result is NO, determine that described two sound bites are normal.
Optionally, the detection module includes:
Computing unit, for calculating the BIC value between two sound bites;
Processing unit, for the BIC value to be normalized.
Optionally, the segmentation module includes:
Acquiring unit, for obtaining mute section in the mixing voice;
Removal unit, for removing described mute section;
First cutting unit, for being split according to described mute section to the mixing voice, the length after being divided Sound bite;
Second cutting unit, for the long sound bite to be divided into N number of sound bite by end-point detection.
According to still another embodiment of the invention, a kind of storage medium is additionally provided, meter is stored in the storage medium Calculation machine program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
According to still another embodiment of the invention, a kind of electronic device, including memory and processor are additionally provided, it is described Computer program is stored in memory, the processor is arranged to run the computer program to execute any of the above-described Step in embodiment of the method.
Through the invention, mixing voice is divided by N number of sound bite by end-point detection, wherein the N be greater than or Natural number equal to 2;Bayesian information criterion is carried out to two sound bites of the arbitrary neighborhood in N number of sound bite BIC detection, and abandon BIC detection and abnormal sound bite occur, obtains the efficient voice segment of target object, therefore, can be with It solves in the related technology for being mainly that the mixing voice spoken of specific objective cannot quickly and effectively isolate specific objective voice The problem of, realize the effect of the quick separating specific objective voice from mixing voice.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of hardware block diagram of the mobile terminal of method of speech processing of the embodiment of the present invention;
Fig. 2 is a kind of flow chart of method of speech processing according to an embodiment of the present invention;
Fig. 3 is a kind of block diagram of voice processing apparatus according to an embodiment of the present invention;
Fig. 4 is a kind of block diagram one of voice processing apparatus according to the preferred embodiment of the invention;
Fig. 5 is a kind of block diagram two of voice processing apparatus according to the preferred embodiment of the invention.
Specific embodiment
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in combination with Examples.It should be noted that not conflicting In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.
Embodiment 1
Embodiment of the method provided by the embodiment of the present application one can be in mobile terminal, terminal or similar fortune It calculates and is executed in device.For running on mobile terminals, Fig. 1 is a kind of movement of method of speech processing of the embodiment of the present invention The hardware block diagram of terminal, as shown in Figure 1, mobile terminal 10 may include at one or more (only showing one in Fig. 1) It manages device 102 (processing unit that processor 102 can include but is not limited to Micro-processor MCV or programmable logic device FPGA etc.) Memory 104 for storing data, optionally, above-mentioned mobile terminal can also include the transmission device for communication function 106 and input-output equipment 108.It will appreciated by the skilled person that structure shown in FIG. 1 is only to illustrate, simultaneously The structure of above-mentioned mobile terminal is not caused to limit.For example, mobile terminal 10 may also include it is more than shown in Fig. 1 or less Component, or with the configuration different from shown in Fig. 1.
Memory 104 can be used for storing computer program, for example, the software program and module of application software, such as this hair The corresponding computer program of message method of reseptance in bright embodiment, processor 102 are stored in memory 104 by operation Computer program realizes above-mentioned method thereby executing various function application and data processing.Memory 104 may include High speed random access memory, may also include nonvolatile memory, as one or more magnetic storage device, flash memory or its His non-volatile solid state memory.In some instances, memory 104 can further comprise remotely setting relative to processor 102 The memory set, these remote memories can pass through network connection to mobile terminal 10.The example of above-mentioned network includes but not It is limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 106 is used to that data to be received or sent via a network.Above-mentioned network specific example may include The wireless network that the communication providers of mobile terminal 10 provide.In an example, transmitting device 106 includes a Network adaptation Device (Network Interface Controller, referred to as NIC), can be connected by base station with other network equipments to It can be communicated with internet.In an example, transmitting device 106 can for radio frequency (Radio Frequency, referred to as RF) module is used to wirelessly be communicated with internet.
A kind of method of speech processing for running on above-mentioned mobile terminal or the network architecture, Fig. 2 are provided in the present embodiment It is a kind of flow chart of method of speech processing according to an embodiment of the present invention, as shown in Fig. 2, the process includes the following steps:
Mixing voice is divided into N number of sound bite by end-point detection by step S202, wherein the N is to be greater than or wait In 2 natural number;
It is quasi- to carry out Bayesian Information to two sound bites of the arbitrary neighborhood in N number of sound bite by step S204 Then BIC is detected, and abandons BIC detection and abnormal sound bite occur, obtains the efficient voice segment of target object.
Through the above steps, mixing voice is divided by N number of sound bite by end-point detection, wherein the N be greater than Or the natural number equal to 2;Bayesian information criterion is carried out to two sound bites of the arbitrary neighborhood in N number of sound bite BIC detection, and abandon BIC detection and abnormal sound bite occur, obtains the efficient voice segment of target object, therefore, can be with It solves in the related technology for being mainly that the mixing voice spoken of specific objective cannot quickly and effectively isolate specific objective voice The problem of, realize the effect of the quick separating specific objective voice from mixing voice.
In the embodiment of the present invention, Bayes is carried out to two sound bites of the arbitrary neighborhood in N number of sound bite Information criterion BIC detection, and abandon BIC detection occur abnormal sound bite mode can be it is a variety of, can be to N number of voice Any two in segment are detected, and are also possible to successively detect every two according to the sequencing of voice, one It in a optional embodiment, specifically includes: to two sound bites adjacent in N number of sound bite to progress BIC detection; Judge whether two sound bites of BIC detection exception occur;In the case where the judgment result is yes, BIC detection is abandoned to occur Two abnormal sound bites;It repeats to carry out BIC detection to the two neighboring sound bite in remaining N-2 sound bite, It abandons BIC detection and two abnormal sound bites occurs, until remaining two neighboring sound bite does not occur exception.
Further, judge whether two sound bites of BIC detection occur abnormal may include: to judge described two languages Whether the BIC value between tablet section is greater than predetermined threshold;In the case where the judgment result is yes, described two sound bites are determined Occur abnormal;If the determination result is NO, determine that described two sound bites are normal.
In another alternative embodiment, two sound bites in N number of sound bite carry out pattra leaves This information criterion BIC detection, and abandoning BIC detection abnormal sound bite occur includes: to the language in N number of sound bite Tablet section is to carrying out BIC detection, wherein the sound bite is to being any two sound bite in N number of sound bite; Judge that the sound bite of BIC detection to whether there is exception, obtains testing result;Abandoning the testing result is abnormal voice Segment pair.
Further, judge the sound bite of BIC detection to whether occurring abnormal may include: to judge sound bite pair Whether BIC value is greater than predetermined threshold;In the case where the judgment result is yes, determine that the sound bite is abnormal to occurring;Sentencing In the case that disconnected result is no, determine the sound bite to normal.
In the embodiment of the present invention, carrying out BIC detection to two sound bites in N number of sound bite specifically be can wrap It includes: calculating the BIC value between two sound bites;The BIC value is normalized.
In the embodiment of the present invention, it is divided into N number of sound bite can specifically include mixing voice by end-point detection: obtains Take mute section in the mixing voice;Remove described mute section;The mixing voice is split according to described mute section, Long sound bite after being divided;The long sound bite is divided into N number of sound bite by end-point detection.Its In, end-point detection is a basic link of speech recognition and speech processes and a hot fields of the Research of Speech Recognition. The main purpose of technology is to distinguish from the voice of input to voice and non-voice, can remove in voice it is mute at Point, obtain efficient voice in input voice.
The embodiment of the present invention is directed to specific mixing voice, is dominant for the voice duration of certain speaker dependent, other The voice duration of people or noise is relatively low.And voice content is less concerned about, more concerned with the occasion of speaker characteristic, is mentioned Go out for the purpose of detectable, by weakening target, innovatory algorithm promotes effect, and guarantee obtains relatively cleaner specific theory Talk about human speech sound.The specific method is as follows:
The position occurred to speaker's turning point assumes, it is believed that turning point appears in the voice sheet by end-point detection On section boundary.Bring benefit in this way: if one section of mixing voice there are 100,000 sampled points, there is 100 after end-point detection After a sound bite, BIC detection mode turning point originally possibly are present on this 100,000 points, and the embodiment of the present invention is improved Turning point is only present on the head and the tail of this 100 sound bites, greatly improves computational efficiency.
Two sound bites abnormal to BIC detection, directly discarding, and calculate remaining sound bite between any two BIC.Until whole sound bites all at least calculated 1 BIC.
If occurring the abnormal situation of BIC detection in epicycle BIC calculating process, repeating the above steps, until epicycle Do not occur BIC exception, terminate, after remaining sound bite reconfigures at this time, as passes through speaker dependent's language of BIC detection Sound.Voice at this time is without noise and nonspecific speaker's voice.
For the influence for reducing the BIC calculating different with distribution of sound bite length, BIC is normalized, specifically by The maximum value of number of sampling points and current BIC are normalized, and are normalized by following formula:
Wherein N1For the length (sample points) of fragment 1, N2For the length of fragment 2, N is after fragment 1 merges with fragment 2 Length, σ1For the variance of 1 sample point of fragment, σ2For the variance of 2 sample point of fragment, σ is sample point after fragment 1 merges with fragment 2 Variance, λ are a coefficient.For above-mentioned BIC, it is normalized toF is the letter of sample point variance and length Number, according to real data and empirically determined.
The influence that the threshold value of above-mentioned end-point detection is chosen to BIC is very big, is embodied in the acutance and void for influencing BIC signal Alert, mainly influence signal acutance, the present invention choose suitable end-point detection threshold value by experimental method, it is sharp to improve BIC signal Degree.
The difficulty of traditional BIC practical application is that BIC abnormal signal is related to a threshold value, and this threshold value is usually Every 2 sections of voices are distinctive when calculating BIC value, do not have universality, are difficult to obtain a Global B IC threshold value, are worth pointing out It is that, by above-mentioned improvement, BIC signal has normalized, and since the threshold value to end-point detection is controlled, BIC signal It is relatively accurate effectively, the BIC threshold value of the embodiment of the present invention basic global sense, and no longer needing according to call every time Data threshold value, to ensure that the practical application effect of the embodiment of the present invention.
Embodiment 2
A kind of voice processing apparatus is additionally provided in the present embodiment, and the device is real for realizing above-described embodiment and preferably Mode is applied, the descriptions that have already been made will not be repeated.As used below, the soft of predetermined function may be implemented in term " module " The combination of part and/or hardware.Although device described in following embodiment is preferably realized with software, hardware, or The realization of the combination of software and hardware is also that may and be contemplated.
Fig. 3 is a kind of block diagram of voice processing apparatus according to an embodiment of the present invention, as shown in Figure 3, comprising:
Divide module 32, for mixing voice to be divided into N number of sound bite by end-point detection, wherein the N is big In or equal to 2 natural number;
Detection module 34 carries out Bayes for two sound bites to the arbitrary neighborhood in N number of sound bite Information criterion BIC detection, and abandon BIC detection and abnormal sound bite occur, obtain the efficient voice segment of target object.
Fig. 4 is a kind of block diagram one of voice processing apparatus according to the preferred embodiment of the invention, as shown in figure 4, the inspection Surveying module 34 includes:
Detection unit 42, for two sound bites in N number of sound bite to carry out BIC detection;
Judging unit 44, for judging whether two sound bites of BIC detection exception occur;
There are two abnormal voice sheets in the case where the judgment result is yes, abandoning BIC detection in discarding unit 46 Section;
Repetition detection unit 48 carries out BIC inspection to two sound bites in remaining N-2 sound bite for repeating It surveys, abandons BIC detection and two abnormal sound bites occur, until remaining any two sound bite does not occur exception.
Optionally, the judging unit 44, is also used to
Judge whether the BIC value between described two sound bites is greater than predetermined threshold;
In the case where the judgment result is yes, it is abnormal to determine that described two sound bites occur;
If the determination result is NO, determine that described two sound bites are normal.
Fig. 5 is a kind of block diagram two of voice processing apparatus according to the preferred embodiment of the invention, as shown in figure 5, the inspection Surveying module 34 includes:
Computing unit 52, for calculating the BIC value between any two sound bite;
Processing unit 54, for the BIC value to be normalized.
Optionally, the segmentation module 32 includes:
Acquiring unit, for obtaining mute section in the mixing voice;
Removal unit, for removing described mute section;
First cutting unit, for being split according to described mute section to the mixing voice, the length after being divided Sound bite;
Second cutting unit, for the long sound bite to be divided into N number of sound bite by end-point detection.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong Following manner realization is crossed, but not limited to this: above-mentioned module is respectively positioned in same processor;Alternatively, above-mentioned modules are with any Combined form is located in different processors.
Embodiment 3
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:
Mixing voice is divided into N number of sound bite by end-point detection by S11, wherein the N is more than or equal to 2 Natural number;
S12 carries out bayesian information criterion BIC to two sound bites of the arbitrary neighborhood in N number of sound bite Detection, and abandon BIC detection and abnormal sound bite occur, obtain the efficient voice segment of target object.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (Read- Only Memory, referred to as ROM), it is random access memory (Random Access Memory, referred to as RAM), mobile hard The various media that can store computer program such as disk, magnetic or disk.
Embodiment 4
The embodiments of the present invention also provide a kind of electronic device, including memory and processor, stored in the memory There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method Suddenly.
Optionally, above-mentioned electronic device can also include transmission device and input-output equipment, wherein the transmission device It is connected with above-mentioned processor, which connects with above-mentioned processor.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
Mixing voice is divided into N number of sound bite by end-point detection by S11, wherein the N is more than or equal to 2 Natural number;
S12 carries out bayesian information criterion BIC to two sound bites of the arbitrary neighborhood in N number of sound bite Detection, and abandon BIC detection and abnormal sound bite occur, obtain the efficient voice segment of target object.
Optionally, the specific example in the present embodiment can be with reference to described in above-described embodiment and optional embodiment Example, details are not described herein for the present embodiment.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.It is all within principle of the invention, it is made it is any modification, etc. With replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of method of speech processing characterized by comprising
Mixing voice is divided into N number of sound bite by end-point detection, wherein the N is the natural number more than or equal to 2;
Bayesian information criterion BIC detection is carried out to two sound bites of the arbitrary neighborhood in N number of sound bite, and is lost It abandons BIC detection and abnormal sound bite occurs, obtain the efficient voice segment of target object.
2. the method according to claim 1, wherein the arbitrary neighborhood in N number of sound bite Two sound bites carry out bayesian information criterion BIC detection, and abandon the abnormal sound bite of BIC detection appearance and include:
To two sound bites adjacent in N number of sound bite to progress BIC detection;
Judge whether two sound bites of BIC detection exception occur;
In the case where the judgment result is yes, it abandons BIC detection and two abnormal sound bites occurs;
It repeats to carry out BIC detection to two sound bites adjacent in remaining N-2 sound bite, abandons BIC detection and occur Two abnormal sound bites, until remaining two neighboring sound bite does not occur exception.
3. according to the method described in claim 2, it is characterized in that, judge BIC detection two sound bites whether occur it is different Often include:
Judge whether the BIC value between described two sound bites is greater than predetermined threshold;
In the case where the judgment result is yes, it is abnormal to determine that described two sound bites occur;
If the determination result is NO, determine that described two sound bites are normal.
4. according to the method in any one of claims 1 to 3, which is characterized in that two in N number of sound bite Sound bite carries out BIC detection
Calculate the BIC value between described two sound bites;
The BIC value is normalized.
5. according to the method in any one of claims 1 to 3, which is characterized in that by end-point detection by creolized language cent Being cut into N number of sound bite includes:
Obtain mute section in the mixing voice;
Remove described mute section;
The mixing voice is split according to described mute section, the long sound bite after being divided;
The long sound bite is divided into N number of sound bite by end-point detection.
6. a kind of voice processing apparatus characterized by comprising
Divide module, for mixing voice to be divided into N number of sound bite by end-point detection, wherein the N is to be greater than or wait In 2 natural number;
It is quasi- to carry out Bayesian Information for two sound bites to the arbitrary neighborhood in N number of sound bite for detection module Then BIC is detected, and abandons BIC detection and abnormal sound bite occur, obtains the efficient voice segment of target object.
7. device according to claim 6, which is characterized in that the detection module includes:
Detection unit, for two sound bites adjacent in N number of sound bite to carry out BIC detection;
Judging unit, for judging whether two sound bites of BIC detection exception occur;
There are two abnormal sound bites in the case where the judgment result is yes, abandoning BIC detection in discarding unit;
Repetition detection unit carries out BIC inspection to two sound bites adjacent in remaining N-2 sound bite for repeating It surveys, abandons BIC detection and two abnormal sound bites occur, until remaining two neighboring sound bite does not occur exception.
8. device according to claim 7, which is characterized in that the judging unit is also used to
Judge whether the BIC value between described two sound bites is greater than predetermined threshold;
In the case where the judgment result is yes, it is abnormal to determine that described two sound bites occur;
If the determination result is NO, determine that described two sound bites are normal.
9. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer Program is arranged to execute method described in the claim 1 to 5 when operation.
10. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory Sequence, the processor are arranged to run the computer program to execute method described in the claim 1 to 5.
CN201811076321.4A 2018-09-14 2018-09-14 Voice processing method and device Active CN109036386B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811076321.4A CN109036386B (en) 2018-09-14 2018-09-14 Voice processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811076321.4A CN109036386B (en) 2018-09-14 2018-09-14 Voice processing method and device

Publications (2)

Publication Number Publication Date
CN109036386A true CN109036386A (en) 2018-12-18
CN109036386B CN109036386B (en) 2021-03-16

Family

ID=64622220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811076321.4A Active CN109036386B (en) 2018-09-14 2018-09-14 Voice processing method and device

Country Status (1)

Country Link
CN (1) CN109036386B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390946A (en) * 2019-07-26 2019-10-29 龙马智芯(珠海横琴)科技有限公司 A kind of audio signal processing method, device, electronic equipment and storage medium
CN111343344A (en) * 2020-03-13 2020-06-26 Oppo(重庆)智能科技有限公司 Voice abnormity detection method and device, storage medium and electronic equipment
CN111613249A (en) * 2020-05-22 2020-09-01 云知声智能科技股份有限公司 Voice analysis method and equipment
CN111883159A (en) * 2020-08-05 2020-11-03 龙马智芯(珠海横琴)科技有限公司 Voice processing method and device
CN112562635A (en) * 2020-12-03 2021-03-26 云知声智能科技股份有限公司 Method, device and system for solving pulse signal generation at splicing position in voice synthesis
CN112735470A (en) * 2020-12-28 2021-04-30 携程旅游网络技术(上海)有限公司 Audio cutting method, system, device and medium based on time delay neural network
CN112951212A (en) * 2021-04-19 2021-06-11 中国科学院声学研究所 Voice turning point detection method and device for multiple speakers

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1716380A (en) * 2005-07-26 2006-01-04 浙江大学 Audio frequency splitting method for changing detection based on decision tree and speaking person
CN101315771A (en) * 2008-06-04 2008-12-03 哈尔滨工业大学 Compensation method for different speech coding influence in speaker recognition
CN102034472A (en) * 2009-09-28 2011-04-27 戴红霞 Speaker recognition method based on Gaussian mixture model embedded with time delay neural network
CN102682760A (en) * 2011-03-07 2012-09-19 株式会社理光 Overlapped voice detection method and system
CN103559882A (en) * 2013-10-14 2014-02-05 华南理工大学 Meeting presenter voice extracting method based on speaker division
CN104021785A (en) * 2014-05-28 2014-09-03 华南理工大学 Method of extracting speech of most important guest in meeting
CN105304082A (en) * 2015-09-08 2016-02-03 北京云知声信息技术有限公司 Voice output method and voice output device
US20160241346A1 (en) * 2015-02-17 2016-08-18 Adobe Systems Incorporated Source separation using nonnegative matrix factorization with an automatically determined number of bases
CN107393527A (en) * 2017-07-17 2017-11-24 广东讯飞启明科技发展有限公司 The determination methods of speaker's number

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1716380A (en) * 2005-07-26 2006-01-04 浙江大学 Audio frequency splitting method for changing detection based on decision tree and speaking person
CN101315771A (en) * 2008-06-04 2008-12-03 哈尔滨工业大学 Compensation method for different speech coding influence in speaker recognition
CN102034472A (en) * 2009-09-28 2011-04-27 戴红霞 Speaker recognition method based on Gaussian mixture model embedded with time delay neural network
CN102682760A (en) * 2011-03-07 2012-09-19 株式会社理光 Overlapped voice detection method and system
CN103559882A (en) * 2013-10-14 2014-02-05 华南理工大学 Meeting presenter voice extracting method based on speaker division
CN104021785A (en) * 2014-05-28 2014-09-03 华南理工大学 Method of extracting speech of most important guest in meeting
US20160241346A1 (en) * 2015-02-17 2016-08-18 Adobe Systems Incorporated Source separation using nonnegative matrix factorization with an automatically determined number of bases
CN105304082A (en) * 2015-09-08 2016-02-03 北京云知声信息技术有限公司 Voice output method and voice output device
CN107393527A (en) * 2017-07-17 2017-11-24 广东讯飞启明科技发展有限公司 The determination methods of speaker's number

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PATRICK KENNY: "Joint Factor Analysis Versus Eigenchannels", 《AUDIO,SPEECH,AND LANGUAGE PROCESSING》 *
杨登舟等: "基于计算听觉场景分析的说话人转换检测", 《计算机工程》 *
赖松轩: "说话人聚类的初始类生成方法", 《计算机工程与应用》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390946A (en) * 2019-07-26 2019-10-29 龙马智芯(珠海横琴)科技有限公司 A kind of audio signal processing method, device, electronic equipment and storage medium
CN111343344A (en) * 2020-03-13 2020-06-26 Oppo(重庆)智能科技有限公司 Voice abnormity detection method and device, storage medium and electronic equipment
CN111613249A (en) * 2020-05-22 2020-09-01 云知声智能科技股份有限公司 Voice analysis method and equipment
CN111883159A (en) * 2020-08-05 2020-11-03 龙马智芯(珠海横琴)科技有限公司 Voice processing method and device
CN112562635A (en) * 2020-12-03 2021-03-26 云知声智能科技股份有限公司 Method, device and system for solving pulse signal generation at splicing position in voice synthesis
CN112562635B (en) * 2020-12-03 2024-04-09 云知声智能科技股份有限公司 Method, device and system for solving generation of pulse signals at splicing position in speech synthesis
CN112735470A (en) * 2020-12-28 2021-04-30 携程旅游网络技术(上海)有限公司 Audio cutting method, system, device and medium based on time delay neural network
CN112735470B (en) * 2020-12-28 2024-01-23 携程旅游网络技术(上海)有限公司 Audio cutting method, system, equipment and medium based on time delay neural network
CN112951212A (en) * 2021-04-19 2021-06-11 中国科学院声学研究所 Voice turning point detection method and device for multiple speakers

Also Published As

Publication number Publication date
CN109036386B (en) 2021-03-16

Similar Documents

Publication Publication Date Title
CN109036386A (en) A kind of method of speech processing and device
CN109584876B (en) Voice data processing method and device and voice air conditioner
CN103440867B (en) Audio recognition method and system
CN110517667A (en) A kind of method of speech processing, device, electronic equipment and storage medium
CN104781862B (en) Real-time traffic is detected
CN112400310A (en) Voice-based call quality detector
CN110390946A (en) A kind of audio signal processing method, device, electronic equipment and storage medium
CN108877783A (en) The method and apparatus for determining the audio types of audio data
CN109065051A (en) Voice recognition processing method and device
CN111816216A (en) Voice activity detection method and device
CN112581938B (en) Speech breakpoint detection method, device and equipment based on artificial intelligence
CN109740530A (en) Extracting method, device, equipment and the computer readable storage medium of video-frequency band
CN113129876A (en) Network searching method and device, electronic equipment and storage medium
CN113329372B (en) Method, device, equipment, medium and product for vehicle-mounted call
CN107196979A (en) Pre- system for prompting of calling out the numbers based on speech recognition
CN112562727A (en) Audio scene classification method, device and equipment applied to audio monitoring
JP2000163098A (en) Voice recognition device
CN114038487A (en) Audio extraction method, device, equipment and readable storage medium
US7571093B1 (en) Method of identifying duplicate voice recording
CN114049898A (en) Audio extraction method, device, equipment and storage medium
CN112053686B (en) Audio interruption method, device and computer readable storage medium
US11322137B2 (en) Video camera
EP3309777A1 (en) Device and method for audio frame processing
CN114005436A (en) Method, device and storage medium for determining voice endpoint
EP3171360B1 (en) Speech recognition with determination of noise suppression processing mode

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230105

Address after: Room 502 and Room 504, Jiayuan Office Building, No. 369, Yuelu Avenue, Xianjiahu Street, Yuelu District, Changsha City, Hunan Province 410205

Patentee after: Hunan Huawei Jin'an Enterprise Management Co.,Ltd.

Address before: 100080 370m south of Huandao, Yanfu Road, Yancun Town, Fangshan District, Beijing

Patentee before: BEIJING WANGZHONG GONGCHUANG TECHNOLOGY CO.,LTD.

TR01 Transfer of patent right