CN108198570A - The method and device of speech Separation during hearing - Google Patents

The method and device of speech Separation during hearing Download PDF

Info

Publication number
CN108198570A
CN108198570A CN201810106940.7A CN201810106940A CN108198570A CN 108198570 A CN108198570 A CN 108198570A CN 201810106940 A CN201810106940 A CN 201810106940A CN 108198570 A CN108198570 A CN 108198570A
Authority
CN
China
Prior art keywords
hearing
voice data
voice
matrix
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810106940.7A
Other languages
Chinese (zh)
Other versions
CN108198570B (en
Inventor
马金龙
关海欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Original Assignee
Beijing Yunzhisheng Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunzhisheng Information Technology Co Ltd filed Critical Beijing Yunzhisheng Information Technology Co Ltd
Priority to CN201810106940.7A priority Critical patent/CN108198570B/en
Publication of CN108198570A publication Critical patent/CN108198570A/en
Application granted granted Critical
Publication of CN108198570B publication Critical patent/CN108198570B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction

Abstract

The present invention provides a kind of method and device of speech Separation during hearing, wherein, this method includes:The first voice data of the first audio collecting device acquisition and the second speech data of the second audio collecting device acquisition are obtained, the first audio collecting device is the device for being directed toward hearing people, and the second audio collecting device is to be directed toward by the device of hearing people;First voice data is filtered, is determined and the hearing corresponding hearing voice data of people;To inquest voice data as with reference to signal, remove the hearing voice data in second speech data, determine in second speech data by hearing voice data.This method effectively reduces the interference of hearing people's channel by two groups of voice data, it is spoken the properly separating of signal so as to fulfill hearing people with being inquested people, later hearing people and the voice by hearing people can be correctly identified using speech recognition, so as to automatically generate hearing record, and then hearing efficiency is improved, and saved human cost.

Description

The method and device of speech Separation during hearing
Technical field
The present invention relates to the method and devices of speech Separation during a kind of speech Separation technical field, more particularly to hearing.
Background technology
At present, it is generated the hearing (such as criminal hearing) of judicial scene using notes form, it is less efficient, it needs artificial It carries out, human and material resources is caused to waste.Simultaneously as the inherent limitations of hearing scene etc., causes the collected signal of microphone past Toward can include multiple targets of speaking, directly collected voice signal is identified can not effective district do not mentionlet alone words target;And It is often too small or even can differ greatly relative to hearing people's sound of speaking by hearing people's sound of speaking, therefore most at present examine News process is by the way of manually recording.
Invention content
The method and device of speech Separation when the present invention provides a kind of hearing, to solve existing manual type record hearing Put down the defects of efficiency is low.
The method of speech Separation during a kind of hearing provided in an embodiment of the present invention, including:
Obtain the first voice data of the first audio collecting device acquisition and the second audio collecting device acquire second Voice data, first audio collecting device are the device for being directed toward hearing people, and second audio collecting device is is directed toward quilt Inquest the device of people;
First voice data is filtered, is determined and the hearing corresponding hearing voice data of people;
Using the hearing voice data as signal is referred to, the hearing voice data in the second speech data is removed, Determine in the second speech data by hearing voice data.
In a kind of possible realization method, this method further includes:
The hearing voice data is identified respectively and by hearing voice data, determine corresponding hearing text and inquested Text.
It is described to determine corresponding hearing text and included by hearing text in a kind of possible realization method:
It determines the hearing voice data and the timestamp by hearing voice data, and is respectively institute according to the timestamp State hearing text and described by the corresponding timestamp of hearing text addition, the timestamp includes time started stamp at the end of Between stab;
Lap in the hearing text and the text by hearing is determined, and highlight according to the timestamp The corresponding text of the lap.
It is described to be used as using the hearing voice data with reference to signal in a kind of possible realization method, remove described the Hearing voice data in two voice data, including:
It is filled according to the distance between first audio collecting device and the hearing people and second audio collection It puts the distance between described hearing people and determines signal time delay;
Time delay processing is carried out to the hearing voice data according to the signal time delay, by the hearing voice after time delay processing Data are used as with reference to signal, and remove the hearing voice data in the second speech data.
It is described to be used as using the hearing voice data with reference to signal in a kind of possible realization method, remove described the Hearing voice data in two voice data, including:
The hearing voice data and the second speech data are pre-processed respectively, determined and the hearing voice The corresponding first phonetic matrix G1 of data and second phonetic matrix G2 corresponding with the second speech data;
Third phonetic matrix G3, G3=G2- λ are determined according to the first phonetic matrix G1 and the second phonetic matrix G2 G1, λ are weight coefficient;
Dimension-reduction treatment is carried out to the third phonetic matrix G3, the third phonetic matrix G3 is converted into discrete voice number Group Xs, is reduced to continuous voice data, by the voice after reduction according to the default sampling period by the discrete voice array Xs Data are used as by hearing voice data;
Wherein, the process of the pretreatment includes:
According to the default sampling period to voice data carry out discrete sampling processing, obtain with array form represent from Voice array X is dissipated, the voice data is the hearing voice data or the second speech data;X=[x1,x2,… xj,…,xn], wherein, xjRepresent the sampled value of the corresponding voice data of j-th of sampled point, n is sampling sum;
It is extended using the discrete voice array X as a line into every trade, determines the discrete voice matrix of m × n after row extension M;Wherein, m is odd number,And mi,jValue withBetween for negative correlativing relation, mi,jRepresent discrete voice square The element that the i-th row jth arranges in battle array M;
Determine the R-matrix H of k × k;K is the odd number more than 1, and the element of the i-th rows of R-matrix H jth row is:Wherein, σ is regulation coefficient;
The discrete voice matrix M is carried out dropping poor processing according to the R-matrix H, determines drop difference treated voice Matrix G, and the element g of phonetic matrix G xth rows y rowx,yFor:
Wherein, when the voice data is the hearing voice data, identified phonetic matrix G is the first voice square Battle array G1;When the voice data is the second speech data, identified phonetic matrix G is the second phonetic matrix G2.
Based on same inventive concept, the device of speech Separation when the embodiment of the present invention also provides a kind of hearing, including:
Acquisition module, for obtaining the first voice data of the first audio collecting device acquisition and the second audio collection dress The second speech data of acquisition is put, first audio collecting device is to be directed toward the device for inquesting people, second audio collection Device is to be directed toward by the device of hearing people;
First processing module for being filtered to first voice data, determines opposite with the hearing people The hearing voice data answered;
Second processing module, for using the hearing voice data as with reference to signal, removing the second speech data In hearing voice data, determine in the second speech data by hearing voice data.
In a kind of possible realization method, which further includes:
Identification module, for identifying the hearing voice data respectively and by hearing voice data, determining corresponding examine Interrogate text and by hearing text.
In a kind of possible realization method, the identification module is additionally operable to:
It determines the hearing voice data and the timestamp by hearing voice data, and is respectively institute according to the timestamp State hearing text and described by the corresponding timestamp of hearing text addition, the timestamp includes time started stamp at the end of Between stab;
Lap in the hearing text and the text by hearing is determined, and highlight according to the timestamp The corresponding text of the lap.
In a kind of possible realization method, the Second processing module is used for:
It is filled according to the distance between first audio collecting device and the hearing people and second audio collection It puts the distance between described hearing people and determines signal time delay;
Time delay processing is carried out to the hearing voice data according to the signal time delay, by the hearing voice after time delay processing Data are used as with reference to signal, and remove the hearing voice data in the second speech data.
In a kind of possible realization method, the Second processing module is used for:
The hearing voice data and the second speech data are pre-processed respectively, determined and the hearing voice The corresponding first phonetic matrix G1 of data and second phonetic matrix G2 corresponding with the second speech data;
Third phonetic matrix G3, G3=G2- λ are determined according to the first phonetic matrix G1 and the second phonetic matrix G2 G1, λ are weight coefficient;
Dimension-reduction treatment is carried out to the third phonetic matrix G3, the third phonetic matrix G3 is converted into discrete voice number Group Xs, is reduced to continuous voice data, by the voice after reduction according to the default sampling period by the discrete voice array Xs Data are used as by hearing voice data;
Wherein, the process of the pretreatment includes:
According to the default sampling period to voice data carry out discrete sampling processing, obtain with array form represent from Voice array X is dissipated, the voice data is the hearing voice data or the second speech data;X=[x1,x2,… xj,…,xn], wherein, xjRepresent the sampled value of the corresponding voice data of j-th of sampled point, n is sampling sum;
It is extended using the discrete voice array X as a line into every trade, determines the discrete voice matrix of m × n after row extension M;Wherein, m is odd number,And mi,jValue withBetween for negative correlativing relation, mi,jRepresent discrete voice square The element that the i-th row jth arranges in battle array M;
Determine the R-matrix H of k × k;K is the odd number more than 1, and the element of the i-th rows of R-matrix H jth row is:Wherein, σ is regulation coefficient;
The discrete voice matrix M is carried out dropping poor processing according to the R-matrix H, determines drop difference treated voice Matrix G, and the element g of phonetic matrix G xth rows y rowx,yFor:
Wherein, when the voice data is the hearing voice data, identified phonetic matrix G is the first voice square Battle array G1;When the voice data is the second speech data, identified phonetic matrix G is the second phonetic matrix G2.
The method and device of speech Separation during a kind of hearing provided in an embodiment of the present invention, two audios based on setting are adopted Acquisition means obtain two groups of voice data respectively, and another group of voice data is carried out using one of which voice data as reference signal Processing for removing, so as to fulfill speech Separation.The interference of hearing people's channel is effectively reduced by two groups of voice data, so as to fulfill examining News people can correctly identify hearing people using speech recognition later and be inquested people with being spoken the properly separating of signal by hearing people Voice, so as to automatically generate hearing record, and then improve hearing efficiency, and saved human cost.Pass through protrusion It shows the corresponding text of lap, user can be caused quickly to navigate to the position that speech recognition may be wrong, conveniently made User, which quickly audits, confirms whether the text of identification is accurate.Examining in second speech data can be removed by delay process It is more accurate during news voice data, it is hereby achieved that more accurately by hearing voice data.The language that array form is represented Sound Data expansion is matrix form, convenient that voice data is handled;Determine with by the corresponding third of hearing voice data Dimension-reduction treatment is carried out after phonetic matrix again, so as to obtain being inquested voice data.Which is with the second voice of more high-dimensional removal Hearing voice data in data is distorted caused by when can effectively reduce removal second speech data.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that being understood by implementing the present invention.The purpose of the present invention and other advantages can be by the explanations write Specifically noted structure is realized and is obtained in book, claims and attached drawing.
Below by drawings and examples, technical scheme of the present invention is described in further detail.
Description of the drawings
Attached drawing is used to provide further understanding of the present invention, and a part for constitution instruction, the reality with the present invention Example is applied together for explaining the present invention, is not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is the method flow diagram of speech Separation when being inquested in the embodiment of the present invention;
Fig. 2 is the schematic diagram that lap text is determined in the embodiment of the present invention;
Fig. 3 is the first structure figure of the device of speech Separation when being inquested in the embodiment of the present invention;
Fig. 4 is the first structure figure of the device of speech Separation when being inquested in the embodiment of the present invention.
Specific embodiment
The preferred embodiment of the present invention is illustrated below in conjunction with attached drawing, it should be understood that preferred reality described herein It applies example to be merely to illustrate and explain the present invention, be not intended to limit the present invention.
The method of speech Separation during a kind of hearing provided in an embodiment of the present invention, based on two audio collecting devices (such as Microphone etc.) realize speech Separation effect.Specifically, shown in Figure 1, the method comprising the steps of 101-103:
Step 101:The first voice data and the second audio collecting device for obtaining the acquisition of the first audio collecting device are adopted The second speech data of collection, the first audio collecting device are the device for being directed toward hearing people, and the second audio collecting device is is directed toward quilt Inquest the device of people.
In the embodiment of the present invention, two audio collecting devices, i.e. the first audio collecting device and the second audio are set respectively Harvester.Wherein, the first audio collecting device is directed toward hearing people, for acquiring the voice of hearing people;Since the first audio is adopted Acquisition means can be close apart from hearing people, and distance is far by hearing people, therefore the first audio collecting device institute gathered data is basic On all be inquest people voice data.
Second audio collecting device is directed toward by hearing people, and in view of safety factor, and the second audio collecting device cannot be away from It is filled from being still needed by the distance that hearing people is too near or even the second audio collecting device is apart from hearing people less than the second audio collection Distance is put by the distance of hearing people, so when the second audio collecting device be although directed toward by hearing people, but the second audio is adopted Acquisition means still can collect hearing people and the integrated voice/data by hearing people.It should be noted that in the embodiment of the present invention, First audio collecting device is directed toward the voice collecting direction direction that hearing people refers to the first audio collecting device (such as microphone) Inquest people, and the first audio collecting device distance hearing people distance be less than the second audio collecting device distance hearing people away from From.
Step 102:First voice data is filtered, is determined and the hearing corresponding hearing voice data of people.
In the embodiment of the present invention, as noted previously, as the first audio collecting device can be close apart from hearing people, and distance It is far by hearing people, therefore the first voice data that the first audio collecting device is acquired is essentially all the voice number for inquesting people According to the hearing voice data of hearing people can be determined by being simply filtered drop difference processing to the first voice data at this time.
Step 103:To inquest voice data as signal is referred to, the hearing voice data in second speech data is removed, Determine in second speech data by hearing voice data.
The second speech data of second audio collecting device acquisition is to include hearing people and the voice data by hearing people, this When need to carry out speech Separation to it.In the embodiment of the present invention, with hearing people's voice data of the first audio collecting device acquisition As with reference to signal, so as to remove the phonological component that people is inquested in second speech data, and then the second voice can be determined By hearing people by hearing voice data in data.
The method of speech Separation during hearing provided in an embodiment of the present invention, two audio collecting devices difference based on setting Two groups of voice data are obtained, and Processing for removing is carried out to another group of voice data using one of which voice data as reference signal, So as to fulfill speech Separation.The interference of hearing people's channel is effectively reduced by two groups of voice data, so as to fulfill hearing people and quilt Hearing people speaks the properly separating of signal, and can correctly identify hearing people and the voice by hearing people using speech recognition later, So as to automatically generate hearing record, and then hearing efficiency is improved, and saved human cost.
In a kind of possible realization method, this method further includes:Identification inquests voice data and by hearing voice respectively Data determine corresponding hearing text and by hearing text.
In the embodiment of the present invention, the voice that hearing people is only included in voice data is inquested, it is carried out after speech recognition i.e. It can determine corresponding hearing text;Likewise, by only being included in hearing voice data by the voice of hearing people, it is carried out It can be determined after speech recognition corresponding by hearing text.Later by inquesting text and can be generated by hearing text Hearing record.
In a kind of possible realization method, since people is inquested in presence and is spoken simultaneously by hearing people or wherein a side interrupts The situation that the opposing party speaks, content of text at this time are above-mentioned to determine corresponding examine in the embodiment of the present invention there may be problem It interrogates text and step A1-A2 is included by hearing text:
Step A1:It determines hearing voice data and the timestamp by hearing voice data, and is respectively to examine according to timestamp It interrogates text and corresponding timestamp is added by hearing text.
Step A2:Hearing text is determined and by the lap in hearing text, and highlight overlapping according to timestamp The corresponding text in part.
In the embodiment of the present invention, determine the timestamp of two kinds of texts according to the timestamp of voice data, in the text when Between stamp for showing the time of this section in text, this or a certain word, while come pair with time started stamp and ending time stamp Entire text is divided, and time started stamp is to there are text, and ending time stamp was stabbed to the time started between ending time stamp For blank;Meanwhile the text between time started stamp and ending time stamp can also be equipped with common time and stab, this is common Timestamp is served only for representing text corresponding time point.That is, hearing text can be determined according to the timestamp of text or inquested One section of text in text is collected within which period.When determining hearing text and by hearing text according to timestamp In when having lap, illustrate not only to have collected hearing text within the period of the overlapping, but also collected by hearing text, At this time using inquest voice data as with reference to signal obtain by hearing voice data there may be larger interference, pass through protrusion Show whether corresponding text is accurate in order to which user can verify voice recognition result herein.
As shown in Fig. 2, being indicated whether with rectangular wave there are text, when waveform is 1, represent there are text, when waveform is 0 When, represent that there is no texts.Specifically, for inquesting text, T1, T3, T5 are stabbed for the time started, and T2, T4, T6 are the end time Stamp, one section of text being respectively present between T1 and T2, between T3 and T4, between T5 and T6 in hearing text.Similarly, for being examined Interrogate text, T7, T9 for the time started stab, T8, T10 be ending time stamp, be respectively present between T7 and T8, between T9 and T10 by Inquest one section of text in text.At this time as shown in Fig. 2, hearing text is with being T9-T4 sections by the lap of hearing text The part of expression inquests people at the T3 moment and loquiturs, (hearing people was terminated in the T4 times before hearing people does not finish also Speak), dialogue is inserted at the T9 moment, therefore collect hearing text simultaneously within this period of T9-T4 by hearing people With by hearing text, need that corresponding text in this period of T9-T4 is highlighted and (is such as highlighted) at this time. Optionally, due to hearing voice data by by hearing people interference possibility it is smaller, therefore be believed that hearing text be it is correct, this When only need to highlight the lap by hearing text.
A kind of method of speech Separation when another embodiment of the present invention provides hearing, this method are included in above-described embodiment Step 101-103, realization principle and technique effect are referring to the corresponding embodiments of Fig. 1.Meanwhile in the embodiment of the present invention, step To inquest voice data as with reference to signal in rapid 103, the hearing voice data in second speech data is removed, including step B1-B2:
Step B1:According to the distance between the first audio collecting device and hearing people and the second audio collecting device with examining It interrogates the distance between people and determines signal time delay.
Step B2:Time delay processing is carried out to hearing voice data according to signal time delay, by the hearing voice after time delay processing Data are used as with reference to signal, and remove the hearing voice data in second speech data.
Since two audio collecting devices relative to the distance of hearing people are different, therefore two audio collecting devices are adopted The voice of hearing people collected there are in fact time delay.In the embodiment of the present invention, according to the first audio collecting device and hearing people The distance between and the second audio collecting device determine signal time delay with hearing the distance between people, can specifically utilize existing Some Time Delay Estimation Algorithms determine the signal time delay.After hearing voice data is got, this is examined according to the signal time delay Interrogate voice data and carry out time delay processing, so as to eliminate two audio collecting devices time delay between collected signal so that Remove second speech data in hearing voice data when it is more accurate, it is hereby achieved that more accurately by hearing voice number According to.
On the basis of above-described embodiment, " to inquest voice data as with reference to signal, the second language is removed in step 103 Hearing voice data in sound data " is including step C1-C3:
Step C1:Hearing voice data and second speech data are pre-processed respectively, determined and hearing voice data Corresponding first phonetic matrix G1 and second phonetic matrix G2 corresponding with second speech data.
Step C2:Third phonetic matrix G3, G3=G2- λ are determined according to the first phonetic matrix G1 and the second phonetic matrix G2 G1, λ are weight coefficient.
Step C3:Dimension-reduction treatment is carried out to third phonetic matrix G3, third phonetic matrix G3 is converted into discrete voice number Group Xs, is reduced to continuous voice data, by the voice data after reduction according to the default sampling period by discrete voice array Xs As by hearing voice data.
Wherein, the process of the pretreatment in step C1 specifically includes step C11-C14:
Step C11:Discrete sampling processing is carried out to voice data according to the default sampling period, obtains and is represented with array form Discrete voice array X, voice data is hearing voice data or second speech data;X=[x1,x2,…xj,…,xn], In, xjRepresent the sampled value of the corresponding voice data of j-th of sampled point, n is sampling sum.
In the embodiment of the present invention, voice data (hearing voice data or second speech data) is continuous audio data, Discrete sampling is carried out to it and just can determine phonetic feature wherein included;By the discrete voice array that is represented with array form come Voice data is recorded, facilitates subsequent processing.Wherein, the default sampling period and the default sampling period phase in above-mentioned steps C3 Together, i.e., for the voice data, with this preset the sampling period sampled after obtain n sampled value.
Step C12:It is extended using discrete voice array X as a line into every trade, determines the discrete voice of m × n after row extension Matrix M;Wherein, m is odd number,And mi,jValue withBetween for negative correlativing relation, mi,jRepresent discrete language The element that the i-th row jth arranges in sound matrix M.
In the embodiment of the present invention, discrete voice array X can also regard the matrix of a 1 × n as, should by row extension Discrete voice array X is extended to the discrete voice matrix M of m × n.Wherein, i.e.,For other mi,j, mi,jValue WithBetween for negative correlativing relation, i.e. mi,jThe element of corresponding element distance most middle rowIt is more remote, mi,j's It is worth smaller.
Optionally,OrWherein,Other modes can also be used Determine mi,j, the present embodiment do not limit this.
Step C13:Determine the R-matrix H of k × k;K is odd number more than 1, and the member of the i-th rows of R-matrix H jth row Element is:Wherein, σ is regulation coefficient.
Since error can be introduced after the voice data of acquired original is extended to discrete voice matrix M, needed at this time to it It carries out dropping poor processing, to reduce the error for even eliminating and introducing.In the embodiment of the present invention by establish the R-matrix H of k ranks come Remove the error in discrete voice matrix M.Wherein, regulation coefficient σ is bigger, and the poor effect of drop is more apparent, but is more easy to cause discrete Phonetic matrix M is distorted;Therefore generally regulation coefficient σ, such as σ=0.8 are chosen according to actual conditions.
Step C14:Discrete voice matrix M is carried out dropping poor processing according to R-matrix H, determines drop difference treated voice Matrix G, and the element g of phonetic matrix G xth rows y rowx,yFor:
Wherein, when voice data is hearing voice data, identified phonetic matrix G is the first phonetic matrix G1;Language When sound data are second speech data, identified phonetic matrix G is the second phonetic matrix G2.
In the embodiment of the present invention, work as mx,yIt can carry out dropping when difference is handled (i.e.And), according to reference to square Battle array H carries out dropping poor processing to discrete voice matrix M;For the element m of discrete voice matrix M peripheriesx,y(Or), although not dropping poor processing, due to original discrete voice array X, to be located at discrete voice matrix M most intermediate A line, therefore peripheral elements are ignored even if without dropping difference handles the influence to discrete voice array X.
After the first phonetic matrix G1 and the second phonetic matrix G2 is obtained, according to above-mentioned step C2 and C3 i.e. can determine by Inquest voice data.Wherein, weight coefficient λ is a real number between 0 to 1, i.e. λ is up to 1;It specifically can be according to two sounds The distance between frequency harvester determines λ, and the distance between two audio collecting devices are bigger, and λ is smaller.Meanwhile by third language Sound matrix G3 is converted to discrete voice array Xs, specifically can be directly using the element of G3 most middle rows as discrete voice array Xs can also determine one that discrete voice array Xs relative positions go out according to all elements of a row in third phonetic matrix G3 Element.
In the embodiment of the present invention, a liter dimension processing is carried out to voice data first, the voice data that array form is represented expands It opens up as matrix form, it is convenient that voice data is handled;Determine with by the corresponding third phonetic matrix of hearing voice data Dimension-reduction treatment is carried out again afterwards, so as to obtain being inquested voice data.Which is in more high-dimensional removal second speech data Voice data is inquested, is distorted caused by when can effectively reduce removal second speech data.
The method of speech Separation during a kind of hearing provided in an embodiment of the present invention, two audio collecting devices based on setting Two groups of voice data are obtained respectively, and another group of voice data is carried out at elimination using one of which voice data as reference signal Reason, so as to fulfill speech Separation.By two groups of voice data effectively reduce hearing people's channel interference, so as to fulfill hearing people with It is spoken the properly separating of signal by hearing people, can correctly identify hearing people and the language by hearing people using speech recognition later Sound so as to automatically generate hearing record, and then improves hearing efficiency, and saved human cost.By highlighting The corresponding text of lap can cause user quickly to navigate to the position that speech recognition may be wrong, convenient for users to Quick audit confirms whether the text of identification is accurate.The hearing language in second speech data can be removed by delay process It is more accurate during sound data, it is hereby achieved that more accurately by hearing voice data.The voice number that array form is represented According to matrix form is extended to, conveniently voice data is handled;Determine with by the corresponding third voice of hearing voice data Dimension-reduction treatment is carried out after matrix again, so as to obtain being inquested voice data.Which is with more high-dimensional removal second speech data In hearing voice data, can effectively reduce removal second speech data when caused by distortion.
The method flow of speech Separation when describing hearing in detail above, this method can also pass through corresponding device reality It is existing, the structure and function of the device is described in detail below.
The device of speech Separation, shown in Figure 3 during a kind of hearing provided in an embodiment of the present invention, including:
Acquisition module 31, for obtaining the first voice data and the second audio collection of the acquisition of the first audio collecting device The second speech data of device acquisition, the first audio collecting device are the device for being directed toward hearing people, and the second audio collecting device is It is directed toward by the device of hearing people;
First processing module 32 for being filtered to the first voice data, determines corresponding careful with hearing people Interrogate voice data;
Second processing module 33, for inquest voice data as with reference to signal, removing examining in second speech data Interrogate voice data, determine in second speech data by hearing voice data.
Shown in Figure 4 in a kind of possible realization method, which further includes:
Identification module 34, for identifying hearing voice data respectively and by hearing voice data, determining corresponding hearing Text and by hearing text.
In a kind of possible realization method, identification module 34 is additionally operable to:
Determine the hearing voice data and timestamp by hearing voice data, and according to timestamp be respectively inquest text and Corresponding timestamp is added by hearing text, timestamp includes time started stamp and ending time stamp;
Hearing text is determined and by the lap in hearing text according to timestamp, and highlights lap correspondence Text.
In a kind of possible realization method, Second processing module 33 is specifically used for:
According to the first audio collecting device and hearing the distance between people and the second audio collecting device and hearing people it Between distance determine signal time delay;
According to signal time delay to hearing voice data carry out time delay processing, using the hearing voice data after time delay processing as Reference signal, and remove the hearing voice data in second speech data.
In a kind of possible realization method, Second processing module 33 is specifically used for:
Hearing voice data and second speech data are pre-processed respectively, determine corresponding with inquesting voice data the One phonetic matrix G1 and second phonetic matrix G2 corresponding with second speech data;
Determine that third phonetic matrix G3, G3=G2- λ G1, λ are power according to the first phonetic matrix G1 and the second phonetic matrix G2 Weight coefficient;
Dimension-reduction treatment is carried out to third phonetic matrix G3, third phonetic matrix G3 is converted into discrete voice array Xs, is pressed Discrete voice array Xs is reduced to continuous voice data according to the default sampling period, using the voice data after reduction as being examined Interrogate voice data;
Wherein, the process of pretreatment includes:
Discrete sampling processing is carried out to voice data according to the default sampling period, obtains the discrete language represented with array form Sound array X, voice data are hearing voice data or second speech data;X=[x1,x2,…xj,…,xn], wherein, xjIt represents The sampled value of the corresponding voice data of j-th of sampled point, n are sampling sum;
It is extended using discrete voice array X as a line into every trade, determines the discrete voice matrix M of m × n after row extension;Its In, m is odd number,And mi,jValue withBetween for negative correlativing relation, mi,jRepresent discrete voice matrix M In the i-th row jth arrange element;
Determine the R-matrix H of k × k;K is the odd number more than 1, and the element of the i-th rows of R-matrix H jth row is:Wherein, σ is regulation coefficient;
Discrete voice matrix M is carried out dropping poor processing according to R-matrix H, determines drop difference treated phonetic matrix G, and The element g of phonetic matrix G xth rows y rowx,yFor:
Wherein, when voice data is hearing voice data, identified phonetic matrix G is the first phonetic matrix G1;Language When sound data are second speech data, identified phonetic matrix G is the second phonetic matrix G2.
The device of speech Separation during a kind of hearing provided in an embodiment of the present invention, two audio collecting devices based on setting Two groups of voice data are obtained respectively, and another group of voice data is carried out at elimination using one of which voice data as reference signal Reason, so as to fulfill speech Separation.By two groups of voice data effectively reduce hearing people's channel interference, so as to fulfill hearing people with It is spoken the properly separating of signal by hearing people, can correctly identify hearing people and the language by hearing people using speech recognition later Sound so as to automatically generate hearing record, and then improves hearing efficiency, and saved human cost.By highlighting The corresponding text of lap can cause user quickly to navigate to the position that speech recognition may be wrong, convenient for users to Quick audit confirms whether the text of identification is accurate.The hearing language in second speech data can be removed by delay process It is more accurate during sound data, it is hereby achieved that more accurately by hearing voice data.The voice number that array form is represented According to matrix form is extended to, conveniently voice data is handled;Determine with by the corresponding third voice of hearing voice data Dimension-reduction treatment is carried out after matrix again, so as to obtain being inquested voice data.Which is with more high-dimensional removal second speech data In hearing voice data, can effectively reduce removal second speech data when caused by distortion.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art God and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (10)

1. a kind of method of speech Separation during hearing, which is characterized in that including:
Obtain the first voice data of the first audio collecting device acquisition and the second voice of the second audio collecting device acquisition Data, first audio collecting device are the device for being directed toward hearing people, and second audio collecting device is inquested to be directed toward The device of people;
First voice data is filtered, is determined and the hearing corresponding hearing voice data of people;
Using the hearing voice data as with reference to signal, the hearing voice data in the second speech data is removed, is determined In the second speech data by hearing voice data.
2. it according to the method described in claim 1, it is characterized in that, further includes:
The hearing voice data is identified respectively and by hearing voice data, determine corresponding hearing text and by hearing text This.
3. according to the method described in claim 2, it is characterized in that, described determine corresponding hearing text and by hearing text Including:
It determines the hearing voice data and the timestamp by hearing voice data, and is respectively described examine according to the timestamp Interrogate text and described by the corresponding timestamp of hearing text addition, the timestamp includes time started stamp and end time Stamp;
Lap in the hearing text and the text by hearing is determined according to the timestamp, and is highlighted described The corresponding text of lap.
4. according to the method described in claim 1, it is characterized in that, it is described using it is described hearing voice data as refer to signal, The hearing voice data in the second speech data is removed, including:
According to first audio collecting device and described hearing the distance between people and second audio collecting device with The distance between described hearing people determines signal time delay;
Time delay processing is carried out to the hearing voice data according to the signal time delay, by the hearing voice data after time delay processing As with reference to signal, and remove the hearing voice data in the second speech data.
5. according to the method described in claim 1, it is characterized in that, it is described using it is described hearing voice data as refer to signal, The hearing voice data in the second speech data is removed, including:
The hearing voice data and the second speech data are pre-processed respectively, determined and the hearing voice data Corresponding first phonetic matrix G1 and second phonetic matrix G2 corresponding with the second speech data;
Third phonetic matrix G3, G3=G2- λ G1, λ are determined according to the first phonetic matrix G1 and the second phonetic matrix G2 For weight coefficient;
Dimension-reduction treatment is carried out to the third phonetic matrix G3, the third phonetic matrix G3 is converted into discrete voice array The discrete voice array Xs is reduced to continuous voice data, by the voice number after reduction by Xs according to the default sampling period According to as by hearing voice data;
Wherein, the process of the pretreatment includes:
Discrete sampling processing is carried out to voice data according to the default sampling period, obtains the discrete language represented with array form Sound array X, the voice data are the hearing voice data or the second speech data;X=[x1,x2,…xj,…, xn], wherein, xjRepresent the sampled value of the corresponding voice data of j-th of sampled point, n is sampling sum;
It is extended using the discrete voice array X as a line into every trade, determines the discrete voice matrix M of m × n after row extension;Its In, m is odd number,And mi,jValue withBetween for negative correlativing relation, mi,jRepresent discrete voice matrix M In the i-th row jth arrange element;
Determine the R-matrix H of k × k;K is the odd number more than 1, and the element of the i-th rows of R-matrix H jth row is:Wherein, σ is regulation coefficient;
The discrete voice matrix M is carried out dropping poor processing according to the R-matrix H, determines drop difference treated phonetic matrix G, and the element g of phonetic matrix G xth rows y rowx,yFor:
Wherein, when the voice data is the hearing voice data, identified phonetic matrix G is the first phonetic matrix G1;When the voice data is the second speech data, identified phonetic matrix G is the second phonetic matrix G2.
6. a kind of device of speech Separation during hearing, which is characterized in that including:
Acquisition module is adopted for obtaining the first voice data of the first audio collecting device acquisition and the second audio collecting device The second speech data of collection, first audio collecting device are to be directed toward the device for inquesting people, second audio collecting device To be directed toward by the device of hearing people;
First processing module for being filtered to first voice data, determines corresponding with the hearing people Inquest voice data;
Second processing module, for using the hearing voice data as with reference to signal, removing in the second speech data Inquest voice data, determine in the second speech data by hearing voice data.
7. device according to claim 6, which is characterized in that further include:
Identification module, for identifying the hearing voice data respectively and by hearing voice data, determining corresponding hearing text Originally and by hearing text.
8. device according to claim 7, which is characterized in that the identification module is additionally operable to:
It determines the hearing voice data and the timestamp by hearing voice data, and is respectively described examine according to the timestamp Interrogate text and described by the corresponding timestamp of hearing text addition, the timestamp includes time started stamp and end time Stamp;
Lap in the hearing text and the text by hearing is determined according to the timestamp, and is highlighted described The corresponding text of lap.
9. device according to claim 6, which is characterized in that the Second processing module is used for:
According to first audio collecting device and described hearing the distance between people and second audio collecting device with The distance between described hearing people determines signal time delay;
Time delay processing is carried out to the hearing voice data according to the signal time delay, by the hearing voice data after time delay processing As with reference to signal, and remove the hearing voice data in the second speech data.
10. device according to claim 6, which is characterized in that the Second processing module is used for:
The hearing voice data and the second speech data are pre-processed respectively, determined and the hearing voice data Corresponding first phonetic matrix G1 and second phonetic matrix G2 corresponding with the second speech data;
Third phonetic matrix G3, G3=G2- λ G1, λ are determined according to the first phonetic matrix G1 and the second phonetic matrix G2 For weight coefficient;
Dimension-reduction treatment is carried out to the third phonetic matrix G3, the third phonetic matrix G3 is converted into discrete voice array The discrete voice array Xs is reduced to continuous voice data, by the voice number after reduction by Xs according to the default sampling period According to as by hearing voice data;
Wherein, the process of the pretreatment includes:
Discrete sampling processing is carried out to voice data according to the default sampling period, obtains the discrete language represented with array form Sound array X, the voice data are the hearing voice data or the second speech data;X=[x1,x2,…xj,…, xn], wherein, xjRepresent the sampled value of the corresponding voice data of j-th of sampled point, n is sampling sum;
It is extended using the discrete voice array X as a line into every trade, determines the discrete voice matrix M of m × n after row extension;Its In, m is odd number,And mi,jValue withBetween for negative correlativing relation, mi,jRepresent discrete voice matrix M In the i-th row jth arrange element;
Determine the R-matrix H of k × k;K is the odd number more than 1, and the element of the i-th rows of R-matrix H jth row is:Wherein, σ is regulation coefficient;
The discrete voice matrix M is carried out dropping poor processing according to the R-matrix H, determines drop difference treated phonetic matrix G, and the element g of phonetic matrix G xth rows y rowx,yFor:
Wherein, when the voice data is the hearing voice data, identified phonetic matrix G is the first phonetic matrix G1;When the voice data is the second speech data, identified phonetic matrix G is the second phonetic matrix G2.
CN201810106940.7A 2018-02-02 2018-02-02 Method and device for separating voice during interrogation Active CN108198570B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810106940.7A CN108198570B (en) 2018-02-02 2018-02-02 Method and device for separating voice during interrogation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810106940.7A CN108198570B (en) 2018-02-02 2018-02-02 Method and device for separating voice during interrogation

Publications (2)

Publication Number Publication Date
CN108198570A true CN108198570A (en) 2018-06-22
CN108198570B CN108198570B (en) 2020-10-23

Family

ID=62592089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810106940.7A Active CN108198570B (en) 2018-02-02 2018-02-02 Method and device for separating voice during interrogation

Country Status (1)

Country Link
CN (1) CN108198570B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065023A (en) * 2018-08-23 2018-12-21 广州势必可赢网络科技有限公司 A kind of voice identification method, device, equipment and computer readable storage medium
CN109785855A (en) * 2019-01-31 2019-05-21 秒针信息技术有限公司 Method of speech processing and device, storage medium, processor
CN110689900A (en) * 2019-09-29 2020-01-14 北京地平线机器人技术研发有限公司 Signal enhancement method and device, computer readable storage medium and electronic equipment
CN111128212A (en) * 2019-12-09 2020-05-08 秒针信息技术有限公司 Mixed voice separation method and device
CN111145774A (en) * 2019-12-09 2020-05-12 秒针信息技术有限公司 Voice separation method and device
WO2023242841A1 (en) * 2022-06-13 2023-12-21 Orcam Technologies Ltd. Processing and utilizing audio signals

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998004100A1 (en) * 1996-07-19 1998-01-29 David Griesinger Multichannel active matrix sound reproduction with maximum lateral separation
JP3695324B2 (en) * 2000-12-12 2005-09-14 日本電気株式会社 Information system using TV broadcasting
US20070133811A1 (en) * 2005-12-08 2007-06-14 Kabushiki Kaisha Kobe Seiko Sho Sound source separation apparatus and sound source separation method
CN101964192A (en) * 2009-07-22 2011-02-02 索尼公司 Sound processing device, sound processing method, and program
KR20110127783A (en) * 2010-05-20 2011-11-28 충북대학교 산학협력단 Apparatus for separating voice and method for separating voice of single channel using the same
CN103106903A (en) * 2013-01-11 2013-05-15 太原科技大学 Single channel blind source separation method
CN103247295A (en) * 2008-05-29 2013-08-14 高通股份有限公司 Systems, methods, apparatus, and computer program products for spectral contrast enhancement
CN104408042A (en) * 2014-10-17 2015-03-11 广州三星通信技术研究有限公司 Method and device for displaying a text corresponding to voice of a dialogue in a terminal
CN104505099A (en) * 2014-12-08 2015-04-08 北京云知声信息技术有限公司 Method and equipment for removing known interference in voice signal
CN106448722A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 Sound recording method, device and system
CN106887238A (en) * 2017-03-01 2017-06-23 中国科学院上海微系统与信息技术研究所 A kind of acoustical signal blind separating method based on improvement Independent Vector Analysis algorithm
CN107093438A (en) * 2012-06-18 2017-08-25 谷歌公司 System and method for recording selective removal audio content from mixed audio

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998004100A1 (en) * 1996-07-19 1998-01-29 David Griesinger Multichannel active matrix sound reproduction with maximum lateral separation
JP3695324B2 (en) * 2000-12-12 2005-09-14 日本電気株式会社 Information system using TV broadcasting
US20070133811A1 (en) * 2005-12-08 2007-06-14 Kabushiki Kaisha Kobe Seiko Sho Sound source separation apparatus and sound source separation method
CN103247295A (en) * 2008-05-29 2013-08-14 高通股份有限公司 Systems, methods, apparatus, and computer program products for spectral contrast enhancement
CN101964192A (en) * 2009-07-22 2011-02-02 索尼公司 Sound processing device, sound processing method, and program
KR20110127783A (en) * 2010-05-20 2011-11-28 충북대학교 산학협력단 Apparatus for separating voice and method for separating voice of single channel using the same
CN107093438A (en) * 2012-06-18 2017-08-25 谷歌公司 System and method for recording selective removal audio content from mixed audio
CN103106903A (en) * 2013-01-11 2013-05-15 太原科技大学 Single channel blind source separation method
CN104408042A (en) * 2014-10-17 2015-03-11 广州三星通信技术研究有限公司 Method and device for displaying a text corresponding to voice of a dialogue in a terminal
CN104505099A (en) * 2014-12-08 2015-04-08 北京云知声信息技术有限公司 Method and equipment for removing known interference in voice signal
CN106448722A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 Sound recording method, device and system
CN106887238A (en) * 2017-03-01 2017-06-23 中国科学院上海微系统与信息技术研究所 A kind of acoustical signal blind separating method based on improvement Independent Vector Analysis algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SCOTT C. DOUGLAS,HIROSHI SAWADA,SHOJI MAKINO: "Natural Gradient Multichannel Blind Deconvolution and Speech Separation Using Causal FIR Filters", 《IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS IEEE》 *
张华,左健存,戴虹,桂林: "基于Givens-Hyperbolic双旋转的多路语音信号卷积盲分离", 《上海第二工业大学学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065023A (en) * 2018-08-23 2018-12-21 广州势必可赢网络科技有限公司 A kind of voice identification method, device, equipment and computer readable storage medium
CN109785855A (en) * 2019-01-31 2019-05-21 秒针信息技术有限公司 Method of speech processing and device, storage medium, processor
CN109785855B (en) * 2019-01-31 2022-01-28 秒针信息技术有限公司 Voice processing method and device, storage medium and processor
CN110689900A (en) * 2019-09-29 2020-01-14 北京地平线机器人技术研发有限公司 Signal enhancement method and device, computer readable storage medium and electronic equipment
CN111128212A (en) * 2019-12-09 2020-05-08 秒针信息技术有限公司 Mixed voice separation method and device
CN111145774A (en) * 2019-12-09 2020-05-12 秒针信息技术有限公司 Voice separation method and device
WO2023242841A1 (en) * 2022-06-13 2023-12-21 Orcam Technologies Ltd. Processing and utilizing audio signals

Also Published As

Publication number Publication date
CN108198570B (en) 2020-10-23

Similar Documents

Publication Publication Date Title
CN108198570A (en) The method and device of speech Separation during hearing
CN108922538B (en) Conference information recording method, conference information recording device, computer equipment and storage medium
CN103957359B (en) Camera head and focusing method thereof
Ortega-García et al. Overview of speech enhancement techniques for automatic speaker recognition
JP6462651B2 (en) Speech translation apparatus, speech translation method and program
DE60004331T2 (en) SPEAKER RECOGNITION
EP0825586A2 (en) Lexical tree pre-filtering in speech recognition
WO2002080139A3 (en) Method and apparatus for voice dictation and document production
JP4408490B2 (en) Method and apparatus for executing a database query
CN104778948B (en) A kind of anti-noise audio recognition method based on bending cepstrum feature
CN103730112A (en) Multi-channel voice simulation and acquisition method
JP5374427B2 (en) Sound source separation device, sound source separation method and program therefor, video camera device using the same, and mobile phone device with camera
CN105845143A (en) Speaker confirmation method and speaker confirmation system based on support vector machine
US6751588B1 (en) Method for performing microphone conversions in a speech recognition system
WO2000077772A2 (en) Speech and voice signal preprocessing
CN112017658A (en) Operation control system based on intelligent human-computer interaction
CN110718229A (en) Detection method for record playback attack and training method corresponding to detection model
CN113035225B (en) Visual voiceprint assisted voice separation method and device
JPS60158498A (en) Pattern collation system
CN113409804A (en) Multichannel frequency domain speech enhancement algorithm based on variable-span generalized subspace
US6212499B1 (en) Audible language recognition by successive vocabulary reduction
CN111028857B (en) Method and system for reducing noise of multichannel audio-video conference based on deep learning
CN109087651B (en) Voiceprint identification method, system and equipment based on video and spectrogram
CN210606618U (en) System for realizing voice and character recording
DE102010018877A1 (en) Method for voice-controlling of hearing aid i.e. behind-the-ear-hearing aid, involves interacting speech recognition and distinct voice detection, such that voice command spoken by wearer of hearing aid is used for voice-controlling aid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Room 101, 1st floor, building 1, Xisanqi building materials City, Haidian District, Beijing 100096

Patentee after: Yunzhisheng Intelligent Technology Co.,Ltd.

Address before: 12 / F, Guanjie building, building 1, No. 16, Taiyanggong Middle Road, Chaoyang District, Beijing

Patentee before: BEIJING UNISOUND INFORMATION TECHNOLOGY Co.,Ltd.