CN108198570B

CN108198570B - Method and device for separating voice during interrogation

Info

Publication number: CN108198570B
Application number: CN201810106940.7A
Authority: CN
Inventors: 马金龙; 关海欣
Original assignee: Beijing Yunzhisheng Information Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2018-02-02
Filing date: 2018-02-02
Publication date: 2020-10-23
Anticipated expiration: 2038-02-02
Also published as: CN108198570A

Abstract

The invention provides a method and a device for separating voice during interrogation, wherein the method comprises the following steps: acquiring first voice data acquired by a first audio acquisition device and second voice data acquired by a second audio acquisition device, wherein the first audio acquisition device is a device pointing to an auditor, and the second audio acquisition device is a device pointing to a polled person; filtering the first voice data, and determining the trial voice data corresponding to the trial persons; and taking the trial voice data as a reference signal, removing the trial voice data in the second voice data, and determining the trial voice data in the second voice data. The method effectively reduces the interference of the trial messenger channel through two groups of voice data, thereby realizing the correct separation of the talking signals of the trial messenger and the auditor, and then correctly identifying the voices of the trial messenger and the auditor by utilizing voice identification, thereby automatically generating the trial notes, further improving the trial efficiency and saving the labor cost.

Description

Method and device for separating voice during interrogation

Technical Field

The invention relates to the technical field of voice separation, in particular to a method and a device for voice separation during interrogation.

Background

At present, inquiries (such as criminal inquiries) of judicial scenes are generally generated in a stroke form, efficiency is low, manual operation is needed, and waste of manpower and material resources is caused. Meanwhile, due to inherent limitations of interrogation scenes and the like, signals acquired by a microphone often contain a plurality of speaking targets, and the acquired voice signals are directly identified, so that the speaking targets cannot be effectively distinguished; moreover, the speaking voice of the auditor is often too small or even far away from the speaking voice of the auditor, so most of the audition processes at present adopt a manual recording mode.

Disclosure of Invention

The invention provides a method and a device for separating voice during interrogation, which are used for solving the defect of low efficiency of recording an interrogation note in the conventional manual mode.

The embodiment of the invention provides a method for separating voice during interrogation, which comprises the following steps:

acquiring first voice data acquired by a first audio acquisition device and second voice data acquired by a second audio acquisition device, wherein the first audio acquisition device is a device pointing to an auditor, and the second audio acquisition device is a device pointing to a polled person;

filtering the first voice data, and determining the audition voice data corresponding to the audition person;

and taking the audition voice data as a reference signal, removing the audition voice data in the second voice data, and determining the audited voice data in the second voice data.

In one possible implementation, the method further includes:

and identifying the trial voice data and the trial voice data respectively, and determining corresponding trial texts and trial texts.

In one possible implementation, the determining the corresponding trial text and the polled text comprises:

determining time stamps of the trial voice data and the trial voice data, and adding corresponding time stamps to the trial text and the trial text respectively according to the time stamps, wherein the time stamps comprise a start time stamp and an end time stamp;

and determining the overlapping part of the trial text and the text to be trial according to the time stamp, and highlighting the text corresponding to the overlapping part.

In a possible implementation manner, the removing the trial voice data from the second voice data by using the trial voice data as a reference signal includes:

determining signal time delay according to the distance between the first audio acquisition device and the trial messenger and the distance between the second audio acquisition device and the trial messenger;

and carrying out time delay processing on the trial voice data according to the signal time delay, taking the trial voice data after the time delay processing as a reference signal, and removing the trial voice data in the second voice data.

preprocessing the trial voice data and the second voice data respectively, and determining a first voice matrix G1 corresponding to the trial voice data and a second voice matrix G2 corresponding to the second voice data;

determining a third voice matrix G3 according to the first voice matrix G1 and the second voice matrix G2, wherein G3 is G2-lambda G1, and lambda is a weight coefficient;

performing dimensionality reduction processing on the third voice matrix G3, converting the third voice matrix G3 into a discrete voice array Xs, reducing the discrete voice array Xs into continuous voice data according to a preset sampling period, and taking the reduced voice data as audited voice data;

wherein the preprocessing process comprises the following steps:

performing discrete sampling processing on the voice data according to the preset sampling period to obtain a discrete voice array X represented in an array form, wherein the voice data is the trial voice data or the second voice data; x ═ X₁,x₂,…x_j,…,x_n]Wherein x is_jRepresenting the sampling value of the voice data corresponding to the jth sampling point, wherein n is the total number of samples;

performing row expansion by taking the discrete voice array X as a row, and determining an M multiplied by n discrete voice matrix M after row expansion;wherein m is an odd number,

and m is_i,jValue of and

m is a negative correlation between them_i,jThe element representing the ith row and the jth column in the discrete speech matrix M;

determining a k × k reference matrix H; k is an odd number greater than 1, and the element of the ith row and the jth column of the reference matrix H is:

wherein σ is an adjustment coefficient;

performing difference reduction processing on the discrete voice matrix M according to the reference matrix H, determining a voice matrix G after difference reduction processing, and determining an element G of the xth row and the yth column of the voice matrix G_x,yComprises the following steps:

when the voice data is the interrogation voice data, the determined voice matrix G is a first voice matrix G1; and when the voice data is the second voice data, the determined voice matrix G is a second voice matrix G2.

Based on the same inventive concept, the embodiment of the present invention further provides a device for separating speech during interrogation, including:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring first voice data acquired by a first audio acquisition device and second voice data acquired by a second audio acquisition device, the first audio acquisition device is a device pointing to an auditor, and the second audio acquisition device is a device pointing to a polled person;

the first processing module is used for carrying out filtering processing on the first voice data and determining the audition voice data corresponding to the audition person;

and the second processing module is used for removing the trial voice data in the second voice data by taking the trial voice data as a reference signal and determining the trial voice data in the second voice data.

In one possible implementation, the apparatus further includes:

and the recognition module is used for respectively recognizing the trial speech data and determining corresponding trial texts and the corresponding trial texts.

In one possible implementation, the identification module is further configured to:

In one possible implementation manner, the second processing module is configured to:

wherein the preprocessing process comprises the following steps:

performing row expansion by taking the discrete voice array X as a row, and determining an M multiplied by n discrete voice matrix M after row expansion; wherein m is an odd number,

and m is_i,jValue of and

wherein σ is an adjustment coefficient;

The embodiment of the invention provides a method and a device for separating voice during interrogation, which are characterized in that two groups of voice data are respectively obtained based on two set audio acquisition devices, and one group of voice data is taken as a reference signal to eliminate the other group of voice data, so that voice separation is realized. The interference of the trial messenger channel is effectively reduced through the two groups of voice data, so that the correct separation of the talking signals of the trial messenger and the auditioned is realized, the voices of the trial messenger and the audited messenger can be correctly identified by utilizing voice identification, the trial notes can be automatically generated, the trial efficiency is improved, and the labor cost is saved. By highlighting the text corresponding to the overlapped part, the user can be quickly positioned at the position where the voice recognition is possibly wrong, and the user can conveniently and quickly check and confirm whether the recognized text is accurate or not. The trial voice data in the second voice data can be removed more accurately through the delay processing, so that more accurate trial voice data can be obtained. The voice data expressed in the array form is expanded into a matrix form, so that the voice data can be conveniently processed; and after determining a third voice matrix corresponding to the audited voice data, performing dimension reduction processing to obtain the audited voice data. The mode removes trial voice data in the second voice data with higher dimensionality, and can effectively reduce distortion caused by removing the second voice data.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of a method for speech separation during interrogation according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating the determination of overlapping text in an embodiment of the present invention;

FIG. 3 is a first block diagram of an apparatus for speech separation during interrogation in accordance with an embodiment of the present invention;

FIG. 4 is a first block diagram of an apparatus for speech separation during interrogation according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

The voice separation method during interrogation provided by the embodiment of the invention realizes the voice separation effect based on two audio acquisition devices (such as microphones and the like). Specifically, referring to fig. 1, the method includes steps 101-103:

step 101: the method comprises the steps of acquiring first voice data acquired by a first audio acquisition device and second voice data acquired by a second audio acquisition device, wherein the first audio acquisition device is a device pointing to an auditor, and the second audio acquisition device is a device pointing to a polled person.

In the embodiment of the invention, two audio acquisition devices, namely a first audio acquisition device and a second audio acquisition device, are respectively arranged. The first audio acquisition device points to the auditor and is used for acquiring the voice of the auditor; because the first audio acquisition device can be close to the auditor and far away from the auditor, the data acquired by the first audio acquisition device are basically the voice data of the auditor.

The second audio acquisition device points to the auditor, and considering safety factors, the second audio acquisition device cannot be too close to the auditor, even the distance between the second audio acquisition device and the auditor still needs to be smaller than the distance between the second audio acquisition device and the auditor, so that although the second audio acquisition device points to the auditor, the second audio acquisition device still can acquire comprehensive voice data of the auditor and the auditor. It should be noted that, in the embodiment of the present invention, the fact that the first audio capture device points at the auditor means that a voice capture direction of the first audio capture device (e.g., a microphone) faces the auditor, and a distance from the first audio capture device to the auditor is smaller than a distance from the second audio capture device to the auditor.

Step 102: and carrying out filtering processing on the first voice data, and determining the trial voice data corresponding to the trial person.

In the embodiment of the present invention, as described above, since the first audio acquisition device can be close to the auditor and far from the auditor, the first voice data acquired by the first audio acquisition device is basically the voice data of the auditor, and at this time, the auditor voice data of the auditor can be determined by simply performing filtering and difference reduction processing on the first voice data.

Step 103: and taking the trial voice data as a reference signal, removing the trial voice data in the second voice data, and determining the trial voice data in the second voice data.

The second voice data collected by the second audio collecting device comprises voice data of the auditor and the audited person, and at the moment, voice separation is needed to be carried out on the second voice data. In the embodiment of the invention, the auditor voice data acquired by the first audio acquisition device is used as the reference signal, so that the voice part of the auditor in the second voice data can be removed, and the auditor voice data in the second voice data can be determined.

The method for separating the voice during interrogation provided by the embodiment of the invention is characterized in that two groups of voice data are respectively obtained based on the two arranged audio acquisition devices, and one group of voice data is taken as a reference signal to eliminate the other group of voice data, thereby realizing voice separation. The interference of the trial messenger channel is effectively reduced through the two groups of voice data, so that the correct separation of the talking signals of the trial messenger and the auditioned is realized, the voices of the trial messenger and the audited messenger can be correctly identified by utilizing voice identification, the trial notes can be automatically generated, the trial efficiency is improved, and the labor cost is saved.

In one possible implementation, the method further includes: and identifying the trial speech data and the trial speech data respectively, and determining corresponding trial texts and the corresponding trial texts.

In the embodiment of the invention, the trial voice data only contains the voice of the trial messenger, and the corresponding trial text can be determined after the voice recognition is carried out on the trial voice data; similarly, the audited speech data only contains the speech of the auditor, and the corresponding audited text can be determined after the speech recognition is carried out on the auditor speech data. And then, the interrogation record can be generated through the interrogation text and the interrogated text.

In a possible implementation manner, since there is a case that the auditor and the polled person speak simultaneously or one of the two parties interrupts the other party to speak, and the text content may have a problem, in an embodiment of the present invention, the determining the corresponding audition text and the text to be audited includes steps a1-a 2:

step A1: and determining time stamps of the trial voice data and the trial voice data, and adding corresponding time stamps to the trial text and the trial text respectively according to the time stamps.

Step A2: and determining the overlapping part of the trial text and the trial text according to the time stamp, and highlighting the text corresponding to the overlapping part.

In the embodiment of the invention, the time stamps of two texts are determined according to the time stamp of voice data, the time stamp in the text is used for indicating the time of the section, the sentence or a word in the text, the whole text is divided by the start time stamp and the end time stamp, the text exists between the start time stamp and the end time stamp, and the time stamps from the end time stamp to the start time stamp are blank; meanwhile, the text between the start timestamp and the end timestamp may also be provided with a common timestamp, and the common timestamp is only used for indicating the time point corresponding to the text. That is, it can be determined from the time stamp of the text, which time period the text in the trial text or the text being polled was collected. When the overlap part of the trial text and the audited text is determined according to the time stamp, the trial text and the audited text are acquired in the overlapped time period, at this time, the audited voice data obtained by taking the trial voice data as a reference signal may have larger interference, and the corresponding text is highlighted so that a user can check whether the voice recognition result is accurate.

As shown in fig. 2, a rectangular wave indicates the presence or absence of text, and when the waveform is 1, it indicates the presence of text, and when the waveform is 0, it indicates the absence of text. Specifically, for trial texts, T1, T3, and T5 are start timestamps, T2, T4, and T6 are end timestamps, and one text segment in the trial text exists between T1 and T2, between T3 and T4, and between T5 and T6, respectively. Similarly, for the text to be polled, T7 and T9 are start timestamps, T8 and T10 are end timestamps, and a text segment in the text to be polled exists between T7 and T8 and between T9 and T10 respectively. At this time, as shown in fig. 2, the overlapping part of the trial text and the polled text is the part represented by segments T9-T4, that is, the trial participant starts speaking at time T3, and before the trial participant finishes speaking (the trial participant finishes speaking at time T4), the polled participant inserts a dialog at time T9, so that the trial text and the polled text are simultaneously collected in the time period T9-T4, and at this time, the text corresponding to the time period T9-T4 needs to be highlighted (for example, highlighted). Optionally, the trial text may be considered correct because the trial voice data is less likely to be disturbed by the trial participants, and only the overlapping portion of the trial text may need to be highlighted.

Another embodiment of the present invention provides a method for separating speech during interrogation, which includes

steps

101 and 103 in the above embodiments, and the implementation principle and technical effect thereof are as shown in the corresponding embodiment of fig. 1. Meanwhile, in the embodiment of the present invention, the removing the trial voice data from the second voice data by using the trial voice data as the reference signal in step 103 includes steps B1-B2:

step B1: and determining the signal time delay according to the distance between the first audio acquisition device and the trial man and the distance between the second audio acquisition device and the trial man.

Step B2: and carrying out time delay processing on the trial voice data according to the signal time delay, taking the trial voice data after the time delay processing as a reference signal, and removing the trial voice data in the second voice data.

Because the distance between the two audio acquisition devices and the auditor is different, the voices of the auditor acquired by the two audio acquisition devices actually have time delay. In the embodiment of the invention, the signal delay is determined according to the distance between the first audio acquisition device and the auditor and the distance between the second audio acquisition device and the auditor, and the signal delay can be determined by utilizing the existing delay estimation algorithm. After the trial voice data is obtained, time delay processing is carried out on the trial voice data according to the signal time delay, so that the time delay between the signals collected by the two audio collecting devices is eliminated, the trial voice data in the second voice data is removed more accurately, and more accurate trial voice data can be obtained.

On the basis of the above embodiment, the step 103 of "using the trial voice data as the reference signal and removing the trial voice data from the second voice data" includes the steps C1-C3:

step C1: and respectively preprocessing the trial voice data and the second voice data, and determining a first voice matrix G1 corresponding to the trial voice data and a second voice matrix G2 corresponding to the second voice data.

Step C2: a third speech matrix G3 is determined from the first speech matrix G1 and the second speech matrix G2, G3 ═ G2- λ G1, λ is a weighting factor.

Step C3: and performing dimensionality reduction on the third voice matrix G3, converting the third voice matrix G3 into a discrete voice array Xs, restoring the discrete voice array Xs into continuous voice data according to a preset sampling period, and taking the restored voice data as audition voice data.

Wherein the pretreatment process in the step C1 specifically comprises the steps C11-C14:

step C11: performing voice data according to a preset sampling periodDiscrete sampling processing is carried out, a discrete voice array X represented in an array form is obtained, and voice data are trial voice data or second voice data; x ═ X₁,x₂,…x_j,…,x_n]Wherein x is_jAnd n is the total number of samples.

In the embodiment of the invention, the voice data (trial voice data or second voice data) is continuous audio data, and discrete sampling is carried out on the continuous audio data to determine the voice characteristics contained in the continuous audio data; the voice data is recorded by the discrete voice array expressed in the array form, so that the subsequent processing is facilitated. The preset sampling period is the same as the preset sampling period in step C3, that is, for the voice data, n sampling values are obtained after sampling with the preset sampling period.

Step C12: performing row expansion by taking the discrete voice array X as a row, and determining an M multiplied by n discrete voice matrix M after row expansion; wherein m is an odd number,

and m is_i,jValue of and

m is a negative correlation between them_i,jRepresenting the elements in the ith row and jth column of the discrete speech matrix M.

In the embodiment of the present invention, the discrete speech array X may also be regarded as a1 × n matrix, and the discrete speech array X is expanded into an M × n discrete speech matrix M by row expansion. Wherein, namely

For other m_i,j，m_i,jValue of and

are in a negative correlation relationship, i.e. m_i,jCorresponding elements are from the element of the middle row

The farther away, m_i,jThe smaller the value of (c).

Alternatively to this, the first and second parts may,

or

Wherein the content of the first and second substances,

m may also be determined in other ways_i,jThis embodiment is not limited to this.

Step C13: determining a k × k reference matrix H; k is an odd number greater than 1, and the element in the ith row and the jth column of the reference matrix H is:

where σ is an adjustment coefficient.

Because errors are introduced after the originally collected voice data are expanded into the discrete voice matrix M, the errors need to be reduced at this time so as to reduce or even eliminate the introduced errors. In the embodiment of the invention, the error in the discrete speech matrix M is removed by establishing the k-order reference matrix H. The larger the adjustment coefficient sigma is, the more obvious the degradation effect is, but the more easily the discrete speech matrix M is distorted; therefore, the adjustment coefficient σ is generally selected according to the actual situation, for example, σ is 0.8.

Step C14: performing difference reduction processing on the discrete voice matrix M according to the reference matrix H, determining a voice matrix G after the difference reduction processing, and determining an element G of the x-th row and the y-th column of the voice matrix G_x,yComprises the following steps:

when the voice data is trial voice data, the determined voice matrix G is a first voice matrix G1; when the voice data is the second voice data, the determined voice matrix G is the second voice matrix G2.

Examples of the inventionIn m when_x,yWhen the difference reduction processing can be carried out (i.e. the

And is

) Performing difference reduction processing on the discrete speech matrix M according to the reference matrix H; for elements M outside the discrete speech matrix M_x,y(

Or

) Although the subtraction processing is not performed, since the original discrete speech array X is located in the middle row of the discrete speech matrix M, the influence of the peripheral elements on the discrete speech array X can be ignored even if the subtraction processing is not performed.

After the first speech matrix G1 and the second speech matrix G2 are obtained, the audited speech data can be determined according to the above steps C2 and C3. Wherein, the weight coefficient lambda is a real number between 0 and 1, namely lambda is maximum 1; in particular, λ may be determined according to a distance between two audio acquisition devices, and the larger the distance between the two audio acquisition devices, the smaller λ is. Meanwhile, the third speech matrix G3 is converted into the discrete speech array Xs, specifically, the element in the middle row of G3 may be directly used as the discrete speech array Xs, and an element in the relative position of the discrete speech array Xs may also be determined according to all elements in a column of the third speech matrix G3.

In the embodiment of the invention, firstly, the voice data is subjected to dimension-increasing processing, and the voice data expressed in an array form is expanded into a matrix form, so that the voice data is conveniently processed; and after determining a third voice matrix corresponding to the audited voice data, performing dimension reduction processing to obtain the audited voice data. The mode removes trial voice data in the second voice data with higher dimensionality, and can effectively reduce distortion caused by removing the second voice data.

The method for separating the voice during interrogation provided by the embodiment of the invention is characterized in that two groups of voice data are respectively obtained based on two set audio acquisition devices, and one group of voice data is taken as a reference signal to eliminate the other group of voice data, so that the voice separation is realized. The interference of the trial messenger channel is effectively reduced through the two groups of voice data, so that the correct separation of the talking signals of the trial messenger and the auditioned is realized, the voices of the trial messenger and the audited messenger can be correctly identified by utilizing voice identification, the trial notes can be automatically generated, the trial efficiency is improved, and the labor cost is saved. By highlighting the text corresponding to the overlapped part, the user can be quickly positioned at the position where the voice recognition is possibly wrong, and the user can conveniently and quickly check and confirm whether the recognized text is accurate or not. The trial voice data in the second voice data can be removed more accurately through the delay processing, so that more accurate trial voice data can be obtained. The voice data expressed in the array form is expanded into a matrix form, so that the voice data can be conveniently processed; and after determining a third voice matrix corresponding to the audited voice data, performing dimension reduction processing to obtain the audited voice data. The mode removes trial voice data in the second voice data with higher dimensionality, and can effectively reduce distortion caused by removing the second voice data.

The above describes in detail the method flow of speech separation during interrogation, which can also be implemented by a corresponding device, and the structure and function of the device are described in detail below.

An apparatus for separating speech during interrogation according to an embodiment of the present invention is shown in fig. 3, and includes:

the acquisition module 31 is used for acquiring first voice data acquired by a first audio acquisition device and second voice data acquired by a second audio acquisition device, wherein the first audio acquisition device is a device pointing to an auditor, and the second audio acquisition device is a device pointing to a person to be audited;

the first processing module 32 is configured to perform filtering processing on the first voice data, and determine interrogation voice data corresponding to an interrogation person;

the second processing module 33 is configured to remove the trial voice data from the second voice data by using the trial voice data as a reference signal, and determine the trial voice data from the second voice data.

In one possible implementation, referring to fig. 4, the apparatus further includes:

and the recognition module 34 is used for respectively recognizing the trial speech data and determining the corresponding trial text and the corresponding trial text.

In one possible implementation, the identification module 34 is further configured to:

and determining the overlapping part of the trial text and the trial text according to the time stamp, and highlighting the text corresponding to the overlapping part.

In a possible implementation manner, the second processing module 33 is specifically configured to:

determining signal time delay according to the distance between the first audio acquisition device and the auditor and the distance between the second audio acquisition device and the auditor;

performing dimensionality reduction on the third voice matrix G3, converting the third voice matrix G3 into a discrete voice array Xs, reducing the discrete voice array Xs into continuous voice data according to a preset sampling period, and taking the reduced voice data as audition voice data;

wherein, the pretreatment process comprises the following steps:

performing discrete sampling processing on the voice data according to a preset sampling period to obtain a discrete voice array X represented in an array form, wherein the voice data is trial voice data or second voice data; x ═ X₁,x₂,…x_j,…,x_n]Wherein x is_jRepresenting the sampling value of the voice data corresponding to the jth sampling point, wherein n is the total number of samples;

and m is_i,jValue of and

m is a negative correlation between them_i,_jThe element representing the ith row and the jth column in the discrete speech matrix M;

determining a k × k reference matrix H; k is an odd number greater than 1, and the element in the ith row and the jth column of the reference matrix H is:

wherein σ is an adjustment coefficient;

performing difference reduction processing on the discrete voice matrix M according to the reference matrix H, determining a voice matrix G after the difference reduction processing, and determining an element G of the x-th row and the y-th column of the voice matrix G_x,yComprises the following steps:

The device for separating the voice during interrogation provided by the embodiment of the invention is characterized in that two groups of voice data are respectively obtained based on the two set audio acquisition devices, and one group of voice data is taken as a reference signal to eliminate the other group of voice data, so that the voice separation is realized. The interference of the trial messenger channel is effectively reduced through the two groups of voice data, so that the correct separation of the talking signals of the trial messenger and the auditioned is realized, the voices of the trial messenger and the audited messenger can be correctly identified by utilizing voice identification, the trial notes can be automatically generated, the trial efficiency is improved, and the labor cost is saved. By highlighting the text corresponding to the overlapped part, the user can be quickly positioned at the position where the voice recognition is possibly wrong, and the user can conveniently and quickly check and confirm whether the recognized text is accurate or not. The trial voice data in the second voice data can be removed more accurately through the delay processing, so that more accurate trial voice data can be obtained. The voice data expressed in the array form is expanded into a matrix form, so that the voice data can be conveniently processed; and after determining a third voice matrix corresponding to the audited voice data, performing dimension reduction processing to obtain the audited voice data. The mode removes trial voice data in the second voice data with higher dimensionality, and can effectively reduce distortion caused by removing the second voice data.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for speech separation during interrogation, comprising:

taking the audition voice data as a reference signal, removing the audition voice data in the second voice data, and determining the audited voice data in the second voice data;

the removing the trial voice data in the second voice data by using the trial voice data as a reference signal includes:

wherein the preprocessing process comprises the following steps:

and m is_i,jValue of and

wherein σ is an adjustment coefficient;

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein determining the corresponding trial text and the polled text comprises:

4. The method of claim 1, wherein removing the trial voice data from the second voice data using the trial voice data as a reference signal comprises:

5. An apparatus for audio separation during interrogation, comprising:

the second processing module is used for removing the trial voice data in the second voice data by taking the trial voice data as a reference signal and determining the trial voice data in the second voice data;

the second processing module is configured to:

wherein the preprocessing process comprises the following steps:

and m is_i,jValue of and

wherein σ is an adjustment coefficient;

6. The apparatus of claim 5, further comprising:

7. The apparatus of claim 6, wherein the identification module is further configured to:

8. The apparatus of claim 5, wherein the second processing module is configured to: