CN114337908B - Method and device for generating interference signal of target voice signal - Google Patents
Method and device for generating interference signal of target voice signal Download PDFInfo
- Publication number
- CN114337908B CN114337908B CN202210011028.XA CN202210011028A CN114337908B CN 114337908 B CN114337908 B CN 114337908B CN 202210011028 A CN202210011028 A CN 202210011028A CN 114337908 B CN114337908 B CN 114337908B
- Authority
- CN
- China
- Prior art keywords
- signal
- time domain
- voice
- frame
- inversion signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000012545 processing Methods 0.000 claims abstract description 88
- 238000001228 spectrum Methods 0.000 claims description 43
- 230000036961 partial effect Effects 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 15
- 230000003595 spectral effect Effects 0.000 claims description 11
- 238000001914 filtration Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 4
- 230000002452 interceptive effect Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 8
- 238000009432 framing Methods 0.000 abstract description 3
- 238000004891 communication Methods 0.000 description 9
- 230000000873 masking effect Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 230000006386 memory function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Landscapes
- Noise Elimination (AREA)
Abstract
The application discloses a method and a device for generating an interference signal of a target voice signal, wherein the method comprises the following steps: acquiring a target voice signal to be interfered; carrying out framing treatment on the target voice signal to obtain at least one voice frame; processing each voice frame, including performing first processing, second processing and/or third processing on the voice frame to obtain a frequency domain envelope inversion signal, a time domain inversion signal and/or a time domain envelope inversion signal; and determining the interference signal of the target voice signal according to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal and the preset weight coefficients respectively corresponding to the frequency domain envelope inversion signal and the time domain envelope inversion signal. According to the method and the device, the three frequency domain envelope inversion signals, the time domain inversion signals and the time domain envelope inversion signals related to the target voice signals are constructed, and according to the three constructed signals and the preset weight coefficients corresponding to the three constructed signals, the interference signals of the target voice signals are obtained, so that the interference effect of the interference signals on the target voice signals is further improved.
Description
Technical Field
The invention relates to the technical field of voice signal processing. And more particularly, to a method and apparatus for generating an interference signal of a target voice signal.
Background
With the rapid development of the mobile internet, recordable devices around us have shown explosive growth in both variety and number. The ubiquitous recordable equipment brings convenience to us, meanwhile, the problem of voice privacy leakage is gradually serious, and the national security and people life are threatened.
The main goal of voice privacy protection technology is to reduce the intelligibility of user voice signals picked up by recordable devices. Early research focused on how to use physical isolation to protect voice privacy. In recent years, technologies that actively interfere with potential eavesdropping devices have received more attention. The masking effect that speech intelligibility is subjected to under interference conditions falls into two broad categories, energy masking and information masking. The main means of energy masking is mainly to mask the speech with uncorrelated noise. With the development of speech enhancement technology, the energy masked speech is easily recovered. Unlike energy masking, the masking signals used for information masking are all changed from the target voice signal, and have stronger correlation with the target voice signal. However, it is currently difficult to ensure the real-time performance of an interference signal (i.e., a masking signal) of a target voice signal for information masking.
Disclosure of Invention
Because the existing method has the problems, the application provides an interference signal generation method and device of a target voice signal.
In a first aspect, the present application proposes an interference signal generating method for a target voice signal, including:
acquiring a target voice signal to be interfered;
carrying out frame division processing on the target voice signal to obtain at least one voice frame, wherein the frame length of each voice frame in the at least one voice frame is a random value;
processing each of the at least one speech frame, the processing comprising:
performing first processing on each voice frame in the at least one voice frame to obtain a frequency domain envelope inversion signal;
performing second processing on each voice frame in the at least one voice frame to obtain a time domain inversion signal; and/or
Performing third processing on each voice frame in the at least one voice frame to obtain a time domain envelope inversion signal;
and determining the interference signal of the target voice signal according to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal and preset weight coefficients respectively corresponding to the frequency domain envelope inversion signal and the time domain envelope inversion signal.
In one possible implementation, the first processing of each of the at least one speech frame to obtain a frequency domain envelope inversion signal includes:
performing Fourier transform on each voice frame in the at least one voice frame to obtain a frequency spectrum of each voice frame in the at least one voice frame;
determining a spectral envelope of each of the at least one speech frame from a spectrum of each speech frame;
determining a first fine structure corresponding to each voice frame according to the spectrum envelope of the voice frame;
according to the spectrum envelope of each voice frame, determining N times of polynomial and/or exponential function fitting curves of the spectrum envelope of each voice frame, wherein N is an integer greater than or equal to 1;
and determining the frequency domain envelope inversion signal according to the frequency spectrum envelope of each voice frame, the respective N-degree polynomial or exponential function fitting curve of the frequency spectrum envelope of each voice frame and the first fine structure.
In one possible implementation, the performing a second processing on each of the at least one speech frame to obtain a time domain inverted signal includes:
and carrying out time domain inversion on each voice frame in the at least one voice frame to obtain a time domain inversion signal.
In one possible implementation, the performing third processing on each of the at least one speech frame to obtain a time-domain envelope inversion signal includes:
carrying out frequency band division on each voice frame in the at least one voice frame to obtain partial target voice signals corresponding to each frequency band;
determining the time domain envelope of the target voice signals according to the partial target voice signals corresponding to the frequency bands;
determining a second fine structure corresponding to the time domain envelope of the partial target voice signal corresponding to each frequency band according to the time domain envelope;
and determining a time domain envelope inversion signal according to the time domain envelope of the part of the target voice signals corresponding to each frequency band and the second fine structure.
In one possible implementation, the determining the interference signal of the target speech signal according to the frequency domain envelope inversion signal, the time domain inversion signal, and/or the time domain envelope inversion signal and the preset weight coefficients corresponding to the frequency domain envelope inversion signal and the time domain envelope inversion signal, respectively, includes:
determining a weighting signal according to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal and preset weighting coefficients respectively corresponding to the frequency domain envelope inversion signal and the time domain envelope inversion signal;
and determining an interference signal of the target voice signal according to the target voice signal and the weighted signal.
In a possible implementation, the determining the weighted signal according to the frequency domain envelope inversion signal, the time domain inversion signal, and/or the time domain envelope inversion signal and the preset weight coefficients corresponding to the frequency domain envelope inversion signal and the time domain envelope inversion signal respectively includes:
multiplying the frequency domain envelope inversion signal, the time domain inversion signal and the time domain envelope inversion signal with preset weight coefficients respectively corresponding to the frequency domain envelope inversion signal and the time domain envelope inversion signal to obtain at least one multiplication result;
and accumulating the at least one multiplication result to obtain a weighted signal.
In one possible implementation, the determining the interference signal of the target speech signal according to the target speech signal and the weighted signal includes:
and carrying out low-pass filtering on the weighted signals by adopting a low-pass filter to obtain low-pass interference signals. In order to reduce the possibility that the target voice signal after being interfered by the generated interference signal is restored, the cut-off frequency of the low-pass filter is randomly set;
and replacing a part of low-pass interference signals corresponding to any frequency band with a part of target voice signals corresponding to the any frequency band to obtain interference signals of the target voice signals.
In a second aspect, the present application proposes an interference signal generating device for a target voice signal, including:
the receiving and transmitting unit is used for acquiring a target voice signal to be interfered;
the processing unit is used for carrying out frame division processing on the target voice signal to obtain at least one voice frame, and the frame length of each voice frame in the at least one voice frame is a random value;
the processing unit is configured to process each of the at least one speech frame, where the processing includes:
performing first processing on each voice frame in the at least one voice frame to obtain a frequency domain envelope inversion signal;
performing second processing on each voice frame in the at least one voice frame to obtain a time domain inversion signal; and/or
Performing third processing on each voice frame in the at least one voice frame to obtain a time domain envelope inversion signal;
the processing unit is used for determining the interference signal of the target voice signal according to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal and preset weight coefficients respectively corresponding to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal.
In a possible implementation, the processing unit is specifically configured to perform fourier transform on each of the at least one speech frame to obtain a frequency spectrum of each of the at least one speech frame; determining a spectral envelope of each of the at least one speech frame from a spectrum of each speech frame; determining a first fine structure corresponding to each voice frame according to the spectrum envelope of the voice frame; according to the spectrum envelope of each voice frame, determining N times of polynomial and/or exponential function fitting curves of the spectrum envelope of each voice frame, wherein N is an integer greater than or equal to 1; and determining the frequency domain envelope inversion signal according to the frequency spectrum envelope of each voice frame, the respective N-degree polynomial or exponential function fitting curve of the frequency spectrum envelope of each voice frame and the first fine structure.
In a possible implementation, the processing unit is specifically configured to perform time domain inversion on each of the at least one speech frame to obtain a time domain inverted signal.
In one possible implementation, the processing unit is specifically configured to divide a frequency band for each of the at least one speech frame to obtain a portion of the target speech signal corresponding to each frequency band; determining the time domain envelope of the target voice signals according to the partial target voice signals corresponding to the frequency bands; determining a second fine structure corresponding to the time domain envelope of the partial target voice signal corresponding to each frequency band according to the time domain envelope; and determining a time domain envelope inversion signal according to the time domain envelope of the part of the target voice signals corresponding to each frequency band and the second fine structure.
In a possible implementation, the processing unit is specifically configured to determine a weighted signal according to the frequency domain envelope inversion signal, the time domain inversion signal, and/or the time domain envelope inversion signal and preset weight coefficients corresponding to the frequency domain envelope inversion signal, the time domain inversion signal, and/or the time domain envelope inversion signal, respectively; and determining an interference signal of the target voice signal according to the target voice signal and the weighted signal.
In a possible implementation, the processing unit is specifically configured to multiply the frequency domain envelope inversion signal, the time domain inversion signal, and/or the time domain envelope inversion signal with preset weight coefficients corresponding to the frequency domain envelope inversion signal and the time domain envelope inversion signal, respectively, to obtain at least one multiplication result; and accumulating the at least one multiplication result to obtain a weighted signal.
In a possible implementation, the processing unit is specifically configured to perform low-pass filtering on the weighted signal to obtain a low-pass interference signal; and replacing a part of low-pass interference signals corresponding to any frequency band with a part of target voice signals corresponding to the any frequency band to obtain interference signals of the target voice signals.
In a third aspect, the present application also proposes an interfering signal generating device of a target speech signal, comprising at least one processor for executing a program stored in a memory, which when executed causes the device to perform the steps as in the first aspect and in various possible implementations.
In a fourth aspect, the present application also proposes a non-transitory computer readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the steps as in the first aspect and in various possible implementations.
According to the technical scheme, the three frequency domain envelope inversion signals, the time domain inversion signals and the time domain envelope inversion signals related to the target voice signals are constructed, and the three signals are dynamically weighted to obtain weighted signals, so that the interference effect of the weighted signals on the target voice signals is stronger than that of the three signals, and the interference effect on the target voice signals is improved. And the weighted signals are subjected to low-pass filtering to obtain low-pass interference signals, and partial low-pass interference signals corresponding to any frequency band are replaced by partial target voice signals corresponding to any frequency band to obtain interference signals of the target voice signals, so that the interference effect of the interference signals on the target voice signals is further improved. In addition, the generation process of the three signals is random, so that the possibility that the target voice signal is restored after being interfered by the interference signal is further reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of an interference signal generating method of a target voice signal according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a first process performed on each of at least one voice frame according to an embodiment of the present application;
fig. 3 is a schematic flow chart of a third process performed on each of at least one speech frame according to an embodiment of the present application;
fig. 4 is a schematic flow chart of determining an interference signal of a target speech signal according to a frequency domain envelope inversion signal, a time domain envelope inversion signal and preset weight coefficients corresponding to the frequency domain envelope inversion signal, the time domain envelope inversion signal and the time domain envelope inversion signal respectively provided in the embodiment of the present application;
fig. 5 is a schematic structural diagram of an interference signal generating device of a target voice signal according to an embodiment of the present application;
fig. 6 is another schematic structural diagram of an interference signal generating device for a target voice signal according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
It should be noted that, in this application, the term "and/or" is merely an association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. The terms first, second, third and the like in the description and in the claims of embodiments of the present application are used for distinguishing between different objects and not necessarily for describing a particular sequential order of objects. For example, the first process, the second process, the third process, and the like are for distinguishing between different processes, and are not for describing a specific order of the target object. In the embodiments of the present application, words such as "exemplary," "for example," or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary," "by way of example," or "such as" is not necessarily to be construed as advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion. In the description of the embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" means two or more.
In order to prevent voice privacy disclosure, for example, a person outside a conference room may hear the speaking content of the person inside the conference room, the embodiment of the application provides a method and a device for generating an interference signal of a target voice signal. The above method generates an interference signal of the target speech signal. The target voice signal is interfered by the interference signal, so that eavesdropping can be effectively prevented. For example, the interference signal is played outside the conference room, so that a person outside the conference room cannot accurately recognize the target voice signal, thereby achieving the purpose of preventing eavesdropping.
Fig. 1 is a flow chart of an interference signal generating method of a target voice signal provided in the present application, where the flow chart includes: S101-S106, specifically include:
s101, acquiring a target voice signal to be interfered.
In the embodiment of the application, a target voice signal to be interfered is obtained.
S102, carrying out framing processing on the target voice signal to obtain at least one voice frame.
In the embodiment of the application, framing processing is performed on the target voice signal to obtain at least one voice frame. In order to reduce the possibility that the target voice signal after being interfered by the generated interference signal is restored, the frame length of each voice frame in the at least one voice frame is a random value. For example, the target speech signal is subjected to frame division processing to obtain three speech frames, wherein the frame length of the first speech frame is 3 frames, the frame length of the second speech frame is 6 frames, and the frame length of the third speech frame is 8 frames.
S103, processing each voice frame in the at least one voice frame, wherein the processing comprises the following steps: performing first processing on each voice frame in the at least one voice frame to obtain a frequency domain envelope inversion signal; performing second processing on each voice frame in the at least one voice frame to obtain a time domain inversion signal; and/or performing third processing on each voice frame in the at least one voice frame to obtain a time domain envelope inversion signal.
In an embodiment of the present application, a first process is performed on each of at least one speech frame. The first processing procedure comprises S201-S205, and specifically comprises the following steps:
s201, performing Fourier transform on each voice frame in the at least one voice frame to obtain a frequency spectrum of each voice frame in the at least one voice frame.
S202, determining the spectrum envelope of each voice frame according to the spectrum of each voice frame in at least one voice frame;
s203, determining a first fine structure corresponding to each voice frame according to the spectrum envelope of the voice frame;
s204, according to the spectrum envelope of each voice frame, determining N times of polynomials and/or exponential function fitting curves of the spectrum envelope of each voice frame;
in this embodiment of the present application, a least square method may be used to determine, according to the spectral envelope of each speech frame, a polynomial and/or exponential function fitting curve of each of the respective N-th order of the spectral envelope of each speech frame, where N is an integer greater than or equal to 1.
S205, determining the frequency domain envelope inversion signal according to the spectrum envelope of each voice frame, the respective N times polynomial or exponential function fitting curve of the spectrum envelope of each voice frame and the first fine structure.
In the embodiment of the application, the spectrum envelope of each voice frame is inverted by taking one of the respective polynomial or exponential function fitting curves of degree N as the symmetry axis. And determining the frequency spectrum of each voice frame of the frequency domain envelope inversion signal according to the inverted frequency spectrum envelope and the first fine structure. And carrying out inverse Fourier transform on the frequency spectrum of each voice frame of the frequency domain envelope inversion signal to obtain the frequency domain envelope inversion signal. To this end, a first speech intelligibility gap signal is generated that is related to the target speech signal.
In the embodiment of the present application, to obtain the time domain inversion signal, a second processing needs to be performed on each of the at least one speech frame. In particular, the second processing may be time domain inversion of each of the at least one speech frame. To this end, a second speech intelligibility gap signal is generated that is related to the target speech signal.
In an embodiment of the present application, a third process is performed on each of the at least one speech frame, where the third process includes: S301-S304 specifically comprise:
s301, carrying out frequency band division on each voice frame in at least one voice frame to obtain partial target voice signals corresponding to each frequency band.
In the embodiment of the application, a band-pass filter is used to divide the frequency band of each voice frame in at least one voice frame, so as to obtain a part of target voice signals corresponding to each frequency band.
S302, determining the time domain envelope of the target voice signals according to the partial target voice signals corresponding to the frequency bands.
S303, determining a second fine structure corresponding to the time domain envelope of the partial target voice signal corresponding to each frequency band.
S304, determining a time domain envelope inversion signal according to the time domain envelope and the second fine structure of the part of the target voice signal corresponding to each frequency band.
In the embodiment of the application, the time domain envelope of the partial target voice signal corresponding to each frequency band is subjected to time reversal. And determining a time domain envelope inversion signal according to the inverted time domain envelope and the second fine structure. To this end, a third speech intelligibility gap signal is generated that is related to the target speech signal.
S104, determining the interference signal of the target voice signal according to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal and preset weight coefficients respectively corresponding to the frequency domain envelope inversion signal and the time domain envelope inversion signal.
In this embodiment of the present application, determining, according to the frequency domain envelope inversion signal, the time domain inversion signal, and/or the time domain envelope inversion signal and preset weight coefficients corresponding to the frequency domain envelope inversion signal, the interference signal of the target speech signal includes: S401-S402 specifically comprise:
s401, determining a weighting signal according to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal and preset weighting coefficients corresponding to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal respectively.
In the embodiment of the application, the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal are multiplied by preset weight coefficients respectively corresponding to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal to obtain at least one multiplication result. And accumulating at least one multiplication result to obtain a weighted signal.
S402, determining an interference signal of the target voice signal according to the target voice signal and the weighted signal.
In the embodiment of the application, since the interference effect of the weighted signal with the frequency of more than 4kHZ on the target voice signal is weak, the weighted signal is subjected to low-pass filtering by adopting a low-pass filter, so that a low-pass interference signal is obtained. The cut-off frequency of the low-pass filter is set randomly in order to reduce the possibility that the target voice signal after being interfered by the generated interference signal is restored. And then, replacing the part of low-pass interference signals corresponding to the optional frequency band with the part of target voice signals corresponding to the optional frequency band to obtain the interference signals of the target voice signals. The interference signal has strong correlation with the target voice signal, and the possibility that the target voice signal interfered by the interference signal is restored is very small, so that the target voice signal is prevented from being eavesdropped.
According to the method and the device, three frequency domain envelope inversion signals, time domain inversion signals and time domain envelope inversion signals related to the target voice signals are constructed, namely, a first voice intelligibility interference signal, a second voice intelligibility interference signal and a third voice intelligibility interference signal, and the three voice intelligibility interference signals are dynamically weighted to obtain weighted signals, and the interference effect of the weighted signals on the target voice signals is stronger than that of the three voice intelligibility interference signals, so that the interference effect on the target voice signals is improved. And the weighted signals are subjected to low-pass filtering to obtain low-pass interference signals, and partial low-pass interference signals corresponding to any frequency band are replaced by partial target voice signals corresponding to any frequency band to obtain interference signals of the target voice signals, so that the interference effect of the interference signals on the target voice signals is further improved. In addition, the generation process of the three voice intelligibility interference signals is random, so that the possibility that the target voice signals interfered by the interference signals are restored is further reduced.
Fig. 5 is a schematic structural diagram 500 of an interference signal generating device for a target voice signal provided in the present application, where the schematic structural diagram 500 includes: a transceiver unit 501 and a processing unit 502;
the transceiver 501 is configured to obtain a target voice signal to be interfered;
the processing unit 502 is configured to perform frame segmentation processing on the target voice signal to obtain at least one voice frame, where a frame length of each voice frame in the at least one voice frame is a random value;
the processing unit 502 is configured to process each of the at least one speech frame, where the processing includes:
performing first processing on each voice frame in the at least one voice frame to obtain a frequency domain envelope inversion signal;
performing second processing on each voice frame in the at least one voice frame to obtain a time domain inversion signal; and/or
Performing third processing on each voice frame in the at least one voice frame to obtain a time domain envelope inversion signal;
the processing unit 502 is configured to determine an interference signal of the target speech signal according to the frequency domain envelope inversion signal, the time domain inversion signal, and/or the time domain envelope inversion signal and preset weight coefficients corresponding to the frequency domain envelope inversion signal, the time domain inversion signal, and/or the time domain envelope inversion signal, respectively.
In a possible implementation, the processing unit 502 is specifically configured to perform fourier transform on each of the at least one speech frame to obtain a spectrum of each of the at least one speech frame; determining a spectral envelope of each of the at least one speech frame from a spectrum of each speech frame; determining a first fine structure corresponding to each voice frame according to the spectrum envelope of the voice frame; according to the spectrum envelope of each voice frame, determining N times of polynomial and/or exponential function fitting curves of the spectrum envelope of each voice frame, wherein N is an integer greater than or equal to 1; and determining the frequency domain envelope inversion signal according to the frequency spectrum envelope of each voice frame, the respective N-degree polynomial or exponential function fitting curve of the frequency spectrum envelope of each voice frame and the first fine structure.
In a possible implementation, the processing unit 502 is specifically configured to perform time domain inversion on each of the at least one speech frame to obtain a time domain inverted signal.
In a possible implementation, the processing unit 502 is specifically configured to divide a frequency band for each of the at least one speech frame to obtain a portion of the target speech signal corresponding to each frequency band; determining the time domain envelope of the target voice signals according to the partial target voice signals corresponding to the frequency bands; determining a second fine structure corresponding to the time domain envelope of the partial target voice signal corresponding to each frequency band according to the time domain envelope; and determining a time domain envelope inversion signal according to the time domain envelope of the part of the target voice signals corresponding to each frequency band and the second fine structure.
In a possible implementation, the processing unit 502 is specifically configured to determine a weighted signal according to the frequency domain envelope inversion signal, the time domain inversion signal, and/or the time domain envelope inversion signal and preset weight coefficients corresponding to the frequency domain envelope inversion signal, the time domain inversion signal, and/or the time domain envelope inversion signal, respectively; and determining an interference signal of the target voice signal according to the target voice signal and the weighted signal.
In a possible implementation, the processing unit 502 is specifically configured to multiply the frequency domain envelope inversion signal, the time domain inversion signal, and/or the time domain envelope inversion signal with preset weight coefficients corresponding to the frequency domain envelope inversion signal and the time domain envelope inversion signal, respectively, to obtain at least one multiplication result; and accumulating the at least one multiplication result to obtain a weighted signal.
In a possible implementation, the processing unit 502 is specifically configured to perform low-pass filtering on the weighted signal to obtain a low-pass interference signal; and replacing a part of low-pass interference signals corresponding to any frequency band with a part of target voice signals corresponding to the any frequency band to obtain interference signals of the target voice signals.
Fig. 6 is a schematic structural diagram 600 of an interference signal generating device for a target voice signal according to an embodiment of the present application. The apparatus 600 may be a system-on-chip. In the embodiment of the application, the chip system may be formed by a chip, and may also include a chip and other discrete devices. The apparatus 600 includes at least one processor 610 for implementing the methods provided by embodiments of the present application. The apparatus 600 may also include a communication interface 620. In the present embodiment, the communication interface 620 may be a transceiver, a circuit, a bus, a module, or other type of communication interface for communicating with other devices over a transmission medium.
Processor 610 may perform functions performed by processing unit 502 in apparatus 500; the communication interface 620 may be used to perform functions performed by the transceiving unit 501 in the apparatus 500.
When the apparatus 600 is configured to perform the above method, the communication interface 620 is configured to acquire a target voice signal to be interfered; the processor 610 is configured to perform frame segmentation processing on the target voice signal to obtain at least one voice frame, where a frame length of each voice frame in the at least one voice frame is a random value; processing each of the at least one speech frame, the processing comprising: performing first processing on each voice frame in the at least one voice frame to obtain a frequency domain envelope inversion signal; performing second processing on each voice frame in the at least one voice frame to obtain a time domain inversion signal; and/or performing third processing on each voice frame in the at least one voice frame to obtain a time domain envelope inversion signal; and determining the interference signal of the target voice signal according to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal and preset weight coefficients respectively corresponding to the frequency domain envelope inversion signal and the time domain envelope inversion signal.
The communication interface 620 is also used to perform other steps or operations in the above-described method embodiments in addition to the transceiver unit 501. The processor 610 may also be configured to perform other steps or operations in the above-described method embodiments other than the processing unit 502, which are not described in detail herein.
The apparatus 600 may also include at least one memory 630 for storing program instructions and/or data. The memory 630 is coupled to the processor 610. The coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units, or modules, which may be in electrical, mechanical, or other forms for information interaction between the devices, units, or modules. The processor 610 may operate in conjunction with the memory 630. Processor 610 may execute program instructions stored in memory 630. In one possible implementation, at least one of the at least one memory may be integrated with the processor. In another possible implementation, the memory 630 is located outside of the device 600.
The particular connection medium between communication interface 620, processor 610, and memory 630 is not limited in this embodiment. In the embodiment of the present application, the memory 630, the processor 610 and the communication interface 620 are connected by a bus 640 in fig. 6, where the bus is indicated by a thick line in fig. 6, and the connection manner between other components is only schematically illustrated, and is not limited thereto. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 6, but not only one bus or one type of bus.
The processor 610 may be one or more central processing units (Central Processing Unit, CPU) by way of example, and in the case where the processor 610 is a CPU, the CPU may be a single-core CPU or a multi-core CPU. The processor 410 may be a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, and may implement or perform the methods, steps and logic blocks disclosed in the embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.
By way of example, memory 630 may include, but is not limited to, nonvolatile Memory such as Hard Disk Drive (HDD) or Solid State Drive (SSD), random access Memory (Random Access Memory, RAM), erasable programmable Read-Only Memory (Erasable Programmable ROM, EPROM), read-Only Memory (ROM), or portable Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), among others. The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory in the embodiments of the present application may also be circuitry or any other device capable of implementing a memory function for storing program instructions and/or data. .
The embodiments of the present application provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring a target voice signal to be interfered;
carrying out frame division processing on the target voice signal to obtain at least one voice frame, wherein the frame length of each voice frame in the at least one voice frame is a random value;
processing each of the at least one speech frame, the processing comprising:
performing first processing on each voice frame in the at least one voice frame to obtain a frequency domain envelope inversion signal;
performing second processing on each voice frame in the at least one voice frame to obtain a time domain inversion signal; and/or
Performing third processing on each voice frame in the at least one voice frame to obtain a time domain envelope inversion signal;
and determining the interference signal of the target voice signal according to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal and preset weight coefficients respectively corresponding to the frequency domain envelope inversion signal and the time domain envelope inversion signal.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.
Claims (8)
1. A method for generating an interference signal of a target speech signal, comprising:
acquiring a target voice signal to be interfered;
carrying out frame division processing on the target voice signal to obtain at least one voice frame, wherein the frame length of each voice frame in the at least one voice frame is a random value;
processing each of the at least one speech frame, the processing comprising:
performing first processing on each voice frame in the at least one voice frame to obtain a frequency domain envelope inversion signal;
performing second processing on each voice frame in the at least one voice frame to obtain a time domain inversion signal; and/or
Performing third processing on each voice frame in the at least one voice frame to obtain a time domain envelope inversion signal;
determining an interference signal of the target voice signal according to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal and preset weight coefficients respectively corresponding to the frequency domain envelope inversion signal and the time domain envelope inversion signal;
wherein the performing a first process on each of the at least one speech frame to obtain a frequency domain envelope inversion signal includes:
performing Fourier transform on each voice frame in the at least one voice frame to obtain a frequency spectrum of each voice frame in the at least one voice frame;
determining a spectral envelope of each of the at least one speech frame from a spectrum of each speech frame;
determining a first fine structure corresponding to each voice frame according to the spectrum envelope of the voice frame;
according to the spectrum envelope of each voice frame, determining N times of polynomial and/or exponential function fitting curves of the spectrum envelope of each voice frame, wherein N is an integer greater than or equal to 1;
determining the frequency domain envelope inversion signal according to the spectral envelope of each voice frame, the respective polynomial or exponential function fitting curve of the spectral envelope of each voice frame and the first fine structure;
the third processing is performed on each voice frame in the at least one voice frame to obtain a time domain envelope inversion signal, including:
carrying out frequency band division on each voice frame in the at least one voice frame to obtain partial target voice signals corresponding to each frequency band;
determining the time domain envelope of the target voice signals according to the partial target voice signals corresponding to the frequency bands;
determining a second fine structure corresponding to the time domain envelope of the partial target voice signal corresponding to each frequency band according to the time domain envelope;
and determining a time domain envelope inversion signal according to the time domain envelope of the part of the target voice signals corresponding to each frequency band and the second fine structure.
2. The method of claim 1, wherein said second processing each of said at least one speech frame to obtain a time domain inverted signal comprises:
and carrying out time domain inversion on each voice frame in the at least one voice frame to obtain a time domain inversion signal.
3. The method according to claim 1, wherein the determining the interference signal of the target speech signal according to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal and the preset weight coefficients corresponding thereto, respectively, comprises:
determining a weighting signal according to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal and preset weighting coefficients respectively corresponding to the frequency domain envelope inversion signal and the time domain envelope inversion signal;
and determining an interference signal of the target voice signal according to the target voice signal and the weighted signal.
4. A method according to claim 3, wherein said determining a weighted signal from said frequency domain envelope inversion signal, said time domain inversion signal and/or said time domain envelope inversion signal and their respective corresponding preset weight coefficients comprises:
multiplying the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal with preset weight coefficients respectively corresponding to the frequency domain envelope inversion signal and the time domain envelope inversion signal to obtain at least one multiplication result;
and accumulating the at least one multiplication result to obtain a weighted signal.
5. A method according to claim 3, wherein said determining an interference signal of said target speech signal from said target speech signal and said weighted signal comprises:
carrying out low-pass filtering on the weighted signals to obtain low-pass interference signals;
and replacing a part of low-pass interference signals corresponding to any frequency band with a part of target voice signals corresponding to the any frequency band to obtain interference signals of the target voice signals.
6. An interference signal generating apparatus for a target speech signal, comprising:
the receiving and transmitting unit is used for acquiring a target voice signal to be interfered;
the processing unit is used for carrying out frame division processing on the target voice signal to obtain at least one voice frame, and the frame length of each voice frame in the at least one voice frame is a random value;
the processing unit is configured to process each of the at least one speech frame, and the processing includes:
performing first processing on each voice frame in the at least one voice frame to obtain a frequency domain envelope inversion signal;
performing second processing on each voice frame in the at least one voice frame to obtain a time domain inversion signal; and/or
Performing third processing on each voice frame in the at least one voice frame to obtain a time domain envelope inversion signal;
the processing unit is used for determining an interference signal of the target voice signal according to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal and preset weight coefficients respectively corresponding to the frequency domain envelope inversion signal, the time domain inversion signal and/or the time domain envelope inversion signal;
the processing unit is specifically configured to perform fourier transform on each of the at least one speech frame, so as to obtain a frequency spectrum of each of the at least one speech frame; determining a spectral envelope of each of the at least one speech frame from a spectrum of each speech frame; determining a first fine structure corresponding to each voice frame according to the spectrum envelope of the voice frame; according to the spectrum envelope of each voice frame, determining N times of polynomial and/or exponential function fitting curves of the spectrum envelope of each voice frame, wherein N is an integer greater than or equal to 1; determining the frequency domain envelope inversion signal according to the spectral envelope of each voice frame, the respective polynomial or exponential function fitting curve of the spectral envelope of each voice frame and the first fine structure;
the processing unit is specifically configured to divide a frequency band for each voice frame in the at least one voice frame, so as to obtain a part of target voice signals corresponding to each frequency band; determining the time domain envelope of the target voice signals according to the partial target voice signals corresponding to the frequency bands; determining a second fine structure corresponding to the time domain envelope of the partial target voice signal corresponding to each frequency band according to the time domain envelope; and determining a time domain envelope inversion signal according to the time domain envelope of the part of the target voice signals corresponding to each frequency band and the second fine structure.
7. An interfering signal generating device for a target speech signal, comprising at least one processor for executing a program stored in a memory, which when executed, causes the device to perform:
the method of any one of claims 1-5.
8. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the method according to any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210011028.XA CN114337908B (en) | 2022-01-05 | 2022-01-05 | Method and device for generating interference signal of target voice signal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210011028.XA CN114337908B (en) | 2022-01-05 | 2022-01-05 | Method and device for generating interference signal of target voice signal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114337908A CN114337908A (en) | 2022-04-12 |
CN114337908B true CN114337908B (en) | 2024-04-12 |
Family
ID=81025026
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210011028.XA Active CN114337908B (en) | 2022-01-05 | 2022-01-05 | Method and device for generating interference signal of target voice signal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114337908B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102110441A (en) * | 2010-12-22 | 2011-06-29 | 中国科学院声学研究所 | Method for generating sound masking signal based on time reversal |
CA3099805A1 (en) * | 2018-06-14 | 2019-12-19 | Pindrop Security, Inc. | Deep neural network based speech enhancement |
CN112863535A (en) * | 2021-01-05 | 2021-05-28 | 中国科学院声学研究所 | Residual echo and noise elimination method and device |
CN113030862A (en) * | 2021-03-12 | 2021-06-25 | 中国科学院声学研究所 | Multi-channel speech enhancement method and device |
-
2022
- 2022-01-05 CN CN202210011028.XA patent/CN114337908B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102110441A (en) * | 2010-12-22 | 2011-06-29 | 中国科学院声学研究所 | Method for generating sound masking signal based on time reversal |
CA3099805A1 (en) * | 2018-06-14 | 2019-12-19 | Pindrop Security, Inc. | Deep neural network based speech enhancement |
CN112863535A (en) * | 2021-01-05 | 2021-05-28 | 中国科学院声学研究所 | Residual echo and noise elimination method and device |
CN113030862A (en) * | 2021-03-12 | 2021-06-25 | 中国科学院声学研究所 | Multi-channel speech enhancement method and device |
Non-Patent Citations (3)
Title |
---|
Interference Reduction in Reverberant Speech Separation With Visual Voice Activity Detection;Qingju Liu等;《IEEE》;20141031;全文 * |
传声器阵列波束比判决语音增强方法;曹占中等;《声学学报》;20170731;全文 * |
基于麦克风阵列的语音增强与干扰抑制算法;王义圆;张曦文;周贻能;黄际彦;;电声技术;20180205(第02期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114337908A (en) | 2022-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111341336B (en) | Echo cancellation method, device, terminal equipment and medium | |
CN105472189B (en) | Echo cancellation detector, method of cancelling echoes and comparison generator | |
EP3682624B1 (en) | Methods and systems for operating a signal filter device | |
CN111951819A (en) | Echo cancellation method, device and storage medium | |
CN110782914B (en) | Signal processing method and device, terminal equipment and storage medium | |
CN106165015B (en) | Apparatus and method for facilitating watermarking-based echo management | |
ES2706512T3 (en) | Hiding frame errors | |
US20220084535A1 (en) | Reduced latency streaming dynamic noise suppression using convolutional neural networks | |
CN113539285A (en) | Audio signal noise reduction method, electronic device, and storage medium | |
WO2012100557A1 (en) | Bandwidth expansion method and apparatus | |
WO2022143522A1 (en) | Audio signal processing method and apparatus, and electronic device | |
CN114337908B (en) | Method and device for generating interference signal of target voice signal | |
JP2011527160A (en) | Dynamic filtering for adjacent channel interference suppression | |
CN112785998B (en) | Signal processing method, equipment and device | |
CN107967919A (en) | Eliminate the method, device and mobile terminal of TDD noises | |
US8543526B2 (en) | Systems and methods using neural networks to reduce noise in audio signals | |
US10021241B2 (en) | System, apparatus, and method for proximity detection | |
CN112532276A (en) | Narrow-band interference signal processing method and device and storage medium | |
CN117711434B (en) | Audio processing method and device, electronic equipment and computer readable storage medium | |
US11955132B2 (en) | Identifying method of sound watermark and sound watermark identifying apparatus | |
CN111585932B (en) | Dynamic narrowband interference avoidance method, device, storage medium and terminal suitable for broadband OFDM system | |
CN105761724B (en) | Voice frequency signal processing method and device | |
CN111193522B (en) | Signal receiving method, signal receiving device, storage medium and electronic equipment | |
CN116887129A (en) | Audio processing method, device, chip, module equipment and storage medium | |
CN118525332A (en) | Audio processing apparatus and method for suppressing noise |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |