CN117153186A - Sound signal processing method, device, electronic equipment and storage medium - Google Patents

Sound signal processing method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117153186A
CN117153186A CN202210944168.2A CN202210944168A CN117153186A CN 117153186 A CN117153186 A CN 117153186A CN 202210944168 A CN202210944168 A CN 202210944168A CN 117153186 A CN117153186 A CN 117153186A
Authority
CN
China
Prior art keywords
sound
sound source
candidate
signal
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210944168.2A
Other languages
Chinese (zh)
Inventor
陈俊彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen TCL New Technology Co Ltd
Original Assignee
Shenzhen TCL New Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen TCL New Technology Co Ltd filed Critical Shenzhen TCL New Technology Co Ltd
Priority to CN202210944168.2A priority Critical patent/CN117153186A/en
Priority to PCT/CN2023/092372 priority patent/WO2024027246A1/en
Publication of CN117153186A publication Critical patent/CN117153186A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a sound signal processing method, a device, electronic equipment and a storage medium, wherein a signal quality evaluation mechanism is added in signal separation, sound source separation processing is carried out on sound source data to be processed, and a candidate sound source corresponding to the sound source data to be processed and sound signals belonging to each candidate sound source in the sound source data to be processed are obtained; performing quality evaluation on the sound signals of each candidate sound source, and determining an evaluation value of the sound signals of each candidate sound source; determining a target sound source from a plurality of candidate target sound sources according to the evaluation value of the sound signal of each candidate sound source; and processing the sound signals of the target sound sources, so that the quality evaluation is carried out on the sound signals of each candidate sound source to obtain evaluation values of each candidate sound source, the final target sound source is determined, the accuracy of the separated sound source signals is improved, and the problem of low stability of signal separation is solved.

Description

Sound signal processing method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of signal processing technologies, and in particular, to a method and apparatus for processing a sound signal, an electronic device, and a storage medium.
Background
When the electronic device collects sound signals through a plurality of microphone channels, ambient noise, related source signals of interference, reflected signals in the environment and other interference signals belonging to non-sound sources often exist in the collected signals, and due to various complexity characterization of the signals in the transmission process, the interference signals and the sound source signals are mixed together, so that the sound source signals are difficult to extract. Blind source separation is an effective solution to this problem and its aim is to extract the source signal from the complex mixed signal.
Although the existing blind source separation method can separate sound source signals from complex mixed sound signals, as the existing blind source separation method can separate sound source signals from complex mixed sound signals, whether the separated sound source signals are effective or not and whether the quality meets the requirements or not cannot be distinguished, the accuracy of the separated sound source signals is not high, and the stability of blind source separation is not high.
Disclosure of Invention
The embodiment of the invention provides a sound signal processing method, a sound signal processing device, electronic equipment and a storage medium, which can improve the stability of signal separation.
The embodiment of the invention provides a sound signal processing method, which comprises the following steps:
performing sound source separation processing on sound source data to be processed to obtain candidate sound sources corresponding to the sound source data to be processed and sound signals belonging to the candidate sound sources in the sound source data to be processed;
performing quality evaluation on the sound signals of each candidate sound source, and determining an evaluation value of the sound signals of each candidate sound source;
determining a target sound source from a plurality of candidate target sound sources according to the evaluation value of the sound signal of each candidate sound source;
and processing the sound signal of the target sound source.
Correspondingly, the embodiment of the invention also provides a sound signal processing device, which comprises:
the separation module is used for carrying out sound source separation processing on the sound source data to be processed to obtain candidate sound sources corresponding to the sound source data to be processed and sound signals belonging to the candidate sound sources in the sound source data to be processed;
the evaluation module is used for carrying out quality evaluation on the sound signals of each candidate sound source and determining an evaluation value of the sound signals of each candidate sound source;
the selecting module is used for determining and obtaining a target sound source from a plurality of candidate target sound sources according to the evaluation value corresponding to the sound signal of each candidate sound source;
And the processing module is used for processing the sound signal of the target sound source.
Correspondingly, the embodiment of the invention also provides electronic equipment, which comprises:
a memory and a processor; the memory stores a computer program, and the processor is configured to execute the computer program in the memory to perform operations in the sound signal processing method.
In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute the steps in the sound signal processing method.
According to the embodiment of the invention, a signal quality evaluation mechanism is added in signal separation, and sound source separation processing is carried out on sound source data to be processed, so that candidate sound sources corresponding to the sound source data to be processed and sound signals belonging to each candidate sound source in the sound source data to be processed are obtained; performing quality evaluation on the sound signals of each candidate sound source, and determining an evaluation value of the sound signals of each candidate sound source; determining a target sound source from a plurality of candidate sound sources according to the evaluation value of the sound signal of each candidate sound source; the sound signals of the target sound sources are processed, so that the sound signal quality of each candidate sound source is evaluated according to the evaluation value of each candidate sound source by adding a signal quality evaluation mechanism in signal separation, and therefore the effective target sound source is selected, the accuracy of the separated sound source signals is improved, and the problem of low stability of signal separation is solved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a sound signal processing method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a sound source separation process in the sound signal processing method according to the embodiment of the invention;
fig. 3 is a schematic flow chart of a method for estimating a candidate sound source in a sound signal processing method according to an embodiment of the present invention;
fig. 4 is a schematic flow chart of another sound source separation process in the sound signal processing method according to the embodiment of the present invention;
fig. 5 is a schematic structural diagram of an audio signal processing apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
As described in the background art, in the prior art, when blind source separation is performed by using AuxICA (full name: auxiliary Function Based Independent Component Analysis, chinese: independent component analysis method based on auxiliary function) and AuxIVA (full name: auxiliary Function Based Independent Vector Analysis, independent vector analysis method based on auxiliary function), evaluation and screening of sound quality of a sound source are not performed on a separated target sound source, and accuracy of the target sound source cannot be guaranteed, so that quality of a separated sound signal is reduced, and stability of blind source separation is not high.
Based on the above, in order to improve the stability of blind source separation output and achieve the effect of noise reduction, the embodiment of the invention provides a sound signal processing method, which determines a final target sound source through the evaluation value of each candidate sound source, improves the accuracy of separated sound source signals, improves the problem of low stability of signal separation, and improves the audio-visual effect.
Referring to fig. 1, fig. 1 is a schematic flow chart of a sound signal processing method according to an embodiment of the present invention, where the sound signal processing method may be applied to an electronic device, and in some embodiments of the present invention, the electronic device may be a mobile terminal, such as a mobile phone, a tablet computer, a computer, or a television; the electronic device may also be a voice device in some embodiments of the invention, such as a bluetooth speaker, a smart speaker, a microphone, a smart home, etc.
The illustrated sound signal processing method includes steps 101 to 104:
101, performing sound source separation processing on the sound source data to be processed to obtain candidate sound sources in the sound source data to be processed and sound signals belonging to each candidate sound source in the sound source data to be processed.
The sound source data to be processed refers to a voice signal in the current environment collected by the electronic equipment, and the voice signal comprises a voice signal of a sound source and noise existing in the environment. The candidate sound source refers to a sound source that may exist in the current environment estimated from the sound source data to be processed, which includes a target sound source.
In some embodiments of the present invention, the sound source data to be processed may be a voice signal in the current environment collected in real time; or may be a voice signal in the current environment collected during a preset period of time.
The electronic device is provided with a microphone array, the electronic device collects voice signals in the current environment where the electronic device is located through the microphone array, and because the distance between a sound source and each microphone channel in the microphone array is different, each microphone channel in the microphone array may receive the voice signals of the sound source, reverberation of a room, interference of other sound sources, noise in the environment and noise in the interior of the device inevitably reduce the quality and the language definition of the voice signals, but the current voice recognition technology cannot have high sensitivity and robustness completely like human hearing, can distinguish various sound sources and eliminate interference, and then the interference causes noise in the sound source data to be processed, if the sound source data to be processed is directly used for output, the audio-visual effect is affected, and the performance of the electronic device in an interactive mode by voice is reduced, so that the sound source data to be processed needs to be subjected to sound source separation processing, and the sound source in the environment is determined. The microphone array may be a ring microphone array, a linear microphone array, or a distributed microphone array, where the microphone array includes at least one microphone channel.
In some embodiments of the present invention, there are various ways to perform sound source separation processing on sound source data to be processed, which illustratively include:
(1) The characteristics of different sound sources can be separated from the sound source data to be processed by a sound source separation method based on a depth neural network, and the candidate sound sources in the sound source data to be processed and the sound signals of each candidate sound source are obtained according to the separated characteristics of the different sound sources. The sound source separation method based on the depth neural network comprises, but is not limited to, a sound source separation method based on depth clustering, a sound source separation method based on displacement invariant training and an end-to-end sound source separation method.
(2) The candidate sound sources in the sound source data to be processed and the sound signals of each candidate sound source can be obtained by blind source separation of the sound source data to be processed by a separation method based on independent subspace analysis.
(3) The candidate sound sources in the sound source data to be processed and the sound signals of each candidate sound source can be obtained by blind source separation of the sound source data to be processed by a separation method based on nonnegative matrix factorization.
(4) The candidate sound sources in the sound source data to be processed and the sound signals of each candidate sound source can be obtained by blind source separation of the sound source data to be processed by a separation method based on clustering. Clustering the sound source data to be processed through a Gaussian mixture model, and obtaining candidate sound sources in the sound source data to be processed and sound signals of each candidate sound source.
(5) The sound source data to be processed can be subjected to blind source separation through principal component analysis, and candidate sound sources in the sound source data to be processed and sound signals of each candidate sound source are obtained.
(6) The separation method based on independent component analysis can be used for carrying out blind source separation on the sound source data to be processed by analyzing the mutually independent statistical characteristics among the signals in the sound source data to be processed, so as to obtain candidate sound sources in the sound source data to be processed and sound signals of each candidate sound source.
(7) The separation method based on independent vector analysis can perform blind source separation on the sound source data to be processed to obtain candidate sound sources in the sound source data to be processed and sound signals of each candidate sound source.
(8) The candidate sound sources in the sound source data to be processed and the sound signals of each candidate sound source can be obtained by blind source separation of the sound source data to be processed through a separation method based on independent vector analysis of auxiliary function optimization.
It should be noted that the above-mentioned sound source separation processing method is only an exemplary illustration, and does not constitute limitation of the sound signal processing method provided in the embodiment of the present invention, for example, sound source separation processing may be performed on sound source data to be processed by using a separation method based on an overdetermined independent vector analysis optimized by an auxiliary function, so as to obtain candidate sound sources in the sound source data to be processed and a sound signal of each candidate sound source.
102, performing quality evaluation on the sound signal of each candidate sound source, and determining an evaluation value of the sound signal of each candidate sound source.
The evaluation value characterizes a sound quality of the sound signal of each candidate sound source for quantifying a probability that each candidate sound source is a target sound source.
In some embodiments of the present invention, there are a number of ways to evaluate the quality of the sound signal of each candidate sound source, including, illustratively:
(1) The evaluation value of the sound signal of each candidate sound source may be determined by evaluating the quality of the sound signal of each candidate sound source by calculating the signal-to-interference ratio of the sound signal of each candidate sound source.
(2) The evaluation value of the sound signal of each candidate sound source may be determined by evaluating the quality of the sound signal of each candidate sound source by calculating the signal distortion ratio of the sound signal of each candidate sound source.
(3) The evaluation value of the sound signal of each candidate sound source may be determined by evaluating the quality of the sound signal of each candidate sound source by calculating the maximum likelihood ratio of the sound signal of each candidate sound source.
(4) The evaluation value of the sound signal of each candidate sound source may be determined by performing quality evaluation on the sound signal of each candidate sound source by calculating cepstrum clusters of the sound signal of each candidate sound source.
(5) The evaluation value of the sound signal of each candidate sound source may be determined by evaluating the quality of the sound signal of each candidate sound source by calculating the frequency-weighted segmented signal-to-noise ratio of the sound signal of each candidate sound source.
(6) The evaluation value of the sound signal of each candidate sound source may be determined by evaluating the quality of the sound signal of each candidate sound source by calculating a speech quality perception evaluation score of the sound signal of each candidate sound source.
(7) The evaluation value of the sound signal of each candidate sound source may be determined by evaluating the quality of the sound signal of each candidate sound source by calculating the kurtosis value of the sound signal of each candidate sound source.
(8) The evaluation value of the sound signal of each candidate sound source may be determined by evaluating the quality of the sound signal of each candidate sound source by calculating a probability score corresponding to the speech feature vector of the sound signal of each candidate sound source. Wherein the probability score is used to characterize the probability that the sound signal of each candidate sound source is the speech signal of the target sound source.
It should be noted that the method for evaluating the quality of the sound signal of each candidate sound source is merely an exemplary illustration, and does not limit the sound signal processing method provided by the embodiment of the present invention. In practical application, the corresponding evaluation value determination method can be selected according to the calculation effectiveness of the electronic equipment in the practical application scene.
And 103, determining and obtaining a target sound source from a plurality of candidate target sound sources according to the evaluation value of the sound signal of each candidate sound source.
In some embodiments of the present invention, step 103 includes: and selecting a candidate sound source corresponding to the maximum evaluation value according to the evaluation value of the sound signal of each candidate sound source, and setting the candidate sound source corresponding to the selected maximum evaluation value as a target sound source.
In some embodiments of the present invention, step 103 includes: according to the evaluation value of the sound signal of each candidate sound source, the statistical characteristics of the evaluation value are obtained, and according to the statistical characteristics of the evaluation value, the target sound source is determined and obtained from a plurality of candidate target sound sources, wherein the statistical characteristics comprise the median or mode of the evaluation value. Specifically, the median or mode of the evaluation value is obtained from the evaluation value of the sound signal of each candidate sound source, the evaluation value greater than the median or the evaluation value greater than the mode is selected from the plurality of candidate target sound sources according to the median or the mode of the evaluation value, and the candidate sound source corresponding to the selected evaluation value greater than the median is set as the target sound source or the evaluation value greater than the mode is set as the target sound source.
104, processing the sound signal of the target sound source.
Wherein processing of the sound signal of the target sound source includes, but is not limited to, speech output, speech recognition, speech transmission, speech storage, etc.
In some embodiments of the present invention, when the electronic device is a voice-interactive electronic device, step 104 includes: the method comprises the steps of obtaining a sound signal corresponding to the target sound source, analyzing semantic meaning in the sound signal corresponding to the target sound source to obtain voice information in the sound signal corresponding to the target sound source, and executing operations corresponding to the instructions, such as dialogue interaction, query operation, music playing operation and the like, by the electronic equipment in response to the instructions corresponding to the voice information.
In some embodiments of the present invention, when the electronic device is a voice gathering device, step 104 comprises: the voice signal corresponding to the target sound source is acquired, and the voice signal is stored, for example, in a radio device, or the voice signal can be transmitted to a server in communication connection with the electronic device.
The embodiment of the invention provides a sound signal processing method, which determines a final target sound source through the evaluation values of candidate sound sources, improves the accuracy of separated sound source signals and solves the problem of low stability of signal separation.
In consideration of the fact that the existing blind source separation method cannot utilize the position information of the sound source, after the position of the sound source changes, the existing blind source separation method cannot accurately detect the change situation of the position of the sound source, so that the target sound source separated by the existing blind source separation method has certain uncertainty, and the sound signal of the target sound source obtained by separation is unstable, so that in order to further improve the certainty of the output result obtained by blind source separation, the noise in the output voice signal is reduced, and the effect of noise reduction is achieved. The sound source priori information refers to position information of candidate sound sources possibly existing in the environment where the electronic equipment for collecting the sound source data to be processed is located. The position information of the candidate sound source can be a space coordinate value of the candidate sound source, and can also be a pitch angle and an azimuth angle of the candidate sound source in a space of an environment where the electronic equipment for collecting the data of the sound source to be processed is located. It should be noted that, the method for establishing the space coordinate system in the embodiment of the present invention is not limited, and for example, the space coordinate system may be established with the geometric center in the electronic device as the origin.
As shown in fig. 2, fig. 2 is a schematic flow chart of a sound source separation process in a sound signal processing method according to an embodiment of the present invention, where the sound source separation processing method includes steps 201 to 203:
and 201, estimating the sound source position of the sound source data to be processed, and determining and obtaining the candidate sound sources corresponding to the sound source data to be processed and the position information of each candidate sound source.
In some embodiments of the present invention, the candidate sound sources corresponding to the sound source data to be processed and the position information of each candidate sound source may be obtained by performing sound source estimation on the sound source data to be processed by an SPR (full name: steered Response Power, chinese: controllable response power) method. Specifically comprising: estimating the power spectrum distribution of the sound source data to be processed in space by an SPR method, and determining the candidate sound sources corresponding to the sound source data to be processed and the position information of each candidate sound source according to the power spectrum distribution.
In some embodiments of the present invention, the location of the maximum power may be determined according to the power spectrum distribution, and the selected location of the maximum power may be set as location information of the candidate sound source. In some embodiments of the present invention, a plurality of positions with powers greater than or equal to a preset power value may be determined according to the power spectrum distribution, and the selected positions with powers greater than the preset power value may be set as position information of the candidate sound source. The preset power value may be an average power value in power spectrum distribution, or may be a preset power value of a power setting bit arranged in the S-th bit in the ordered power by ordering the power of each position in space according to the power spectrum distribution from the high power to the low power. Wherein, S is an integer greater than 0, and the value of S may be set according to the actual application scenario, for example, the value of S may be 2, 3, 4, 5, etc.
In some embodiments of the present invention, considering that aliasing is easily generated in a high frequency portion when a power spectrum of sound source data to be processed is estimated by an SPR method, and because aliasing is easily generated, accuracy of estimating position information of a candidate sound source is affected. Specifically, the sound source position estimation includes steps a1 to a4:
and a1, performing frequency domain conversion on sound source data to be processed to obtain frequency domain signals of the sound source data to be processed.
And a2, filtering the frequency domain signal of the sound source data to be processed through a filter to obtain a low-frequency signal and a high-frequency signal of the frequency signal. The filter may be a low-pass filter or a high-pass filter.
Step a3, according to the low-frequency signal of the frequency signal, performing time delay estimation on the low-frequency signal of each microphone channel in a microphone array arranged in the electronic equipment through an SPR method to obtain a controllable response power function value of the microphone array in each preset area, selecting a plurality of preset areas with the controllable response power function value being greater than or equal to a preset function value threshold, and setting the selected preset areas as estimation areas where candidate sound sources are located.
And a4, carrying out time delay estimation on the high-frequency signal of each microphone channel in a microphone array arranged in the electronic equipment through an SPR method according to the high-frequency signal of the frequency signal to obtain a controllable response power function value of the microphone array in each estimated area, selecting an estimated area with the controllable response power function value being greater than or equal to a preset function value threshold, setting the estimated area with the controllable response power function value being greater than or equal to the preset function value threshold as the position of each candidate sound source, and setting the position information of the estimated area with the controllable response power function value being greater than or equal to the preset function value threshold as the position information of each candidate sound source.
In some embodiments of the present invention, steps a3 to a4 include: dividing a space coordinate system into a plurality of first grid areas, wherein each first grid area corresponds to position information formed by a pitch angle and an azimuth angle; performing time delay estimation on low-frequency signals of each microphone channel in a microphone array arranged in electronic equipment through an SPR method to obtain a first controllable response power function value of the microphone array in each first grid area; selecting a first grid region with the largest controllable response power function value, and setting the selected first grid region as an estimated region where a candidate sound source is located; dividing an estimation area where the candidate sound source is located into a plurality of second grid areas, wherein each second grid area corresponds to position information formed by a pitch angle and an azimuth angle, and the angle difference between every two adjacent second grid areas is smaller than the angle difference between every two adjacent first grid areas; performing time delay estimation on high-frequency signals of each microphone channel in a microphone array arranged in the electronic equipment through an SPR method to obtain second controllable response power function values of the microphone array in each second grid area, selecting second grid areas with second controllable response power function values larger than or equal to a preset function value threshold according to the second controllable response power function values of the microphone array in each second grid area, setting the second grid areas with the second controllable response power function values larger than or equal to the preset function value threshold as positions of each candidate sound source, and setting position information of the second grid areas with the second controllable response power function values larger than or equal to the preset function value threshold as position information of each candidate sound source.
And 202, determining and obtaining the position guiding information of each candidate sound source according to the position of each sound channel for collecting the sound source data to be processed and the position information of each candidate sound source.
The sound channel position for collecting the sound source data to be processed refers to position information of each microphone channel in a microphone array arranged in the electronic equipment for collecting the sound source data to be processed in a preset space coordinate system.
The position guidance information is used to determine a position vector between the position information of each candidate sound source and the position of each sound channel where the sound source data to be processed is acquired.
In some embodiments of the present invention, since the existing blind source separation method does not consider the sound source position, and thus noise exists in the separated sound signals, in order to improve stability of blind source separation, in blind source separation, position vectors between the position information of each candidate sound source and each sound channel position of the sound source data to be processed are determined according to the position information of each candidate sound source estimated from the sound source, and in blind source separation, sound signals of each sound channel position of the sound source data to be processed are obtained by performing sound source separation on the sound source data to be processed through the position vectors between the position information of each candidate sound source and each sound channel position of the sound source data to be processed and collecting the sound signal components of each candidate sound source in the sound signals of each sound channel position in the sound source data to be processed.
In some embodiments of the present invention, the distance between the position information of each candidate sound source and the position of each sound channel collecting the to-be-processed sound source data may be obtained according to the position information of each candidate sound source and the position information of each sound channel collecting the to-be-processed sound source data, and the angle information between the position information of each candidate sound source and the position of each sound channel collecting the to-be-processed sound source data may be obtained according to the distance between the position information of each candidate sound source and the position of each sound channel collecting the to-be-processed sound source data, and the position guiding information of each candidate sound source may be obtained according to the angle information between the position information of each candidate sound source and the position of each sound channel collecting the to-be-processed sound source data. For example, the position information of each candidate sound source and the angle information between the positions of each sound channel where the sound source data to be processed is collected can be obtained byAnd obtaining the position guiding information of each candidate sound source. Wherein θ represents angle information between the position information of each candidate sound source and the position of each sound channel collecting the sound source data to be processed, and M is collecting the sound source data to be processed The number of sound channels managing sound source data.
In some embodiments of the present invention, the distance between the position information of each candidate sound source and the position of each sound channel collecting the data of the sound source to be processed may be obtained according to the position information of the sound source to be processed and the position information of each candidate sound source, and the time information required for the sound signal to reach the position of each sound channel collecting the data of the sound source to be processed from the position information of each candidate sound source may be obtained according to the distance between the position information of each candidate sound source and the position of each sound channel collecting the data of the sound source to be processed, and the position guiding information of each candidate sound source may be obtained according to the time information required for the sound signal to reach the position of each sound channel collecting the data of the sound source to be processed from the position information of each candidate sound source. For example, by time information required for acquiring each sound channel position of the sound source data to be processed from the position information of each candidate sound source based on the sound signalAnd obtaining position guiding information of each candidate sound source, wherein tau represents time information required for a sound signal to reach the position of each sound channel for collecting the sound source data to be processed from the position information of each candidate sound source, and j represents complex numbers.
And 203, performing sound source separation on the sound source data to be processed according to the position guiding information of each candidate sound source to obtain the sound signal of each candidate sound source.
In some embodiments of the present invention, according to the position guiding information of each candidate sound source, the sound source data to be processed may be subjected to sound source separation according to the separation method in step 101, so as to obtain the sound signal of each candidate sound source. Illustratively, a separation method of overdetermined independent vector analysis based on auxiliary function optimization is described as an example.
In the method for separating the sound source data to be processed by the separation method based on the over-determined independent vector analysis of the auxiliary function optimization, the received sound source data is assumed to be composed of N transmitting end source signals S in the environment 1 ,S 2 ,...,S N Through transfer function h mn Mixed signal x received by M sound channels after mixing 1 ,x 2 ,...,x M Namely, the sound source data is expressed asThe method comprises the steps of obtaining N emitting end source signals, namely N emitting end source signals in the environment, converting sound source data into a frequency domain through short-time Fourier transformation of the sound source data, and obtaining frequency domain signals X (L, k) =H (k) S (L, k), wherein l=1, and L, wherein L is the number of frames of the short-time Fourier transformation, and S (L, k) = [ S ] 1 (l,k),...,S N (l,k)] T Representing the emission source signal at frequency point k, X (l, k) = [ X ] 1 (l,k),...,X M (l,k)] T For the frequency domain signal of the received mixed signal, H (k) is a mixing matrix, and according to the frequency domain signal X (L, k) =h (k) S (L, k) of sound source data, l=1 1 (l,k),...,Y N (l,k)] T For the separation signal on the frequency point k, the separation signal is approximately N transmitting end source signals, W (k) is the separation parameter on the frequency point k, the separation parameter of each frame frequency domain signal in the frequency domain signals of the sound source data on each frequency point is solved through optimization based on the auxiliary function, and separating the frequency signals of the candidate sound source from the frequency signals of each frequency point in the frequency domain signals of the sound source data to be processed by separating the frequency signals of each frequency point in the frequency domain signals of the parameter and the sound source data, and obtaining the sound signals of the candidate sound source through inverse short-time Fourier transform.
In some embodiments of the present invention, in order to improve reliability of sound signals of a candidate sound source obtained by separation, reduce noise in the separated candidate sound source signals, in performing sound source separation on sound source data by using a separation method based on an over-determined independent vector analysis optimized by an auxiliary function, a separation parameter of each frame frequency domain signal in frequency domain signals of the sound source data on each frequency point is solved by combining guide information with the optimization based on the auxiliary function, and the sound signals of the candidate sound source are separated from frequency signals of each frame frequency domain signal in frequency points in the frequency domain signals of the sound source data by using the separation parameter and the frequency signals of each frame frequency domain signal in the frequency domain signals of the sound source data. Specifically, the separation method includes steps b1 to b2:
And b1, determining and obtaining separation parameters of the sound source data to be processed according to the position guiding information of each candidate sound source.
Wherein, the sound source data to be processed may be a current frame frequency domain signal among frequency domain signals of the sound source data.
In some embodiments of the invention, step b1 comprises: after the auxiliary parameters corresponding to the sound source data to be processed are obtained, correcting the auxiliary parameters corresponding to the sound source data to be processed according to the position guiding information of each candidate sound source to obtain corrected auxiliary parameters corresponding to the sound source data to be processed, and optimizing and solving the separation parameters of the sound source data to be processed based on the corrected auxiliary parameters. Wherein the auxiliary parameters include auxiliary parameters in frequency signals at various frequency points of each frame frequency domain signal in the frequency domain signals of the sound source data to be processed. Specifically, the method for determining the separation parameter of the sound source data to be processed according to the position guide information comprises the following steps:
(1) And acquiring the historical separation parameters of the historical sound source data and the auxiliary parameters corresponding to the sound source data to be processed.
(2) And correcting the auxiliary parameters according to the position guide information of each candidate sound source to obtain corrected auxiliary parameters.
(3) And obtaining the separation parameters of the sound source data to be processed according to the corrected auxiliary parameters and the historical separation parameters.
Wherein, the historical separation parameter of the historical sound source data refers to the separation parameter of the sound source data of the previous frame of the sound source data to be processed. Because the sound signals of the sound sources have correlation in time sequence, in the sound source separation of the sound source data to be processed based on the separation method of the overdetermined independent vector analysis of the auxiliary function optimization, the sound source separation of the sound source data to be processed is carried out by alternately updating the separation parameters and the auxiliary parameters, wherein the updating of the auxiliary parameters is realized by updating the separation parameters of the sound source data of the previous frame through the frequency domain signals of the sound source data to be processed; the updating of the separation parameter is achieved by updating the separation parameter of the sound source data of the previous frame with the auxiliary parameter of the frequency domain signal of the sound source data to be processed.
In some embodiments of the present invention, the step of obtaining auxiliary parameters corresponding to the sound source data to be processed includes: the method comprises the steps of obtaining historical separation parameters of historical sound source data and historical auxiliary parameters of discrete sound source data, obtaining energy of each candidate sound source output by the previous frame of sound source data according to the historical separation parameters and sound signals of the sound source data to be processed, and obtaining auxiliary parameters corresponding to the sound source data to be processed according to the historical auxiliary parameters, the sound signals of the sound source data to be processed and the energy of each candidate sound source output by the previous frame of sound source data. Wherein, the history auxiliary parameter refers to an auxiliary parameter of the sound source data of the previous frame. For example, when the number S of separated candidate sound sources is less than or equal to the number M of sound channels of the sound source data to be processed, it is possible to Obtaining auxiliary parameters of sound source data to be processed
V(l,k)=[V 1 (l,k),V 2 (l,k),...,V s (l,k),...,V S (l,k)]Wherein α is a forgetting factor and α ε [0,1 ]]L is the number of frames of sound source data to be processed, V s (l-1, k) is an auxiliary parameter of the kth frequency point in the sound source data of the previous frame, i.e., a history auxiliary parameter,is the energy of each candidate source of the output, W s (l-1, k) is a separation parameter of the kth frequency point in the sound source data of the previous frame, namely, a historical separation parameter, (-) H Representing conjugate transpose in which the first frame processes auxiliary parameter V of sound source data s (1, k) is a matrix in which the value of the diagonal element is 1 and the values of the elements at other positions are zero, which is set in advance.
In some embodiments of the inventionCorrecting the auxiliary parameter matrix according to the position guide information of each candidate sound source comprises: calculating each candidate sound source auxiliary parameter V in auxiliary parameters according to the position guiding information of each candidate sound source s (l, k) corresponding correction parameters, V s (l, k) and correction parameters for each candidate auxiliary parameter V s And (l, k) correcting to obtain corrected auxiliary parameters. Wherein the information can be guided according to the position of each candidate sound sourceBy passing throughObtaining auxiliary parameters V of each candidate sound source s (l, k) the corresponding correction parameter β, wherein λ s Is a preset constant. After determining and obtaining auxiliary parameters V of each candidate sound source s After the correction parameter beta corresponding to (l, k), the method comprises the following steps of V s (l, k) +beta to obtain auxiliary parameters D of each corrected candidate sound source s (l, k) summing up each corrected candidate source auxiliary parameter D s (l, k) obtaining a corrected auxiliary parameter D (l, k) = [ D ] 1 (l,k),D 2 (l,k),...,D s (l,k),...,D S (l,k)]。
In some embodiments of the present invention, the separation parameter W (l, k) of the sound source data to be processed may be obtained through D (l, k) W (l-1, k) according to the corrected auxiliary parameter and the historical separation parameter.
In some embodiments of the invention, the modified auxiliary parameter and the historical separation parameter can be used to determine the position of the object by (W (l-1, k) D s (l,k)) -1 Obtaining the s-th intermediate parameter P s (l, k) byThe separation parameter W (l, k) is obtained.
In some embodiments of the present invention, to increase the accuracy of the separation result to solve the ambiguity problem of blind source separation, in the separation parameter determination, the separation parameter may be determined by (W (l-1, k) V based on the corrected auxiliary parameter and the historical separation parameter s (l,k)) -1 Obtaining the s first intermediate parameter P s (l, k) byObtaining the s-th modified first intermediate parameter Q s (l, k), through P s H (l,k)D s (l,k)P s (l, k) obtaining the s-th second intermediate parameter ψ s (l, k), through P s H (l,k)D s (l,k)Q s (l, k) obtaining the s-th third intermediate parameter Φ s (l, k) obtaining the separation parameter W of the s-th element from the s-th first intermediate parameter, the s-th modified first intermediate parameter, the s-th second intermediate parameter and the s-th third intermediate parameter s (l, k) summing the separation parameters for each element to obtain a separation parameter W (l, k). Specifically, for the s-th element W in the separation parameter W (l, k) s (l, k) the third intermediate parameter Φ of the element can be used s Comparing the value of (l, k) with a preset value; if the third intermediate parameter phi of the element s The value of (l, k) is consistent with the preset value, then according to the second intermediate parameter ψ of the element s (l, k), first intermediate parameter P s (l, k) and the modified first intermediate parameter Q s (l, k) byObtaining the s-th element W in the separation parameter W (l, k) s (l, k); if the third intermediate parameter phi of the element s If the value of (l, k) is not consistent with the preset value, then according to the second intermediate parameter ψ of the element s (l, k), the third intermediate parameter Φ of the element s (l, k), first intermediate parameter P s (l, k) and the modified first intermediate parameter Q s (l, k) byObtaining the s-th element W in the separation parameter W (l, k) s (l, k). Wherein the preset value may be 0.
It should be noted that the above determination method of the separation parameter is merely an exemplary illustration of determining the separation parameter in the separation method based on the overdetermined independent vector analysis of the auxiliary function optimization, and in practical application, the determination manner of the separation parameter may be adjusted according to the employed separation method.
And b2, performing sound source separation on sound signals in the sound source data to be processed according to the separation parameters, and determining to obtain the sound signals of each candidate sound source.
In some embodiments of the invention, step b2 comprises: after the separation parameter is obtained, the separation signal is separated from the sound source data to be processed by calculating the product of the separation parameter and the frequency domain signal of the sound signal in the sound source data to be processed, wherein each element in the separation signal represents the frequency domain signal of each candidate sound source, and the frequency domain signal of each candidate sound source is separated to perform inverse short time Fourier transform to obtain the sound signal of each candidate sound source.
In some embodiments of the invention, step b2 comprises: after the separation parameters are obtained, obtaining noise separation parameters according to the separation parameters, obtaining total separation parameters of the sound source data to be processed through the noise separation parameters and the separation parameters, separating separation signals from the sound source data to be processed by calculating the product of the total separation parameters and frequency domain signals of sound signals in the sound source data to be processed, wherein each element in the separation signals represents the frequency domain signals of each candidate sound source, and performing inverse short-time Fourier transformation on the frequency domain signals of each candidate sound source to obtain the sound signals of each candidate sound source.
In some embodiments of the present invention, a method for determining a noise separation parameter includes: according to the separation parameters, the separation is carried out by (A 2 C(l,k)W H (l,k))(A 1 C(l,k)W H (l,k)) -1 The noise subspace J (l, k) is calculated, by [ J (l, k), -I M-S ]Obtaining a noise separation parameter U (l, k), wherein A 1 And A 2 Are all constant matrices, and A 1 =[I S ,O S×M-S ],A 1 =[O (M-S)×S ,I M-S ]I is an identity matrix, Q *×* Is a zero matrix, C (l, k) is a matrix of noise parameters of M, in some embodiments of the invention, a preamble noise partition may be based on the source data of the previous frameThe preamble noise parameter matrix C (l-1, k) in the separation parameter, the sound signal of the sound source data to be processed, passes through alpha C (l-1, k) + (1-alpha) X (l, k) X H (l, k) obtaining C (l, k), wherein alpha is forgetting factor in auxiliary parameter calculation, and the value setting is the same as the forgetting factor setting in the auxiliary parameter and can be set to 0.95; in some embodiments of the invention, the noise parameter matrix C (1, k) of the first frame of C (l, k) processed sound source data is a zero matrix.
In some embodiments of the invention, after obtaining the noise separation parameter U (l, k), the noise separation parameter is calculated byObtaining the total separation parameter->By calculating the total separation parameter->Product of frequency domain signal of sound signal X (l, k) in sound source data to be processed +.>Separating a separation signal Y (l, k) from the sound source data to be processed, wherein the separation signal Y (l, k) is a column vector having a number of elements S, each element Y s (l, k), s=1, 2, & gt, S represents the frequency domain signal of each candidate sound source, and the frequency domain signal from which each candidate sound source is separated is subjected to an inverse short time fourier transform to obtain the sound signal y of each candidate sound source s (l)。
In some embodiments of the present invention, in order to increase stability of the separation signal and increase accuracy of the separation result, so as to solve the ambiguity problem of blind source separation, in step b2, after obtaining the separation parameter, obtaining the noise separation parameter according to the separation parameter byObtaining the total separation parameter of the sound source data to be processed>By->First transformation matrix for obtaining total separation parameter>Extracting the first transformation matrix->The elements of the first row to the S-th row in the matrix are obtained to obtain a second transformation matrix W bp (l, k) by->The separation signal Y (l, k) is separated from the sound source data to be processed. Wherein A (l, k) is a diagonal matrix of M, and the diagonal elements in A (l, k) are the total separation parameter +.>The inverted diagonal elements; (. Cndot. H Representing the conjugate transpose.
The above-mentioned manner of separating the sound signals of the candidate sound sources from the sound source data to be processed by the separation parameter is merely an exemplary illustration of the separation method based on the overdetermined independent vector analysis of the auxiliary function optimization, and in practical application, the manner of separating the sound signals of the candidate sound sources from the sound source data to be processed by the separation parameter may be adjusted according to the employed separation method.
In some embodiments of the present invention, in step 201, an initial sound source area may be selected from a sound source space where an electronic device for collecting the sound source data to be processed is located, the initial sound source area is uniformly divided according to a preset azimuth angle to obtain a plurality of direction vectors, each direction vector is set as an initial sound source position, a power value of the sound source data to be processed in each initial sound source position is calculated through SRP (fully: steered Response Power, fully: controllable response power), a candidate sound source position is selected from the plurality of initial sound source positions according to the power value of each initial sound source position, and a candidate sound source and position information of the candidate sound source are obtained according to the selected candidate sound source position. Specifically, as shown in fig. 3, fig. 3 is a schematic flow chart of a method for estimating a candidate sound source in a sound signal processing method according to an embodiment of the present invention, where the method for estimating a candidate sound source includes steps 301 to 304:
301, determining and obtaining a plurality of initial sound source positions according to a preset azimuth angle.
In some embodiments of the present invention, a space coordinate system may be established with respect to a geometric center of a microphone array in an electronic device, an initial sound source area in a preset angle range is selected according to a clockwise direction or a counterclockwise direction with respect to the origin as a center and a preset distance as a radius, at least one initial sound source exists in the initial sound source area, the origin is used as a center and the preset distance as a radius, the initial sound source area is rotated or rotated in the counterclockwise direction according to a clockwise direction, one position is selected every azimuth angle preset in rotation in the initial sound source area, a plurality of positions are selected, each selected position is set as an initial sound source position, an azimuth angle of each selected position and a pitch angle formed by each selected position and the origin are obtained, and the direction vector of each selected position is set as position information of the initial sound source position.
302, a distance between each initial sound source position and each sound channel position is determined based on each initial sound source position.
In some embodiments of the present invention, step 302 includes: according to a space coordinate system, determining the position coordinate of each microphone channel of a microphone array in the electronic equipment in the space coordinate system, setting the position coordinate of each microphone channel of the microphone array in the electronic equipment in the space coordinate system as each sound channel position for collecting sound source data to be processed, determining the position coordinate of each initial sound source position according to the position information of each initial sound source position, and obtaining the distance between the initial sound source position and each sound channel position according to the position coordinate of the initial sound source position and each sound channel position for each initial sound source position. In some embodiments of the present invention, the distance between the initial sound source position and each sound channel position may be obtained by calculating a 2-norm between the position coordinates of the initial sound source position and each sound channel position; the distance of the initial sound source position from each sound channel position may also be obtained by calculating the euclidean distance or mahalanobis distance between the position coordinates of the initial sound source position and each sound channel position.
303, determining the power of the sound signal at each initial sound source position according to the distance between each initial sound source position and each sound channel position.
In some embodiments of the present invention, step 303 comprises: for each initial sound source position, obtaining the distance difference between the initial sound source position and each two adjacent sound channel positions according to the distance between the initial sound source position and each sound channel position, obtaining the time difference of the signals received by the two adjacent sound channel positions at the initial sound source position according to the distance difference, determining the sound signals at each sound channel position in the two adjacent sound channel positions according to the sound source data to be processed, obtaining the power of the signals at the initial sound source position at the previous sound channel position in the two adjacent sound channel positions according to the time difference and the sound signals at each sound channel position in the two adjacent sound channel positions, and summarizing the power of the signals at the candidate sound source position at each sound channel position to obtain the power of the sound signals at the initial sound source position. Wherein a preceding one of the two adjacent sound channel positions may be a sound channel position in which the position coordinates of the sound channel of the two adjacent sound channel positions are smaller than the position coordinates of the other sound channel.
In some embodiments of the present invention, step 303 may also calculate the power of the sound signal at each initial sound source position according to steps a 1-a 3.
304, determining and obtaining the candidate sound source and the position information of the candidate sound source according to the power of the sound signal at each initial sound source position.
In some embodiments of the present invention, step 304 includes: according to the power of the sound signals on the initial sound source positions, sorting the initial sound source positions according to the order of the power from large to small, selecting a preset number of target initial sound source positions from the sorted initial sound source positions, setting the selected target initial sound source positions as candidate sound sources, and setting the position information of each target initial sound source position as the position information of the candidate sound sources. It should be noted that, in the embodiment of the present invention, the specific number of the preset number is not limited, that is, the number of the candidate sound sources is not limited, for example, in order to reduce the calculation amount in the sound source separation processing, the number of the candidate sound sources may be set to be less than or equal to the number of sound channels for collecting the sound source data to be processed.
In some embodiments of the present invention, step 304 includes: the power of the sound signals at the initial sound source positions is sequentially compared with a power threshold, the initial sound source positions with the power larger than or equal to the power preset are selected, the selected initial sound source positions with the power larger than or equal to the power preset are set as candidate sound sources, and the position information of each selected initial sound source position with the power larger than or equal to the power preset is set as the position information of the candidate sound sources. In some embodiments of the present invention, the power threshold may be preset, may be determined according to an average value, a mode or a median of the power of the sound signal at each initial sound source position, or may be set to a power threshold by sorting the powers according to the order of the powers from high to low according to the power of the sound signal at each initial sound source position, and setting the power value at the preset number in the sorted powers.
In some embodiments of the present invention, step 304 includes: according to the power of the sound signals at each initial sound source position, determining the maximum power in the power of the sound signals at each initial sound source position, calculating the power difference between the power of the sound signals at each initial sound source position and the maximum power, setting the initial sound source position corresponding to the power with the power difference smaller than or equal to the preset power difference threshold as a candidate sound source, and setting the position information of the initial sound source position corresponding to the power with the power difference smaller than or equal to the preset power difference threshold as the position information of the candidate sound source.
In some embodiments of the present invention, in step 303, for each initial sound source position, time information of arrival of a signal of the initial sound source position at each sound channel position can be obtained through the initial sound source position and each sound channel position, and power of a sound signal at the initial sound source position is determined according to the time information of arrival of the signal of the initial sound source position at each sound channel position, specifically, the power calculation method of the initial sound source position includes steps c1 to c3:
step c1 of determining, for each initial sound source position, time information of arrival of a signal at each sound channel position from the initial sound source position according to a distance between the initial sound source position and each sound channel position.
In some embodiments of the present invention, for each initial sound source position, a distance between the initial sound source position and each sound channel position may be obtained by the initial sound source position and each sound channel position where the sound source data to be processed is collected, and time information of arrival of a signal of the initial sound source position at each sound channel position may be obtained according to a propagation speed of sound and a distance between the initial sound source position and each sound channel position.
And c2, determining the power of the sound signal of each sound channel position according to the time information of the signal of the initial sound source position reaching each sound channel position.
In some embodiments of the present invention, delay estimation may be performed according to time information of the signal of the initial sound source position reaching each sound channel position, to obtain a controllable response power function value of each sound channel, and the controllable response power function value of each sound channel is set as the power of the sound signal of each sound channel position. The controllable response power function value can be obtained by performing time delay estimation according to the time information of the signal of the initial sound source position reaching each sound channel position through a generalized cross-correlation function based on phase transformation weighting. Specifically, the method for determining the power of the sound signal from each sound channel position according to the generalized cross-correlation function based on phase transformation weighting includes:
(1) For each sound channel position, first time information is determined for the signal of the initial sound source position to reach a next sound channel position adjacent to the sound channel position, and second time information for the signal of the initial sound source position to reach the next sound channel position adjacent to the sound channel position.
(2) A time difference between the first time information and the second time information is determined.
(3) And determining the power of the sound channel position according to the time difference, the sound signal of the sound channel position and the sound signal of the next sound source channel position adjacent to the sound channel position.
In some embodiments of the present invention, the time difference τ between the arrival of the signal at the initial sound source location at the sound channel location and the arrival of the signal at the next sound source channel location adjacent to the sound channel location may be obtained by subtracting the first time information of the arrival of the signal at the initial sound source location at the sound channel location from the second time information of the arrival of the signal at the next sound source channel location adjacent to the sound channel location ij (d n ) Wherein d is n Representing the initial sound source position, i represents the ith sound source channel, j represents the jth sound source channel, and j=i+1.
In some embodiments of the invention, the time difference τ is obtained ij (d n ) Then, the frequency domain signal X of each frequency point k of the sound signal according to the sound channel position i (k) And frequency domain signal X of each frequency point k sound signal of the next sound source channel position adjacent to the sound channel position j (k) By means ofObtaining a controllable response power function value R of the sound channel position ijij (d n )]Setting the controllable response power function value of the sound channel location as the power of the sound signal of the sound channel location, wherein, (-) * Representing conjugation->F s The sound source data to be processed is the sampling frequency of the sound signal, and K is the frequency point number of the short-time Fourier transform.
And c3, determining and obtaining the power of the sound signal at the initial sound source position according to the power of each sound channel position.
In some embodiments of the invention, after deriving the power of the sound signal for the sound channel location, byObtain the power F (d) n )。
In some embodiments of the invention, in a method of determining the power of a sound signal to be derived for each sound channel location based on a generalized cross correlation function of phase transformation weights, taking into account that the signal quality received for the initial sound source location is different for different sound channel locations in an electronic device, if for each sound channel location the signal quality received for the sound channel location is not taken into account, only by Obtain the power F (d) n ) The accuracy of the subsequent sound source estimation may be reduced and, in practice, the quality of the received signal at the pair of sound channel locations may be characterized by the maximum value of the controllable response power function of the pair of sound channel locations.
Based on this, in step c2, the initial power of each sound channel position and the initial power of the next sound channel position adjacent to each sound channel position are obtained from the time difference of the signal of the initial sound source position reaching each sound channel position and reaching the next sound channel position adjacent to each sound channel position, the power weight of each sound channel position is obtained from the maximum value of the initial power of each sound channel position and the initial power of the next sound channel position adjacent to each sound channel position, and the power of the sound signal of each sound channel position is obtained from the initial power of each sound channel position and the power weight of the sound channel position, specifically, the method for determining the power of the sound signal based on the weight comprises:
(1) And determining the initial power of the sound signal of each sound channel position according to the time information of the signal of the initial sound source position reaching each sound channel position.
(2) And determining and obtaining target power in the initial power corresponding to each two adjacent sound channel positions according to the initial power corresponding to each sound channel position, wherein the target power represents a larger value in the initial power corresponding to each two adjacent sound channel positions.
(3) And determining the power weight of each sound channel position according to the initial power corresponding to the sound channel position, the initial power corresponding to the next sound channel position adjacent to the sound channel position and each target power.
(4) And determining and obtaining the power of the sound channel position according to the initial power corresponding to the sound channel position and the power weight of the sound channel position.
For the initial sound source position, the initial power of the sound signal of each sound channel position can be obtained according to the method of determining the power of the sound signal of each sound channel position based on the generalized cross-correlation function weighted by phase transformation.
In some embodiments of the present invention, for each sound channel position where sound source data to be processed is collected, the maximum initial power R of the initial power of the sound signal at that sound channel position and the initial power of the sound signal at the next sound channel position adjacent to that sound channel position is determined max ijij (d n )]The maximum initial power R max ijij (d n )]Setting a target power. After each target power is obtained, collecting the position of the target under the initial sound source position by accumulating each target powerTarget power sum value for sound channel position of sound source dataFor each sound channel position, determining the maximum initial power of the sound signal of the sound channel position and the initial power of the sound signal of the next sound channel position adjacent to the sound channel position, and determining the maximum initial power by the maximum initial power and the target power total value->The ratio of the initial power of the sound signal of the sound channel position and the maximum initial power in the initial power of the sound signal of the next sound channel position adjacent to the sound channel position is normalized to obtain the power weight omega of the sound signal of the sound channel position i,j
In some embodiments of the invention, the power weight ω of the sound signal at the sound channel location is obtained i,j Initial power R of sound signal of the sound channel position ijij (d n )]After that, according to omega i,j R ijij (d n )]The power of the sound channel location is obtained.
In some embodiments of the present invention, at passing omega i,j R ijij (d n )]After obtaining the power of the sound channel position, the power is obtained by Obtain the power F (d) n )。
In some embodiments of the present invention, in step 202, time information that a signal of the position information of each candidate sound source reaches each sound channel position of the to-be-processed sound source data may be obtained according to the position information of the candidate sound source and each sound channel position of the to-be-processed sound source data, and position guiding information of each candidate sound source may be obtained according to the time information that a signal of the position information of each candidate sound source reaches each sound channel position of the to-be-processed sound source data. The method for determining the position guide information specifically comprises the following steps d1 to d2:
and d1, determining time information for obtaining the arrival of the signal of each candidate sound source to the channel position of each sound source according to the position information of the candidate sound source.
And d2, obtaining the position guiding information of the candidate sound source according to the time information of the signal of the candidate sound source reaching the channel position of each sound source.
In some embodiments of the invention, step d1 comprises: according to the established space coordinate system, according to the position information of each sound source channel for collecting the sound source data to be processed in the space coordinate system and the position information of the coordinate origin in the space coordinate system, the position vector of each sound source channel for collecting the sound source data to be processed is obtained, for each candidate sound source, according to the inner product between the direction vector of the position information of the candidate sound source and the position vector of each sound source channel for collecting the sound source data to be processed, the propagation distance of the signal at the position of the candidate sound source to the position of each sound source channel for collecting the sound source data to be processed is obtained, and according to the propagation distance and the sound propagation speed of the signal of the candidate sound source to the position of each sound source channel for collecting the sound source data to be processed, the time information of the signal of the candidate sound source to the position of each sound source channel for collecting the sound source data to be processed is obtained.
In some embodiments of the invention, step d2 comprises: inputting the time information of the signal of the candidate sound source reaching the position of each sound source channel for collecting the data of the sound source to be processed into a preset vector modelObtain the position-guiding information of the candidate sound source +.>Wherein (1)>Position information representing the candidate sound source, ω is a preselected settingIs used for the analog angular frequency of (a), τm m=1, 2, where, M represents the time information of the arrival of the signal of the candidate sound source at the position of each sound source channel where the sound source data to be processed is acquired.
In some embodiments of the present invention, after obtaining the position guidance information of each candidate sound source, according to step 203, sound source separation is performed on the sound source data to be processed according to the position guidance information of each candidate sound source, so as to obtain the sound signal of each candidate sound source.
In the embodiment of the invention, in sound source separation processing, firstly, sound source data to be processed are subjected to sound source localization to obtain the number of candidate sound sources and the position information of the candidate sound sources, then the position information of each candidate sound source is utilized to obtain the position guide information of each candidate sound source, the sound source data to be processed is subjected to sound source separation by adopting a separation method combining with the position guide information and the ultra-fixed independent vector analysis, and the sound signals of each candidate sound source are separated from the sound source data to be processed.
In some embodiments of the present invention, considering that the number of candidate sound sources is estimated by the sound source separation processing method shown in 201 to 203, which increases the calculation amount of the sound signal processing method, in order to reduce the calculation amount of the sound signal processing method, the embodiments of the present invention provide a sound source separation processing method without sound source estimation, specifically, as shown in fig. 4, fig. 4 is a schematic flow diagram of another sound source separation processing in the sound signal processing method provided in the embodiment of the present invention, where the shown sound source separation processing method includes steps 401 to 403:
and 401, performing sound source separation processing on the sound source data to be processed to obtain a plurality of predicted sound sources and sound signals of each primary predicted sound source.
Considering that the existing separation method based on independent vector analysis needs to perform sound source estimation to determine candidate sound sources when performing sound source separation processingThe number or the number of candidate sound sources to be separated is known in advance, the calculated amount of a sound signal processing method is increased, and in order to solve the problem that the number estimation of the candidate sound sources is required when blind source separation is carried out by a separation method based on independent vector analysis, the embodiment of the invention establishes an initial separation parameter W through the number m of sound channels of sound source data to be processed when sound source separation is carried out by the separation method based on independent vector analysis M×M (l) By iteratively updating the initial separation parameter W M×M (l) The method comprises the steps of separating sound signals of M predicted sound sources from sound source data to be processed, detecting and eliminating redundant signals of the separated sound signals of the M predicted sound sources, and extracting sound signals of candidate sound sources from the separated sound signals of the M predicted sound sources.
In some embodiments of the present invention, in step 401, an initial separation parameter W may be established by the number m of sound channels of the sound source data to be processed M×M (l) By iterative model based on constant-variation adaptive decomposition algorithmFor the initial separation parameter W M×M (l) And carrying out iteration, separating out the separated signals of each iteration from the sound source data to be processed when each iteration is carried out, and obtaining the final separated signals to separate out the sound signals of M predicted sound sources from the sound source data to be processed when the iteration number reaches the preset iteration number. Wherein m x m dimension of the identity matrix at I, I represents the iteration step number, α (l) represents the iteration step length, E represents the expectation, y represents a nonlinear function related to a probability density function of a sound signal of sound source data to be processed, y represents a separation signal obtained by the first time of the belt, and T represents a transpose.
In some embodiments of the present invention, in step 401, an initial separation parameter W is also established by the number m of sound channels of the sound source data to be processed M×M (l) By an iterative model based on a natural gradient methodFor the initial separation parameter W M×M (l) And (3) iterating, wherein each iteration is performed, a separation signal of the iteration is separated from the sound source data to be processed, and when the iteration number reaches the preset iteration number, a final separation signal is obtained, and sound signals of M predicted sound sources are separated from the sound source data to be processed. The parameter meaning of the iterative model based on the natural gradient method is the same as that of the iterative model based on the constant-variation adaptive decomposition algorithm, and the description is omitted here.
In some embodiments of the present invention, in step 401, the time domain signals x= { x of the M sound channels of the sound source data to be processed may also be 1 ;x 2 ;...;x M Performing short-time fourier transform to obtain frequency domain signals X (K) of M sound channels of sound source data to be processed, k=1, 2,..k, where K is the number of points of the short-time fourier transform, X (K) = { X 1 (k);...;X M (k) -a }; and (3) performing sound source separation on the frequency domain signals X (k) of each frequency point according to a separation method based on independent vector analysis of auxiliary function optimization, and separating sound signals of M predicted sound sources from the sound source data to be processed.
And 402, calculating the cross-correlation coefficient between the sound signals of each predicted sound source to obtain a correlation coefficient matrix.
In some embodiments of the present invention, when the number of candidate sound sources is unknown, there are S independent components in the sound signals of the M predicted sound sources obtained by separation, and the remaining M-S components are copies or zero signals of one or more independent components, where the S independent components are the sound signals of the S candidate sound sources, and because there is a relatively low correlation between the S independent components present in the sound signals of the M predicted sound sources, there is a correlation between the M-S components composed of copies or zero signals of one or more independent components, the sound signals of the S candidate sound sources can be extracted from the sound signals of the M predicted sound sources by a cross-correlation coefficient between the sound signals corresponding to each predicted sound source.
Specifically, step 402 includes: for each predicted sound source, calculating an autocorrelation coefficient between a sound signal corresponding to the predicted sound source and a sound signal corresponding to the predicted sound source, andthe cross-correlation coefficient between the sound signals of each predicted sound source except the sound signal corresponding to the predicted sound source is obtained; establishing a correlation coefficient matrix corresponding to the sound signals of the predicted sound source according to the correlation coefficient of the sound signals corresponding to each initial sound source
403, determining and obtaining a candidate sound source and a sound signal of the candidate sound source from each predicted sound source according to the correlation coefficient matrix.
In some embodiments of the present invention, in the correlation coefficient matrix, diagonal elements of the matrix represent autocorrelation coefficients of sound signals of the predicted sound sources, which are all necessarily 1, and other elements in the matrix represent cross correlation coefficients between sound signals of any two predicted sound sources; comparing the numerical value of each column or each row of off-diagonal elements in the correlation matrix with a preset coefficient; if the absolute difference between the numerical value of the element existing in the numerical value of each column or each row of non-diagonal elements in the correlation coefficient matrix and the preset coefficient is smaller than or equal to a target element of a preset threshold value, indicating that redundant signals which are the same as or similar to the sound signals of the predicted sound source corresponding to the diagonal elements exist in the predicted sound source, eliminating the predicted sound source corresponding to the target element; and removing redundant signals through the correlation coefficient matrix, and performing data cleaning on a plurality of predicted sound sources to obtain candidate sound sources and sound signals of the candidate sound sources.
In consideration that the sound signals of the target sound source and the sound signals of the non-target sound source are included in the sound signals of the separated candidate sound sources, and that other signals or noise components which are aliased in the sound signals of the target sound source are few, the sound quality is good, but not the sound signals of the target sound source, the sound signals of the target sound source can be selected from the sound signals of the separated candidate sound sources by evaluating the sound quality of each of the candidate sound sources because the voice quality of the sound signals of the non-target sound source is worse than the sound quality of the sound signals of the target sound source due to the aliased noise or other signals. Therefore, after obtaining the candidate sound sources in the sound source data to be processed and the sound signals of each candidate sound source, in order to further remove noise in the candidate sound sources, the embodiment of the invention evaluates the voice quality of the sound signals of each candidate sound source to obtain an evaluation value of the sound signals of each candidate sound source, screens the candidate sound sources according to the evaluation value, and selects a target sound source from a plurality of candidate sound sources.
In some embodiments of the present invention, the kurtosis value of the sound signal of each candidate sound source may be calculated to obtain an evaluation value corresponding to the sound signal of each candidate sound source, where the kurtosis value is used to describe the speech characteristics of the sound signal, and the greater the kurtosis value of the sound signal, the higher the speech quality of the sound signal. Specifically, the kurtosis value-based sound signal evaluation method includes:
(1) And performing time-frequency domain conversion on the sound signal of each candidate sound source to obtain a time domain signal of the sound signal of each candidate sound source.
(2) And determining a kurtosis value corresponding to the time domain signal of the sound signal of each candidate sound source, and setting the kurtosis value as an evaluation value corresponding to the sound signal of the candidate sound source.
In some embodiments of the present invention, the frequency domain signal Y (l, k) = [ Y ] of the sound signals of the plurality of candidate sound sources separated 1 (l,k),Y 2 (l,k),...,Y S (l,k)]Obtaining time domain signals y (l) = [ y ] of sound signals of a plurality of candidate sound sources through inverse short time Fourier transformation 1 (l),y 2 (l),...,y S (l)]For each candidate sound source, a time domain signal y is based on the sound signal of the candidate sound source s (l) S=1, …, S, byObtaining kurtosis value K (y s (l) Kurtosis value K (y) of the sound signal of the candidate sound source s (l) Set as an evaluation value corresponding to the sound signal of the candidate sound source.
In some embodiments of the present invention, when the electronic device is a voice-interactive electronic device, the wake-up word score in the sound signal of each candidate sound source may be further determined according to the voice feature of the sound signal of each candidate sound source, and the wake-up word score in the sound signal of each candidate sound source is set as the evaluation value corresponding to the sound signal of each candidate sound source, that is, the target sound source is selected from the plurality of candidate sound sources through the wake-up word score in the sound signal of each candidate sound source. The wake-up word score is used to quantify the voice quality in the voice signal of each candidate sound source, and in some embodiments of the present invention, the wake-up word score in the voice signal of each candidate sound source may be determined by determining the probability that the voice feature corresponding to the voice signal of each candidate sound source is the voice feature of the wake-up word. Specifically, the sound signal evaluation method based on the wake word score includes:
(1) A speech feature vector of the sound signal of each candidate sound source is acquired.
(2) And determining the probability score corresponding to the voice characteristic vector of the voice signal of each candidate sound source.
(3) And determining an evaluation value corresponding to the sound signal of each candidate sound source according to the probability score corresponding to the voice characteristic vector of the sound signal of each candidate sound source.
Wherein the probability score characterizes the probability that the speech feature vector is the speech feature vector corresponding to the wake-up word.
In some embodiments of the present invention, the frequency domain signals Y (l, k) = [ Y ] of the sound signals of the plurality of candidate sound sources separated 1 (l,k),Y 2 (l,k),...,Y S (l,k)]Obtaining time domain signals y (l) = [ y ] of sound signals of a plurality of candidate sound sources through inverse short time Fourier transformation 1 (l),y 2 (l),...,y S (l)]For each candidate sound source, a time domain signal y is based on the sound signal of the candidate sound source s (l) S=1, …, S, time domain signal y for each candidate sound source s (l) From this y by a mel-frequency cepstral coefficient derived from the frequency spectrum s (l) The key characteristic parameters reflecting the characteristics of the voice signal are extracted to form a characteristic vector sequence, and the characteristic vector sequence is set as the voice characteristic vector of the voice signal of the candidate sound source.
In some embodiments of the present invention, the semantic feature of the sound signal of each candidate sound source may be obtained according to the speech feature vector of the sound signal of each candidate sound source, the semantic feature of the sound signal of each candidate sound source is compared with a preset semantic feature to obtain the similarity degree of the semantic feature of the sound signal of each candidate sound source and the preset semantic feature, and the similarity degree of the semantic feature of the sound signal of each candidate sound source and the preset semantic feature is set as the probability score corresponding to the speech feature vector of the sound signal of each candidate sound source. Wherein, the semantic features of the sound signal of each candidate sound source can be obtained according to the semantic feature extraction method in step 102.
In some embodiments of the present invention, for each candidate sound source, a probability score corresponding to a speech feature vector of a sound signal of the candidate sound source may be set as an evaluation value corresponding to the sound signal of the candidate sound source.
In some embodiments of the present invention, for each candidate sound source, a probability score corresponding to a speech feature vector of a sound signal of the candidate sound source may be compared with a preset probability threshold, and if the probability score corresponding to the speech feature vector of the sound signal of the candidate sound source is greater than the preset probability threshold, an evaluation value corresponding to the sound signal of the candidate sound source is set as a first preset value; and if the probability score corresponding to the voice feature vector of the sound signal of the candidate sound source is smaller than or equal to a preset probability threshold value, setting the evaluation value corresponding to the sound signal of the candidate sound source as a second preset value. Wherein the first preset value may be 1 and the second preset value may be 0; the first preset value may also be 100 and the second preset value may also be 0.
In some embodiments of the present invention, for each candidate sound source, a probability score corresponding to a speech feature vector of a sound signal of the candidate sound source may be queried for pre-stored evaluation data, a probability interval in which the probability score corresponding to the speech feature vector of the sound signal of the candidate sound source is located and an evaluation score corresponding to the probability interval may be determined, and the evaluation score corresponding to the probability interval may be set as an evaluation value corresponding to the sound signal of the candidate sound source. The pre-stored evaluation data comprise a plurality of probability intervals and evaluation scores corresponding to the probability intervals.
In some embodiments of the present invention, after obtaining the evaluation value corresponding to the sound signal of each candidate sound source, the candidate sound source corresponding to the maximum evaluation value may be determined according to the evaluation value corresponding to the sound signal of each candidate sound source, the candidate sound source corresponding to the maximum evaluation value may be set as the target sound source, and the sound signal of the candidate sound source corresponding to the maximum evaluation value may be set as the sound signal of the target sound source.
According to the sound signal processing method provided by the embodiment of the invention, the final target sound source is determined by evaluating the sound signal of each candidate sound source obtained by blind source separation, the problem of low stability of blind source separation is solved, and the accuracy of the target sound source is improved, so that noise reduction is carried out, and the audio-visual effect is improved.
In order to better implement the sound signal processing method provided by the embodiment of the present invention, on the basis of the embodiment of the sound signal processing method, the embodiment of the present invention further provides a sound signal processing device, as shown in fig. 5, fig. 5 is a schematic structural diagram of the sound signal processing device provided by the embodiment of the present invention, where the sound signal processing device includes:
the separation module 501 is configured to perform sound source separation processing on sound source data to be processed, so as to obtain a candidate sound source corresponding to the sound source data to be processed and sound signals belonging to each candidate sound source in the sound source data to be processed;
An evaluation module 502, configured to perform quality evaluation on the sound signal of each candidate sound source, and determine an evaluation value of the sound signal of each candidate sound source;
a selecting module 503, configured to determine, according to an evaluation value corresponding to the sound signal of each candidate sound source, a target sound source from a plurality of candidate target sound sources;
and the processing module 504 is used for processing the sound signal of the target sound source.
In some embodiments of the invention, separation module 501 comprises:
the sound source estimation unit is used for carrying out sound source position estimation on the sound source data to be processed and determining and obtaining candidate sound sources corresponding to the sound source data to be processed and position information of each candidate sound source;
the vector determining unit is used for determining and obtaining the position guiding information of each candidate sound source according to the position of each sound channel for collecting the sound source data to be processed and the position information of each candidate sound source;
and the separation unit is used for carrying out sound source separation on the sound source data to be processed according to the position guiding information of each candidate sound source to obtain the sound signal of each candidate sound source.
In some embodiments of the invention, the separation unit:
a separation parameter subunit, configured to determine a separation parameter according to the position guiding information of each candidate sound source;
And the separation subunit is used for carrying out sound source separation on the sound signals in the sound source data to be processed according to the separation parameters, and determining to obtain the sound signals of each candidate sound source.
In some embodiments of the invention, the separation parameter subunit is configured to:
acquiring historical separation parameters of historical sound source data and auxiliary parameters corresponding to the sound source data to be processed;
correcting the auxiliary parameter matrix according to the position guide information of each candidate sound source to obtain corrected auxiliary parameters;
and obtaining the separation parameters of the sound source data to be processed according to the corrected auxiliary parameters and the historical separation parameters.
In some embodiments of the invention, the sound source estimating unit is configured to:
determining and obtaining a plurality of initial sound source positions according to a preset azimuth angle;
determining the distance between each initial sound source position and each sound channel position for collecting the sound source data to be processed according to each initial sound source position;
determining and obtaining the power of the sound signal at each initial sound source position according to the distance between each initial sound source position and each sound channel position;
and determining and obtaining the candidate sound source and the position information of the candidate sound source according to the power of the sound signal at each initial sound source position.
In some embodiments of the invention, the sound source estimating unit is configured to:
for each initial sound source position, determining time information of the signal of the initial sound source position reaching each sound channel position according to the distance between the initial sound source position and each sound channel position;
determining the power of the sound signals of the sound channel positions according to the time information of the signals of the initial sound source positions reaching the sound channel positions;
the power of the sound signal at the initial sound source position is determined based on the power of the sound signal at each sound channel position.
In some embodiments of the invention, the sound source estimating unit is configured to:
determining, for each sound channel position, first time information for the signal of the initial sound source position to reach the sound channel position, and second time information for the signal of the initial sound source position to reach a next sound channel position adjacent to the sound channel position;
determining a time difference between the first time information and the second time information;
and determining the power of the sound signal of the sound channel position according to the time difference, the sound signal of the sound channel position and the sound signal of the next sound source channel position adjacent to the sound channel position.
In some embodiments of the invention, the sound source estimating unit is configured to:
determining initial power of the sound signals of the sound channel positions according to the time information of the signals of the initial sound source positions reaching the sound channel positions;
determining and obtaining target power in initial power corresponding to each two adjacent sound channel positions according to initial power corresponding to each sound channel position, wherein the target power represents a larger value in initial power corresponding to each two adjacent sound channel positions;
for each sound channel position, determining and obtaining the power weight of the sound channel position according to the initial power corresponding to the sound channel position, the initial power corresponding to the next adjacent sound channel position of the sound channel position and each target power;
and determining and obtaining the power of the sound channel position according to the initial power corresponding to the sound channel position and the power weight of the sound channel position.
In some embodiments of the invention, the vector determination unit is configured to:
for each candidate sound source, determining and obtaining time information of the signal of the candidate sound source reaching the channel position of each sound source according to the position information of the candidate sound source;
And obtaining the position guiding information of the candidate sound source according to the time information of the signal of the candidate sound source reaching the position of each sound source channel.
In some embodiments of the invention, separation module 501 comprises:
the initial separation unit is used for carrying out sound source separation on the sound source data to be processed to obtain a predicted sound source corresponding to the sound source data to be processed and sound signals belonging to each predicted sound source in the sound source data to be processed;
the correlation calculation unit is used for calculating the cross correlation coefficient between the sound signals of each predicted sound source to obtain a correlation coefficient matrix;
and the screening unit is used for determining and obtaining the candidate sound sources and the sound signals of the candidate sound sources from all the predicted sound sources according to the correlation coefficient matrix.
In some embodiments of the present invention, the evaluation module 502 is configured to:
performing time-frequency domain conversion on the sound signals of each candidate sound source to obtain time domain signals of the sound signals of each candidate sound source;
and determining a kurtosis value corresponding to the time domain signal of the sound signal of each candidate sound source, and setting the kurtosis value as an evaluation value corresponding to the sound signal of the candidate sound source.
In some embodiments of the present invention, the evaluation module 502 is configured to:
acquiring a voice characteristic vector of a sound signal of each candidate sound source;
Determining probability scores corresponding to the voice feature vectors of the voice signals of each candidate sound source; the probability score represents the probability that the voice feature vector is the voice feature vector corresponding to the wake-up word;
and determining an evaluation value corresponding to the sound signal of each candidate sound source according to the probability score corresponding to the voice characteristic vector of the sound signal of each candidate sound source.
In some embodiments of the present invention, the selecting module 503 is configured to:
according to the evaluation value corresponding to the sound signal of each candidate sound source, determining the candidate sound source corresponding to the maximum evaluation value;
and setting the candidate sound source corresponding to the maximum evaluation value as a target sound source.
According to the sound signal processing device provided by the embodiment of the invention, the final target sound source is determined by evaluating the sound signal of each candidate sound source obtained by blind source separation, the problem of low stability of blind source separation is solved, and the accuracy of the target sound source is improved, so that noise reduction is carried out, and the audio-visual effect is improved.
Accordingly, an embodiment of the present invention also provides an electronic device, as shown in fig. 6, where the electronic device may include a Radio Frequency (RF) circuit 601, a memory 602 including one or more computer readable storage media, an input unit 603, a display unit 604, a sensor 605, an audio circuit 606, a wireless fidelity (WiFi, wireless Fidelity) module 607, a processor 608 including one or more processing cores, and a power supply 609. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 6 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
The RF circuit 601 may be used for receiving and transmitting signals during a message or a call, and in particular, after receiving downlink information of a base station, the downlink information is processed by one or more processors 608; in addition, data relating to uplink is transmitted to the base station. Typically, RF circuitry 601 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a subscriber identity module (SIM, subscriber Identity Module) card, a transceiver, a coupler, a low noise amplifier (LNA, low Noise Amplifier), a duplexer, and the like. In addition, the RF circuitry 601 may also communicate with networks and other devices through wireless communications. The wireless communication may use any communication standard or protocol including, but not limited to, global system for mobile communications (GSM, global System of Mobile communication), general packet radio service (GPRS, general Packet Radio Service), code division multiple access (CDMA, code Division Multiple Access), wideband code division multiple access (WCDMA, wideband Code Division Multiple Access), long term evolution (LTE, long Term Evolution), email, short message service (SMS, short Messaging Service), and the like.
The memory 602 may be used to store software programs and modules that are stored in the memory 602 for execution by the processor 608 to perform various functional applications and data processing. The memory 602 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, a computer program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device (such as audio data, phonebooks, etc.), and the like. In addition, the memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 602 may also include a memory controller to provide access to the memory 602 by the processor 608 and the input unit 603.
The input unit 603 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, in one particular embodiment, the input unit 603 may include a touch-sensitive surface, as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations thereon or thereabout by a user (e.g., operations thereon or thereabout by a user using any suitable object or accessory such as a finger, stylus, etc.), and actuate the corresponding connection means according to a predetermined program. Alternatively, the touch-sensitive surface may comprise two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 608, and can receive commands from the processor 608 and execute them. In addition, touch sensitive surfaces may be implemented in a variety of types, such as resistive, capacitive, infrared, and surface acoustic waves. The input unit 603 may comprise other input devices in addition to a touch sensitive surface. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.
The display unit 604 may be used to display information entered by a user or provided to a user as well as various graphical user interfaces of the electronic device, which may be composed of graphics, text, icons, video, and any combination thereof. The display unit 604 may include a display panel, which may be optionally configured in the form of a liquid crystal display (LCD, liquid Crystal Display), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch-sensitive surface may overlay a display panel, and upon detection of a touch operation thereon or thereabout, the touch-sensitive surface is passed to the processor 608 to determine the type of touch event, and the processor 608 then provides a corresponding visual output on the display panel based on the type of touch event. Although in fig. 6 the touch sensitive surface and the display panel are implemented as two separate components for input and output functions, in some embodiments the touch sensitive surface may be integrated with the display panel to implement the input and output functions.
The electronic device may also include at least one sensor 605, such as a light sensor, a motion sensor, and other sensors. In particular, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel according to the brightness of ambient light, and a proximity sensor that may turn off the display panel and/or backlight when the electronic device is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and the direction when the mobile phone is stationary, and can be used for applications of recognizing the gesture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the electronic device are not described in detail herein.
Audio circuitry 606, speakers, and a microphone may provide an audio interface between the user and the electronic device. The audio circuit 606 may transmit the received electrical signal after audio data conversion to a speaker, where the electrical signal is converted to a sound signal for output; on the other hand, the microphone converts the collected sound signals into electrical signals, which are received by the audio circuit 606 and converted into audio data, which are processed by the audio data output processor 608 for transmission via the RF circuit 601 to, for example, another electronic device, or which are output to the memory 602 for further processing. The audio circuit 606 may also include an ear bud jack to provide communication of the peripheral ear bud with the electronic device.
WiFi belongs to a short-distance wireless transmission technology, and the electronic equipment can help a user to send and receive emails, browse webpages, access streaming media and the like through the WiFi module 607, so that wireless broadband Internet access is provided for the user. Although fig. 6 shows a WiFi module 607, it is understood that it does not belong to the necessary constitution of the electronic device, and can be omitted entirely as needed within the scope of not changing the essence of the invention.
The processor 608 is a control center of the electronic device that uses various interfaces and lines to connect the various parts of the overall handset, performing various functions of the electronic device and processing the data by running or executing software programs and/or modules stored in the memory 602, and invoking data stored in the memory 602, thereby performing overall monitoring of the handset. Optionally, the processor 608 may include one or more processing cores; preferably, the processor 608 may integrate an application processor that primarily handles operating systems, user interfaces, computer programs, and the like, with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 608.
The electronic device also includes a power supply 609 (e.g., a battery) for powering the various components, which may be logically connected to the processor 608 via a power management system so as to perform functions such as managing charge, discharge, and power consumption via the power management system. The power supply 609 may also include one or more of any components, such as a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
Although not shown, the electronic device may further include a camera, a bluetooth module, etc., which will not be described herein. In particular, in this embodiment, the processor 608 in the electronic device loads executable files corresponding to the processes of one or more computer programs into the memory 602 according to the following instructions, and the processor 608 executes the computer programs stored in the memory 602, so as to implement various functions:
performing sound source separation processing on the sound source data to be processed to obtain candidate sound sources corresponding to the sound source data to be processed and sound signals belonging to each candidate sound source in the sound source data to be processed;
performing quality evaluation on the sound signals of each candidate sound source, and determining an evaluation value of the sound signals of each candidate sound source;
Determining a target sound source from a plurality of candidate target sound sources according to the evaluation value of the sound signal of each candidate sound source;
the sound signal of the target sound source is processed.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, an embodiment of the present invention provides a storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any one of the sound signal processing methods provided by the embodiment of the present invention. For example, the computer program may perform the steps of:
performing sound source separation processing on the sound source data to be processed to obtain candidate sound sources corresponding to the sound source data to be processed and sound signals belonging to each candidate sound source in the sound source data to be processed;
performing quality evaluation on the sound signals of each candidate sound source, and determining an evaluation value of the sound signals of each candidate sound source;
determining a target sound source from a plurality of candidate target sound sources according to the evaluation value of the sound signal of each candidate sound source;
The sound signal of the target sound source is processed.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
The steps of any sound signal processing method provided by the embodiment of the present invention can be executed by the computer program stored in the storage medium, so that the beneficial effects of any sound signal processing method provided by the embodiment of the present invention can be achieved, which are detailed in the previous embodiments and are not described herein.
The foregoing describes in detail a sound signal processing method, apparatus, electronic device and storage medium provided by the embodiments of the present invention, and specific examples are applied to illustrate the principles and embodiments of the present invention, where the foregoing examples are only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present invention, the present description should not be construed as limiting the present invention.

Claims (16)

1. A sound signal processing method, the method comprising:
performing sound source separation processing on sound source data to be processed to obtain candidate sound sources corresponding to the sound source data to be processed and sound signals belonging to the candidate sound sources in the sound source data to be processed;
performing quality evaluation on the sound signals of each candidate sound source, and determining an evaluation value of the sound signals of each candidate sound source;
determining a target sound source from a plurality of candidate target sound sources according to the evaluation value of the sound signal of each candidate sound source;
and processing the sound signal of the target sound source.
2. The sound signal processing method as claimed in claim 1, wherein the performing the sound source separation processing on the sound source data to be processed to obtain the candidate sound source corresponding to the sound source data to be processed and the sound signals belonging to each of the candidate sound sources in the sound source data to be processed includes:
performing sound source position estimation on sound source data to be processed, and determining and obtaining candidate sound sources corresponding to the sound source data to be processed and position information of each candidate sound source;
determining and obtaining the position guide information of each candidate sound source according to the position of each sound channel for collecting the sound source data to be processed and the position information of each candidate sound source;
And carrying out sound source separation on the sound source data to be processed according to the position guiding information of each candidate sound source to obtain the sound signal of each candidate sound source.
3. The sound signal processing method as claimed in claim 2, wherein said performing sound source separation on said sound source data to be processed based on the position guide information of each of said candidate sound sources to obtain the sound signal of each of said candidate sound sources, comprises:
determining and obtaining separation parameters according to the position guide information of each candidate sound source;
and carrying out sound source separation on the sound signals in the sound source data to be processed according to the separation parameters, and determining to obtain the sound signals of each candidate sound source.
4. A sound signal processing method according to claim 3, wherein said determining a separation parameter based on position guidance information of each of said candidate sound sources comprises:
acquiring historical separation parameters of historical sound source data and auxiliary parameters corresponding to the sound source data to be processed;
correcting the auxiliary parameters according to the position guide information of each candidate sound source to obtain corrected auxiliary parameters;
and obtaining the separation parameter of the sound source data to be processed according to the corrected auxiliary parameter and the historical separation parameter.
5. The sound signal processing method as claimed in claim 2, wherein the performing sound source position estimation on the sound source data to be processed, and determining to obtain the candidate sound sources corresponding to the sound source data to be processed and the position information of each candidate sound source include:
determining and obtaining a plurality of initial sound source positions according to a preset azimuth angle;
determining the distance between each initial sound source position and each sound channel position for collecting the sound source data to be processed according to each initial sound source position;
determining power of the sound signal at each initial sound source position according to the distance between each initial sound source position and each sound channel position;
and determining and obtaining a candidate sound source and position information of the candidate sound source according to the power of the sound signal at each initial sound source position.
6. The sound signal processing method as claimed in claim 5, wherein said determining the power of the sound signal at each of the initial sound source positions based on the distance between each of the initial sound source positions and each of the sound channel positions comprises:
determining, for each of the initial sound source positions, time information of arrival of a signal of the initial sound source position at each of the sound channel positions according to a distance between the initial sound source position and each of the sound channel positions;
Determining the power of the sound signal of each sound channel position according to the time information of the signal of the initial sound source position reaching each sound channel position;
and determining and obtaining the power of the sound signal at the initial sound source position according to the power of the sound signal at each sound channel position.
7. The sound signal processing method as claimed in claim 6, wherein said determining the power of the sound signal to each of said sound channel positions based on the time information of arrival of the signal of the initial sound source position at each of said sound channel positions comprises:
determining, for each of the sound channel positions, first time information at which the signal of the initial sound source position reaches the sound channel position, and second time information at which the signal of the initial sound source position reaches a next sound source channel position adjacent to the sound channel position;
determining a time difference between the first time information and the second time information;
and determining the power of the sound signal of the sound channel position according to the time difference, the sound signal of the sound channel position and the sound signal of the next sound source channel position adjacent to the sound channel position.
8. The sound signal processing method as claimed in claim 6, wherein said determining the power of the sound signal to each of said sound channel positions based on the time information of arrival of the signal of the initial sound source position at each of said sound channel positions comprises:
determining initial power of the sound signal of each sound channel position according to time information of the signal of the initial sound source position reaching each sound channel position;
determining and obtaining target power in initial power corresponding to each two adjacent sound channel positions according to initial power corresponding to each sound channel position, wherein the target power represents a larger value in initial power corresponding to each two adjacent sound channel positions;
determining, for each of the sound channel positions, a power weight for obtaining the sound channel position according to an initial power corresponding to the sound channel position, an initial power corresponding to a next sound channel position adjacent to the sound channel position, and each of the target powers;
and determining and obtaining the power of the sound channel position according to the initial power corresponding to the sound channel position and the power weight of the sound channel position.
9. The sound signal processing method as claimed in claim 2, wherein said determining the position guidance information of each of said candidate sound sources based on the position of each sound channel where said sound source data to be processed is acquired and the position information of each of said candidate sound sources, comprises:
determining, for each candidate sound source, time information for obtaining the arrival of the signal of the candidate sound source at the channel position of each sound source according to the position information of the candidate sound source;
and obtaining the position guiding information of the candidate sound source according to the time information of the signal of the candidate sound source reaching the position of each sound source channel.
10. The sound signal processing method as claimed in claim 1, wherein said performing sound source separation on the processed sound source data to obtain a candidate sound source corresponding to the sound source data to be processed and sound signals belonging to each of the candidate sound sources in the sound source data to be processed includes:
performing sound source separation on the sound source data to be processed to obtain a predicted sound source corresponding to the sound source data to be processed and sound signals belonging to each predicted sound source in the sound source data to be processed;
calculating the cross-correlation coefficient between the sound signals of the predicted sound sources to obtain a correlation coefficient matrix;
And determining and obtaining a candidate sound source and sound signals of the candidate sound source from each predicted sound source according to the correlation coefficient matrix.
11. The sound signal processing method as claimed in claim 1, wherein said performing quality evaluation on the sound signal of each of said candidate sound sources, determining an evaluation value corresponding to the sound signal of each of said candidate sound sources comprises:
performing time-frequency domain conversion on the sound signals of each candidate sound source to obtain time domain signals of the sound signals of each candidate sound source;
and determining kurtosis values corresponding to time domain signals of the sound signals of each candidate sound source, and setting the kurtosis values as evaluation values corresponding to the sound signals of the candidate sound sources.
12. The sound signal processing method as claimed in claim 1, wherein said performing quality evaluation on the sound signal of each of said candidate sound sources, determining an evaluation value corresponding to the sound signal of each of said candidate sound sources comprises:
acquiring a voice characteristic vector of a sound signal of each candidate sound source;
determining probability scores corresponding to the voice feature vectors of the voice signals of each candidate sound source; the probability score represents the probability that the voice feature vector is the voice feature vector corresponding to the wake-up word;
And determining an evaluation value corresponding to the sound signal of each candidate sound source according to the probability score corresponding to the voice characteristic vector of the sound signal of each candidate sound source.
13. The sound signal processing method according to any one of claims 1 to 12, wherein the determining a target sound source from among the plurality of sound candidate sound sources based on the evaluation value corresponding to the sound signal of each of the candidate sound sources includes:
according to the evaluation value corresponding to the sound signal of each candidate sound source, determining and obtaining the candidate sound source corresponding to the maximum evaluation value;
and setting the candidate sound source corresponding to the maximum evaluation value as a target sound source.
14. An acoustic signal processing apparatus, the apparatus comprising:
the separation module is used for carrying out sound source separation processing on the sound source data to be processed to obtain candidate sound sources corresponding to the sound source data to be processed and sound signals belonging to the candidate sound sources in the sound source data to be processed;
the evaluation module is used for carrying out quality evaluation on the sound signals of each candidate sound source and determining an evaluation value of the sound signals of each candidate sound source;
the selecting module is used for determining and obtaining a target sound source from a plurality of candidate target sound sources according to the evaluation value corresponding to the sound signal of each candidate sound source;
And the processing module is used for processing the sound signal of the target sound source.
15. An electronic device comprising a memory and a processor; the memory stores a computer program, and the processor is configured to execute the computer program in the memory to perform the operations in the sound signal processing method according to any one of claims 1 to 13.
16. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the sound signal processing method of any one of claims 1 to 13.
CN202210944168.2A 2022-08-05 2022-08-05 Sound signal processing method, device, electronic equipment and storage medium Pending CN117153186A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210944168.2A CN117153186A (en) 2022-08-05 2022-08-05 Sound signal processing method, device, electronic equipment and storage medium
PCT/CN2023/092372 WO2024027246A1 (en) 2022-08-05 2023-05-05 Sound signal processing method and apparatus, and electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210944168.2A CN117153186A (en) 2022-08-05 2022-08-05 Sound signal processing method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117153186A true CN117153186A (en) 2023-12-01

Family

ID=88904825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210944168.2A Pending CN117153186A (en) 2022-08-05 2022-08-05 Sound signal processing method, device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN117153186A (en)
WO (1) WO2024027246A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118010165A (en) * 2024-04-08 2024-05-10 宁波泰利电器有限公司 Automatic induction temperature early warning method and system for hair straightening comb

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100070274A1 (en) * 2008-09-12 2010-03-18 Electronics And Telecommunications Research Institute Apparatus and method for speech recognition based on sound source separation and sound source identification
JP2016045225A (en) * 2014-08-19 2016-04-04 日本電信電話株式会社 Number of sound sources estimation device, number of sound sources estimation method, and number of sound sources estimation program
CN106797413A (en) * 2014-09-30 2017-05-31 惠普发展公司,有限责任合伙企业 Sound is adjusted
US20180299527A1 (en) * 2015-12-22 2018-10-18 Huawei Technologies Duesseldorf Gmbh Localization algorithm for sound sources with known statistics
CN113096684A (en) * 2021-06-07 2021-07-09 成都启英泰伦科技有限公司 Target voice extraction method based on double-microphone array
CN113327624A (en) * 2021-05-25 2021-08-31 西北工业大学 Method for intelligently monitoring environmental noise by adopting end-to-end time domain sound source separation system
CN114220454A (en) * 2022-01-25 2022-03-22 荣耀终端有限公司 Audio noise reduction method, medium and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100070274A1 (en) * 2008-09-12 2010-03-18 Electronics And Telecommunications Research Institute Apparatus and method for speech recognition based on sound source separation and sound source identification
JP2016045225A (en) * 2014-08-19 2016-04-04 日本電信電話株式会社 Number of sound sources estimation device, number of sound sources estimation method, and number of sound sources estimation program
CN106797413A (en) * 2014-09-30 2017-05-31 惠普发展公司,有限责任合伙企业 Sound is adjusted
US20180299527A1 (en) * 2015-12-22 2018-10-18 Huawei Technologies Duesseldorf Gmbh Localization algorithm for sound sources with known statistics
CN113327624A (en) * 2021-05-25 2021-08-31 西北工业大学 Method for intelligently monitoring environmental noise by adopting end-to-end time domain sound source separation system
CN113096684A (en) * 2021-06-07 2021-07-09 成都启英泰伦科技有限公司 Target voice extraction method based on double-microphone array
CN113889138A (en) * 2021-06-07 2022-01-04 成都启英泰伦科技有限公司 Target voice extraction method based on double-microphone array
CN114220454A (en) * 2022-01-25 2022-03-22 荣耀终端有限公司 Audio noise reduction method, medium and electronic equipment

Also Published As

Publication number Publication date
WO2024027246A1 (en) 2024-02-08

Similar Documents

Publication Publication Date Title
CN110176226B (en) Speech recognition and speech recognition model training method and device
CN110164469B (en) Method and device for separating multi-person voice
CN106710596B (en) Answer sentence determination method and device
CN109558512B (en) Audio-based personalized recommendation method and device and mobile terminal
CN108304758B (en) Face characteristic point tracking method and device
CN110163367B (en) Terminal deployment method and device
CN110570840B (en) Intelligent device awakening method and device based on artificial intelligence
WO2019128639A1 (en) Method for detecting audio signal beat points of bass drum, and terminal
CN107229629B (en) Audio recognition method and device
CN109376781B (en) Training method of image recognition model, image recognition method and related device
CN109756818B (en) Dual-microphone noise reduction method and device, storage medium and electronic equipment
CN109302528B (en) Photographing method, mobile terminal and computer readable storage medium
CN112820299A (en) Voiceprint recognition model training method and device and related equipment
CN111722696B (en) Voice data processing method and device for low-power-consumption equipment
CN111477243A (en) Audio signal processing method and electronic equipment
CN110517677B (en) Speech processing system, method, apparatus, speech recognition system, and storage medium
WO2024027246A1 (en) Sound signal processing method and apparatus, and electronic device and storage medium
CN110083742B (en) Video query method and device
CN108600559B (en) Control method and device of mute mode, storage medium and electronic equipment
CN109284783B (en) Machine learning-based worship counting method and device, user equipment and medium
CN112748899A (en) Data processing method and related equipment
CN106782614B (en) Sound quality detection method and device
CN112948763B (en) Piece quantity prediction method and device, electronic equipment and storage medium
US20210110838A1 (en) Acoustic aware voice user interface
CN111091180B (en) Model training method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination