CN110942779A - Noise processing method, device and system - Google Patents

Noise processing method, device and system Download PDF

Info

Publication number
CN110942779A
CN110942779A CN201911106466.9A CN201911106466A CN110942779A CN 110942779 A CN110942779 A CN 110942779A CN 201911106466 A CN201911106466 A CN 201911106466A CN 110942779 A CN110942779 A CN 110942779A
Authority
CN
China
Prior art keywords
voice information
target user
information
voice
audio information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911106466.9A
Other languages
Chinese (zh)
Inventor
吴科苇
刘兵兵
刘如意
王峰
车洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Cloud Computing Co Ltd
Original Assignee
Suning Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Cloud Computing Co Ltd filed Critical Suning Cloud Computing Co Ltd
Priority to CN201911106466.9A priority Critical patent/CN110942779A/en
Publication of CN110942779A publication Critical patent/CN110942779A/en
Priority to CA3160740A priority patent/CA3160740A1/en
Priority to PCT/CN2020/105992 priority patent/WO2021093380A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The embodiment of the application discloses a noise processing method, a device and a system, wherein the method comprises the following steps: detecting the collected audio information; when the voice information is detected, filtering the voice information according to pre-stored audio information of a target user; judging whether voice information exists after filtering processing; and if so, recognizing the voice information after the filtering processing and performing corresponding feedback according to a recognition result. This application regards the pronunciation of the target person who acquires as prior information, consequently when non-target person sends the instruction, according to this prior information to can suppress non-target person's instruction, when target person has other voice interference, when the environmental noise in the time of sending the instruction, can suppress same position, similar position, the voice interference and the environmental noise of distant place according to this prior information, thereby obtain not containing other voice, the instruction of environmental noise, the definition of target person's sound has been improved, interactive experience has been improved.

Description

Noise processing method, device and system
Technical Field
The invention belongs to the field of acoustics, and particularly relates to a noise processing method, device and system.
Background
With the development of artificial intelligence, more and more living environments will embody more intellectualization, such as vehicle-mounted environments, home environments, classroom environments, meeting room environments and the like. Among the various intelligent devices employed in these environments, intelligent voice interaction devices play an important role. The intelligent voice interaction device realizes the voice interaction between human beings and the device, so that the device can replace the human beings to perform some operations and controls according to the meanings of the human beings, the hands of the human beings are liberated as far as possible, and the intelligent voice interaction device is indispensable in the future.
Since the actual living environment is often complicated, there are many noises and disturbing sounds in addition to the target person's voice. These noises and interfering sounds are undesirable and their presence can severely interfere with human interaction with the speech device, reducing the interaction experience. To avoid the interference of these noises and interfering sounds, a microphone array is usually used to perform beam forming or blind source separation, enhance the sound in a specific direction to suppress the sound in other directions or separate the sound of a specific target person.
However, conventional beamforming or blind source separation is not effective in suppressing interference or separating target sounds in all environments. When the disturbing sound is also human voice and is close to the direction of the target sound or at the same direction or far away from the target sound, the effect of the above-mentioned method is reduced sharply.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a noise processing method, a noise processing device and a noise processing system. The method can not only solve the interference of environmental noise, but also solve the interference of human voice in the same direction, close direction and far direction, and improve the interactive experience between people and equipment.
The embodiment of the invention provides the following specific technical scheme:
in a first aspect, the present invention provides a method for processing noise, the method comprising:
detecting the collected audio information;
when voice information is detected, filtering the voice information according to pre-stored audio information of a target user;
judging whether voice information exists after filtering processing;
and if so, recognizing the voice information after the filtering processing and performing corresponding feedback according to a recognition result.
Preferably, the filtering the voice information according to the pre-stored audio information of the target user specifically includes:
constructing an acoustic model; the acoustic model is a Gaussian mixture model, and the variable of the acoustic model is the voice information, and the initial value of the parameter is a covariance matrix obtained by calculating the audio information of the target user;
modifying parameters of the acoustic model according to an EM algorithm;
judging whether the iteration times of the EM algorithm reach a preset value or not;
when the acoustic model is reached, acquiring an output result of the acoustic model;
and filtering the voice information according to the output result.
Preferably, when the voice information is detected, the method further comprises:
and carrying out echo cancellation on the voice information.
Preferably, the method further comprises:
sending an operation instruction to the target user according to the received request sent by the target user;
receiving audio information sent by the target user according to the operation instruction;
and storing the audio information sent by the target user according to the operation instruction.
Preferably, the algorithm for detecting the collected audio information includes any one of a pitch detection algorithm, a double-threshold method, and an a posteriori snr frequency domain iterative algorithm.
In a second aspect, the present invention provides a noise processing apparatus, the apparatus comprising:
the detection module is used for detecting the acquired audio information;
the analysis module is used for carrying out filtering processing on the voice information according to pre-stored audio information of a target user when the voice information is detected;
the judging module is used for judging whether voice information exists after filtering processing;
and the recognition module is used for recognizing the filtered voice information and performing corresponding feedback according to a recognition result when the voice information exists.
Preferably, the analysis module specifically includes:
the construction module is used for constructing an acoustic model; the acoustic model is a Gaussian mixture model, and the variable of the acoustic model is the voice information, and the initial value of the parameter is a covariance matrix obtained by calculating the audio information of the target user;
the correction module is used for correcting the parameters of the acoustic model according to an EM algorithm;
the processing module is used for judging whether the iteration times of the EM algorithm reach a preset value or not; when the acoustic model is reached, acquiring an output result of the acoustic model; and filtering the voice information according to the output result.
Preferably, the analysis module further comprises:
and the echo cancellation module is used for performing echo cancellation on the voice information when the voice information is detected.
Preferably, the apparatus further comprises a storage module configured to:
sending an operation instruction to the target user according to the received request sent by the target user;
receiving audio information sent by the target user according to the operation instruction;
and storing the audio information sent by the target user according to the operation instruction.
In a third aspect, the present invention provides a computer system comprising:
one or more processors; and
a memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform operations comprising:
detecting the collected audio information;
when voice information is detected, filtering the voice information according to pre-stored audio information of a target user;
judging whether voice information exists after filtering processing;
and if so, recognizing the voice information after the filtering processing and performing corresponding feedback according to a recognition result.
The embodiment of the invention has the following beneficial effects:
according to the invention, the voice of the target person is firstly acquired as the prior information, so that when the non-target person sends an instruction, the instruction of the non-target person can be inhibited according to the prior information, and when other voice interference and environmental noise exist while the target person sends the instruction, the voice interference and the environmental noise in the same direction, the similar direction and the distant position can be inhibited according to the prior information, so that the instruction without other voice and environmental noise is obtained, the definition of the voice of the target person is improved, and the interactive experience is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is an application environment diagram of a noise processing method according to an embodiment of the present application;
fig. 2 is a flowchart of a noise processing method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a noise processing apparatus according to a second embodiment of the present application;
fig. 4 is a schematic position diagram of a noise processing apparatus and an experimental user according to a second embodiment of the present application;
fig. 5 is a diagram of a computer system architecture according to a third embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present application provides a noise processing method, which can be applied to the application environment shown in fig. 1. The server 12 communicates with the database 11 and the terminal 13 via a network. The terminal 13 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 12 may be implemented by an independent server or a server cluster formed by a plurality of servers.
Example one
As shown in fig. 2, the present application provides a noise processing method, which specifically includes the following steps:
and S21, detecting the collected audio information.
The detection algorithm may include any one of a pitch detection algorithm, a double-threshold method, and an a posteriori snr frequency domain iterative algorithm.
In addition, the method can also be any other algorithm which can realize the detection of the voice breakpoint, and the selection of the algorithm is not limited by the scheme.
And S22, when the voice information is detected, filtering the voice information according to the pre-stored audio information of the target user.
When voice information is detected, the method further comprises the following steps:
and carrying out echo cancellation on the voice information.
The echo in this embodiment refers to an acoustic echo, and when performing echo cancellation, the echo can be implemented by an acoustic echo cancellation method commonly used in the art, such as an echo suppression algorithm or an acoustic echo cancellation algorithm, which is not limited in this invention.
Wherein, the detected voice information comprises environmental noise and/or human voice interference noise.
The filtering processing of the voice information according to the pre-stored audio information of the target user specifically comprises the following steps:
1. constructing an acoustic model;
the acoustic model is a Gaussian mixture model, the variable of the acoustic model is voice information, and the initial value of the parameter is a covariance matrix obtained by calculating the audio information of the target user;
the Gaussian Mixture Model (GMM) can be represented by the following equation:
Figure BDA0002271463260000061
wherein x is voice information, N (x | mu)kΣ k) is the component of the kth in the model; pikIs the mixing coefficient, i.e. the weight of each component; pik、μkΣ k is a parameter of the gaussian mixture model, and the initial value thereof is a covariance matrix obtained by calculating the audio information of the target user;
2. modifying parameters of the acoustic model according to an EM algorithm;
wherein the EM algorithm is a maximum expectation algorithm.
The step 2 specifically includes the following two substeps:
a. calculating posterior probability according to the initial value of the current parameter;
b. and correcting the parameters according to the posterior probability.
3. Judging whether the iteration times of the EM algorithm reach a preset value or not;
in the scheme, the iteration times are set according to an empirical value, and when the execution times of the EM algorithm (the execution times of the steps a and b) reach a preset value, the iteration is ended.
4. When the acoustic model is reached, obtaining an output result of the acoustic model;
the output result is the posterior probability calculated according to the parameters in the last iteration.
5. And filtering the voice information according to the output result.
Therefore, the interference of environmental noise, the voice of the same direction, the voice of the near direction and the voice of the far direction can be effectively inhibited.
And S23, judging whether voice information exists after the filtering processing.
When the detected voice information does not exist, the detected voice information is the voice sent by the non-target user; when present, it indicates that the detected voice information includes voice uttered by the target user.
And S24, if the voice information exists, recognizing the voice information after the filtering processing and feeding back the voice information correspondingly according to the recognition result.
Specifically, the voice information after filtering processing is converted into text content, the intention of the user is judged by means of word segmentation technology and the like, corresponding feedback is carried out, and meanwhile, an evaluation index is output and used for evaluating the accuracy of the voice recognition process.
The evaluation index may be a Sentence Error Rate (SER), a sentence accuracy rate (s.corr), a word error rate (WER/CER), or the like.
In addition, the acquisition of the pre-stored audio information of the target user comprises the following steps:
1. sending an operation instruction to a target user according to a received request sent by the target user;
the method can be applied to intelligent voice interaction equipment, so that the request sent by the target user can be a reset request for the equipment. According to the request sent by the target user, sending an operation instruction to the target user, such as:
voice prompt: please adjust the sitting posture, please say that the head is Biu small Biu, please incline the head to the left by about 10cm, then say that the head is Biu small Biu, please incline the head to the right by about 10cm, then say that the head is Biu small Biu, please incline the body forward by about 10cm, then say that the head is Biu small Biu, etc.
2. Receiving audio information sent by a target user according to an operation instruction;
the target user sends corresponding audio information according to the operation instruction, for example, when receiving the operation instruction of 'please adjust sitting posture', replying: "sitting posture adjusted"; when an operation command of 'please say Xiao Biu Xiao Biu', replying to 'Xiao Biu Xiao Biu'; when an operation command of 'please incline the head to the left by about 10cm and then say a small Biu small Biu' is received, corresponding actions are continuously carried out according to the command and the reply is carried out.
When there are a plurality of operation commands, the operation commands are transmitted at set time intervals.
Such as: and sending an operation instruction every 2 s.
3. And storing the audio information sent by the target user according to the operation instruction.
Such as: the voices such as 'adjusted sitting posture', 'small Biu and Biu' returned by the target user are stored.
According to the invention, the voice of the target person is firstly acquired as the prior information, so that when the non-target person sends an instruction, the instruction of the non-target person can be inhibited according to the prior information, and when other voice interference and environmental noise exist while the target person sends the instruction, the voice interference and the environmental noise in the same direction, the similar direction and the distant position can be inhibited according to the prior information, so that the instruction without other voice and environmental noise is obtained, the definition of the voice of the target person is improved, and the interactive experience is improved.
Example two
As shown in fig. 3, the present application provides a noise processing apparatus, which specifically includes:
the detection module 31 is used for detecting the acquired audio information;
the analysis module 32 is configured to, when voice information is detected, perform filtering processing on the voice information according to pre-stored audio information of a target user;
a judging module 33, configured to judge whether there is voice information after the filtering processing;
and the recognition module 34 is configured to, when the voice information exists, recognize the filtered voice information and perform corresponding feedback according to a recognition result.
Preferably, the analysis module 32 specifically includes:
a construction module 321, configured to construct an acoustic model; the acoustic model is a Gaussian mixture model, the variable of the acoustic model is voice information, and the initial value of the parameter is a covariance matrix obtained by calculating the audio information of the target user;
a modification module 322, configured to modify parameters of the acoustic model according to an EM algorithm;
the processing module 323 is used for judging whether the iteration times of the EM algorithm reach a preset value; when the acoustic model is reached, obtaining an output result of the acoustic model; and filtering the voice information according to the output result.
Preferably, the analysis module 32 further includes:
and an echo cancellation module 324, configured to perform echo cancellation on the voice information when the voice information is detected.
The apparatus further comprises a storage module 35 configured to:
sending an operation instruction to a target user according to a received request sent by the target user;
receiving audio information sent by a target user according to an operation instruction;
and storing the audio information sent by the target user according to the operation instruction.
Preferably, the algorithm for detecting the collected audio information includes any one of a pitch detection algorithm, a double-threshold method, and an a posteriori snr frequency domain iterative algorithm.
When the noise processing apparatus is an intelligent interactive device, the intelligent interactive device includes a voice interactive system and a voice recognition system, wherein the voice interactive system includes the detection module 31, the analysis module 32, the determination module 33, and the storage module 35, and the voice recognition system includes the recognition module 34.
And carrying out an interaction experiment by using the intelligent interaction equipment, and arranging the users according to preset positions.
Referring to fig. 4, fig. 4 includes 5 users, which are user number 1, user number 2, user number 3, user number 4, and user number 5, respectively.
The experimental process is as follows:
1. the number 1 user and the number 2 user speak simultaneously, and the number 1 is a target user;
2. the number 1 user and the number 3 user speak simultaneously, and the number 1 is a target user;
3. the number 1 user and the number 4 user speak simultaneously, and the number 1 is a target user;
4. the No. 1 user and the No. 5 user speak simultaneously, and the No. 1 user is a target user;
5. the No. 1, No. 2 and No. 3 users speak at the same time, and the No. 1 is a target user;
6. the No. 1, No. 3 and No. 4 users speak at the same time, and the No. 1 is a target user;
7. the No. 1, No. 4 and No. 5 users speak at the same time, and the No. 1 is a target user;
8. all users speak simultaneously, and the number 1 is the target user.
The recognition module 34 in the speech recognition system is configured to recognize the filtered speech information and perform corresponding feedback according to a recognition result; and the method is also used for outputting an evaluation index for evaluating the accuracy of the voice recognition process.
In this embodiment, the evaluation index is WER (word error rate).
The results of the above experiment are shown in table 1 below:
TABLE 1
Figure BDA0002271463260000091
Figure BDA0002271463260000101
In addition, the gaussian mixture model uses an EM algorithm to correct the parameters, and obtains the optimal parameters through a self-adaptive algorithm during correction.
According to the experimental result, the audio information of the target user is used as the prior information, so that the subsequent voice recognition effect can be improved, and the interactive experience is improved.
EXAMPLE III
As shown in fig. 5, a third embodiment of the present application provides a computer system, including:
one or more processors; and
a memory associated with the one or more processors, the memory for storing program instructions that, when read and executed by the one or more processors, perform the following:
detecting the collected audio information;
when the voice information is detected, filtering the voice information according to pre-stored audio information of a target user;
judging whether voice information exists after filtering processing;
and if so, recognizing the voice information after the filtering processing and performing corresponding feedback according to a recognition result.
Fig. 5 illustrates an architecture of a computer system that may include, in particular, a processor 52, a video display adapter 54, a disk drive 56, an input/output interface 58, a network interface 510, and a memory 512. The processor 52, video display adapter 54, disk drive 56, input/output interface 58, network interface 510, and memory 512 may be communicatively coupled via a communication bus 514.
The processor 52 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solution provided in the present Application.
The Memory 512 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 512 may store an operating system 516 for controlling the operation of the computer system 50, a Basic Input Output System (BIOS)518 for controlling low-level operations of the computer system. In addition, a web browser 520, a data storage management system 522, and the like may also be stored. In summary, when the technical solution provided by the present application is implemented by software or firmware, the relevant program codes are stored in the memory 512 and called to be executed by the processor 52.
The input/output interface 58 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The network interface 510 is used for connecting a communication module (not shown in the figure) to realize communication interaction between the device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
The communication bus 514 includes a path to transfer information between the various components of the device, such as the processor 52, the video display adapter 54, the disk drive 56, the input/output interface 58, the network interface 510, and the memory 512.
In addition, the computer system can also obtain the information of specific receiving conditions from the virtual resource object receiving condition information database for condition judgment and the like.
It should be noted that although the above-described device only shows the processor 52, the video display adapter 54, the disk drive 56, the input/output interface 58, the network interface 510, the memory 512, the communication bus 514, etc., in a specific implementation, the device may also include other components necessary for proper operation. Furthermore, it will be understood by those skilled in the art that the apparatus described above may also include only the components necessary to implement the solution of the present application, and not necessarily all of the components shown in the figures.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a cloud server, or a network device) to execute the method according to the embodiments or some parts of the embodiments of the present application.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention. In addition, the noise processing apparatus, the computer system and the noise processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method of noise processing, the method comprising:
detecting the collected audio information;
when voice information is detected, filtering the voice information according to pre-stored audio information of a target user;
judging whether voice information exists after filtering processing;
and if so, recognizing the voice information after the filtering processing and performing corresponding feedback according to a recognition result.
2. The method according to claim 1, wherein the filtering the speech information according to the pre-stored audio information of the target user specifically comprises:
constructing an acoustic model; the acoustic model is a Gaussian mixture model, and the variable of the acoustic model is the voice information, and the initial value of the parameter is a covariance matrix obtained by calculating the audio information of the target user;
modifying parameters of the acoustic model according to an EM algorithm;
judging whether the iteration times of the EM algorithm reach a preset value or not;
when the acoustic model is reached, acquiring an output result of the acoustic model;
and filtering the voice information according to the output result.
3. The method of claim 1, wherein when voice information is detected, the method further comprises:
and carrying out echo cancellation on the voice information.
4. The method according to any one of claims 1 to 3, further comprising:
sending an operation instruction to the target user according to the received request sent by the target user;
receiving audio information sent by the target user according to the operation instruction;
and storing the audio information sent by the target user according to the operation instruction.
5. The method according to any one of claims 1 to 3, wherein the algorithm for detecting the collected audio information comprises any one of a pitch detection algorithm, a double threshold method, and a posteriori SNR frequency domain iterative algorithm.
6. A noise processing apparatus, characterized in that the apparatus comprises:
the detection module is used for detecting the acquired audio information;
the analysis module is used for carrying out filtering processing on the voice information according to pre-stored audio information of a target user when the voice information is detected;
the judging module is used for judging whether voice information exists after filtering processing;
and the recognition module is used for recognizing the filtered voice information and performing corresponding feedback according to a recognition result when the voice information exists.
7. The apparatus according to claim 6, wherein the analysis module specifically comprises:
the construction module is used for constructing an acoustic model; the acoustic model is a Gaussian mixture model, and the variable of the acoustic model is the voice information, and the initial value of the parameter is a covariance matrix obtained by calculating the audio information of the target user;
the correction module is used for correcting the parameters of the acoustic model according to an EM algorithm;
the processing module is used for judging whether the iteration times of the EM algorithm reach a preset value or not; when the acoustic model is reached, acquiring an output result of the acoustic model; and filtering the voice information according to the output result.
8. The apparatus of claim 6, wherein the analysis module further comprises:
and the echo cancellation module is used for performing echo cancellation on the voice information when the voice information is detected.
9. The apparatus according to any one of claims 6 to 9, further comprising a storage module configured to:
sending an operation instruction to the target user according to the received request sent by the target user;
receiving audio information sent by the target user according to the operation instruction;
and storing the audio information sent by the target user according to the operation instruction.
10. A computer system, comprising:
one or more processors; and
a memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform operations comprising:
detecting the collected audio information;
when voice information is detected, filtering the voice information according to pre-stored audio information of a target user;
judging whether voice information exists after filtering processing;
and if so, recognizing the voice information after the filtering processing and performing corresponding feedback according to a recognition result.
CN201911106466.9A 2019-11-13 2019-11-13 Noise processing method, device and system Pending CN110942779A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201911106466.9A CN110942779A (en) 2019-11-13 2019-11-13 Noise processing method, device and system
CA3160740A CA3160740A1 (en) 2019-11-13 2020-07-30 Noise processing method, device, and system
PCT/CN2020/105992 WO2021093380A1 (en) 2019-11-13 2020-07-30 Noise processing method and apparatus, and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911106466.9A CN110942779A (en) 2019-11-13 2019-11-13 Noise processing method, device and system

Publications (1)

Publication Number Publication Date
CN110942779A true CN110942779A (en) 2020-03-31

Family

ID=69907610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911106466.9A Pending CN110942779A (en) 2019-11-13 2019-11-13 Noise processing method, device and system

Country Status (3)

Country Link
CN (1) CN110942779A (en)
CA (1) CA3160740A1 (en)
WO (1) WO2021093380A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111508473A (en) * 2020-06-12 2020-08-07 佛山科学技术学院 Speech recognition rate analysis device, method and storage medium
CN112202653A (en) * 2020-09-21 2021-01-08 海尔优家智能科技(北京)有限公司 Voice information output method, storage medium and electronic equipment
CN112700771A (en) * 2020-12-02 2021-04-23 珠海格力电器股份有限公司 Air conditioner, three-dimensional voice control identification method, computer equipment, storage medium and terminal
WO2021093380A1 (en) * 2019-11-13 2021-05-20 苏宁云计算有限公司 Noise processing method and apparatus, and system
CN112927691A (en) * 2021-02-23 2021-06-08 中国人民解放军陆军装甲兵学院 Voice recognition control device and method
CN114598922A (en) * 2022-03-07 2022-06-07 深圳创维-Rgb电子有限公司 Voice message interaction method, device, equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229987B (en) * 2022-12-13 2023-11-21 广东保伦电子股份有限公司 Campus voice recognition method, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101552004A (en) * 2009-05-13 2009-10-07 哈尔滨工业大学 Method for recognizing in-set speaker
CN104966517A (en) * 2015-06-02 2015-10-07 华为技术有限公司 Voice frequency signal enhancement method and device
CN105280183A (en) * 2015-09-10 2016-01-27 百度在线网络技术(北京)有限公司 Voice interaction method and system
CN105355202A (en) * 2014-08-22 2016-02-24 现代自动车株式会社 Voice recognition apparatus, vehicle having the same, and method of controlling the vehicle
CN108922515A (en) * 2018-05-31 2018-11-30 平安科技(深圳)有限公司 Speech model training method, audio recognition method, device, equipment and medium
CN109473102A (en) * 2017-09-07 2019-03-15 上海新同惠自动化系统有限公司 A kind of robot secretary intelligent meeting recording method and system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8639516B2 (en) * 2010-06-04 2014-01-28 Apple Inc. User-specific noise suppression for voice quality improvements
CN102592607A (en) * 2012-03-30 2012-07-18 北京交通大学 Voice converting system and method using blind voice separation
US9837102B2 (en) * 2014-07-02 2017-12-05 Microsoft Technology Licensing, Llc User environment aware acoustic noise reduction
CN104637494A (en) * 2015-02-02 2015-05-20 哈尔滨工程大学 Double-microphone mobile equipment voice signal enhancing method based on blind source separation
EP3217399B1 (en) * 2016-03-11 2018-11-21 GN Hearing A/S Kalman filtering based speech enhancement using a codebook based approach
CN108198569B (en) * 2017-12-28 2021-07-16 北京搜狗科技发展有限公司 Audio processing method, device and equipment and readable storage medium
CN110942779A (en) * 2019-11-13 2020-03-31 苏宁云计算有限公司 Noise processing method, device and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101552004A (en) * 2009-05-13 2009-10-07 哈尔滨工业大学 Method for recognizing in-set speaker
CN105355202A (en) * 2014-08-22 2016-02-24 现代自动车株式会社 Voice recognition apparatus, vehicle having the same, and method of controlling the vehicle
CN104966517A (en) * 2015-06-02 2015-10-07 华为技术有限公司 Voice frequency signal enhancement method and device
CN105280183A (en) * 2015-09-10 2016-01-27 百度在线网络技术(北京)有限公司 Voice interaction method and system
CN109473102A (en) * 2017-09-07 2019-03-15 上海新同惠自动化系统有限公司 A kind of robot secretary intelligent meeting recording method and system
CN108922515A (en) * 2018-05-31 2018-11-30 平安科技(深圳)有限公司 Speech model training method, audio recognition method, device, equipment and medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021093380A1 (en) * 2019-11-13 2021-05-20 苏宁云计算有限公司 Noise processing method and apparatus, and system
CN111508473A (en) * 2020-06-12 2020-08-07 佛山科学技术学院 Speech recognition rate analysis device, method and storage medium
CN111508473B (en) * 2020-06-12 2023-03-14 佛山科学技术学院 Speech recognition rate analysis device, method and storage medium
CN112202653A (en) * 2020-09-21 2021-01-08 海尔优家智能科技(北京)有限公司 Voice information output method, storage medium and electronic equipment
CN112700771A (en) * 2020-12-02 2021-04-23 珠海格力电器股份有限公司 Air conditioner, three-dimensional voice control identification method, computer equipment, storage medium and terminal
CN112927691A (en) * 2021-02-23 2021-06-08 中国人民解放军陆军装甲兵学院 Voice recognition control device and method
CN114598922A (en) * 2022-03-07 2022-06-07 深圳创维-Rgb电子有限公司 Voice message interaction method, device, equipment and storage medium

Also Published As

Publication number Publication date
WO2021093380A1 (en) 2021-05-20
CA3160740A1 (en) 2021-05-20

Similar Documents

Publication Publication Date Title
CN110942779A (en) Noise processing method, device and system
US11887582B2 (en) Training and testing utterance-based frameworks
EP3819903B1 (en) Audio data processing method and apparatus, device and storage medium
US20210005198A1 (en) Detecting Self-Generated Wake Expressions
CN110310623B (en) Sample generation method, model training method, device, medium, and electronic apparatus
US10777193B2 (en) System and device for selecting speech recognition model
CN107644638B (en) Audio recognition method, device, terminal and computer readable storage medium
US8306819B2 (en) Enhanced automatic speech recognition using mapping between unsupervised and supervised speech model parameters trained on same acoustic training data
JP2021086154A (en) Method, device, apparatus, and computer-readable storage medium for speech recognition
US8719019B2 (en) Speaker identification
CN107507621B (en) Noise suppression method and mobile terminal
CN111627432B (en) Active outbound intelligent voice robot multilingual interaction method and device
WO2020043162A1 (en) System and method for performing multi-model automatic speech recognition in challenging acoustic environments
CN113643693B (en) Acoustic model conditioned on sound characteristics
CN112289299A (en) Training method and device of speech synthesis model, storage medium and electronic equipment
US11468892B2 (en) Electronic apparatus and method for controlling electronic apparatus
CN112309365A (en) Training method and device of speech synthesis model, storage medium and electronic equipment
KR20160138837A (en) System, method and computer program for speech recognition and translation
CN112562723B (en) Pronunciation accuracy determination method and device, storage medium and electronic equipment
CN112687286A (en) Method and device for adjusting noise reduction model of audio equipment
CN111883135A (en) Voice transcription method and device and electronic equipment
CN113889091A (en) Voice recognition method and device, computer readable storage medium and electronic equipment
US20170206898A1 (en) Systems and methods for assisting automatic speech recognition
Panek et al. Challenges in adopting speech control for assistive robots
CN114694667A (en) Voice output method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200331