CN106098078B

CN106098078B - Voice recognition method and system capable of filtering loudspeaker noise

Info

Publication number: CN106098078B
Application number: CN201610413367.5A
Authority: CN
Inventors: 齐东京; 方国宽
Original assignee: Huizhou TCL Mobile Communication Co Ltd
Current assignee: Huizhou TCL Mobile Communication Co Ltd
Priority date: 2016-06-14
Filing date: 2016-06-14
Publication date: 2020-06-02
Anticipated expiration: 2036-06-14
Also published as: CN106098078A

Abstract

The invention provides a voice recognition method capable of filtering speaker noise and a system thereof, wherein the method comprises the following steps: when detecting that the voice of a user is input through a microphone and detecting that a loudspeaker stores a voice file in a playing intelligent terminal, acquiring the synthesized sound of the voice of the user and the sound of the loudspeaker; calculating to obtain a second frequency and a second amplitude of the user voice according to the first frequency and the first amplitude of the speaker sound sampled in the intelligent terminal, the synthetic audio frequency of the synthetic audio and the synthetic audio amplitude; filtering the tone of the loudspeaker sound in the synthetic sound, and restoring the tone with the second frequency and the second amplitude of the user voice to obtain the user voice; and converting the user voice into text according to the voice database. The invention realizes that when a user uses the voice recognition software and the loudspeaker plays an external sound, the inner processor of the terminal analyzes according to the sound composition and filters out the sound of the loudspeaker, so that the environmental noise is reduced in the user voice received by the background, and the high-efficiency recognition of the voice is realized.

Description

Voice recognition method and system capable of filtering loudspeaker noise

Technical Field

The invention relates to the technical field of voice recognition, in particular to a voice recognition method and a voice recognition system capable of filtering speaker noise.

Background

The voice recognition technology is gradually becoming a key technology of a man-machine interface in the information technology, and the combination of the voice recognition technology and the voice synthesis technology enables people to throw away a keyboard and operate through voice commands. The rise of the mobile internet is becoming the most important application environment for voice recognition, such as Siri of apple inc, domestic fly software, etc., which can efficiently recognize the voice of a user. At present, similar software can be installed on the intelligent terminal, user voice can be converted into characters, the voice is matched with a background database, character display is generated, and even direct control is carried out. In order to efficiently recognize a voice, it is necessary to avoid environmental noise as much as possible when a user inputs a voice.

However, when the smart terminal plays music, the user speaks into the microphone and brings music sound into the speaker, which greatly reduces the recognition efficiency.

Thus, there is still a need for improvement and development of the prior art.

Disclosure of Invention

In view of the defects of the prior art, the present invention aims to provide a speech recognition method and system capable of filtering speaker noise, and aims to solve the problem that when a user speaks into a microphone while a music is played by an intelligent terminal in the prior art, the music sound brought into a speaker will cause a great reduction in recognition efficiency.

In order to achieve the purpose, the invention adopts the following technical scheme:

a speech recognition method for filtering speaker noise, wherein the method comprises the steps of:

A. when detecting that the voice of a user is input through a microphone and detecting that a loudspeaker stores a voice file in a playing intelligent terminal, acquiring the synthesized sound of the voice of the user and the sound of the loudspeaker;

B. calculating to obtain a second frequency and a second amplitude of the user voice according to a first frequency and a first amplitude of the speaker sound sampled in the intelligent terminal, and a synthetic audio frequency and a synthetic audio amplitude of the synthetic audio;

C. filtering the tone of the loudspeaker sound in the synthetic sound, and restoring the tone with a second frequency and a second amplitude of the user voice to obtain the user voice;

D. and converting the user voice into text according to the voice database.

The voice recognition method capable of filtering the noise of the loudspeaker, wherein the step B specifically comprises the following steps:

b1, calculating a second frequency from the synthesized audio frequency and the first frequency according to the fact that the synthesized audio frequency is the least common multiple of the first frequency and the second frequency;

b2, calculating a second amplitude according to the difference between the synthetic sound amplitude and the first amplitude.

The voice recognition method capable of filtering the noise of the loudspeaker, wherein the step C specifically comprises the following steps:

c1, after the analog/digital conversion of the synthetic sound by the audio coder, the synthetic sound with synthetic sound frequency, synthetic sound amplitude and synthetic sound color is coded and sent to the processor;

c2, filtering the tone of the speaker sound in the synthesized sound by the processor, and keeping the tone of the user voice;

c3, the audio decoder converts the second frequency and the second amplitude of the user speech into a partial speech, and the partial speech and the tone of the user speech are restored to obtain the user speech.

The voice recognition method capable of filtering the noise of the loudspeaker, wherein the step D specifically comprises the following steps:

d1, uploading the voice of the user to a voice database at the cloud end;

d2, matching the user voice in a voice database to obtain a text;

d3, sending the text to the intelligent terminal and displaying.

The speech recognition method capable of filtering speaker noise further comprises the step of obtaining the speaker sound code of each frame of the speaker sound in the audio coder by the processor.

A speech recognition system that filters speaker noise, comprising:

the detection and acquisition module is used for acquiring the synthesized sound of the user voice and the loudspeaker voice when detecting that the user voice is input through the microphone and detecting that the loudspeaker stores a voice file in the playing intelligent terminal;

the computing module is used for computing a second frequency and a second amplitude of the user voice according to a first frequency and a first amplitude of the speaker sound sampled in the intelligent terminal, and a synthetic audio frequency and a synthetic audio amplitude of the synthetic audio;

the filtering and restoring module is used for filtering the tone of the loudspeaker sound in the synthetic sound and restoring the tone with the second frequency and the second amplitude of the user voice to obtain the user voice;

and the conversion module is used for converting the user voice into a text according to the voice database.

The speech recognition system capable of filtering speaker noise, wherein the computing module specifically comprises:

the frequency calculation unit is used for calculating to obtain a second frequency according to the synthetic audio frequency which is the least common multiple of the first frequency and the second frequency;

an amplitude calculating unit calculates a second amplitude based on a difference between the synthesized sound amplitude and the first amplitude.

The voice recognition system capable of filtering speaker noise, wherein the filtering and restoring module specifically comprises:

a coding transmitting unit for coding the synthetic sound with synthetic sound frequency, synthetic sound amplitude and synthetic sound tone color to the processor after the analog/digital conversion of the synthetic sound by the audio coder;

the filtering unit is used for filtering the tone of the sound of the loudspeaker in the synthetic sound by the processor and reserving the tone of the user voice;

and the audio decoder converts the second frequency and the second amplitude of the user voice into partial voice, and the tone of the partial voice and the tone of the user voice are restored to obtain the user voice.

The voice recognition system capable of filtering the noise of the loudspeaker comprises a conversion module and a voice recognition module, wherein the conversion module specifically comprises:

the uploading unit is used for uploading the user voice to a voice database at the cloud end;

the matching unit is used for matching the user voice in the voice database to obtain a text;

and the sending and displaying unit is used for sending the text to the intelligent terminal and displaying the text.

The voice recognition system capable of filtering speaker noise is characterized in that the detection and acquisition module is also used for acquiring a speaker sound code of each frame of a speaker sound in an audio encoder by a processor.

The invention relates to a voice recognition method capable of filtering speaker noise and a system thereof, wherein the method comprises the following steps: when detecting that the voice of a user is input through a microphone and detecting that a loudspeaker stores a voice file in a playing intelligent terminal, acquiring the synthesized sound of the voice of the user and the sound of the loudspeaker; calculating to obtain a second frequency and a second amplitude of the user voice according to the first frequency and the first amplitude of the speaker sound sampled in the intelligent terminal, the synthetic audio frequency of the synthetic audio and the synthetic audio amplitude; filtering the tone of the loudspeaker sound in the synthetic sound, and restoring the tone with the second frequency and the second amplitude of the user voice to obtain the user voice; and converting the user voice into text according to the voice database. The invention realizes that when a user uses the voice recognition software and the loudspeaker plays the external sound, the processor in the terminal analyzes according to the sound composition to filter the sound of the loudspeaker, so that the environmental noise is reduced in the user voice received by the background, and the high-efficiency recognition of the voice is realized.

Drawings

FIG. 1 is a flow chart of a voice recognition method for filtering speaker noise according to a preferred embodiment of the present invention.

FIG. 2 is a flowchart illustrating the method for recognizing a voice capable of filtering speaker noise according to the present invention, wherein the method for recognizing a voice capable of filtering speaker noise includes obtaining a second frequency and a second amplitude of a user's voice.

FIG. 3 is a flowchart illustrating the method for recognizing speech with speaker noise filtering according to the present invention.

FIG. 4 is a flowchart illustrating text conversion according to a preferred embodiment of the speech recognition method for filtering speaker noise according to the present invention.

FIG. 5 is a block diagram of a preferred embodiment of a speech recognition system for filtering speaker noise according to the present invention.

Detailed Description

The present invention provides a speech recognition method and system capable of filtering speaker noise, which will be described in further detail below with reference to the accompanying drawings and examples in order to make the objects, technical schemes and effects of the present invention clearer and clearer. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Please refer to fig. 1, which is a flowchart illustrating a voice recognition method capable of filtering speaker noise according to a preferred embodiment of the present invention. As shown in fig. 1, the voice recognition method capable of filtering speaker noise includes the following steps:

and S100, when detecting that the voice of the user is input through the microphone and detecting that the voice file is stored in the playing intelligent terminal through the loudspeaker, acquiring the synthesized sound of the voice of the user and the sound of the loudspeaker.

In this embodiment, when the user opens the player in the intelligent terminal, the background speech recognition process may also be opened synchronously, so that the intelligent terminal may detect whether the user inputs speech in real time when playing music. And once the voice file is detected to be played on the intelligent terminal through the player and the user voice is recorded, the synthesized sound of the user voice and the loudspeaker voice is obtained. At this time, without any processing, the user voice and the speaker sound cannot be distinguished, which requires processing in the subsequent step.

Step S200, calculating to obtain a second frequency and a second amplitude of the user voice according to the first frequency and the first amplitude of the speaker sound sampled in the intelligent terminal, the synthetic audio frequency of the synthetic audio and the synthetic audio amplitude.

In this embodiment, since the speaker material and structure are fixed, the speaker timbre is known to the processor in the smart terminal. Similarly, in the process of playing the voice file by the player, the processor acquires the speaker sound code of each frame of the speaker sound in the audio encoder, that is, the player can acquire the first frequency and the first amplitude of each frame of the sound data in the speaker sound.

Since the first frequency and the first amplitude of the speaker sound, and the synthetic tone frequency and the synthetic tone amplitude of the synthetic tone are known, the second frequency can be found from the synthetic tone frequency which is the least common multiple of the first frequency and the second frequency, and the second amplitude can be found from the synthetic tone amplitude which is the sum of the first amplitude and the second amplitude. In this way, the second frequency and the second amplitude of the user speech can be obtained by simple calculation processing by the processor.

And step S300, filtering the tone of the loudspeaker sound in the synthetic sound, and restoring the tone with the second frequency and the second amplitude of the user voice to obtain the user voice.

After the second frequency and the second amplitude of the user voice are obtained, the tone of the speaker (the tone of the speaker is known to a processor in the intelligent terminal because the material and the structure of the speaker are fixed) can be selectively filtered, and only the tone of the user voice is reserved, so that the user voice can be recovered through the tone of the user voice, the second frequency and the second amplitude. Therefore, the sound part of the loudspeaker in the synthetic sound is filtered, only the part of the voice of the user is reserved, and the voice recognition effect of filtering the noise of the loudspeaker is achieved.

And step S400, converting the user voice into a text according to the voice database.

And when the voice of the user is matched through the voice database, converting the voice into a corresponding text, and performing corresponding operation on the intelligent terminal according to an instruction corresponding to the text. For example, when the user opens the player to play music, the background speech recognition process detects that the user enters speech "fast forward for 10 seconds", and then converts the speech into text "fast forward for 10 seconds" after the processing of steps S100 to S400. At this time, the player fast forwards the currently played voice file for 10 seconds according to the control instruction fast forward corresponding to the text. Therefore, the accurate recognition of the user voice under the condition of background sound is realized.

Further, as shown in fig. 2, in the speech recognition method capable of filtering speaker noise, the step S200 specifically includes:

step S201, calculating to obtain a second frequency according to the synthetic audio frequency and the first frequency according to the synthetic audio frequency as the least common multiple of the first frequency and the second frequency.

Since the processor samples the synthesized tone frequency and amplitude of the synthesized tone after the speaker sound and the user's voice form the synthesized tone. Also, the synthesized audio frequency is also known to be the least common multiple of the first frequency and the second frequency, i.e., 1/synthesized audio frequency = N (1/first frequency) × (1/second frequency), where N is any positive integer. From the above equation, the second frequency can be solved.

In step S202, a second amplitude is calculated based on the difference between the synthetic sound amplitude and the first amplitude.

Further, as shown in fig. 3, in the speech recognition method capable of filtering speaker noise, the step S300 specifically includes:

step S301, after the analog/digital conversion of the synthetic sound by the audio coder, the synthetic sound with synthetic sound frequency, synthetic sound amplitude and synthetic sound tone color is coded and sent to the processor;

step S302, the processor filters out the tone of the sound of the loudspeaker in the synthetic sound and reserves the tone of the voice of the user;

step S303, the audio decoder converts the second frequency and the second amplitude of the user voice into a partial voice, and restores the tone of the partial voice and the user voice to obtain the user voice.

Further, as shown in fig. 4, in the speech recognition method capable of filtering speaker noise, the step S400 specifically includes:

step S401, uploading user voice to a voice database at the cloud end;

s402, matching the user voice in a voice database to obtain a text;

and S403, sending the text to the intelligent terminal and displaying the text.

Therefore, the invention realizes that when the user uses the voice recognition software and the loudspeaker plays the external sound, the processor in the terminal analyzes according to the sound composition and filters out the sound of the loudspeaker, so that the environmental noise is reduced in the user voice received by the background, and the high-efficiency recognition of the voice is realized.

Based on the method embodiment, the invention also provides a voice recognition system capable of filtering the noise of the loudspeaker. As shown in fig. 5, the voice recognition system capable of filtering speaker noise includes:

the detection and acquisition module 100 is configured to acquire synthesized sound of user voice and speaker sound when detecting that the user voice is input through the microphone and detecting that the speaker stores a voice file in the playing intelligent terminal;

the computing module 200 is configured to compute a second frequency and a second amplitude of the user voice according to a first frequency and a first amplitude of a speaker sound sampled in the intelligent terminal, and a synthetic audio frequency and a synthetic audio amplitude of the synthetic audio;

a filtering and restoring module 300, configured to filter the tone of the speaker sound in the synthesized sound, and restore the tone with a second frequency and a second amplitude of the user speech to obtain the user speech;

a conversion module 400, configured to convert the user speech into a text according to the speech database.

Further, in the speech recognition system capable of filtering speaker noise, the calculating module 200 specifically includes:

the frequency calculation unit is used for calculating the second frequency according to the synthetic audio frequency which is the least common multiple of the first frequency and the second frequency;

Further, in the voice recognition system capable of filtering speaker noise, the filtering and recovering module 300 specifically includes:

Further, in the speech recognition system capable of filtering speaker noise, the conversion module 400 specifically includes:

Further, in the voice recognition system capable of filtering speaker noise, the detection and acquisition module 100 is also used for acquiring a speaker sound code of each frame of a speaker sound in an audio encoder by a processor.

In summary, the speech recognition method and system capable of filtering speaker noise according to the present invention includes: when detecting that the voice of a user is input through a microphone and detecting that a loudspeaker stores a voice file in a playing intelligent terminal, acquiring the synthesized sound of the voice of the user and the sound of the loudspeaker; calculating to obtain a second frequency and a second amplitude of the user voice according to the first frequency and the first amplitude of the speaker sound sampled in the intelligent terminal, the synthetic audio frequency of the synthetic audio and the synthetic audio amplitude; filtering the tone of the loudspeaker sound in the synthetic sound, and restoring the tone with the second frequency and the second amplitude of the user voice to obtain the user voice; and converting the user voice into text according to the voice database. The invention realizes that when a user uses the voice recognition software and the loudspeaker plays the external sound, the processor in the terminal analyzes according to the sound composition to filter the sound of the loudspeaker, so that the environmental noise is reduced in the user voice received by the background, and the high-efficiency recognition of the voice is realized.

It should be understood that the technical solutions and concepts of the present invention may be equally replaced or changed by those skilled in the art, and all such changes or substitutions should fall within the protection scope of the appended claims.

Claims

1. A speech recognition method for filtering speaker noise, the method comprising the steps of:

D. converting the user voice into a text according to the voice database;

the step B specifically comprises the following steps:

b2, calculating a second amplitude according to the difference between the amplitude of the synthesized sound and the first amplitude;

when the voice recognition software is used and the loudspeaker plays the external sound, the processor in the terminal analyzes the external sound according to the composition of the sound and filters out the sound of the loudspeaker.

2. The speech recognition method of claim 1, wherein step C specifically comprises:

3. The speech recognition method of claim 1, wherein step D specifically comprises:

d1, uploading the voice of the user to a voice database at the cloud end;

d2, matching the user voice in a voice database to obtain a text;

d3, sending the text to the intelligent terminal and displaying.

4. The method of claim 1, wherein step a further comprises the processor obtaining a speaker sound code for each frame of speaker sound in the audio encoder.

5. A speech recognition system capable of filtering speaker noise, comprising:

the conversion module is used for converting the user voice into a text according to the voice database;

the calculation module specifically includes:

an amplitude calculation unit for calculating a second amplitude from a difference between the synthesized sound amplitude and the first amplitude;

6. The speech recognition system of claim 5, wherein the filtering and restoring module comprises:

7. The speech recognition system of claim 5, wherein the conversion module comprises:

8. The speech recognition system of claim 5, wherein the detection and acquisition module is further configured to acquire a speaker vocoding for each frame of speaker sound in an audio encoder.