CN111128192A

CN111128192A - Voice recognition noise reduction method, system, mobile terminal and storage medium

Info

Publication number: CN111128192A
Application number: CN201911424022.XA
Authority: CN
Inventors: 夏严辉; 熊友军
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-08

Abstract

The invention provides a voice recognition noise reduction method, a system, a mobile terminal and a storage medium, wherein the method comprises the following steps: when a voice instruction sent by a user is received, audio information containing the voice instruction and environment audio information of the current environment are obtained; acquiring operation information of the steering engine, and inquiring noise data in a noise database according to the operation information and the environmental audio information to obtain target noise data matched with the operation information and the environmental audio information; comparing the target noise data with the voice audio information; and performing noise reduction processing on the voice audio information according to the comparison result, and performing voice recognition on the voice audio information subjected to noise reduction. According to the invention, the target audio data is analyzed, the data of the target noise data characteristic is removed, and relatively clean noise-free audio data is obtained, so that the interference of environmental noise or noise of a steering engine product is reduced, and the accuracy of voice recognition is improved.

Description

Voice recognition noise reduction method, system, mobile terminal and storage medium

Technical Field

The invention belongs to the technical field of voice recognition, and particularly relates to a voice recognition noise reduction method, a voice recognition noise reduction system, a mobile terminal and a storage medium.

Background

With the development of human-computer interaction technology, robots are expected to have human-like perception capabilities and to cooperate with humans. In order to achieve the aim, some researchers use a voice technology to enable the robot to understand human language so as to directly control the robot to correspondingly execute the operation in a voice mode, so that the control of the robot is simplified, and the control efficiency of the robot is improved.

The noise that can inevitably produce among the motion state of current robot, noise for example steering wheel, fan or motor produced, and because of the adapter is closer to the robot, consequently user's language information is compared to these noises and is acquireed more easily, and then has leaded to the speech recognition accuracy of robot low down.

Disclosure of Invention

The embodiment of the invention aims to provide a voice recognition noise reduction method, a voice recognition noise reduction system, a mobile terminal and a storage medium, and aims to solve the problem that the voice recognition accuracy is low due to noise emitted by a robot in the motion process in the use process of the existing robot.

The embodiment of the invention is realized in such a way that a speech recognition noise reduction method comprises the following steps:

when a voice instruction sent by a user is received, audio information containing the voice instruction and environment audio information of the current environment are obtained;

acquiring operation information of a steering engine, and inquiring noise data in a noise database according to the operation information and the environmental audio information to obtain target noise data matched with the operation information and the environmental audio information;

comparing the target noise data with the voice audio information stored in the audio information;

and performing noise reduction processing on the voice audio information according to the comparison result, and performing voice recognition on the voice audio information subjected to noise reduction.

Further, before the step of receiving the voice command sent by the user, the method further includes:

according to a locally pre-stored sounding state and an operation instruction, sequentially controlling the steering engine to execute different operations, and acquiring audio data generated when the steering engine executes various operations to obtain sample audio;

and performing deep learning according to the sample audio to obtain the noise database.

Furthermore, the controlling the robot to perform different operations in sequence according to the locally pre-stored sound production state and the operation instruction includes:

acquiring the volume value in the sounding state, and acquiring the running number and the operation action in the operation command and the rotation angle of a steering engine of the operation action;

and controlling a corresponding sounding unit to sound according to the volume value, and controlling the corresponding steering engine to execute the operation action according to the operation number.

Further, the querying noise data in a noise database according to the operation information and the environmental audio information includes:

acquiring sounding numbers and operation actions stored in the operation information and rotation angles of steering engines of the operation actions, and acquiring volume values stored in the environment audio information;

and matching the sounding numbers, the operation actions, the rotation angles of steering engines of the operation actions and the volume values with the noise database to obtain the target noise data.

Furthermore, the comparing the target noise data with the voice audio information stored in the audio information includes:

and carrying out frequency spectrum comparison, power spectrum comparison and cepstrum comparison on the target noise data and the voice audio information to obtain a comparison result.

when a control instruction sent by a user is received, sound collection is carried out on the current environment within a first preset time to obtain the environment audio information, wherein the control instruction is used for triggering voice control aiming at the robot;

and sending a voice acquisition prompt after the sound acquisition of the current environment is finished, and continuously performing voice acquisition until the voice acquisition state meets the stop condition, and stopping the voice acquisition to obtain the voice audio information.

Further, after the step of continuing to perform voice capturing, the method further comprises:

when the voice acquisition time is judged to be larger than a time threshold value, judging that the voice acquisition state meets the stop condition; or

When an acquisition stopping instruction is received, judging that the voice acquisition state meets the stopping condition; or

And when the volume of the current environment is continuously smaller than a volume threshold value within a second preset time, judging that the voice acquisition state meets the stopping condition and the corresponding sound generating unit generates sound.

Another object of an embodiment of the present invention is to provide a speech recognition noise reduction system, including:

the information acquisition module is used for acquiring audio information containing a voice instruction and environmental audio information of the current environment when the voice instruction sent by a user is received;

the noise query module is used for acquiring the operation information of the steering engine, and querying noise data in a noise database according to the operation information and the environmental audio information to obtain target noise data matched with the operation information and the environmental audio information;

the audio comparison module is used for comparing the target noise data with the voice audio information stored in the audio information;

and the audio denoising module is used for performing denoising processing on the voice audio information according to the comparison result and performing voice recognition on the denoised voice audio information.

Another object of an embodiment of the present invention is to provide a mobile terminal, including a storage device and a processor, where the storage device is used to store a computer program, and the processor runs the computer program to make the mobile terminal execute the above-mentioned speech recognition noise reduction method.

Another object of an embodiment of the present invention is to provide a storage medium, which stores a computer program used in the above-mentioned mobile terminal, wherein the computer program, when executed by a processor, implements the steps of the above-mentioned speech recognition noise reduction method.

According to the embodiment of the invention, the target audio data is analyzed, the data of the target noise data characteristic is removed, and relatively clean noise-free audio data is obtained, so that the interference of environmental noise or noise of a steering engine product is reduced, and the accuracy of voice recognition is improved.

Drawings

FIG. 1 is a flow chart of a method for noise reduction in speech recognition according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a method for noise reduction in speech recognition according to a second embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a speech recognition noise reduction system according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a mobile terminal according to a fourth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The embodiment of the invention aims to solve the problem that the existing robot has low voice recognition accuracy due to the noise generated in the motion process.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Example one

Referring to fig. 1, a flowchart of a speech recognition noise reduction method according to a first embodiment of the present invention is shown, which includes the steps of:

step S10, when receiving a voice instruction sent by a user, acquiring audio information containing the voice instruction and environmental audio information of the current environment;

the voice audio information is obtained after the current mobile device collects voice sent by a user.

Specifically, the environmental audio information in this step may be obtained based on the usage status of the robot or the collection of the current environmental volume value, for example:

when the robot is judged to be in an unvoiced state (all sounding units on the robot are in the unvoiced state) and the volume value of the current environment is smaller than a first preset volume value, judging that the environment audio information is in a quiet environment;

when the robot is judged to be in a sounding state (any sounding unit on the robot is in the sounding state) and the volume value of the current environment is smaller than a first preset volume value, judging that the environment audio information is an audio environment played by the robot;

when the robot is judged to be in a sounding state (any sounding unit on the robot is in the sounding state) and the volume value of the current environment is greater than a first preset volume value, the environment audio information is judged to be the human-computer interaction environment.

Step S20, acquiring operation information of the steering engine, and inquiring noise data in a noise database according to the operation information and the environmental audio information to obtain target noise data matched with the operation information and the environmental audio information;

the operation information is obtained by sequentially inquiring the operation state of each steering engine on the robot through a steering engine code table prestored locally, the installation position and the current operation state of each steering engine are sequentially stored in the operation information, and the operation state comprises states of ascending, descending, bending or contracting and the like.

Preferably, in this embodiment, the operation information may be further divided into actions such as stretching hands, stretching legs, raising head, nodding head, or twisting waist according to the operation state of the steering engine on the robot.

Specifically, in the step, the acquired environmental audio information and the operating states of all the steering engines are matched with the noise database to query target noise data, wherein the target noise data are noise data obtained by overlapping the current environmental noise and the steering engine operating noise.

Step S30, comparing the target noise data with the voice audio information stored in the audio information;

the repeated audio information between the target noise data and the voice audio information can be inquired by adopting a frequency spectrum comparison mode, a power spectrum comparison mode or a cepstrum comparison mode, and preferably, the accuracy of subsequent noise reduction processing is effectively improved by a design of comparing the target noise data with the voice audio information stored in the audio information.

Step S40, carrying out noise reduction processing on the voice audio information according to the comparison result, and carrying out voice recognition on the voice audio information after noise reduction;

specifically, when the target noise data is removed by the passive noise reduction method, the noise reduction circuit may be used to perform noise reduction, that is, the noise reduction circuit is subjected to parameter setting by using the target noise data, and the voice audio information is input to the noise reduction circuit after the parameter setting, so as to remove the target noise data.

Preferably, in the step, when the target noise data is removed by active noise reduction, the target noise data is removed by performing parameter setting on a filter according to the target noise data and inputting the voice audio information to the filter after the parameter setting.

In the embodiment, the design of inquiring the target noise data according to the operation information and the environment audio information is used for acquiring the noise data caused by the current environment factors and the operation of the steering engine, and the target audio data is analyzed to remove the data of the characteristics of the target noise data and obtain relatively clean noise-free audio data, so that the interference of the noise of the environment noise or the steering engine product is reduced, and the accuracy of voice recognition is improved.

Example two

Referring to fig. 2, a flowchart of a speech recognition noise reduction method according to a second embodiment of the present invention is shown, which includes the steps of:

step S11, sequentially controlling the steering engine to execute different operations according to a locally pre-stored sound production state and an operation instruction, and collecting audio data generated when the steering engine executes various operations to obtain sample audio;

specifically, in this step, according to the locally pre-stored sound production state and the operation instruction, the step of sequentially controlling the steering engine to execute different operations includes:

step S111, obtaining the volume value in the sounding state, and obtaining the running number and the operation action in the operation command and the rotation angle of the steering engine of the operation action;

the volume value, the running number and the operation action corresponding to the number can be set according to requirements, for example, when the volume value can be set to 0 decibel, 40 decibel, 50 decibel or 60 decibel, the volume value is used for controlling the sound volume of the sound generating unit on the robot.

Preferably, in this step, the operation action may be a state of ascending, descending, bending, or contracting, and in this embodiment, the operation action may further be divided into a hand stretching, a leg stretching, a head raising, a head nodding, or a waist twisting and the like according to an operation state of a steering engine on the robot, the operation number is a unique number corresponding to each steering engine, and the operation number may be stored in a manner of a number or a character number, for example, a number 1 steering engine, a number 2 steering engine, a number 3 steering engine, a left hand steering engine, a right hand steering engine, a left leg steering engine, a head steering engine, and the like.

Step S112, controlling a corresponding sound production unit to produce sound according to the volume value, and controlling a corresponding steering engine to execute the operation action according to the operation number;

the corresponding sounding unit is controlled to sound according to the volume value, the steering engine is controlled to execute the design of the operation action according to the operation number, noise data generated by environmental factors and steering engine working factors in different environments and different states are collected, the diversity of noise collection is effectively improved, and the accuracy of follow-up voice noise is improved.

Continuing to refer to fig. 2, in step S21, performing deep learning according to the sample audio to obtain the noise database;

and extracting audio features in the sample audio by using an in-depth learning method, training a neural network model based on the audio features, and defining the neural network model as noise to obtain the noise database.

Specifically, in this step, deep learning training may be performed by using a convolutional neural network model, a deep belief network model, or a stacked self-coding network model, and iterative operation may be implemented by using an algorithm of back propagation, stochastic gradient descent, or learning rate attenuation in the model training process.

Step S31, when a control instruction sent by a user is received, sound collection is carried out on the current environment within a first preset time so as to obtain the environment audio information;

the control instruction is used for triggering voice control for the robot, that is, when the control instruction is received, it is determined that the user needs to perform voice control on the robot, preferably, the first preset time may be set according to a requirement, for example, 0.5s, 1s, or 2s, and specifically, in this step, when the control instruction is received, a sound pickup arranged on the robot is correspondingly controlled to perform sound volume collection, so as to obtain the environmental audio information.

Step S41, when sound collection of the current environment is finished, sending out a voice collection prompt, and continuously collecting voice until the voice collection state meets the stop condition, stopping voice collection to obtain the voice audio information;

wherein, the judgment parameter in the stop condition can be set according to the requirement; specifically, in this step, after the step of continuously performing voice acquisition, the method further includes:

When the volume of the current environment is continuously smaller than a volume threshold value within a second preset time, judging that the voice acquisition state meets the stop condition;

wherein, this time threshold and second preset time all can set up according to the demand.

Step S51, acquiring operation information of the steering engine, and inquiring noise data in a noise database according to the operation information and the environmental audio information to obtain target noise data matched with the operation information and the environmental audio information;

specifically, in this step, the querying noise data in a noise database according to the operation information and the environmental audio information includes:

Step S61, carrying out frequency spectrum comparison, power spectrum comparison and cepstrum comparison on the target noise data and the voice audio information to obtain a comparison result;

step S71, carrying out noise reduction processing on the voice audio information according to the comparison result, and carrying out voice recognition on the voice audio information after noise reduction;

in the embodiment, the design of inquiring the target noise data according to the operation information and the environment audio information is used for acquiring the noise data caused by the environment factors and the operation of the steering engine at present, and the design of comparing the target noise data with the voice audio information stored in the audio information is used for accurately deleting the noise in the voice audio information, so that the interference of the environment noise or the operation noise of the steering engine on the voice recognition is prevented, and the accuracy of the voice recognition is improved.

EXAMPLE III

Referring to fig. 3, a schematic structural diagram of a speech recognition noise reduction system 100 according to a third embodiment of the present invention is shown, including: the device comprises an information acquisition module 10, a noise query module 11, an audio comparison module 12 and an audio denoising module 13, wherein:

the information obtaining module 10 is configured to, when a voice instruction sent by a user is received, obtain audio information including the voice instruction and environmental audio information of a current environment where the user is located.

Wherein, the information obtaining module 10 is further configured to: the system comprises a control instruction and a voice acquisition module, wherein the control instruction is used for acquiring sound of the current environment within a first preset time to obtain environment audio information when receiving a control instruction sent by a user, and the control instruction is used for triggering voice control aiming at the robot; and sending a voice acquisition prompt after the sound acquisition of the current environment is finished, and continuously performing voice acquisition until the voice acquisition state meets the stop condition, and stopping the voice acquisition to obtain the voice audio information.

Further, the information obtaining module 10 is further configured to: the voice acquisition state is judged to meet the stop condition when the voice acquisition time is judged to be larger than a time threshold; or when receiving an acquisition stopping instruction, judging that the voice acquisition state meets the stopping condition; or when the volume of the current environment is continuously smaller than a volume threshold value within a second preset time, judging that the voice acquisition state meets the stop condition.

And the noise query module 11 is used for acquiring the operation information of the steering engine, and querying noise data in a noise database according to the operation information and the environment audio information to obtain target noise data matched with the operation information and the environment audio information.

Wherein the noise querying module 11 is further configured to: acquiring sounding numbers and operation actions stored in the operation information and rotation angles of steering engines of the operation actions, and acquiring volume values stored in the environment audio information; and matching the sounding numbers, the operation actions, the rotation angles of steering engines of the operation actions and the volume values with the noise database to obtain the target noise data.

And the audio comparison module 12 is configured to compare the target noise data with the voice audio information stored in the audio information.

Wherein, the audio comparison module 12 is further configured to: and carrying out frequency spectrum comparison, power spectrum comparison and cepstrum comparison on the target noise data and the voice audio information to obtain a comparison result.

And the audio denoising module 13 is configured to perform denoising processing on the voice audio information according to the comparison result, and perform voice recognition on the denoised voice audio information.

In this embodiment, the speech recognition noise reduction system 100 further includes:

the audio acquisition module 14 is used for sequentially controlling the steering engine to execute different operations according to a locally pre-stored sound production state and an operation instruction, and acquiring audio data generated when the steering engine executes various operations to obtain a sample audio; and performing deep learning according to the sample audio to obtain the noise database.

Preferably, the audio acquisition module 14 is further configured to: acquiring the volume value in the sounding state, and acquiring the running number and the operation action in the operation command and the rotation angle of a steering engine of the operation action; according to the volume value control corresponds the sound production unit and carries out the sound production, and according to operation serial number control corresponds the steering wheel is carried out the operation action corresponds the sound production unit and carries out the sound production, wherein, this volume value, operation serial number and the operation action that this serial number corresponds all can set up according to the demand, for example can set up to 0 decibel, 40 decibels, 50 decibels or 60 decibels etc. as the volume value, this volume value is used for controlling the sound production volume of sound production unit on this robot

Preferably, in the module, the operation action may be a state of ascending, descending, bending, or contracting, in this embodiment, the operation action may further be divided into a hand stretching, a leg stretching, a head raising, a head nodding, or a waist twisting, according to an operation state of the steering engine on the robot, the operation number is a unique number corresponding to each steering engine, and the operation number may be stored in a manner of a number or a character number, for example, a No. 1 steering engine, a No. 2 steering engine, a No. 3 steering engine, a left-hand steering engine, a right-hand steering engine, a left-leg steering engine, a head steering engine, or the like.

Example four

Referring to fig. 4, a mobile terminal 101 according to a fourth embodiment of the present invention includes a storage device and a processor, where the storage device is used to store a computer program, and the processor runs the computer program to make the mobile terminal 101 execute the above-mentioned voice recognition noise reduction method, and the mobile terminal 101 may be a robot.

The present embodiment also provides a storage medium on which a computer program used in the above-mentioned mobile terminal 101 is stored, which when executed, includes the steps of:

and performing noise reduction processing on the voice audio information according to the comparison result, and performing voice recognition on the voice audio information subjected to noise reduction. The storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is used as an example, in practical applications, the above-mentioned function distribution may be performed by different functional units or modules according to needs, that is, the internal structure of the storage device is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit, and the integrated unit may be implemented in a form of hardware, or may be implemented in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application.

Those skilled in the art will appreciate that the component structures shown in fig. 3 are not intended to be limiting of the speech recognition noise reduction system of the present invention and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components, and that the speech recognition noise reduction methods of fig. 1-2 may be implemented using more or fewer components than those shown in fig. 3, or some components in combination, or a different arrangement of components. The units, modules, etc. referred to herein are a series of computer programs that can be executed by a processor (not shown) of the current speech recognition noise reduction system and that can perform specific functions, and all of them can be stored in a storage device (not shown) of the current speech recognition noise reduction system.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method for speech recognition noise reduction, the method comprising:

2. The speech recognition noise reduction method of claim 1, wherein the step of receiving a speech instruction sent by a user is preceded by the method further comprising:

3. The method of speech recognition noise reduction according to claim 2, wherein the sequentially controlling the steering engine to perform different operations comprises:

4. The method of speech recognition noise reduction according to claim 1, wherein the querying noise data in a noise database based on the operational information and the environmental audio information comprises:

5. The method of reducing noise in speech recognition according to claim 1, wherein comparing the target noise data with speech audio information stored in the audio information comprises:

6. The speech recognition noise reduction method of claim 1, wherein the step of receiving a speech instruction sent by a user is preceded by the method further comprising:

7. The speech recognition noise reduction method of claim 6, wherein after the step of continuing speech acquisition, the method further comprises:

And when the volume of the current environment is continuously smaller than a volume threshold value within a second preset time, judging that the voice acquisition state meets the stop condition.

8. A speech recognition noise reduction system, the system comprising:

9. A mobile terminal, characterized by comprising a storage device for storing a computer program and a processor for executing the computer program to make the mobile terminal execute the speech recognition noise reduction method according to any one of claims 1 to 7.

10. A storage medium, characterized in that it stores a computer program for use in a mobile terminal according to claim 9, which computer program, when executed by a processor, implements the steps of the speech recognition noise reduction method according to any one of claims 1 to 7.