CN111640447A - Method for reducing noise of audio signal and terminal equipment - Google Patents

Method for reducing noise of audio signal and terminal equipment Download PDF

Info

Publication number
CN111640447A
CN111640447A CN202010458428.6A CN202010458428A CN111640447A CN 111640447 A CN111640447 A CN 111640447A CN 202010458428 A CN202010458428 A CN 202010458428A CN 111640447 A CN111640447 A CN 111640447A
Authority
CN
China
Prior art keywords
audio signal
spoken language
user
preset
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010458428.6A
Other languages
Chinese (zh)
Other versions
CN111640447B (en
Inventor
周林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Genius Technology Co Ltd
Original Assignee
Guangdong Genius Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Genius Technology Co Ltd filed Critical Guangdong Genius Technology Co Ltd
Priority to CN202010458428.6A priority Critical patent/CN111640447B/en
Publication of CN111640447A publication Critical patent/CN111640447A/en
Application granted granted Critical
Publication of CN111640447B publication Critical patent/CN111640447B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The embodiment of the invention discloses a method for reducing noise of an audio signal and terminal equipment, which are applied to the technical field of terminals and can solve the problems that the noise in the audio signal is high and an effective audio signal cannot be obtained. The method comprises the following steps: collecting a first audio signal; judging whether an interference audio signal which does not accord with the pre-stored user sound characteristics exists in the first audio signal, wherein the pre-stored user sound characteristics comprise: at least one of a timbre parameter of the user's voice, a pitch parameter of the user's voice, and a loudness parameter of the user's voice; and if the interference audio signal which does not accord with the pre-stored user sound characteristic exists, filtering the interference audio signal from the first audio signal to obtain a second audio signal, wherein the second audio signal is the audio signal of the voice of the user. The method is applied to scenes with noisy sound environment.

Description

Method for reducing noise of audio signal and terminal equipment
Technical Field
The embodiment of the invention relates to the technical field of terminals, in particular to a method for reducing noise of an audio signal and terminal equipment.
Background
At present, most home education equipment on the market mostly has spoken language evaluation function, and the home education equipment can receive user's spoken language pronunciation, and carry out the evaluation to spoken language pronunciation, when the user is in and adopts the home education equipment to carry out spoken language evaluation in noisy environment, probably including other audio signal except the audio signal of user's pronunciation in the audio signal that the home education equipment received, the noise is great among the audio signal, can't acquire effectual audio signal, lead to the inaccurate problem of spoken language evaluation result.
Disclosure of Invention
The embodiment of the invention provides a method for reducing noise of an audio signal and terminal equipment, which are used for solving the problems that in the prior art, the noise of the audio signal is high and an effective audio signal cannot be obtained. In order to solve the above technical problem, the embodiment of the present invention is implemented as follows:
in a first aspect, a method for reducing noise of an audio signal is provided, which is applied in an environment with a noisy sound environment, and the method includes: collecting a first audio signal;
judging whether an interference audio signal which does not accord with a pre-stored user sound characteristic exists in the first audio signal, wherein the pre-stored user sound characteristic comprises: at least one of a timbre parameter of the user sound, a pitch parameter of the user sound, and a loudness parameter of the user sound;
and if the interference audio signal which does not accord with the pre-stored user sound characteristic exists, filtering the interference audio signal from the first audio signal to obtain a second audio signal, and taking the second audio signal as the audio signal of the voice of the user.
Optionally, after obtaining the audio signal of the voice of the user, the method further includes:
identifying speech content in an audio signal of the user's speech;
determining the matching degree of the voice content and the preset spoken language evaluation content;
and generating the spoken language evaluation score of the user according to the matching degree, and outputting the spoken language evaluation score.
Optionally, the outputting the spoken language evaluation score includes displaying the spoken language evaluation score,
after generating the spoken language evaluation score of the user according to the matching degree, the method further comprises the following steps:
displaying identification information corresponding to the spoken language evaluation score;
the identification information is at least one of animation, expression or character evaluation.
Optionally, after the spoken language evaluation score is output, the method includes:
judging whether the spoken language evaluation score is less than or equal to the preset score or not;
if the spoken language evaluation score is smaller than or equal to the preset score, playing the audio frequency of the spoken language training at a first preset volume, wherein the first preset volume is larger than the standard volume;
alternatively, the first and second electrodes may be,
if the spoken language evaluation score is smaller than or equal to the preset score, judging whether an earphone is connected or not;
if the earphone is connected, playing the audio frequency of the spoken language training at a second preset volume, wherein the second preset volume is smaller than the standard volume; if the user is connected with the earphone, outputting prompt information according to the first preset volume so as to prompt the user to wear the earphone to listen to the audio frequency of the spoken language training.
Optionally, after obtaining the audio signal of the voice of the user, the method further includes:
identifying speech content and sound features in an audio signal of the user's speech;
determining the matching degree of the voice content and the preset spoken language evaluation content;
judging whether the matching degree is greater than or equal to a preset matching degree;
if the voice feature is larger than or equal to the preset matching degree, determining the similarity between the voice feature and the voice feature in the preset spoken language evaluation;
and generating the evaluation score of the spoken language imitation ability of the user according to the similarity, and outputting the evaluation score of the spoken language imitation ability.
In a second aspect, a terminal device is provided, which includes: the acquisition module is used for acquiring a first audio signal;
a determining module, configured to determine whether an interfering audio signal that does not conform to a pre-stored user sound characteristic exists in the first audio signal, where the pre-stored user sound characteristic includes: at least one of a timbre parameter of the user sound, a pitch parameter of the user sound, and a loudness parameter of the user sound;
and the processing module is used for filtering the interference audio signal from the first audio signal to obtain a second audio signal if the interference audio signal which does not accord with the pre-stored user sound characteristics exists, and taking the second audio signal as the voice signal of the user.
Optionally, the processing module is further configured to identify a voice content in an audio signal of the voice of the user;
determining the matching degree of the voice content and the preset spoken language evaluation content;
and generating the spoken language evaluation score of the user according to the matching degree, and outputting the spoken language evaluation score.
Optionally, the outputting the spoken language evaluation score includes displaying the spoken language evaluation score,
the processing module is specifically used for generating the spoken language evaluation score of the user and displaying identification information corresponding to the spoken language evaluation score according to the matching degree;
the identification information is at least one of animation, expression or character evaluation.
Optionally, the determining module is further configured to determine whether the spoken language evaluation score is less than or equal to the preset score after the processing module outputs the spoken language evaluation score; if the spoken language evaluation score is smaller than or equal to the preset score, playing the audio frequency of the spoken language training at a first preset volume, wherein the first preset volume is larger than the standard volume;
alternatively, the first and second electrodes may be,
if the spoken language evaluation score is smaller than or equal to the preset score, judging whether an earphone is connected or not;
if the earphone is connected, playing the audio frequency of the spoken language training at a second preset volume, wherein the second preset volume is smaller than the standard volume; if the user is connected with the earphone, outputting prompt information according to the first preset volume so as to prompt the user to wear the earphone to listen to the audio frequency of the spoken language training.
Optionally, the processing module is further configured to, after obtaining the audio signal of the voice of the user, identify voice content and voice features in the audio signal of the voice of the user;
determining the matching degree of the voice content and the preset spoken language evaluation content;
the judging module is also used for judging whether the matching degree is greater than or equal to a preset matching degree; if the voice feature is larger than or equal to the preset matching degree, determining the similarity between the voice feature and the voice feature in the preset spoken language evaluation;
and the processing module is also used for generating the evaluation score of the spoken language imitation ability of the user according to the similarity and outputting the evaluation score of the spoken language imitation ability.
In a third aspect, a terminal device is provided, including:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to execute the method for reducing the noise of the audio signal in the first aspect of the embodiment of the present invention.
In a fourth aspect, a computer-readable storage medium is provided, which stores a computer program, the computer program causing a computer to execute the method for reducing noise of an audio signal in the first aspect of the embodiments of the present invention. The computer readable storage medium includes a ROM/RAM, a magnetic or optical disk, or the like.
In a fifth aspect, there is provided a computer program product for causing a computer to perform some or all of the steps of any one of the methods of the first aspect when the computer program product is run on the computer.
A sixth aspect provides an application publishing platform for publishing a computer program product, wherein the computer program product, when run on a computer, causes the computer to perform some or all of the steps of any one of the methods of the first aspect.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, the terminal equipment can compare the pre-stored user sound characteristics with the first audio signals collected by the terminal equipment to determine the interference audio signals which do not accord with the pre-stored user sound characteristics from the first audio signals, and filter the interference audio signals to obtain the second audio signals, and the second audio signals are used as the audio signals of the voice of the user. Through the scheme, when the terminal equipment is applied to the noisy environment of the sound environment, the noise can be filtered to obtain the audio signal of the voice of the user, so that the noise in the audio signal can be eliminated, and the effective audio signal can be obtained.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a first flowchart illustrating a method for reducing noise in an audio signal according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for reducing noise in an audio signal according to an embodiment of the present invention;
FIG. 3 is a third flowchart illustrating a method for reducing noise in an audio signal according to an embodiment of the present invention;
FIG. 4 is a fourth flowchart illustrating a method for reducing noise in an audio signal according to an embodiment of the present invention;
fig. 5 is a first schematic structural diagram of a terminal device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first" and "second," and the like, in the description and in the claims of the present invention are used for distinguishing between different objects and not for describing a particular order of the objects. For example, the first audio signal and the second audio signal, etc. are for distinguishing different audio signals, rather than for describing a particular order of the audio signals.
The terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, in the embodiments of the present invention, words such as "exemplary" or "for example" are used to indicate examples, illustrations or explanations. Any embodiment or design described as "exemplary" or "e.g.," an embodiment of the present invention is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
The embodiment of the invention provides a method for reducing noise of an audio signal and a terminal device, which can filter the noise to obtain the audio signal of voice of a user when the terminal device is applied to a noisy environment of a sound environment, so that the noise in the audio signal can be eliminated and an effective audio signal can be obtained.
The terminal device according to the embodiment of the present invention may be an electronic device such as a Mobile phone, a tablet Computer, a notebook Computer, a palmtop Computer, a vehicle-mounted terminal device, a wearable device, an Ultra-Mobile Personal Computer (UMPC), a netbook, or a Personal Digital Assistant (PDA). The wearable device may be a smart watch, a smart bracelet, a watch phone, a smart foot ring, a smart earring, a smart necklace, a smart headset, or the like, and the embodiment of the present invention is not limited.
The execution subject of the method for reducing noise of an audio signal provided in the embodiment of the present invention may be the terminal device, or may also be a functional module and/or a functional entity capable of implementing the method for reducing noise of an audio signal in the terminal device, which may be specifically determined according to actual use requirements, and the embodiment of the present invention is not limited. The following takes a terminal device as an example to exemplarily describe the method for reducing the noise of the audio signal according to the embodiment of the present invention.
The method for reducing the noise of the audio signal provided by the embodiment of the invention can be applied to a noisy scene of a sound environment, and is particularly suitable for an application scene of language training in a noisy environment.
Example one
As shown in fig. 1, an embodiment of the present invention provides a method for reducing noise of an audio signal, which may include the steps of:
101. a first audio signal is acquired.
Optionally, a function control corresponding to the method provided by the embodiment of the present invention may be set in the terminal device, and the terminal device may be triggered to turn on/off a function (hereinafter referred to as a target function) corresponding to the method provided by the embodiment of the present invention through the function control, that is, after the function is triggered to be turned on, the method provided by the embodiment of the present invention may be adopted to reduce the noise of the audio signal.
In an optional implementation manner, before the foregoing 101, the method may further include: whether the location of the terminal equipment is a public location is determined through the positioning information, and if the location of the terminal equipment is determined to be the public location, the reminding information can be output to remind a user to start a target function in the terminal equipment and reduce the noise of audio signals.
In the optional implementation mode, whether the place where the terminal equipment is located is a public place or not can be judged through the positioning information, so that the user can be reminded to start the target function in the places where the environment is noisy in the public places, and the intelligent adjustment is more intelligent and humanized to the processing mode of the audio signals.
Further optionally, if it is determined that the location of the terminal device is a public location, it may be further detected whether the location of the terminal device is a location where the user has started the target function, and if the location where the user has started the target function, the target function is directly started without outputting the reminding information; if the target function is not started by the user once, the reminding information can be output to remind the user to start the target function in the terminal equipment, and the noise of the audio signal is reduced.
Whether the public place is the place where the user opens the target function once is further judged, so that the habit of using the terminal equipment by the user can be associated with the place where the terminal equipment is located, the target function can be automatically opened when the user enters the public place where the user opens the target function once again, manual opening by the user is not needed, and the user is prompted to open the target function when the user enters the public place where the user does not open the target function once.
102. And judging whether the first audio signal has an interference audio signal which does not accord with the pre-stored user sound characteristic.
In the embodiment of the present invention, the pre-stored user voice characteristics include: at least one of a timbre parameter of the user's voice, a pitch parameter of the user's voice, and a loudness parameter of the user's voice.
Generally, a sound has three properties of tone, loudness and timbre, and a sound wave corresponds to the sound, and the sound wave also has three properties of frequency, amplitude and waveform.
In the embodiment of the present invention, the pitch parameter of the user sound may be determined according to the frequency of the sound wave, the loudness parameter of the user sound may be determined according to the amplitude of the sound wave, and the timbre parameter of the user sound may be determined by the waveform of the sound wave.
If there is an interfering audio signal that does not match the pre-stored user sound characteristics, the following 103 and 104 may be performed; if there is no interfering audio signal that does not match the pre-stored user sound characteristics, the following 105 can be performed.
103. The interfering audio signal is filtered from the first audio signal to obtain a second audio signal.
104. The second audio signal is taken as the audio signal of the user's voice.
In the embodiment of the invention, the audio signal of the voice of the user is obtained after the interference signal which does not accord with the pre-stored sound characteristic of the user is removed.
105. The first audio signal is taken as the audio signal of the user's voice.
In case no disturbing audio signal is present in the first audio signal which does not comply with the pre-stored sound characteristics of the user, the first audio signal may be considered to be the audio signal of the speech of the user.
In the embodiment of the invention, the terminal equipment can compare the pre-stored user sound characteristics with the first audio signals collected by the terminal equipment to determine the interference audio signals which do not accord with the pre-stored user sound characteristics from the first audio signals, and filter the interference audio signals to obtain the second audio signals, and the second audio signals are used as the audio signals of the voice of the user. Through the scheme, when the terminal equipment is applied to the noisy environment of the sound environment, the noise can be filtered to obtain the audio signal of the voice of the user, so that the noise in the audio signal can be eliminated, and the effective audio signal can be obtained.
As shown in fig. 2, an embodiment of the present invention provides a method for reducing noise of an audio signal, which may include the steps of:
201. a first audio signal is acquired.
202. And judging whether the first audio signal has an interference audio signal which does not accord with the pre-stored user sound characteristic.
If there is an interfering audio signal that does not match the pre-stored user sound characteristics, the following 203 and 204 may be performed; if there is no interfering audio signal that does not conform to the pre-stored user voice characteristics, the following 205 may be performed.
203. The interfering audio signal is filtered from the first audio signal to obtain a second audio signal.
204. The second audio signal is taken as the audio signal of the user's voice.
205. The first audio signal is taken as the audio signal of the user's voice.
The descriptions of 201 to 205 may refer to the descriptions of 101 to 105 in the above embodiments, and are not repeated here.
In the embodiment of the invention, the terminal equipment can compare the pre-stored user sound characteristics with the first audio signals collected by the terminal equipment to determine the interference audio signals which do not accord with the pre-stored user sound characteristics from the first audio signals, and filter the interference audio signals to obtain the second audio signals, and the second audio signals are used as the audio signals of the voice of the user. Through the scheme, when the terminal equipment is applied to the noisy environment of the sound environment, the noise can be filtered to obtain the audio signal of the voice of the user, so that the noise in the audio signal can be eliminated, and the effective audio signal can be obtained.
206. Speech content in an audio signal of a user's speech is identified.
Alternatively, after 205, 206 may be replaced by recognizing the speech content in the first audio signal and continuing 207 and 208 as described below.
207. And determining the matching degree of the voice content and the preset spoken language evaluation content.
For example, assuming that the second audio signal is an audio signal for a user to read a certain english word, a certain english sentence, or a certain piece of poetry, the preset spoken language evaluation content may be a pre-stored standard audio of the certain english word, the certain english sentence, or the certain piece of poetry.
Taking the second audio signal as an example of reading a certain english word, the second audio signal may be recognized to obtain the pronunciation of the user for the english word, and determine the matching degree between the pronunciation of the user for the english word and the pronunciation in the preset spoken language evaluation content (the standard audio of the certain english word).
208. And generating a spoken language evaluation score of the user according to the matching degree, and outputting the spoken language evaluation score.
According to the embodiment of the invention, the spoken language evaluation score can be generated and output according to the matching degree of the voice content of the user and the preset spoken language evaluation content, so that the user can be helped to know the self learning condition, the user can conveniently know the learning result, and the learning efficiency is improved.
Optionally, outputting the spoken language evaluation score includes displaying the spoken language evaluation score,
after generating the spoken language evaluation score of the user according to the matching degree, the method further comprises the following steps:
displaying identification information corresponding to the spoken language evaluation score;
the identification information is at least one of animation, expression or character evaluation.
Furthermore, the corresponding identification information can be displayed while the spoken language evaluation score is displayed, so that the interestingness of interface display is improved.
As shown in fig. 3, an embodiment of the present invention provides a method for reducing noise of an audio signal, which may include the steps of:
301. a first audio signal is acquired.
302. And judging whether the first audio signal has an interference audio signal which does not accord with the pre-stored user sound characteristic.
Wherein the pre-stored user voice characteristics include: at least one of a timbre parameter of the user's voice, a pitch parameter of the user's voice, and a loudness parameter of the user's voice.
If there is an interfering audio signal that does not match the pre-stored user sound characteristics, the following 303 and 304 may be performed; if there is no interfering audio signal that does not conform to the pre-stored user sound characteristics, the following 305 may be performed.
303. The interfering audio signal is filtered from the first audio signal to obtain a second audio signal.
304. The second audio signal is taken as the audio signal of the user's voice.
305. The first audio signal is taken as the audio signal of the user's voice.
In the embodiment of the invention, the terminal equipment can compare the pre-stored user sound characteristics with the first audio signals collected by the terminal equipment to determine the interference audio signals which do not accord with the pre-stored user sound characteristics from the first audio signals, and filter the interference audio signals to obtain the second audio signals, and the second audio signals are used as the audio signals of the voice of the user. Through the scheme, when the terminal equipment is applied to the noisy environment of the sound environment, the noise can be filtered to obtain the audio signal of the voice of the user, so that the noise in the audio signal can be eliminated, and the effective audio signal can be obtained.
306. Speech content in an audio signal of a user's speech is identified.
307. And determining the matching degree of the voice content and the preset spoken language evaluation content.
308. And generating a spoken language evaluation score of the user according to the matching degree, and outputting the spoken language evaluation score.
According to the embodiment of the invention, the spoken language evaluation score can be generated and output according to the matching degree of the voice content of the user and the preset spoken language evaluation content, so that the user can be helped to know the self learning condition, the user can conveniently know the learning result, and the learning efficiency is improved.
For the descriptions 301 to 308, reference may be made to the descriptions 201 to 208 in the second embodiment, which are not repeated herein.
309. And judging whether the oral evaluation score is less than or equal to a preset score.
If the oral evaluation score is less than or equal to the preset score, the astaxanthin 310 is executed, and if the oral evaluation score is less than or equal to the preset score, the flow of the embodiment is directly ended.
310. And playing the audio frequency of the spoken language training at a first preset volume.
The audio frequency of the spoken language training can be a standard audio frequency which is recorded in advance and aims at the training content of a certain spoken language. For example, a pronunciation that is a standard english/american pronunciation of a word.
The first preset volume is greater than the standard volume. In the embodiment of the present invention, the standard volume refers to a default volume set by the terminal device or the user when the headset is not connected.
In this embodiment, when the spoken language evaluation score is less than or equal to the preset score, the audio of the spoken language training may be played to be listened to by the user in order to help the user to master the standard pronunciation or correct the spoken language evaluation content, so as to provide a timely learning guidance for the user.
Furthermore, when the method provided by the embodiment of the invention is used in a noisy sound environment scene, the played audio frequency user for spoken language training cannot clearly hear due to the surrounding environment, so that the audio frequency can be played by adopting the first preset volume which is larger than the standard volume, the volume can be adjusted according to the environment, the playing mode is more intelligent, and the flexibility is stronger.
Optionally, in another optional implementation, if the spoken language evaluation score is less than or equal to the preset score, determining whether to connect the earphone; if the earphone is connected, the audio frequency of the spoken language training is played at a second preset volume, and the second preset volume is smaller than the standard volume; if the earphone is not connected, outputting prompt information with a first preset volume to prompt a user to wear the earphone to listen to the audio frequency of the spoken language training.
The method provided by the embodiment of the invention can further judge whether the terminal equipment is connected with the earphone or not, and play the terminal equipment at the second preset volume which is smaller than the standard volume when the terminal equipment is connected with the earphone, so that the situation that the volume is required to be adjusted again because the sound heard by a user is too loud when the terminal equipment is worn when the terminal equipment is played at the standard volume is avoided; when the terminal equipment is not connected with the earphone, the prompt information with the first larger preset volume is used for prompting the user to wear the earphone to listen to the audio frequency of the spoken language training, the user is prevented from neglecting the prompt information due to the noisy sound environment, and the man-machine interaction performance is enhanced.
In an alternative implementation, in a scenario where there are multiple students simultaneously performing spoken language training in a classroom of a spoken language class, a terminal device of a teacher may be associated with a terminal device of each student.
Wherein, the teacher can determine the content of the spoken training through the terminal device thereof and transmit the content of the spoken training to the terminal device of each student (for example, a certain english sentence for the spoken training can be determined and the content of the english sentence can be transmitted to the terminal device of each student, wherein the content of the sentence can include the standard pronunciation of the english sentence stored in advance).
Furthermore, when students perform spoken language training, the terminal device of each student can collect audio signals, and at this time, audio signals for performing spoken language training of other students may exist in the collected audio signals, so that the collected audio signals can be processed by the method in the embodiment of the present invention to obtain audio signals for performing spoken language training of the current student, and further, the audio signals for performing spoken language training of the student are sent to the terminal device of the teacher, and the terminal device of the teacher can analyze the received audio signals for performing spoken language training of each student to obtain scores for performing spoken language training of each student, and can classify students with similar problems, for example, classify students with problems of the same kind, classify students with wrong reading in up and down english sentences into one kind, classify students with problems in continuous reading english sentences into one kind, and display scores for performing spoken language training of each student, and the classification result is convenient for teachers to master the spoken language training condition of each student according to the content displayed in the terminal equipment, and the specific teaching is carried out, so that the teaching efficiency and the teaching pertinence can be improved, and the auxiliary teaching is provided for the actual teaching.
As shown in fig. 4, an embodiment of the present invention provides a method for reducing noise of an audio signal, the method including:
401. a first audio signal is acquired.
402. And judging whether the first audio signal has an interference audio signal which does not accord with the pre-stored user sound characteristic.
Wherein the pre-stored user voice characteristics include: at least one of a timbre parameter of the user's voice, a pitch parameter of the user's voice, and a loudness parameter of the user's voice.
If there is an interfering audio signal that does not meet the pre-stored user voice characteristics, then astaxanthin 403 and 404 are executed; if there is no interfering audio signal that does not match the pre-stored user sound characteristics, the following 405 is performed.
403. The interfering audio signal is filtered from the first audio signal to obtain a second audio signal.
404. The second audio signal is taken as the audio signal of the user's voice.
405. The first audio signal is taken as the audio signal of the user's voice.
For the above descriptions 401 to 405, reference may be made to the description of embodiments 101 to 105 in the first embodiment, and details are not repeated here.
In the embodiment of the invention, the terminal equipment can compare the pre-stored user sound characteristics with the first audio signals collected by the terminal equipment to determine the interference audio signals which do not accord with the pre-stored user sound characteristics from the first audio signals, and filter the interference audio signals to obtain the second audio signals, and the second audio signals are used as the audio signals of the voice of the user. Through the scheme, when the terminal equipment is applied to the noisy environment of the sound environment, the noise can be filtered to obtain the audio signal of the voice of the user, so that the noise in the audio signal can be eliminated, and the effective audio signal can be obtained.
406. Speech content and sound features in an audio signal of a user's speech are identified.
407. And determining the matching degree of the voice content and the preset spoken language evaluation content.
408. And judging whether the matching degree is greater than or equal to a preset matching degree.
If the matching degree is greater than or equal to the preset matching degree, the following 409 and 410 are executed, and if the matching degree is less than the preset matching degree, the above 401 is executed in a return mode.
409. And determining the similarity between the sound characteristics and the sound characteristics in the preset spoken language assessment.
410. And generating an evaluation score of the spoken language imitation ability of the user according to the similarity, and outputting the evaluation score of the spoken language imitation ability.
The method provided by the embodiment of the invention can be applied to a game or competition simulating sound, not only can identify the voice content in the voice audio signal of the user, but also can identify the sound characteristics in the voice audio signal of the user, and only can not determine the similarity with the sound characteristics in the preset spoken language evaluation and give the evaluation score of the spoken language simulation capability when the matching degree of the voice content and the preset spoken language evaluation content is higher, so that the method can be applied to more scenes to realize the evaluation of the spoken language simulation capability.
EXAMPLE five
As shown in fig. 5, an embodiment of the present invention provides a terminal device, where the terminal device includes:
an acquisition module 501, configured to acquire a first audio signal;
a determining module 502, configured to determine whether there is an interfering audio signal that does not conform to a pre-stored user sound characteristic in the first audio signal, where the pre-stored user sound characteristic includes: at least one of a timbre parameter of the user's voice, a pitch parameter of the user's voice, and a loudness parameter of the user's voice;
the processing module 503 is configured to filter the interference audio signal from the first audio signal to obtain a second audio signal if the interference audio signal does not meet the pre-stored user sound characteristics, and use the second audio signal as a signal of the voice of the user.
Optionally, the processing module 503 is further configured to recognize a voice content in the signal of the voice of the user;
determining the matching degree of the voice content and the preset spoken language evaluation content;
and generating a spoken language evaluation score of the user according to the matching degree, and outputting the spoken language evaluation score.
Optionally, outputting the spoken language evaluation score includes displaying the spoken language evaluation score,
the processing module 503 is specifically configured to generate a spoken language evaluation score of the user according to the matching degree, and display identification information corresponding to the spoken language evaluation score;
the identification information is at least one of animation, expression or character evaluation.
Optionally, the determining module 502 is further configured to determine whether the spoken language evaluation score is less than or equal to a preset score after the processing module 503 outputs the spoken language evaluation score; if the spoken language evaluation score is less than or equal to the preset score, playing the audio frequency of the spoken language training at a first preset volume, wherein the first preset volume is greater than the standard volume;
alternatively, the first and second electrodes may be,
if the spoken language evaluation score is less than or equal to the preset score, judging whether the earphone is connected or not;
if the earphone is connected, the audio frequency of the spoken language training is played at a second preset volume, and the second preset volume is smaller than the standard volume; if the earphone is not connected, outputting prompt information with a first preset volume to prompt a user to wear the earphone to listen to the audio frequency of the spoken language training.
Optionally, the processing module 503 is further configured to, after obtaining the signal of the voice of the user, identify voice content and voice features in the signal of the voice of the user;
determining the matching degree of the voice content and the preset spoken language evaluation content;
the determining module 502 is further configured to determine whether the matching degree is greater than or equal to a preset matching degree; if the voice feature is larger than or equal to the preset matching degree, determining the similarity between the voice feature and the voice feature in the preset spoken language evaluation;
the processing module 503 is further configured to generate an evaluation score of the spoken language simulation ability of the user according to the similarity, and output the evaluation score of the spoken language simulation ability.
As shown in fig. 6, an embodiment of the present invention further provides a terminal device, where the terminal device may include:
a memory 601 in which executable program code is stored;
a processor 602 coupled to a memory 601;
the processor 602 calls the executable program code stored in the memory 601 to execute the method for reducing the noise of the audio signal executed by the terminal device in the above embodiments of the methods.
It should be noted that the terminal device shown in fig. 6 may further include components, which are not shown, such as a battery, an input key, a speaker, a microphone, a screen, an RF circuit, a Wi-Fi module, a bluetooth module, and a sensor, which are not described in detail in this embodiment.
Embodiments of the present invention provide a computer-readable storage medium storing a computer program, wherein the computer program causes a computer to execute some or all of the steps of the method as in the above method embodiments.
Embodiments of the present invention also provide a computer program product, wherein the computer program product, when run on a computer, causes the computer to perform some or all of the steps of the method as in the above method embodiments.
Embodiments of the present invention further provide an application publishing platform, where the application publishing platform is configured to publish a computer program product, where the computer program product, when running on a computer, causes the computer to perform some or all of the steps of the method in the above method embodiments.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art should also appreciate that the embodiments described in this specification are exemplary and alternative embodiments, and that the acts and modules illustrated are not required in order to practice the invention.
The terminal device, the computer-readable storage medium, the computer program product, and the application distribution platform provided in the embodiments of the present invention can implement each process shown in the above method embodiments, and implement similar technical effects, and are not described herein again to avoid repetition.
In various embodiments of the present invention, it should be understood that the sequence numbers of the above-mentioned processes do not imply an inevitable order of execution, and the execution order of the processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solution of the present invention, which is a part of or contributes to the prior art in essence, or all or part of the technical solution, can be embodied in the form of a software product, which is stored in a memory and includes several requests for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute part or all of the steps of the above-described method of each embodiment of the present invention.
It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by instructions associated with a program, which may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), compact disc-Read-Only Memory (CD-ROM), or other Memory, magnetic disk, magnetic tape, or magnetic tape, Or any other medium which can be used to carry or store data and which can be read by a computer.

Claims (11)

1. A method of reducing noise in an audio signal for use in an environment having a noisy acoustic environment, the method comprising:
collecting a first audio signal;
judging whether an interference audio signal which does not accord with a pre-stored user sound characteristic exists in the first audio signal, wherein the pre-stored user sound characteristic comprises: at least one of a timbre parameter of the user sound, a pitch parameter of the user sound, and a loudness parameter of the user sound;
and if the interference audio signal which does not accord with the pre-stored user sound characteristic exists, filtering the interference audio signal from the first audio signal to obtain a second audio signal, and taking the second audio signal as the audio signal of the voice of the user.
2. The method of claim 1, wherein after obtaining the second audio signal, further comprising:
identifying speech content in an audio signal of the user's speech;
determining the matching degree of the voice content and the preset spoken language evaluation content;
and generating the spoken language evaluation score of the user according to the matching degree, and outputting the spoken language evaluation score.
3. The method of claim 2, wherein said outputting said spoken language evaluation score comprises displaying said spoken language evaluation score,
after generating the spoken language evaluation score of the user according to the matching degree, the method further comprises the following steps:
displaying identification information corresponding to the spoken language evaluation score;
the identification information is at least one of animation, expression or character evaluation.
4. The method of claim 2, wherein after said outputting said spoken language assessment score, said method comprises:
judging whether the spoken language evaluation score is less than or equal to the preset score or not;
if the spoken language evaluation score is smaller than or equal to the preset score, playing the audio frequency of the spoken language training at a first preset volume, wherein the first preset volume is larger than the standard volume;
alternatively, the first and second electrodes may be,
if the spoken language evaluation score is smaller than or equal to the preset score, judging whether an earphone is connected or not;
if the earphone is connected, playing the audio frequency of the spoken language training at a second preset volume, wherein the second preset volume is smaller than the standard volume; if the earphone is not connected, outputting prompt information according to the first preset volume so as to prompt a user to wear the earphone to listen to the audio frequency of the spoken language training.
5. The method of any of claims 1 to 4, wherein after obtaining the second audio signal, further comprising:
identifying speech content and sound features in an audio signal of the user's speech;
determining the matching degree of the voice content and the preset spoken language evaluation content;
judging whether the matching degree is greater than or equal to a preset matching degree;
if the voice feature is larger than or equal to the preset matching degree, determining the similarity between the voice feature and the voice feature in the preset spoken language evaluation;
and generating the evaluation score of the spoken language imitation ability of the user according to the similarity, and outputting the evaluation score of the spoken language imitation ability.
6. A terminal device, comprising:
the acquisition module is used for acquiring a first audio signal;
a determining module, configured to determine whether an interfering audio signal that does not conform to a pre-stored user sound characteristic exists in the first audio signal, where the pre-stored user sound characteristic includes: at least one of a timbre parameter of the user sound, a pitch parameter of the user sound, and a loudness parameter of the user sound;
and the processing module is used for filtering the interference audio signal from the first audio signal to obtain a second audio signal if the interference audio signal which does not accord with the pre-stored user sound characteristics exists, and taking the second audio signal as the voice signal of the user.
7. The terminal device of claim 6,
the processing module is further used for identifying voice content in the audio signal of the voice of the user;
determining the matching degree of the voice content and the preset spoken language evaluation content;
and generating the spoken language evaluation score of the user according to the matching degree, and outputting the spoken language evaluation score.
8. The terminal device of claim 7, wherein the outputting the spoken language evaluation score comprises displaying the spoken language evaluation score,
the processing module is specifically used for generating the spoken language evaluation score of the user and displaying identification information corresponding to the spoken language evaluation score according to the matching degree;
the identification information is at least one of animation, expression or character evaluation.
9. The terminal device of claim 7,
the judging module is further used for judging whether the spoken language evaluation score is smaller than or equal to the preset score or not after the processing module outputs the spoken language evaluation score; if the spoken language evaluation score is smaller than or equal to the preset score, playing the audio frequency of the spoken language training at a first preset volume, wherein the first preset volume is larger than the standard volume;
alternatively, the first and second electrodes may be,
if the spoken language evaluation score is smaller than or equal to the preset score, judging whether an earphone is connected or not;
if the earphone is connected, playing the audio frequency of the spoken language training at a second preset volume, wherein the second preset volume is smaller than the standard volume; if the earphone is not connected, outputting prompt information according to the first preset volume so as to prompt a user to wear the earphone to listen to the audio frequency of the spoken language training.
10. The terminal device according to any of claims 6 to 9,
the processing module is further configured to, after obtaining the audio signal of the voice of the user, identify voice content and voice features in the audio signal of the voice of the user;
determining the matching degree of the voice content and the preset spoken language evaluation content;
the judging module is also used for judging whether the matching degree is greater than or equal to a preset matching degree; if the voice feature is larger than or equal to the preset matching degree, determining the similarity between the voice feature and the voice feature in the preset spoken language evaluation;
and the processing module is also used for generating the evaluation score of the spoken language imitation ability of the user according to the similarity and outputting the evaluation score of the spoken language imitation ability.
11. A computer storage medium characterized by storing a computer program that causes a computer to execute the method of reducing noise of an audio signal according to the first aspect of the embodiment of the present invention. The computer readable storage medium includes a ROM/RAM, a magnetic or optical disk, or the like.
CN202010458428.6A 2020-05-26 2020-05-26 Method for reducing noise of audio signal and terminal equipment Active CN111640447B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010458428.6A CN111640447B (en) 2020-05-26 2020-05-26 Method for reducing noise of audio signal and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010458428.6A CN111640447B (en) 2020-05-26 2020-05-26 Method for reducing noise of audio signal and terminal equipment

Publications (2)

Publication Number Publication Date
CN111640447A true CN111640447A (en) 2020-09-08
CN111640447B CN111640447B (en) 2023-03-21

Family

ID=72329843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010458428.6A Active CN111640447B (en) 2020-05-26 2020-05-26 Method for reducing noise of audio signal and terminal equipment

Country Status (1)

Country Link
CN (1) CN111640447B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162120A1 (en) * 2007-01-03 2008-07-03 Motorola, Inc. Method and apparatus for providing feedback of vocal quality to a user
US20150002046A1 (en) * 2011-11-07 2015-01-01 Koninklijke Philips N.V. User Interface Using Sounds to Control a Lighting System
CN107481732A (en) * 2017-08-31 2017-12-15 广东小天才科技有限公司 Noise-reduction method, device and terminal device in a kind of spoken test and appraisal
CN108122561A (en) * 2017-12-19 2018-06-05 广东小天才科技有限公司 A kind of spoken voice assessment method and electronic equipment based on electronic equipment
WO2018182763A1 (en) * 2017-03-25 2018-10-04 SpeechAce LLC Teaching and assessment of spoken language skills through fine-grained evaluation of human speech
CN109979473A (en) * 2019-03-29 2019-07-05 维沃移动通信有限公司 A kind of call sound processing method and device, terminal device
CN111128237A (en) * 2019-12-26 2020-05-08 北京大米未来科技有限公司 Voice evaluation method and device, storage medium and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162120A1 (en) * 2007-01-03 2008-07-03 Motorola, Inc. Method and apparatus for providing feedback of vocal quality to a user
US20150002046A1 (en) * 2011-11-07 2015-01-01 Koninklijke Philips N.V. User Interface Using Sounds to Control a Lighting System
WO2018182763A1 (en) * 2017-03-25 2018-10-04 SpeechAce LLC Teaching and assessment of spoken language skills through fine-grained evaluation of human speech
CN107481732A (en) * 2017-08-31 2017-12-15 广东小天才科技有限公司 Noise-reduction method, device and terminal device in a kind of spoken test and appraisal
CN108122561A (en) * 2017-12-19 2018-06-05 广东小天才科技有限公司 A kind of spoken voice assessment method and electronic equipment based on electronic equipment
CN109979473A (en) * 2019-03-29 2019-07-05 维沃移动通信有限公司 A kind of call sound processing method and device, terminal device
CN111128237A (en) * 2019-12-26 2020-05-08 北京大米未来科技有限公司 Voice evaluation method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN111640447B (en) 2023-03-21

Similar Documents

Publication Publication Date Title
Englund et al. Infant directed speech in natural interaction—Norwegian vowel quantity and quality
CN110706536A (en) Voice answering method and device
CN107909995B (en) Voice interaction method and device
CN103080991A (en) Music-based language-learning method, and learning device using same
CN106297790A (en) The voiceprint service system of robot and service control method thereof
CN109658917A (en) E-book chants method, apparatus, computer equipment and storage medium
WO2018038235A1 (en) Auditory training device, auditory training method, and program
CN108806686B (en) Starting control method of voice question searching application and family education equipment
CN109191349A (en) A kind of methods of exhibiting and system of English learning content
CN114121006A (en) Image output method, device, equipment and storage medium of virtual character
Stemberger et al. Phonetic transcription for speech-language pathology in the 21st century
Watt The identification of the individual through speech
KR101779358B1 (en) voice recognition application controlling method based on smartphone
CN110853624A (en) Speech rehabilitation training system
Pucher et al. Influence of speaker familiarity on blind and visually impaired children’s and young adults’ perception of synthetic voices
CN112966090A (en) Dialogue audio data processing method, electronic device, and computer-readable storage medium
CN111640447B (en) Method for reducing noise of audio signal and terminal equipment
KR102134990B1 (en) Voice training system by analyzing section of frequency
Eriksson That voice sounds familiar: Factors in speaker recognition
JP6032584B2 (en) Hearing ability evaluation method, answer sheet and hearing ability evaluation system used therefor.
Lacerda On the emergence of early linguistic functions: A biological and interactional perspective
Gorman A framework for speechreading acquisition tools
CN111739527B (en) Speech recognition method, electronic device, and computer-readable storage medium
CN111026837B (en) Learning content searching method and family education equipment
CN108984229B (en) Application program starting control method and family education equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant