CN110782891A

CN110782891A - Audio processing method and device, computing equipment and storage medium

Info

Publication number: CN110782891A
Application number: CN201910958253.2A
Authority: CN
Inventors: 张健; 马颖江; 张轶
Original assignee: Gree Electric Appliances Inc of Zhuhai
Current assignee: Gree Electric Appliances Inc of Zhuhai
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2020-02-11
Anticipated expiration: 2039-10-10
Also published as: CN110782891B

Abstract

The application discloses an audio processing method, an audio processing device, computing equipment and a storage medium, which are used for improving the voice recognition awakening rate of intelligent equipment to user voice in various environments and improving the use experience of users. The method comprises the following steps: after audio information of user voice information comprising preset keyword sentences is obtained, voice recognition coefficients under the current environment are determined according to the obtained audio information, whether the determined voice recognition coefficients and the preset voice recognition coefficients meet preset conditions or not is judged, and when the preset conditions are met, corresponding voice control is carried out according to the preset keyword sentences, so that a user can conveniently control corresponding intelligent equipment through voice.

Description

Audio processing method and device, computing equipment and storage medium

Technical Field

The present application relates to the field of voice control technologies, and in particular, to an audio processing method and apparatus, a computing device, and a storage medium.

Background

The voice intelligent control equipment can receive the voice of the user, analyze the voice of the user to obtain a voice command, and then execute a corresponding function according to the voice command. With the increasing maturity of artificial intelligence theory and technology, voice interaction has become a novel human-computer interaction mode widely used.

At present, various intelligent devices become indispensable things in people's life, for example, an intelligent voice sound box, an intelligent voice television, an intelligent voice air conditioner and the like, and a user can control the intelligent voice sound box, the intelligent voice television, the intelligent voice air conditioner and the like through voice. However, when the user performs voice control on the smart device, the user may be affected by various environmental factors, and in different environments, other noises may be mixed in the voice collected by the smart device or the distance between the user and the device to be controlled is different, which may reduce the voice recognition wake-up rate of the smart device for the voice of the user.

In summary, how to effectively improve the voice recognition wake-up rate of the intelligent device in different environments becomes a technical problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides an audio processing method and device, a computing device and a storage medium, which are used for improving the voice recognition awakening rate of intelligent equipment to user voice in different environments and improving user experience.

In a first aspect, an audio processing method is provided, the method comprising:

acquiring audio information, wherein the audio information comprises voice information containing preset keyword sentences;

determining a speech recognition coefficient according to the audio information, wherein the speech recognition coefficient is used for indicating the speech quality of the speech information in the environment for generating the audio information;

determining whether the voice recognition coefficient and a preset voice recognition coefficient meet a preset condition;

and if the preset condition is met, performing voice control according to the preset keyword sentence.

In one possible design, determining speech recognition coefficients from the audio information includes:

determining a coefficient determination mode for determining the preset voice recognition coefficient;

determining audio characteristic data corresponding to the audio information, wherein the audio characteristic data comprise voice characteristic data corresponding to the preset keyword sentence, and the audio characteristic data comprise at least one preset audio characteristic and a characteristic value corresponding to each preset audio characteristic;

and determining the voice recognition coefficient according to the audio characteristic data and the coefficient determination mode.

In one possible design, after determining whether the speech recognition coefficient and a preset speech recognition coefficient satisfy a preset condition, the method further includes:

if the preset condition is not met, outputting prompt information, wherein the prompt information is used for prompting a user whether to enter voice control;

and obtaining user feedback information aiming at the prompt information, and confirming whether voice control is carried out by the preset keyword sentence or not based on the user feedback information.

In one possible design, before determining whether the speech recognition coefficients and preset speech recognition coefficients satisfy a preset condition, the method further includes:

obtaining environmental characteristic information of the environment;

and determining a voice recognition coefficient matched with the environmental characteristic information from a plurality of stored voice recognition coefficients to be called as the preset voice recognition coefficient.

In one possible design, the method further includes:

determining that a coefficient adjustment triggering condition is satisfied;

determining a target voice recognition coefficient to be adjusted and corresponding coefficient adjustment information according to the coefficient adjustment triggering condition;

adjusting the target voice recognition coefficient according to the coefficient adjustment information;

and updating the target voice recognition coefficient by the adjusted target voice recognition coefficient.

In one possible design, determining that the coefficient adjustment trigger condition is satisfied includes:

acquiring the awakening duration of each awakening process for awakening control by the target voice recognition coefficient in a first preset time period; wherein, the primary awakening process is a process of carrying out corresponding voice control according to the voice of the user;

determining a first number of the awakening processes with awakening duration longer than a preset awakening duration for all the awakening processes in the first preset time period;

and if the first quantity meets a first constraint condition, determining that the coefficient adjustment triggering condition is met.

obtaining the times of control voice input by a user in each awakening process of awakening control by the target voice recognition coefficient within a second preset time period;

determining a second number of the awakening processes with the times of the control voice input by the user being greater than the preset times for all the awakening processes in the second preset time period;

and if the second quantity meets a second constraint condition, determining that the coefficient adjustment triggering condition is met.

In a second aspect, an audio processing apparatus is provided, the apparatus comprising:

the system comprises an obtaining module, a processing module and a processing module, wherein the obtaining module is used for obtaining audio information, and the audio information comprises voice information containing preset keyword sentences;

a first determining module, configured to determine a speech recognition coefficient according to the audio information, where the speech recognition coefficient is used to indicate speech quality of the speech information in an environment where the audio information is generated;

the second determining module is used for determining whether the voice recognition coefficient and a preset voice recognition coefficient meet a preset condition or not;

and the control module is used for carrying out voice control according to the preset keyword sentence if the preset condition is met.

In one possible design, the first determining module is configured to:

In one possible design, the apparatus further includes a prompt module to:

after the second determining module determines whether the voice recognition coefficient and a preset voice recognition coefficient meet a preset condition, if the preset condition is not met, outputting prompt information, wherein the prompt information is used for prompting a user whether to enter voice control;

and the control module is used for obtaining user feedback information aiming at the prompt information and confirming whether voice control is carried out by the preset keyword sentence or not based on the user feedback information.

In one possible design, the apparatus further includes a third determining module configured to:

obtaining environmental feature information of the environment before the second determining module determines whether the voice recognition coefficient and a preset voice recognition coefficient meet a preset condition;

the first determining module is configured to determine, from the stored multiple speech recognition coefficients to be called, a speech recognition coefficient matched with the environment feature information as the preset speech recognition coefficient.

In one possible design, the apparatus further includes an adjustment module to:

determining that a coefficient adjustment triggering condition is satisfied;

In one possible design, the adjustment module is configured to:

In a third aspect, a computing device is provided, the computing device comprising:

a memory for storing program instructions;

a processor for calling the program instructions stored in the memory and executing the steps included in any of the methods of the first aspect according to the obtained program instructions.

In a fourth aspect, there is provided a storage medium storing computer-executable instructions for causing a computer to perform the steps included in any of the methods of the first aspect.

In a fifth aspect, a computer program product containing instructions is provided, which when run on a computer causes the computer to perform the audio processing method described in the various possible implementations described above.

In the embodiment of the application, after audio information of user voice information including a preset keyword sentence is obtained, a voice recognition coefficient is determined according to the obtained audio information, whether the determined voice recognition coefficient and the preset voice recognition coefficient meet preset conditions or not is judged, when the determined voice recognition coefficient and the preset voice recognition coefficient meet the preset conditions, corresponding voice control is performed according to the preset keyword sentence, the voice recognition coefficient is used for indicating the voice quality of the user voice information in an environment where the audio information is generated, and the preset voice recognition coefficient is a judgment basis for obtaining the voice recognition coefficient in advance and performing targeted comparison. That is to say, when the user executes a corresponding function through the voice control device, the device can obtain the voice recognition coefficient corresponding to the current environment for different environments, and then judge whether to respond to the currently collected user control voice through the comparison result of the voice recognition coefficient and the corresponding preset voice recognition coefficient for different environments.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application;

fig. 2 is a schematic diagram of another application scenario provided in the embodiment of the present application;

fig. 3 is a schematic flowchart of an audio processing method according to an embodiment of the present application;

fig. 4a is a block diagram of an audio processing apparatus according to an embodiment of the present application;

fig. 4b is a block diagram of another structure of an audio processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

The terms "first" and "second" in the description and claims of the present application and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the term "comprises" and any variations thereof, which are intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. The "plurality" in the present application may mean at least two, for example, two, three or more, and the embodiments of the present application are not limited.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document generally indicates that the preceding and following related objects are in an "or" relationship unless otherwise specified.

In order to facilitate understanding of the technical solutions provided in the embodiments of the present application, some brief descriptions are provided below for application scenarios used in the technical solutions provided in the embodiments of the present application, and it should be noted that the application scenarios described below are only used for illustrating the embodiments of the present invention and are not limited. In specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

Referring to fig. 1, fig. 1 is an application scenario to which the technical solution of the embodiment of the present application can be applied, and the application scenario includes a plurality of users (e.g., user 1, user 2, and user 3) and a plurality of intelligent devices (e.g., a treadmill and a washing machine). Due to the fact that weather is hot, the user 1 wants to adjust the room temperature, speaks a control voice of the air conditioner that the air conditioner is turned on to 26 degrees, and when the user 1 sends a voice control instruction to the air conditioner and other voices and noises exist in the environment, the air conditioner collects the control voice of the user 1 and all other audio information, and in this case, the other audio information can be understood as noise interfering with the control voice of the user for the air conditioner except the control voice of the user 1. For example, the sound generated by the running machine when the user 2 uses the running machine is referred to as noise 1, the sound of the running user 2 saying "good tired" is referred to as noise 2, the sound of the kids saying "want to watch animation" is referred to as noise 3, and the sound generated by the washing machine in operation at this time is referred to as noise 4. When the air conditioner collects a keyword sentence containing 'air conditioner is turned on to 26 degrees' spoken by the user 1, a voice recognition coefficient under the current environment is determined according to all collected audio information, if the voice recognition coefficient exceeds a certain numerical value, it is indicated that the voice control information of the user 1 is received by the air conditioner accurately, and then the air conditioner adjusts the temperature to 26 degrees according to the voice control information sent by the user 1.

Referring to fig. 2 again, fig. 2 is another application scenario to which the technical solution of the embodiment of the present application can be applied, and is different from the application scenario shown in fig. 1, when the user 1 sends out the voice control information for controlling the air conditioner, many other audio information still exist in the surrounding environment, and in fig. 2, only the user 1 sends out the voice control information for controlling the air conditioner, in this case, the environment is quite quiet, and the air conditioner can quickly respond to the voice control of the user.

The technical scheme provided by the embodiment of the application is described in the following with the accompanying drawings of the specification.

Referring to fig. 3, fig. 3 is a flowchart illustrating an audio processing method according to an embodiment of the present application, where the audio processing method may be executed by the air conditioner in the application scenario illustrated in fig. 1 or fig. 2. In a specific implementation process, the method may also be executed by some other smart devices, for example, a mobile phone, a tablet, a computer, a smart band, or other smart home devices (e.g., a refrigerator, a smart fan, a humidifier) that can be controlled by voice, and the flow of the audio processing method in fig. 3 is described as follows.

Step 301: audio information is obtained.

The audio information comprises user voice information containing preset keyword sentences, and the preset keyword sentences refer to voice contents of voice control performed by the user aiming at different intelligent devices, so that the intelligent devices can execute corresponding functions when the accurate preset keyword sentences sent by the user are detected. The audio information includes, in addition to the voice information output by the user, all other voice information in the current environment, for example, the voice information may be voice information of other people except the user who utters the preset keyword sentence, or voice information of an animal, or may be some device audio information emitted by various devices during operation, or may also be environmental audio information such as rain sound, car whistling sound, and the like, and the embodiments of the present application are not limited.

Step 302: and determining a voice recognition coefficient according to the obtained audio information.

The voice recognition coefficient refers to the voice quality of the voice information output by the user in the environment of generating the audio information, and the voice information output by the user contains preset keyword sentences, that is, the voice recognition coefficient can judge the voice quality of the control voice information sent out when the user performs voice control on the intelligent device. For example, when the voice information output by the user is interfered more, the voice quality of the current voice information is considered to be poor, or when the user is in a relatively quiet environment, the noise interfering with the voice information of the user is less, and the voice quality of the current voice information of the user is considered to be high.

In a possible implementation manner, after the intelligent device obtains the audio information, it may determine audio feature data corresponding to the audio information, where the audio feature data include speech feature data output by the user and including a preset keyword sentence, and noise data in the environment, and all audio information data except the speech data of the speech information including the keyword sentence spoken by the user may be understood as noise data. The audio features may be, for example, tone color, pitch, loudness, duration, frequency, user's speed, etc., and may be used to determine the sound features of the audio quality, including the speech features of the user-controlled speech. The preset audio features refer to audio features required for determining the speech recognition coefficient, for example, when the speech recognition coefficient is calculated, all the audio features are acquired, but only three audio features of timbre, frequency and loudness are actually required to be used, at this time, the timbre, loudness and frequency can be determined as the preset speech features, and after the preset audio features are determined, the device also needs to determine feature values corresponding to the preset audio features so as to be called when the speech recognition coefficient is determined.

Determining the speech recognition coefficient further requires using a coefficient determination method for a preset speech recognition coefficient, for example, what specific calculation formula is used to perform specific calculation in combination with the aforementioned audio feature data, where the preset speech recognition coefficient refers to a preset speech recognition coefficient, and is used to compare with speech recognition coefficients obtained under different environments, and when dynamically determining speech recognition coefficients under different environments, the calculation method (i.e., the aforementioned coefficient determination method) that is the same as that used to calculate the preset speech recognition coefficient may be used to perform calculation, so that it may be ensured that the calculation basis and standard of the speech recognition coefficient obtained by real-time calculation for comparison and the preset speech recognition coefficient are the same, so as to ensure the matching of comparison between the two, improve the accuracy of comparison, and improve the accuracy of wake-up control.

In the embodiment of the present application, the speech recognition coefficient in the current environment can be calculated through the determined coefficient determination method and the audio feature data that needs to be used, so that the accuracy of the comparison result when the comparison result is compared with the preset speech recognition coefficient in step 303 can be ensured.

Step 303: and judging whether the determined voice recognition coefficient and the preset voice recognition coefficient meet preset conditions.

The preset voice recognition coefficient is a preset voice recognition coefficient and is used for comparing the preset voice recognition coefficient with voice recognition coefficients obtained under different environments, and the preset condition is a judgment condition for judging the voice recognition coefficient and the preset recognition coefficient. The preset condition may be understood as, for example, that the speech recognition coefficient is greater than or equal to the preset speech recognition coefficient, or the speech recognition coefficient is greater than or equal to a specific percentage of the preset speech recognition coefficient, or the speech recognition coefficient reaches the specific percentage of the preset speech recognition coefficient, or a difference between the speech recognition coefficient and the preset speech recognition coefficient is less than a predetermined value, and the like, and the embodiments of the present application are not limited.

In a possible implementation manner, before determining whether the determined speech recognition coefficient and the preset speech recognition coefficient satisfy the preset condition, one of the plurality of pre-stored speech recognition coefficients that is adapted under the current environment may be selected as the preset speech recognition coefficient in step 303. That is to say, a plurality of speech recognition coefficients may be stored in the smart device in advance, for example, these speech recognition coefficients may be referred to as speech recognition coefficients to be called, when speech control is required according to the speech recognition coefficients, environment feature information of the current environment may be determined first, the environment feature information may be, for example, some description information that can determine the current environment situation, such as environment sound information, environment image information, and the like, after the environment feature information is determined, the speech recognition coefficients to be called that are adapted to the current environment are determined according to the environment feature information, and the determined speech recognition coefficients to be called are determined as the preset speech recognition coefficients in step 303.

In a specific embodiment, for example, when a user wakes up a mobile phone on a street with a large amount of traffic, an image capturing device of the mobile phone first captures an ambient image, and the ambient image may show that there are many people other than the user and there may be many vehicles around the mobile phone, so that it may be considered that an environment where the user is currently located is loud and interference noise affecting the user to wake up the mobile phone is loud. Then, environmental sound information including environmental sounds in the surrounding environment such as a whistle of a vehicle, sounds of musical instruments, and the like, voice information of other persons than the user, and the like is determined from the obtained audio information. When the obtained environmental sound information is more, it can also be determined that the interference noise is more in the current environmental influence of the user, and then the mobile phone can determine the voice recognition coefficient in the environment according to the obtained environmental characteristic information.

In a specific implementation manner, for example, the mobile phone is awakened by a user, where the stored speech recognition coefficients to be called of the mobile phone include 0.6, 0.7, and 0.8, and these speech recognition coefficients can also be understood as thresholds, if the current user is in a very noisy environment, the threshold 0.6 may be called at this time, or the user is currently at home and has a rest in a quiet room, the threshold 0.8 may be called, because the control speech collected by the mobile phone is not greatly interfered in the quiet environment, the awakening probability of the user awakening the mobile phone is higher, and accordingly, a larger threshold may be needed, so that false awakening can be effectively avoided.

In the embodiment of the application, the intelligent device can determine the preset voice recognition coefficient in a targeted manner through different environments when the user performs voice control, so that the flexibility of scheduling the preset voice recognition coefficient is improved, the intelligent device can call the preset voice recognition coefficient adaptive to the current environment to perform dynamic comparison in different environments, and a higher voice recognition awakening rate can be ensured.

In a possible implementation manner, when the intelligent device determines that the voice recognition coefficient determined in the current environment and the preset voice recognition coefficient do not satisfy the preset condition, the intelligent device may output a prompt message to a user, where the prompt message is used to prompt the user whether to continue to enable the intelligent device to perform voice control, and the user may perform corresponding operation to control the intelligent device to perform voice control after obtaining the prompt message output by the intelligent device. The prompt information may be, for example, a text prompt information displayed on a display screen of the device, may be a prompt information that the device asks the user whether to perform voice control in a voice response manner, may also be a prompt information that a specific indicator light is flashed, or some other prompt information that can prompt the user, and the like, which is not limited in the embodiment of the present application. And the user performs corresponding operation after obtaining the prompt information, so that the intelligent device obtains feedback information, and further determines whether to perform voice control according to preset keyword sentences according to the feedback information.

Taking the application scenario shown in fig. 1 as an example, in the current environment, besides the user 1 outputs the control voice for controlling the air conditioner, there are many other interference noises, because the environment is noisy, and the voice quality of the user 1 may be poor, so that the air conditioner determines that the condition for executing the temperature adjustment function is not satisfied currently, and at this time, it needs to output the prompt message. For example, by means of voice response, it is determined whether the user needs to adjust the air conditioner to 26 degrees, the air conditioner broadcasts "whether the temperature needs to be adjusted to 26 degrees", and the user replies "need", so that the air conditioner executes the corresponding function. In particular embodiments, the voice input by the user may also be "yes", or "yes", etc., to determine that the user needs voice-controlled feedback.

In the embodiment of the application, when the preset condition is not met, the device can give prompt information in a voice response mode, for example, to prompt a user whether voice control is needed, so that not only is the accuracy of voice control of the intelligent device ensured, but also the intelligence of the intelligent device is higher.

In this embodiment of the present application, for the pre-stored speech recognition coefficient, adaptive adjustment may be performed, for example, the pre-stored speech recognition coefficient that needs to be adjusted is referred to as a target speech recognition coefficient, where the target speech recognition coefficient may be the preset speech recognition coefficient in step 303, or another speech recognition coefficient that is pre-stored in the smart device, and this embodiment of the present application is not limited. Before adjusting the target speech recognition coefficient, the intelligent device first determines whether a coefficient adjustment triggering condition is satisfied, where the coefficient adjustment triggering condition is a condition for triggering the intelligent device to adjust the target speech recognition coefficient, and therefore, when the coefficient adjustment triggering condition is satisfied, it indicates that a certain target speech recognition coefficient in the intelligent device needs to be adjusted.

In a specific implementation process, when a coefficient adjustment triggering condition is met, the intelligent device can determine a target speech recognition coefficient to be adjusted and coefficient adjustment information corresponding to the target speech recognition coefficient according to the condition, so that the intelligent device can correspondingly adjust the target speech recognition coefficient according to the coefficient adjustment information and update the original target speech recognition coefficient by using the adjusted target speech recognition coefficient. In a specific implementation process, a user may also perform corresponding manual adjustment on the speech recognition coefficient of the smart device according to the needs of the user, and the embodiment of the present application is not limited.

In the embodiment of the application, when a certain condition is met, the intelligent device can perform self-adaptive adjustment on the pre-stored voice recognition coefficient, flexibly adjust the voice recognition coefficient used as a judgment reference, ensure the accuracy and reasonableness of the coefficient and improve the voice recognition awakening rate of the intelligent device to the voice of a user to a certain extent.

In particular implementations, there are at least two ways of determining that the coefficient adjustment trigger condition is satisfied.

First determining method

The intelligent device may record the wake-up duration used by the target speech recognition coefficient when the user performs wake-up control each time, then determine a period (for example, referred to as a first preset time period) to detect the condition of the device wake-up duration in the period, and since the user does not need to frequently adjust the target speech recognition coefficient, the period may be set to 15 days or 1 month, and so on, and may understand the process of performing corresponding speech control by the intelligent device according to the speech of the user as a wake-up process, and may determine the number (for example, referred to as a first number) of wake-up processes whose wake-up duration is greater than the preset wake-up duration for all wake-up processes of the intelligent device in the first preset time period, where the preset wake-up duration may be the wake-up duration defined by the user itself or may be the wake-up duration automatically set by the, the embodiments of the present application are not limited. When the first number determined by the smart device satisfies a certain condition, it may be determined that the coefficient adjustment trigger condition is satisfied, and the certain condition may be referred to as a first constraint condition, for example. The first constraint condition may be that the first number reaches a predetermined number of times, or a ratio of the first number in the total number of times of the wake-up process reaches a predetermined ratio, or some other implementable condition, and the embodiment of the present application is not limited.

In one possible embodiment, taking the handset as an example of waking up, for example, the current first preset time period is 15 days, when the user wakes up the mobile phone within the 15 days, the wake-up times with the preset voice recognition coefficient of 0.7 is used as 100 times, the preset wake-up time length is determined as 2 seconds, then the first constraint condition is determined to be met when the number of times that the user wakes up the mobile phone within the 15 days is more than 80 times when the wake-up time length of the user is more than 2 seconds, at this time, the 80 times are the preset number of times corresponding to the current first preset time period, if the number of times exceeding 2 seconds is determined to be 85 times, the number of times can be understood as a first number, and the first number exceeds the predetermined number, which indicates that the speech recognition coefficient 0.7 needs to be adjusted, and it should be noted that for different first preset time periods, different predetermined numbers need to be set.

In another possible embodiment, continuing with the example of waking up the handset, for example for the current first preset time period or 15 days, when the user wakes up the mobile phone within the 15 days, the waking times with the preset voice recognition coefficient of 0.7 are also 100 times, the preset waking time duration is also 2 seconds, the number of times that the user wakes up for more than 2 seconds in the process of waking up the mobile phone within 15 days is 90, and at this time, 90 times, which is understood to be the current first number, the proportion of which to the total number of wake-up procedures is 90%, it can be predetermined that the first constraint is satisfied when the proportion reaches 80%, and then it is considered that the coefficient speech recognition coefficient needs to be adjusted by 0.7, it should be noted that, when the first preset time period changes, the preset proportion may not be adjusted specifically, or may be adjusted correspondingly as needed, which is not limited in the embodiment of the present application.

In the embodiment of the application, whether the voice recognition coefficient needs to be adjusted is determined through the awakening time length in the process of carrying out voice control on the intelligent device by the user, so that the high efficiency of the voice recognition awakening rate of the intelligent device to the voice of the user is ensured.

Second determination method

The intelligent device may record the number of times that the user inputs the control speech in each wake-up process when the target speech recognition coefficient performs wake-up control on the intelligent device, then determine a cycle (for example, referred to as a second preset time period) to detect a situation of the number of times that the user inputs the control speech in each wake-up process in the cycle, and set the second preset time period for adjusting the speech recognition coefficient to 15 days or 1 month, and so on. Because the used control voices are generally different in different voice control scenes, the control voices include keyword sentences corresponding to the scenes subjected to voice control each time.

In a specific implementation manner, for all the wake-up processes in the second preset time period, the intelligent device determines that the number of times the user inputs the control voice is greater than a preset number of times (for example, referred to as a second number), where the preset number of times may be a wake-up number defined by the user himself or a wake-up number automatically set by the intelligent device through historical data, and the embodiment of the present application is also not limited. When the determined second number meets a specific condition, it may be determined that the coefficient adjustment triggering condition is met, for example, the specific condition may be referred to as a second constraint condition, where the second constraint condition may be that the second number reaches a predetermined number of times, may also be that a proportion of the second number in the total number of times of the wake-up process reaches a predetermined proportion, or may be some other implementable conditions, which may be understood by referring to an example in the first determination method, and will not be described again here.

In the embodiment of the application, whether the voice recognition coefficient needs to be adjusted is determined by the number of times that the user needs to input the control voice in the process of carrying out voice control on the intelligent device, so that the timeliness of the voice recognition awakening rate of the intelligent device for the user voice is ensured.

Step 304: and when the determined voice recognition coefficient and the preset voice recognition coefficient meet the preset condition, performing voice control according to the preset keyword sentence.

For example, taking the application scenario shown in fig. 1 as an example, after the user 1 sends a control voice of "air conditioner is turned on to 26 degrees" to the air conditioner, the air conditioner executes the function in the control voice to adjust the temperature to 26 degrees.

In the embodiment of the application, the voice recognition coefficients in different environments can be dynamically acquired, then the voice recognition coefficients are compared with the preset voice recognition coefficients, the intelligent device judges whether the control voice of the current user corresponds to the comparison result, and then corresponding functions are executed.

Based on the same inventive concept, the embodiment of the present application provides an audio processing apparatus, which can implement the corresponding function of the foregoing audio processing method. The audio processing means may be a hardware structure, a software module, or a hardware structure plus a software module. The audio processing device can be realized by a chip system, and the chip system can be formed by a chip and can also comprise the chip and other discrete devices. Referring to fig. 4a, the audio processing apparatus includes an obtaining module 401, a first determining module 402, a second determining module 403, and a control module 404. Wherein:

an obtaining module 401, configured to obtain audio information, where the audio information includes voice information including a preset keyword sentence;

a first determining module 402, configured to determine a speech recognition coefficient according to the audio information, where the speech recognition coefficient is used to indicate speech quality of the speech information in an environment where the audio information is generated;

a second determining module 403, configured to determine whether the speech recognition coefficient and the preset speech recognition coefficient satisfy a preset condition;

and the control module 404 is configured to perform voice control according to a preset keyword sentence if a preset condition is met.

In one possible implementation, the first determining module 402 is configured to:

determining a coefficient determination mode for determining a preset voice recognition coefficient;

determining audio characteristic data corresponding to audio information, wherein the audio characteristic data comprise voice characteristic data corresponding to preset key words and sentences, and the audio characteristic data comprise at least one preset audio characteristic and characteristic values corresponding to the preset audio characteristics;

and determining a voice recognition coefficient according to the audio characteristic data and the coefficient determination mode.

In a possible implementation manner, please refer to fig. 4b, the audio processing apparatus in the embodiment of the present application further includes a prompt module 405, configured to:

after the second determining module 403 determines whether the voice recognition coefficient and the preset voice recognition coefficient satisfy the preset condition, if the preset condition is not satisfied, outputting a prompt message, where the prompt message is used to prompt a user whether to enter voice control;

and the control module 404 is configured to obtain user feedback information for the prompt information, and determine whether to perform voice control with a preset keyword sentence based on the user feedback information.

In a possible implementation manner, please refer to fig. 4b, the audio processing apparatus in the embodiment of the present application further includes a third determining module 406, configured to:

before the second determining module 403 determines whether the speech recognition coefficient and the preset speech recognition coefficient satisfy the preset condition, obtaining environmental feature information of the environment;

a first determining module 402, configured to determine, from the stored multiple speech recognition coefficients to be invoked, a speech recognition coefficient that matches the environment feature information as a preset speech recognition coefficient.

In a possible implementation manner, please refer to fig. 4b, the audio processing apparatus in the embodiment of the present application further includes an adjusting module 407 for:

determining that a coefficient adjustment triggering condition is satisfied;

adjusting a target voice recognition coefficient according to the coefficient adjustment information;

In a possible implementation, the adjusting module 407 is configured to:

acquiring the awakening duration of each awakening process for awakening control by using the target voice recognition coefficient in a first preset time period; wherein, the primary awakening process is a process of carrying out corresponding voice control according to the voice of the user;

determining a first number of the awakening processes with awakening duration longer than the preset awakening duration for all the awakening processes in a first preset time period;

and if the first quantity meets the first constraint condition, determining that a coefficient adjustment trigger condition is met.

In a possible implementation, the adjusting module 407 is configured to:

acquiring the times of control voice input by a user in each awakening process of awakening control by using the target voice recognition coefficient within a second preset time period;

determining a second number of the awakening processes with the times of the control voice input by the user being greater than the preset times for all the awakening processes in a second preset time period;

and if the second quantity meets the second constraint condition, determining that the coefficient adjustment triggering condition is met.

All relevant contents of the steps related to the embodiment of the audio processing method can be cited to the functional description of the functional module corresponding to the audio processing apparatus in the embodiment of the present application, and are not described herein again.

The division of the modules in the embodiments of the present application is schematic, and only one logical function division is provided, and in actual implementation, there may be another division manner, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, may also exist alone physically, or may also be integrated in one module by two or more modules. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Based on the same inventive concept, the present application provides a computing device, which may be, for example, the smart device mentioned in the foregoing embodiments, such as a mobile phone, a tablet, a computer, a smart band, or other smart home devices (e.g., an air conditioner, a smart speaker, a refrigerator, a smart fan, a humidifier) that can be controlled by voice, and the like. Referring to fig. 5, the computing device includes at least one processor 501 and a memory 502 connected to the at least one processor, in this embodiment, a specific connection medium between the processor 501 and the memory 502 is not limited in this application, in fig. 5, the processor 501 and the memory 502 are connected by a bus 500 as an example, the bus 500 is represented by a thick line in fig. 5, and a connection manner between other components is only schematically illustrated and is not limited. The bus 500 may be divided into an address bus, a data bus, a control bus, etc., and is shown with only one thick line in fig. 5 for ease of illustration, but does not represent only one bus or one type of bus.

The computing device in the embodiment of the present application may further include a communication interface 503, where the communication interface 503 is, for example, a network interface, and the computing device may receive data or transmit data through the communication interface 503.

In the embodiment of the present application, the memory 502 stores instructions executable by the at least one processor 501, and the at least one processor 501 may execute the steps included in the foregoing audio processing method by executing the instructions stored in the memory 502.

The processor 501 is a control center of the computing device, and may connect various parts of the entire device by using various interfaces and lines, and perform various functions and process data of the computing device by operating or executing instructions stored in the memory 502 and calling data stored in the memory 502, thereby performing overall monitoring on the computing device. Optionally, the processor 501 may include one or more processing units, and the processor 501 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, application programs, and the like, and the modem processor mainly handles wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 501. In some embodiments, processor 501 and memory 502 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 501 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the audio processing method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.

Memory 502, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 502 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read-Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 502 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 502 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

By programming the processor 501, the code corresponding to the audio processing method described in the foregoing embodiment may be solidified in the chip, so that the chip can execute the steps of the audio processing method when running, and how to program the processor 501 is a technique known by those skilled in the art, and is not described herein again.

Based on the same inventive concept, the present application also provides a storage medium storing computer instructions, which when executed on a computer, cause the computer to perform the steps of the audio processing method as described above.

In some possible embodiments, the various aspects of the audio processing method provided in the present application may also be implemented in the form of a program product, which includes program code for causing a smart device to perform the steps in the audio processing method according to various exemplary embodiments of the present application described above in this specification, when the program product is run on a master device.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of audio processing, the method comprising:

2. The method of claim 1, wherein determining speech recognition coefficients from the audio information comprises:

3. The method of claim 1 or 2, wherein after determining whether the speech recognition coefficient and a preset speech recognition coefficient satisfy a preset condition, the method further comprises:

4. The method of claim 1 or 2, wherein prior to determining whether the speech recognition coefficients and preset speech recognition coefficients satisfy a preset condition, the method further comprises:

obtaining environmental characteristic information of the environment;

5. The method of claim 1 or 2, wherein the method further comprises:

determining that a coefficient adjustment triggering condition is satisfied;

6. The method of claim 5, wherein determining that a coefficient adjustment trigger condition is satisfied comprises:

7. The method of claim 5, wherein determining that a coefficient adjustment trigger condition is satisfied comprises:

8. An audio processing apparatus, characterized in that the apparatus comprises:

9. A computing device, wherein the computing device comprises:

a memory for storing program instructions;

a processor for calling program instructions stored in said memory and for executing the steps comprised in the method of any one of claims 1 to 7 in accordance with the obtained program instructions.

10. A storage medium storing computer-executable instructions for causing a computer to perform the steps comprising the method of any one of claims 1-7.