CN115691497B

CN115691497B - Voice control method, device, equipment and medium

Info

Publication number: CN115691497B
Application number: CN202310005289.5A
Authority: CN
Inventors: 陈建泽
Original assignee: Shenzhen Dajing Photoelectric Technology Co ltd
Current assignee: Shenzhen Dajing Photoelectric Technology Co ltd
Priority date: 2023-01-04
Filing date: 2023-01-04
Publication date: 2023-03-31
Anticipated expiration: 2043-01-04
Also published as: CN115691497A

Abstract

The present application relates to the field of voice control technologies, and in particular, to a voice control method, apparatus, device, and medium. The method comprises the following steps: acquiring voice content; extracting key words in the voice content; acquiring the voice state of the keyword, wherein the voice state comprises: tones and spaces in the speech content; calculating according to the voice state of the keyword and the historical voice state to obtain voice similarity, and if the voice similarity is larger than a preset voice similarity threshold value, determining the keyword as an effective word, wherein the effective word is a vocabulary capable of being played in an interrupted or voice awakened mode; acquiring the current state of a product, wherein the current state at least comprises the following steps: a playing state and a dormant state; if the current state is the playing state, playing interruption is carried out on the product according to the valid words; and if the current state is the dormant state, performing voice awakening on the product according to the effective words. The application has the following effects: the accuracy rate of controlling the product to carry out voice awakening or playing interruption is improved.

Description

Voice control method, device, equipment and medium

Technical Field

The present application relates to the field of voice control technologies, and in particular, to a voice control method, apparatus, device, and medium.

Background

Along with the rapid development of the internet technology, intelligent voice products are more and more popular, and make great contribution to the improvement of the living standard of people.

In the related technology, the intelligent voice product is controlled by using the keywords for extracting the voice content of the user in a dormant state or a playing state, but when the user chats, the keywords appear in the voice content, but the intelligent voice product is controlled unintentionally, the intelligent voice product can be correspondingly controlled indiscriminately according to the extracted keywords, and the control of the intelligent voice product is inaccurate.

Therefore, how to accurately control the intelligent voice product is a technical problem to be solved urgently by those skilled in the art.

Disclosure of Invention

In order to improve the accuracy of controlling a product to perform voice wakeup or play interruption, the application provides a voice control method, device, equipment and medium.

In a first aspect, the present application provides a voice control method, which adopts the following technical scheme:

a method of voice control, comprising:

acquiring voice content;

extracting key words in the voice content;

acquiring a voice state of the keyword, wherein the voice state comprises: tones and spaces in the speech content;

calculating according to the voice state of the keyword and a historical voice state to obtain voice similarity, and if the voice similarity is larger than a preset voice similarity threshold value, determining the keyword as an effective word, wherein the effective word is a vocabulary capable of being played and interrupted or voice awakened;

acquiring the current state of a product, wherein the current state at least comprises: a playing state and a dormant state;

if the current state is a playing state, playing interruption is carried out on the product according to the valid word;

and if the current state is a dormant state, performing voice awakening on the product according to the effective word.

By adopting the technical scheme, the voice state of the keywords in the extracted voice content is obtained, the voice state and the historical voice state are calculated, if the voice similarity is larger than the preset voice similarity threshold value, the keywords are determined to be valid words, when the current state of the product is the playing state, the product is played at the terminal according to the valid words, when the current state of the product is the dormant state, voice awakening is carried out according to the valid words, whether the extracted keywords are valid words or not is judged, when the keyword is the valid word, the product is played at the terminal or voice awakened according to the valid words, the situation that the product is indiscriminately controlled correspondingly according to the extracted keywords is avoided, and the accuracy of controlling the product to carry out voice awakening or playing interruption is improved.

The present application may be further configured in a preferred example to: the extracting of the keywords in the voice content comprises:

performing semantic analysis on the voice content to obtain a plurality of initial vocabularies;

judging whether the vocabularies identical to preset keywords exist in the plurality of initial vocabularies or not;

and if so, taking the vocabulary which is the same as the preset keyword in the plurality of initial vocabularies as the keyword.

By adopting the technical scheme, the voice content is subjected to semantic analysis to obtain a plurality of initial vocabularies, whether vocabularies which are the same as the preset keywords exist in the plurality of initial vocabularies is judged, if yes, the vocabularies which are the same as the preset keywords in the plurality of initial vocabularies are used as the keywords, the voice content is extracted, and the extracted plurality of initial vocabularies and the preset keywords are judged, so that the accuracy of extracting the keywords in the voice content can be improved.

The present application may be further configured in a preferred example to: the current state further includes: video call state, the voice content comprising: the user voice and the environmental noise further comprise, after the obtaining the current state of the product:

if the current state of the product is a video call state, performing sound denoising processing on the voice content according to noise content to obtain processed sound;

and playing the processed sound.

By adopting the technical scheme, when the current state of the product is the video call state, the noise elimination processing is carried out according to the acquired noise content and the user voice, the processed sound is played, the influence of noise in the video call process is avoided, and the user experience is improved.

The present application may be further configured in a preferred example to: the denoising processing is performed according to the noise content and the voice content to obtain a processed sound, and the denoising processing includes:

processing the voice content and the noise content by using an AP (access point) to obtain respective corresponding signal information;

superposing the signal information corresponding to the voice content and the signal information corresponding to the noise content to obtain processed signal information;

and converting the processed signal information to obtain processed sound.

By adopting the technical scheme, the output signal information is obtained by performing AP processing on the voice content and the noise content, the signal information is subjected to superposition processing, the obtained signal information is converted to obtain the processed sound, and the signal information is subjected to superposition processing, so that the noise elimination accuracy is improved.

The present application may be further configured in a preferred example to: if yes, after the vocabulary which is the same as the preset keyword in the plurality of initial vocabularies is used as the keyword, the method further comprises the following steps:

acquiring a previous vocabulary and a next vocabulary of the keyword;

judging whether the previous vocabulary of the keyword is an invalid vocabulary which is the same as any vocabulary in a preset invalid vocabulary set;

judging whether the latter vocabulary of the keyword is an invalid vocabulary or not;

and when the former vocabulary of the keyword and the latter vocabulary of the keyword are not invalid vocabularies, determining that the keyword is an effective keyword, wherein the effective keyword is a vocabulary which can be used for acquiring a voice state.

By adopting the technical scheme, whether the previous vocabulary and the next vocabulary of the acquired keywords are invalid vocabularies is judged, when the previous vocabulary and the next vocabulary of the keywords are not invalid vocabularies, the keywords are determined to be valid keywords, when the previous vocabulary of the keywords is invalid vocabularies or the next vocabulary of the keywords is invalid vocabularies, the keywords are determined to be invalid keywords, and whether the extracted keywords are valid keywords is judged, so that the situation that a user does not perform voice awakening or play interruption on a product unintentionally in the conversation process can be avoided, and the accuracy of voice control is improved.

The present application may be further configured in a preferred example to: if the current state is the dormant state, after the product is voice-awakened according to the valid word, the method further comprises the following steps:

when the keyword is an invalid keyword, acquiring time corresponding to the voice content;

determining predicted working state time according to the time corresponding to the voice content and a preset time interval;

acquiring a first current time;

and if the first current time reaches the predicted working state time, generating a wake-up instruction, wherein the wake-up instruction is used for voice wake-up.

By adopting the technical scheme, the predicted working state time is determined according to the time corresponding to the acquired voice content and the preset time interval, when the acquired current time reaches the predicted working state time, the awakening instruction is generated for voice awakening, and the working state time is predicted, so that the user experience can be improved.

The application may be further configured in a preferred example to: if the current state is a dormant state, voice awakening is performed on the product according to the valid word, and the method further comprises the following steps:

acquiring voice wake-up time;

determining sleep time according to the voice wake-up time and a preset sleep time interval;

and acquiring second current time, generating a sleep instruction if the second current time reaches the sleep time, and reminding and controlling the product to sleep according to the sleep instruction.

By adopting the technical scheme, the sleep time is determined according to the acquired voice wake-up time and the preset sleep time interval, when the acquired current time reaches the sleep time, the sleep is generated and the intelligent voice product is controlled to sleep, so that the service life of the product in a long-time working state is avoided, and the service life of the product is prolonged.

In a second aspect, the present application provides a voice control apparatus, which adopts the following technical solutions:

a voice control device comprises a voice control unit,

a first acquisition module: for obtaining voice content;

an extraction module: extracting keywords in the voice content;

a second obtaining module: acquiring a voice state of the keyword, wherein the voice state comprises: tones and spaces in the speech content;

an effective word determination module: calculating according to the voice state of the keyword and a historical voice state to obtain voice similarity, and if the voice similarity is larger than a preset voice similarity threshold value, determining the keyword as an effective word, wherein the effective word is a vocabulary capable of being played and interrupted or voice awakened;

a third obtaining module: for obtaining a current state of a product, wherein the current state at least comprises: the system comprises a playing state and a dormant state, wherein when the current state is the playing state, a playing interruption module is executed, and when the current state is the dormant state, a voice awakening module is executed;

a play interruption module: the system is used for playing and interrupting the product according to the valid word;

the voice awakening module: and the voice awakening module is used for carrying out voice awakening on the product according to the valid word.

In a third aspect, the present application provides an electronic device, which adopts the following technical solutions:

at least one processor;

a memory;

at least one application, wherein the at least one application is stored in the memory and configured to be executed by the at least one processor, the at least one application configured to: the voice control method described above is performed.

In a fourth aspect, the present application provides a computer-readable storage medium, which adopts the following technical solutions:

a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to execute the voice control method described above.

In summary, the present application includes at least one of the following beneficial technical effects:

the method comprises the steps of obtaining the voice state of a keyword in extracted voice content, calculating the voice state and a historical voice state, determining the keyword to be an effective word if the voice similarity is larger than a preset voice similarity threshold, playing a terminal of a product according to the effective word when the current state of the product is the playing state, performing voice awakening according to the effective word when the current state of the product is the dormant state, and performing voice awakening according to the extracted keyword or not by judging whether the extracted keyword is the effective word or not and performing the playing terminal or voice awakening on the product according to the effective word when a keyword is the effective word, so that the condition that the product is indiscriminately controlled correspondingly according to the extracted keyword is avoided, and the accuracy rate of controlling the product to perform voice awakening or playing interruption is improved.

Drawings

Fig. 1 is a schematic flowchart of a voice control method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a voice control apparatus according to an embodiment of the present application, where the voice control apparatus includes: a first obtaining module 201, an extracting module 202, a second obtaining module 203, an effective word determining module 204, a third obtaining module 205, a playing interruption module 206, and a voice awakening module 207;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The present application is described in further detail below with reference to fig. 1 to 3.

The present embodiment is only for explaining the present application, and it is not limited to the present application, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present application.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship, unless otherwise specified.

The embodiments of the present application will be described in further detail with reference to the drawings.

In the related technology, the intelligent voice product is controlled by using the keywords for extracting the voice content of the user in the dormant state or the playing state, but when the user chats, the keywords appear in the voice content, but the intelligent voice product is controlled by accident, the intelligent voice product can be correspondingly controlled according to the extracted keywords indiscriminately, so that the control of the intelligent voice product is inaccurate.

In order to solve the technical problems, the application provides a voice control method, a device, equipment and a medium, by acquiring a voice state of a keyword in extracted voice content, calculating the voice state and a historical voice state, determining the keyword as an effective word if the voice similarity is greater than a preset voice similarity threshold value, playing a terminal on a product according to the effective word when the current state of the product is a playing state, performing voice awakening according to the effective word when the current state of the product is a dormant state, and performing playing terminal or voice awakening on the product according to the effective word by judging whether the extracted keyword is the effective word or not when the keyword is the effective word, so that the product is prevented from being correspondingly controlled according to the extracted keyword without difference, and the accuracy rate of controlling the product to perform voice awakening or playing interruption is improved.

The embodiment of the application provides a voice control method, which is executed by an electronic device, wherein the electronic device can be a server or a terminal device, the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server providing cloud computing service. The terminal device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like, but is not limited thereto, and the terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited thereto.

With reference to fig. 1, fig. 1 is a schematic flowchart of a voice control method according to an embodiment of the present application. As shown in fig. 1, the method includes step S101, step S102, step S103, step S104, step S105, step S106, and step S107, wherein:

step S101: and acquiring voice content.

The voice content is the user voice content and the environmental noise collected by the voice collecting tool, wherein the voice collecting tool can be a microphone.

Step S102: and extracting keywords in the voice content.

The method for extracting the keywords in the speech content may be a semantic analysis method, for example: the voice content collected by the voice collecting tool is as follows: "my family A1A2A3A4 works well", where A1, A2, A3, A4 each represent a word, and the vocabulary of the speech content is extracted by speech analysis, resulting in: "my home", "of", "A1A2A3A4", "very", "easy to use". The keywords can be vocabularies which are obtained through a semantic analysis method and correspond to a preset similarity threshold value, the preset keywords are stored in the electronic equipment in advance, after the vocabularies are obtained through the semantic analysis method, similarity calculation is carried out on each vocabulary in the vocabularies and the preset keywords to obtain a plurality of similarity values, if the similarity values are larger than the preset similarity threshold value, the vocabularies which correspond to the similarity values larger than the preset similarity threshold value are used as the keywords, and the vocabularies which are not larger than the preset similarity threshold value are not used as the keywords. The preset similarity threshold is not limited, and the user can set the similarity threshold in a user-defined mode according to actual requirements.

Step S103: acquiring the voice state of the keyword, wherein the voice state comprises: tones and spaces in speech content.

The electronic device is pre-stored with a historical voice state, and generally, the product always collects the voice state of the keywords in the voice content of the user during the operation process, and stores the voice state of the user to form the historical voice state.

Step S104: and calculating according to the voice state of the keyword and the historical voice state to obtain voice similarity, and if the voice similarity is greater than a preset voice similarity threshold value, determining the keyword as an effective word, wherein the effective word is a vocabulary capable of being played and interrupted or voice awakened.

After extracting the keywords of the current voice content, similarity calculation is performed according to the voice state of the keywords and the historical voice state, and whether the current keywords are valid words is determined according to the obtained similarity value, wherein the similarity calculation method is not limited in the embodiment of the application and can be any one of an Jacard similarity coefficient algorithm, a cosine similarity calculation method and a Pearson correlation coefficient algorithm.

For example: when the voice content is 'XXX of our house is well used', the keyword is 'XXX' but 'XXX' is not a valid word for voice awakening and playing interruption in the voice content, so that after the keyword is extracted, the voice state of the keyword is judged, when the similarity value is larger than a preset similarity threshold value, the keyword is determined to be the valid word, voice awakening and playing interruption can be carried out, and when the similarity value is not larger than the preset similarity threshold value, the keyword is determined not to be the valid word, and voice awakening and playing interruption cannot be carried out. The preset similarity threshold is not limited, and the user can set the similarity threshold in a user-defined mode according to actual conditions.

Step S105: acquiring the current state of a product, wherein the current state at least comprises the following steps: a playing state and a dormant state.

The current state of the product is the current running state of the product, and generally, when the running state of the product changes, the configuration file of the software corresponding to the product also changes, so that the current state of the product can be obtained by reading the configuration file of the software corresponding to the product.

Step S106: and if the current state is the playing state, playing and interrupting the product according to the valid words.

When the current state of the product is the playing state, in order to receive the voice instruction of the subsequent user, the playing of the product can be interrupted, that is, the playing is suspended. When the electronic equipment detects that the current state of the product is the playing state, a playing interruption instruction is generated according to the extracted valid words, and the playing interruption instruction is used for interrupting playing of the product.

Further, when the playing of the product is interrupted, the current time is recorded, the current time and the preset time interval are calculated to obtain the playing recovery time, and when the time reaches the playing recovery time, the product recovers the playing state.

Step S107: and if the current state is the dormant state, performing voice awakening on the product according to the effective words.

When the product is not operated for a long time, the product can automatically enter a dormant state, and the dormant state is an energy-saving state and aims to save electricity. Before voice interaction, the device needs to be awakened first and then enters a working state from a dormant state, so that the instruction of the user can be normally processed. When a device wakes up from a sleep state to enter an active state, it is called voice wake-up.

In the embodiment of the application, when the current state of the product is detected to be the dormant state, the product can be awakened by voice by using the valid word. And after the effective words are extracted, similarity calculation is carried out according to the effective words and preset effective words, and if the similarity calculation result is greater than a preset similarity threshold value, a voice awakening instruction is generated and used for carrying out voice awakening on the product.

In the embodiment of the application, by acquiring the voice state of the keyword in the extracted voice content and calculating the voice state and the historical voice state, if the voice similarity is greater than a preset voice similarity threshold value, the keyword is determined to be an effective word, when the current state of the product is in a playing state, the product is played at a terminal according to the effective word, when the current state of the product is in a dormant state, voice awakening is performed according to the effective word, whether the extracted keyword is the effective word is judged, and when the keyword is the effective word, the product is played at the terminal or voice awakened according to the effective word, so that the condition that the product is indiscriminately controlled correspondingly according to the extracted keyword is avoided, and the accuracy of controlling the product to perform voice awakening or playing interruption is improved.

A possible implementation manner of the embodiment of the present application, extracting a keyword from a speech content, includes:

judging whether the vocabulary identical to the preset keyword exists in the plurality of initial vocabularies or not;

After the voice content is obtained, the voice content is extracted by using a semantic analysis method to obtain a plurality of initial vocabularies, for example: the voice content is "open A1A2A3A4", and after extraction by semantic analysis, "open", "A1A2A3A4", "A1A2" and "A3A4" can be obtained, where "open", "A1A2A3A4", "A1A2" and "A3A4" are initial words.

The electronic equipment is characterized in that preset keywords are pre-stored in the electronic equipment, after a plurality of initial vocabularies are extracted, similarity calculation is carried out on each initial vocabulary in the plurality of initial vocabularies and the preset keywords to obtain a plurality of similarity calculation results, and the initial vocabularies which are larger than the preset similarity calculation values in the plurality of similarity calculation results are recorded as the vocabularies which are the same as the preset keywords and are used as the keywords. The similarity calculation method is not limited in the embodiment of the application, and can be any one of a Jacard similarity coefficient algorithm, a cosine similarity calculation method and a Pearson correlation coefficient algorithm.

The preset similarity calculation value is saved in the electronic equipment in advance, wherein the preset similarity calculation value is not limited in the embodiment of the application, and the user can set the similarity calculation value in a user-defined mode according to actual requirements.

In the embodiment of the application, semantic analysis is performed on voice content to obtain a plurality of initial vocabularies, whether vocabularies which are the same as preset keywords exist in the plurality of initial vocabularies is judged, if yes, the vocabularies which are the same as the preset keywords in the plurality of initial vocabularies are used as the keywords, and similarity calculation is performed on the plurality of initial vocabularies obtained through the semantic analysis to enable the obtained keywords to be more accurate.

In a possible implementation manner of the embodiment of the present application, the current state further includes: the video call state and the voice content comprise: the user's pronunciation and ambient noise after acquireing the current state of product still include:

if the current state of the product is a video call state, performing sound denoising processing on the voice content according to the noise content to obtain processed sound;

and playing the processed sound.

Wherein, in the video call in-process, for the conversation between the convenient user, can will collect pronunciation content and amplify the broadcast, but after user's pronunciation that the microphone was picked up and environmental noise enlargies together, at the other side loudspeaker and local loudspeaker enlargies the back, what heard is the mixed part of user's pronunciation and noise, feel sound messy, unclear, can't reach the effect of noise elimination, so can make an uproar to the content of broadcast.

Specifically, the embodiment of the present application does not limit the manner of acquiring the noise content, as long as the noise content can be acquired, after the noise content is acquired, the signal picked up by the microphone and the audio played by the loudspeaker are compared by the ADC, and simultaneously, after the comparison by the ADC, the data is transmitted to the AP through the I2S (Inter-IC Sound) for processing, the signal picked up by the microphone is an effective signal, the Sound played by the loudspeaker is a noise signal, the two signals are superposed and calculated to obtain a synthesized signal, the synthesized signal is a processed effective signal, and the synthesized signal is restored by the loudspeaker after being processed by the AP, so that the user can hear the Sound without the noise. The AP processing is to process data by using an AP algorithm.

In the embodiment of the application, when the current state of the product is the video call state, the noise elimination processing is carried out according to the acquired noise content and the user voice, the processed sound is played, the influence of noise in the video call process is avoided, and the user experience is improved.

A possible implementation manner of the embodiment of the present application, performing noise cancellation processing on a sound according to noise content and voice content to obtain a processed sound, includes:

processing the voice content and the noise content by using the AP to obtain respective corresponding signal information;

performing superposition processing on signal information corresponding to the voice content and signal information corresponding to the noise content to obtain processed signal information;

and converting the processed signal information to obtain processed sound.

In the embodiment of the present application, the AP processing is to process data by using an AP algorithm, wherein the AP algorithm is a kind of message passing algorithm, and voice content can be converted into output signal information by the processing of the AP algorithm. The method for superposition processing is not limited in the embodiment of the application, and can be any one of signal superposition with single frequency, full-frequency noise generation with rand function, high-frequency noise generation with a high-pass filter, full-frequency noise generation with rand function, and high-frequency noise generation with a band-pass filter.

In the embodiment of the application, the AP processing is carried out on the voice content and the noise content to obtain the signal information, the signal information is subjected to the superposition processing, the obtained signal information is converted to obtain the processed sound, and the superposition processing is carried out on the signal information, so that the noise elimination accuracy is improved.

A possible implementation manner of the embodiment of the present application, after taking a vocabulary identical to the preset keyword in the plurality of initial vocabularies as the keyword if the vocabulary exists, further includes:

acquiring a previous vocabulary and a next vocabulary of the keyword;

and when the previous vocabulary of the key word and the later vocabulary of the key word are not invalid vocabularies, determining the key word as a valid key word, wherein the valid key word is a vocabulary which can be used for acquiring the voice state.

When the voice content is analyzed, a plurality of initial vocabularies are obtained, after a keyword is determined in the plurality of initial vocabularies, a previous vocabulary and a next vocabulary of the keyword are determined according to the keyword and the voice content, and whether the previous vocabulary and the next vocabulary of the keyword are invalid vocabularies is judged.

The invalid vocabulary may be a mood assist word, noun, adverb, adjective, etc., such as: the words such as the ' n ','d ', and ' can also be ' obtained ', ' determined ', and the like, wherein an invalid word set is pre-stored in the electronic device, after a previous word and a next word of the keyword are determined, similarity calculation is performed between the previous word of the keyword and each invalid word in a preset invalid word set to obtain a plurality of word similarity values, and if the similarity values of the plurality of words are greater than a preset word similarity threshold value, the previous word or the next word of the keyword is determined to be an invalid word, and correspondingly, the keyword is an invalid keyword. If the similarity value of the vocabularies is not larger than the preset vocabulary similarity threshold value, the former vocabularies or the latter vocabularies of the keywords are determined not to be invalid vocabularies, and correspondingly, the keywords are not invalid keywords. For example: the voice content is 'my A1A2A3A4 is good for use', when keywords are extracted, a plurality of initial vocabularies are 'my', 'A1A 2A3A 4', 'good for use' and 'good for use', wherein the keywords are 'A1A 2A3A 4', the former vocabularies of the keywords are 'good for use', the latter vocabularies of the keywords are 'good for use', and the former vocabularies of the keywords are auxiliary words, the latter vocabularies are adjectives and are all invalid vocabularies, so the keywords 'A1A 2A3A 4' are invalid keywords.

In the embodiment of the application, whether the previous vocabulary and the next vocabulary of the acquired keywords are invalid vocabularies or not is judged, when the previous vocabulary and the next vocabulary of the keywords are not invalid vocabularies, the keywords are determined to be valid keywords, when the previous vocabulary of the keywords is invalid vocabularies or the next vocabulary of the keywords is invalid vocabularies, the keywords are determined to be invalid keywords, and whether the extracted keywords are valid keywords or not is judged, so that the situation that a user does not unintentionally perform voice awakening or playing interruption on a product in the conversation process can be avoided, and the accuracy of voice control is improved.

A possible implementation manner of the embodiment of the present application, after performing voice wakeup on a product according to an active word if the current state is the dormant state, further includes:

determining the predicted working state time according to the time corresponding to the voice content and a preset time interval;

acquiring a first current time;

And when the keywords are determined to be invalid keywords, the electronic equipment acquires the time corresponding to the voice content to predict the working state time.

The method for acquiring the first current time is not limited, and the first current time can be acquired through a web crawler or a Beidou GPS system. The preset time interval is pre-stored in the electronic device, and generally, the preset time interval may perform average value calculation according to the historical wake-up time to obtain an average value of all the historical wake-up times, and the average value is used as the preset time interval.

In the embodiment of the application, the predicted working state time is determined according to the time corresponding to the acquired voice content and the preset time interval, when the acquired first current time reaches the predicted working state time, the awakening instruction is generated to perform voice awakening, and the working state time is predicted, so that the user experience can be improved.

acquiring voice wake-up time;

and acquiring second current time, if the second current time reaches the dormancy time, generating a dormancy instruction, and reminding and controlling the product to hibernate according to the dormancy instruction.

Wherein, when intelligent pronunciation product is in long-term operating condition, can consume its life, so can carry out pronunciation awaken back to intelligent pronunciation product, surpass the default time interval, produce the dormancy instruction certainly and control intelligent pronunciation product and carry out the dormancy, improve the life of product.

When carrying out pronunciation awakening to intelligent pronunciation product, can acquire pronunciation awakening time to save, when carrying out pronunciation awakening back, electronic equipment can acquire pronunciation awakening time and confirm the dormancy time according to predetermineeing the dormancy time interval, wherein this application embodiment does not prescribe to predetermineeing the dormancy time interval, and the user can be according to the user-defined setting of demand.

The method for acquiring the second current time is not limited, and the second current time can be acquired through a web crawler and also can be acquired through connecting a Beidou GPS system.

When the sleep time is up and other instructions are not received, a sleep instruction is automatically generated, and the sleep instruction is used for controlling the intelligent voice product to sleep.

In the embodiment of the application, the sleep time is determined according to the acquired voice wake-up time and the preset sleep time interval, when the acquired current time reaches the sleep time, the sleep is generated and the intelligent voice product is controlled to sleep, so that the service life of the product in a long-time working state is avoided, and the service life of the product is prolonged.

The foregoing embodiments describe a voice control method from the perspective of a method flow, and the following embodiments describe a voice control apparatus from the perspective of a virtual module or a virtual unit, which are described in detail in the following embodiments.

Fig. 2 shows a voice control apparatus 200 according to an embodiment of the present application, and fig. 2 is a schematic structural diagram of the voice control apparatus according to the embodiment of the present application. The voice control apparatus 200 may specifically include:

the first obtaining module 201: used for obtaining the voice content;

the extraction module 202: extracting keywords in the voice content;

the second obtaining module 203: acquiring the voice state of the keyword, wherein the voice state comprises: tones and spaces in speech content;

valid word determination module 204: calculating according to the voice state of the keyword and the historical voice state to obtain voice similarity, and if the voice similarity is larger than a preset voice similarity threshold value, determining the keyword as an effective word, wherein the effective word is a vocabulary capable of being played and interrupted or voice awakened;

the third obtaining module 205: the method comprises the steps of obtaining the current state of a product, wherein the current state at least comprises the following steps: a playing state and a sleeping state, wherein when the current state is the playing state, the playing interruption module 206 is executed, and when the current state is the sleeping state, the voice awakening module 207 is executed;

the play interruption module 206: the system is used for playing and interrupting the product according to the valid words;

the voice wake-up module 207: and the voice awakening is carried out on the product according to the valid words.

According to the embodiment of the application, the voice state of the keyword in the extracted voice content is obtained, the voice state and the historical voice state are calculated, if the voice similarity is larger than a preset voice similarity threshold value, the keyword is determined to be an effective word, when the current state of the product is in a playing state, the product is played at a terminal according to the effective word, when the current state of the product is in a dormant state, voice awakening is performed according to the effective word, whether the extracted keyword is the effective word is judged, when the keyword is the effective word, the product is played at the terminal or voice awakened according to the effective word, the condition that the product is indiscriminately controlled correspondingly according to the extracted keyword is avoided, and the accuracy of controlling the product to perform voice awakening or playing interruption is improved.

In a possible implementation manner of the embodiment of the present application, when the extracting module 202 executes extracting the keyword in the voice content, the extracting module is specifically configured to:

In a possible implementation manner of the embodiment of the present application, the current state further includes: the video call state and the voice content comprise: the voice control apparatus 200 further includes:

the sound denoising processing module: the voice denoising method comprises the steps of conducting voice denoising processing on voice content according to noise content if the current state of a product is a video call state, and obtaining processed voice;

and playing the processed sound.

In a possible implementation manner of the embodiment of the present application, the sound denoising processing module is specifically configured to, when performing sound denoising processing according to the noise content and the voice content to obtain a processed sound:

and converting the processed signal information to obtain processed sound.

A possible implementation manner of the embodiment of the present application further includes:

the effective vocabulary judging module: the vocabulary acquisition module is used for acquiring a previous vocabulary and a next vocabulary of the keyword;

and when the previous vocabulary of the keyword and the later vocabulary of the keyword are not invalid vocabularies, determining the keyword as a valid keyword, wherein the valid keyword is a vocabulary which can be used for acquiring the voice state.

the working state time prediction module: the time acquisition module is used for acquiring time corresponding to the voice content when the keyword is an invalid keyword;

acquiring a first current time;

a dormancy module: the voice wake-up time is acquired;

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the voice control apparatus 200 described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

An electronic device is provided in an embodiment of the present application, as shown in fig. 3, fig. 3 is a schematic structural diagram of the electronic device provided in the embodiment of the present application, and an electronic device 300 shown in fig. 3 includes: a processor 301 and a memory 303. Wherein the processor 301 is coupled to the memory 303, such as via bus 302. Optionally, the electronic device 300 may also include a transceiver 304. It should be noted that the transceiver 304 is not limited to one in practical applications, and the structure of the electronic device 300 is not limited to the embodiment of the present application.

The Processor 301 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or execute the various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein. The processor 301 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 302 may include a path that transfers information between the above components. The bus 302 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 302 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 3, but this does not represent only one bus or one type of bus.

The Memory 303 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

The memory 303 is used for storing application program codes for executing the scheme of the application, and the processor 301 controls the execution. The processor 301 is configured to execute application program code stored in the memory 303 to implement the aspects illustrated in the foregoing method embodiments.

Wherein, the electronic device includes but is not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. But also a server, etc. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the related art, the method and the device have the advantages that the voice state of the keywords in the extracted voice content is obtained, the voice state and the historical voice state are calculated, if the voice similarity is larger than the preset voice similarity threshold value, the keywords are determined to be the valid words, when the current state of the product is the playing state, the product is played at the terminal according to the valid words, when the current state of the product is the dormant state, voice awakening is conducted according to the valid words, whether the extracted keywords are the valid words or not is judged, when the keywords are the valid words, the product is played at the terminal or voice awakened according to the valid words, the situation that the product is indiscriminately controlled correspondingly according to the extracted keywords is avoided, and the accuracy of controlling the product to conduct voice awakening or playing interruption is improved.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a few embodiments of the present application and it should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present application, and that these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A voice control method, comprising:

acquiring voice content;

extracting key words in the voice content;

acquiring the voice state of the keyword, wherein the voice state comprises: tones and spaces in the speech content;

if the current state is a dormant state, voice awakening is carried out on the product according to the effective word;

wherein, the extracting the keywords in the voice content comprises:

judging whether vocabularies which are the same as preset keywords exist in the plurality of initial vocabularies or not;

if yes, taking the vocabulary which is the same as the preset keyword in the plurality of initial vocabularies as the keyword;

if yes, after the vocabulary which is the same as the preset keyword in the plurality of initial vocabularies is used as the keyword, the method further comprises the following steps:

acquiring a previous vocabulary and a next vocabulary of the keyword;

judging whether a previous vocabulary of the keyword is an invalid vocabulary, wherein the invalid vocabulary is the same as any vocabulary in a preset invalid vocabulary set;

2. The voice control method of claim 1, wherein the current state further comprises: video call state, the voice content comprising: user's pronunciation and ambient noise after the present state of acquireing the product still includes:

and playing the processed sound.

3. The voice control method according to claim 2, wherein the performing a voice denoising process based on the noise content and the voice content to obtain a processed voice comprises:

and converting the processed signal information to obtain processed sound.

4. The voice control method according to claim 1, wherein after the voice waking of the product according to the valid word if the current state is the dormant state, the method further comprises:

acquiring a first current time;

5. The voice control method according to any one of claims 1 to 4, wherein after performing voice wakeup on the product according to the valid word if the current state is a sleep state, the method further includes:

acquiring voice wake-up time;

6. A voice control apparatus, comprising:

a first acquisition module (201): used for obtaining the voice content;

an extraction module (202): extracting keywords in the voice content;

a second obtaining module (203): acquiring a voice state of the keyword, wherein the voice state comprises: tones and spaces in the speech content;

valid word determination module (204): calculating according to the voice state of the keyword and a historical voice state to obtain voice similarity, and if the voice similarity is larger than a preset voice similarity threshold value, determining the keyword as an effective word, wherein the effective word is a vocabulary capable of being played and interrupted or voice awakened;

a third obtaining module (205): for obtaining a current status of a product, wherein the current status comprises at least: a playing state and a dormant state, wherein when the current state is the playing state, a playing interruption module (206) is executed, and when the current state is the dormant state, a voice awakening module (207) is executed;

playback interruption module (206): the system is used for playing and interrupting the product according to the valid word;

voice wake-up module (207): the voice awakening device is used for carrying out voice awakening on the product according to the valid words;

wherein, when the extraction module (202) is used for extracting the keywords in the voice content, the extraction module is specifically configured to:

the voice control device further includes:

7. An electronic device, comprising:

at least one processor;

a memory;

at least one application, wherein the at least one application is stored in the memory and configured to be executed by the at least one processor, the at least one application configured to: performing the method of any one of claims 1 to 5.

8. A computer-readable storage medium, having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any one of claims 1 to 5.