WO2020001165A1

WO2020001165A1 - Voice control method and apparatus, and storage medium and electronic device

Info

Publication number: WO2020001165A1
Application number: PCT/CN2019/085720
Authority: WO
Inventors: 陈岩
Original assignee: Oppo广东移动通信有限公司
Priority date: 2018-06-27
Filing date: 2019-05-06
Publication date: 2020-01-02
Also published as: CN108694947A; CN108694947B

Abstract

A voice control method, comprising: when an electronic device is in a voice monitoring state, updating a stored voice segment by using detected voice data, the voice segment being a segment of voice data detected within a latest preset time period (101); obtaining a current voice segment in the voice segment updating process (102); extracting a voiceprint feature and a keyword from the current voice segment (103); starting a voice recording function according to the voiceprint feature and the keyword, and obtaining a voice segment at the time of successful start as a target voice segment (104); and performing corresponding control on the electronic device according to recorded voice data and the target voice segment (105).

Description

Voice control method, device, storage medium and electronic equipment

This application claims the priority of a Chinese patent application filed on June 27, 2018 with the Chinese Patent Office, application number 201810681095.6, and the invention name is "Voice Control Method, Device, Storage Medium, and Electronic Equipment", the entire contents of which are incorporated by reference. In this application.

Technical field

The present application relates to the field of computer technology, and in particular, to a voice control method, device, storage medium, and electronic device.

Background technique

With the widespread application of mobile terminals, voice assistants for mobile terminals have also become a commonly used function. The user can use the voice assistant function of the mobile terminal to perform voice interaction with the machine assistant, so that the machine assistant can complete various operations on the mobile terminal under the user's voice control, and also include various operations on applications on the mobile terminal, such as Set up schedules, turn on alarms, set up to-do items, open apps, make calls, and more.

The startup process of the existing voice assistant mainly includes two phases, the wake-up phase and the preparation phase. For example, the terminal can monitor the user's voice in real time. When it is detected that the user speaks the wake-up word, the system performs related preparations to start the voice assistant. After the user has spoken the wake-up word, they need to wait until the system is ready to issue a voice command. The waiting time is longer and the voice consistency is poor.

Summary of the invention

The embodiments of the present application provide a voice control method, a device, a storage medium, and an electronic device. The voice instructions can be issued without waiting for the system to be ready, and the voice consistency is better.

An embodiment of the present application provides a voice control method applied to an electronic device, including:

When the electronic device is in a voice monitoring state, using the monitored voice data to update a stored voice segment, where the voice segment is a piece of voice data that has been monitored within a recently preset time period;

Obtaining the current voice segment during the update of the voice segment;

Extracting voiceprint features and keywords from the current voice segment;

Start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment;

The electronic device is controlled according to the recorded voice data and the target voice segment.

An embodiment of the present application further provides a voice control device, which is applied to an electronic device and includes:

An update module, configured to update the stored voice segment by using the monitored voice data when the electronic device is in a voice monitoring state, where the voice segment is a piece of voice data that has been monitored within a recently preset duration;

An obtaining module, configured to obtain a current voice segment during an update process of the voice segment;

An extraction module for extracting voiceprint features and keywords from the current voice segment;

A startup module, configured to start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment;

A control module, configured to control the electronic device according to the recorded voice data and the target voice segment.

An embodiment of the present application further provides a computer-readable storage medium, where a plurality of instructions are stored in the storage medium, and the instructions are adapted to be loaded by a processor to execute any one of the foregoing voice control methods.

An embodiment of the present application further provides an electronic device including a processor and a memory, the processor is electrically connected to the memory, the memory is used for storing instructions and data, and the processor is used for any one of the foregoing devices. The steps in the voice control method described above.

BRIEF DESCRIPTION OF THE DRAWINGS

The following detailed description of specific embodiments of the present application will make the technical solutions and other beneficial effects of the present application obvious in conjunction with the accompanying drawings.

FIG. 1 is a schematic diagram of an application scenario of a voice control system according to an embodiment of the present application.

FIG. 2 is a schematic flowchart of a voice control method according to an embodiment of the present application.

FIG. 3 is another schematic flowchart of a voice control method according to an embodiment of the present application.

FIG. 4 is a schematic framework diagram of a voice control process according to an embodiment of the present application.

FIG. 5 is a schematic diagram of a server parsing process provided by an embodiment of the present application.

FIG. 6 is a schematic structural diagram of a voice control device according to an embodiment of the present application.

FIG. 7 is a schematic structural diagram of a control module according to an embodiment of the present application.

FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

FIG. 9 is another schematic structural diagram of an electronic device according to an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without creative work fall into the protection scope of the present application.

The embodiments of the present application provide a voice control method, a device, a storage medium, and an electronic device.

Please refer to FIG. 1. FIG. 1 provides a schematic diagram of an application scenario of a voice control system. The voice control system may include any voice control device provided in an embodiment of the present application. The voice control device may be integrated into an electronic device. The electronic device may include a touch-enabled device such as a smart phone or a tablet.

When the electronic device is in a voice monitoring state, the electronic device can use the monitored voice data to update the stored voice segment, which is a piece of voice data that has been monitored within a recently preset time period; the update in the voice segment In the process, the current voice segment is obtained, and voiceprint features and keywords are extracted from the current voice segment; the voice recording function is started according to the voiceprint feature and keywords, and the voice segment at the moment of successful startup is obtained as the target voice segment; The electronic device is controlled according to the recorded voice data and the target voice segment.

For example, the preset time length may be artificially set to 3 seconds. Usually, the preset time length is about the same as the system preparation time when the voice recording function is activated. In Figure 1, the electronic device can monitor the user's voice in real time, and save the last 3 seconds of voice data in real-time during the monitoring process for voiceprint analysis and keyword matching. Once the user's voiceprint and keywords meet the conditions, such as for A user Mobile phone, if user A says "small x small x, play xxx song in xx application", then the mobile phone can start recording when user A sends "small x small x" and get the moment of successful startup Voice segment, such as "play xx shall", stitch it with the subsequent recorded voice "using xxx songs in use" to get the continuous voice of "play xxx songs in xx applications", and perform the mobile phone according to the continuous voice Control accordingly.

A voice control method applied to an electronic device includes:

Obtaining the current voice segment during the update of the voice segment;

Extracting voiceprint features and keywords from the current voice segment;

In some embodiments, the starting a voice recording function according to the voiceprint characteristics and keywords includes:

Determining whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword;

If yes, start the voice recording function;

If not, return to performing the operation of acquiring the current voice segment.

In some embodiments, the controlling the electronic device according to the recorded voice data and a target voice segment includes:

Stitching the target voice segment and the recorded voice data to obtain a stitched voice;

Determining a control instruction according to the spliced voice;

Controlling the electronic device to execute the control instruction.

In some embodiments, the determining a control instruction according to the spliced voice includes:

Sending the spliced speech to a server, so that the server semantically parses the spliced speech and returns a corresponding target parsed word;

Determining a target application and an application event according to the returned target parsing word;

Generating a control instruction according to the target application and an application event;

The controlling the electronic device to execute the control instruction includes using the target application to execute the application event.

In some embodiments, the determining a target application according to the target parsing word includes:

Determining whether the target parsing word matches a stored parsing word set;

If so, determine the target application based on the successfully parsed words;

If not, determine whether the target parsing word matches the preset word set; when the match is successful, determine the target application based on the successfully matched preset word, and add the successfully matched preset word to the parsed word And simultaneously delete the successfully matched preset word from the preset word set.

When the target parsing word is an application name, the application corresponding to the application name is used as the target application;

When the target parsing word is an application type, all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined applications.

In some embodiments, the determining a target application based on the determined application includes:

Using the determined application with the highest frequency as the target application, or

Generate a selection interface according to the determined application, and provide the selection interface to a user; determine a target application according to a selection operation of the user on the selection interface.

As shown in FIG. 2, FIG. 2 is a schematic flowchart of a voice control method provided by an embodiment of the present application, which is applied to an electronic device. The specific process may be as follows:

101. When the electronic device is in a voice monitoring state, the stored voice segment is updated by using the monitored voice data, and the voice segment is a piece of voice data that has been monitored within a recently preset time period.

In this embodiment, voice can be monitored through a device such as a microphone. The preset time can be set manually, for example, 3 seconds, which is usually about the same as the system preparation time when the voice recording function is activated. Specifically, the electronic device can monitor the user's voice in a low power consumption state, and save the voice data detected in the most recent preset time period in real time.

102. In the updating process of the voice segment, obtain a current voice segment.

In this embodiment, voice monitoring and voice segment updating are performed in real time, and in this process, the electronic device may also perform real time analysis on the voice segment.

103. Extract voiceprint features and keywords from the current voice segment.

In this embodiment, the voiceprint feature is mainly a frequency spectrum feature, which may include frequency, amplitude, and other information, and usually reflects the characteristics of the sound's loudness, tone, and timbre. Different people generally have different voiceprint features. The speech segments are converted into spectral data by Fourier transform, and then the relevant information is extracted from the spectral data as corresponding voiceprint features. The keyword may include at least one text, and the text may be English or Chinese.

104. Start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment.

For example, the above step 104 may specifically include:

Judging whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword;

If yes, start the voice recording function;

If not, return to performing the operation of obtaining the current voice segment.

In this embodiment, the preset feature is mainly a voiceprint feature bound to the user, which is usually set in advance. For example, the user may be required to record a voice in advance, and the voiceprint feature is extracted from the voice as a preset. Feature to bind with this user. The preset keyword is mainly used to trigger the activation of the voice interaction function, which can be set by the system by default (that is, set by the manufacturer when the electronic device leaves the factory), or can be set by the user according to his own preferences, such as Enter the corresponding setting window through different interfaces of the related setting interface for setting.

It should be noted that during the process of starting the voice recording function, the electronic device needs to perform a series of preparations, such as interrupting the operation of the foreground application, setting the call parameter settings of the voice recording component, etc. During this preparation process, the electronic device remains In the state of voice monitoring, the voice segment is continuously updated at this time, and after the voice recording function is successfully started, the voice segment can be stopped from updating.

105. Perform corresponding control on the electronic device according to the recorded voice data and the target voice segment.

For example, the above step 105 may specifically include:

1-1. The target voice segment and the recorded voice data are spliced to obtain a spliced voice.

In this embodiment, since the voice duration of the voice segment is similar to the system preparation duration, that is, the voice segment saved at the time when the system is ready (that is, when the voice recording function is successfully started), it can be exactly the key when the user speaks a preset The continuous utterance after the word can be directly used as the starting content of the recorded speech and spliced with the subsequent recorded speech data to form a continuous speech.

1-2. Determine the control instruction according to the spliced voice.

For example, the above steps 1-2 may specifically include:

Sending the spliced speech to the server, so that the server semantically parses the spliced speech and returns the corresponding target parsed word;

Determining a target application and an application event according to the returned target parsed word;

Control instructions are generated based on the target application and application event.

In this embodiment, the server is mainly used for semantic analysis. During the voice recording process, the electronic device can transmit the spliced voice to the server in real time, and the server can be analyzed by a trained voice analysis model. The semantic analysis model can It is a deep learning model. The server can collect different voice samples in advance to train the deep learning model.

Further, the above-mentioned step "determining a target application based on the returned target parsing word" may specifically include:

Determine whether the target parsing word matches the stored parsing word set;

If so, determine the target application based on the successfully parsed words;

If not, determine whether the target parsing word matches the preset word set; when the match is successful, determine the target application based on the successfully matched preset word, and add the successfully matched preset word to the parsed word set, while Delete the successfully matched preset word from the preset word set.

In this embodiment, the parsed word set and the preset word set are mainly application related words, such as an application name or an application type. The parsed word set is mainly parsed out when the electronic device requests a server to perform semantic parsing in a historical period. The preset word set is mainly set by the system by default, for example, each time an application is installed on the terminal, an application related word of the application can be obtained. Specifically, when the target parsing word is an application name, the application corresponding to the application name may be directly used as the target application. When the parsing word is an application type, all applications in the electronic device that belong to the same application type may be determined first. , You can then use the most frequently used application as the target application, or provide users with a selection interface based on these applications, and determine the target application through the user's selection operation.

It should be noted that, for the initial parsing, the preset word set may include all words set by the system by default, the parsing word set may be empty, and each time after the electronic device obtains a new parsing word, the new parsing word may be The parsed words in are stored in the parsed word set and deleted from the preset word set at the same time, so as to continuously update the parsed word set and the preset word set. By matching the target parsing word with the previous parsing record after each parsing, the scope of the matching can be narrowed according to the user's interaction habits, and the matching speed can be improved.

1-3. Control the electronic device to execute the control instruction.

For example, the above steps 1-3 may specifically include:

Use the target application to execute the application event.

In this embodiment, if the target application is in a closed state, the target application may be started first, and then the target application is used to execute a corresponding application event.

It can be known from the foregoing that the voice control method provided in this embodiment is applied to an electronic device. When the electronic device is in a voice monitoring state, the stored voice segment is updated by using the monitored voice data, and the voice segment is a recent preset A piece of voice data detected within the duration, then, during the update of the voice segment, the current voice segment is obtained, and voiceprint features and keywords are extracted from the current voice segment. Then, according to the voiceprint feature and key To activate the voice recording function, and obtain the voice segment at the moment of successful startup as the target voice segment, and then control the electronic device accordingly according to the recorded voice data and the target voice segment, so that it can directly wake up and input interactive instructions through coherent voice, without the need to Voice interruption due to the long system preparation time, the method is simple, can effectively improve the efficiency of voice interaction, and the voice interaction effect is good.

In this embodiment, description will be made from the perspective of a voice control device, and the voice control device will be specifically described as an example for integration in an electronic device.

See Figure 3, a voice control method. The specific process can be as follows:

201. When in a voice monitoring state, the electronic device uses the monitored voice data to update a stored voice segment, which is a segment of voice data that has been monitored within a recently preset time period.

For example, the preset duration may be artificially set to 3 seconds. As long as the electronic device is on, voice monitoring can be performed all the time, and for the monitored voice data, the electronic device can only save the voice data within the last 3 seconds.

202. During the update of the voice segment, the electronic device obtains the current voice segment, and extracts voiceprint features and keywords from the current voice segment.

For example, referring to FIG. 4, when the user says “small x small x, play xxx song in xx application” to the electronic device, since the voice monitoring operation and voice segment update operation are performed in real time, the user says The first 3 seconds of speech, such as "small x small x", can be stored as the initial speech segment, and then the electronic device will perform voiceprint feature and keyword extraction operations on the stored speech segment, such as using Fourier Leaf change converts it into spectrum data, and extracts relevant information from the spectrum data as corresponding voiceprint features. At the same time, it extracts the content of the speech segment to obtain keywords.

203. The electronic device determines whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword. If yes, execute step 204 below; if not, return to execute step 202 above.

For example, the preset feature may be a voiceprint feature input by the user in advance, and the preset keyword may be a phrase set by the system by default, which may include at least two words or words. Of course, in order to avoid the above-mentioned voice of a preset duration Segments cannot include complete preset keywords. The preset keywords should be short, such as "small x".

204. The electronic device starts a voice recording function, and obtains a voice segment at a successful start time as a target voice segment.

For example, in Figure 4, when the user's voice meets the conditions by analyzing the initial voice segment "small x small x", the electronic device can be immediately notified to perform related preparations, such as interrupting the running of the foreground application and calling parameters of the voice recording component. Settings and so on. During this preparation process, the voice segment will continue to be updated until the voice recording function is successfully turned on, and the voice segment at this time may be exactly "play xx application".

205. The electronic device splices the target voice segment and the recorded voice data to obtain a spliced voice, and then sends the spliced voice to a server, so that the server semantically parses the spliced voice and returns a corresponding target parsed word.

For example, after the voice recording function is successfully turned on, since subsequent users' words are saved by recording, there is no need to repeatedly save the voice segment update method. At this time, you can stop the update operation of the voice segment, and The voice segment "Play xx application" is used as the starting content of the recorded voice and the subsequent recorded voice data is spliced into a continuous voice. For example, in the first 2 seconds of recording, the spliced voice can be "play xx application". At the same time, the spliced speech is transmitted to the server in real time for semantic analysis.

206. The electronic device determines whether the target parsing word matches the stored parsing word set. If yes, determines the target application based on the successfully matched parsing word, and determines an application event based on the target parsing word. If not, performs the following steps. 207.

207. The electronic device determines whether the target parsing word matches a preset word set, and if so, determines a target application based on the successfully matched preset word, determines an application event based on the target parsing word, and then sets the successfully matched preset word. It is added to the parsed word set, and the successfully matched preset word is deleted from the preset word set. If not, the user is prompted to re-enter the voice.

For example, referring to FIG. 5, for the target parsing word returned by the server, the electronic device can first match it with the previous parsing record. Only when the matching is unsuccessful, the matching is performed in the preset word set. Referring to FIG. 5, when the target analytic word is an application name, the application corresponding to the application name can be directly used as the target application. For example, the target analytic word parsed from the voice of “playing xxx song in xx application” can be Is "xx application", the application event can be: play xxx songs. When the target analytic word is an application type, all applications belonging to the same application type in the electronic device may be determined first, and then the application with the highest frequency of use may be used as the target application, for example, from the voice of "playing xxx songs" The parsed target analytic word can be "music player application", and the application event can be: playing xxx songs. At this time, C1, C2, and C3 applications belonging to the music player application can be found in the electronic device. If the C1 application is used the most frequently, The target application is a C1 application.

208. The electronic device executes the application event using the target application.

For example, for the voice segment "Playing xxx songs in xx applications", if the xx application is not opened at this time, you can first open the xx application, and then use the xx application to find the xxx song for playback.

It can be known from the foregoing that the voice control method provided in this embodiment is applied to an electronic device. When the voice monitoring state is in use, the electronic device can use the monitored voice data to update the stored voice segment, and the voice segment is the latest preset duration. A piece of voice data is monitored within the voice segment, and in the process of updating the voice segment, the current voice segment is obtained, and voiceprint features and keywords are extracted from the current voice segment. Then, it is determined whether the voiceprint feature is consistent with a preset Feature matching, and whether the keyword matches a preset keyword, if so, start the voice recording function, and obtain the voice segment at the moment of successful startup as the target voice segment, and then stitch the target voice segment and the recorded voice data , Get the spliced voice, so that you can directly wake up and input interactive instructions through coherent voice, without having to interrupt the voice due to the system preparation time, the method is simple. Then, the spliced speech is sent to the server, so that the server semantically parses the spliced speech, and returns the corresponding target parsing word. Then, it is determined whether the target parsing word matches the stored parsing word set. If yes, then The target application is determined according to the successfully parsed word, and the application event is determined according to the target parsed word. If not, it is determined whether the target parsed word matches the preset word set. If so, the target application is determined based on the successfully matched preset word. , And determine an application event according to the target parsed word, then add the successfully matched preset word to the parsed word set, at the same time delete the successfully matched preset word from the preset word set, and finally use the target application to execute the Applying events can improve the efficiency of parsing word matching based on the user's past voice interaction habits, effectively improve the efficiency of voice interaction, and improve the user experience effect.

According to the method described in the foregoing embodiment, this embodiment will be further described from the perspective of a voice control device. The voice control device may be specifically implemented as an independent entity or integrated in an electronic device, such as a terminal. The terminal may include a mobile phone, a tablet computer, and the like.

A voice control device applied to electronic equipment includes:

In some embodiments, the startup module is specifically configured to:

If yes, start the voice recording function;

If not, trigger the acquisition module to perform the operation of acquiring the current voice segment.

In some embodiments, the control module specifically includes:

A splicing unit, configured to splice the target voice segment and the recorded voice data to obtain a spliced voice;

A determining unit, configured to determine a control instruction according to the spliced voice;

An execution unit is configured to control the electronic device to execute the control instruction.

In some embodiments, the determining unit is specifically configured to:

The execution unit is configured to execute the application event by using the target application.

In some embodiments, the determining unit is specifically configured to:

Determining whether the target parsing word matches a stored parsing word set;

If so, determine the target application based on the successfully parsed words;

In some embodiments, the determining unit is specifically configured to:

Please refer to FIG. 6. FIG. 6 specifically describes a voice control device provided in an embodiment of the present application, which is applied to electronic equipment. The voice control device may include: an update module 10, an acquisition module 20, an extraction module 30, a startup module 40, and a control module. 50 of which:

(1) Update module 10

The update module 10 is configured to update the stored voice segment by using the monitored voice data when the electronic device is in a voice monitoring state, where the voice segment is a piece of voice data that has been monitored within a recently preset time period.

In this embodiment, voice can be monitored through a device such as a microphone. The preset time can be set manually, for example, 3 seconds, which is usually about the same as the system preparation time when the voice recording function is activated. Specifically, the electronic device can monitor the user's voice in a low power consumption state, and the update module 10 saves the voice data detected in the most recent preset time period in real time.

(2) Acquisition module 20

The obtaining module 20 is configured to obtain a current voice segment during an update process of the voice segment.

(3) Extraction module 30

The extraction module 30 is configured to extract voiceprint features and keywords from a current voice segment.

(4) Start module 40

The starting module 40 is configured to start a voice recording function according to the voiceprint feature and keywords, and obtain a voice segment at a successful startup time as a target voice segment.

For example, the startup module 40 may be specifically used for:

If yes, start the voice recording function;

If not, the acquisition module is triggered to perform the operation of acquiring the current voice segment.

It should be noted that during the process of starting the voice recording function, the electronic device needs to perform a series of preparations, such as interrupting the operation of the foreground application, setting the call parameter settings of the voice recording component, etc. In the state of voice monitoring, the voice segment is continuously updated at this time, and after the voice recording function is successfully started, the voice segment can be stopped from updating.

(5) Control module 50

The control module 50 is configured to control the electronic device according to the recorded voice data and the target voice segment.

For example, referring to FIG. 7, the control module 50 may specifically include:

The splicing unit 51 is configured to splice the target voice segment and the recorded voice data to obtain a spliced voice.

A determining unit 52 is configured to determine a control instruction according to the stitching voice.

For example, the determining unit 52 may be specifically configured to:

Determine whether the target parsing word matches the stored parsing word set;

If so, determine the target application based on the successfully parsed words;

Further, the determining unit 52 may be specifically configured to:

The execution unit 53 is configured to control the electronic device to execute the control instruction.

Further, the execution unit 53 may be specifically configured to:

Use the target application to execute the application event.

In this embodiment, if the target application is in a closed state, the execution unit 53 may start the target application first, and then use the target application to execute a corresponding application event.

In specific implementation, the above units may be implemented as independent entities, or may be arbitrarily combined, and implemented as the same or several entities. For the specific implementation of the above units, refer to the foregoing method embodiments, and details are not described herein again.

It can be known from the foregoing that the voice control apparatus provided in this embodiment is applied to an electronic device. When the electronic device is in a voice monitoring state, the update module 10 updates the stored voice segment by using the monitored voice data, and the voice segment is the most recent A piece of voice data detected within a preset duration. Then, the acquisition module 20 acquires the current voice segment during the update of the voice segment, and the extraction module 30 extracts voiceprint features and keywords from the current voice segment. Then, The starting module 40 starts the voice recording function according to the voiceprint characteristics and keywords, and obtains the voice segment at the moment of successful startup as the target voice segment, and then the control module 50 controls the electronic device accordingly according to the recorded voice data and the target voice segment. Therefore, it is possible to directly wake up and input interactive instructions through continuous voice, without interrupting the voice due to the system preparation time. The method is simple, can effectively improve the efficiency of voice interaction, and has a good voice interaction effect.

In addition, the embodiment of the present application further provides an electronic device, and the electronic device may be a device such as a smart phone or a tablet computer. As shown in FIG. 8, the electronic device 400 includes a processor 401 and a memory 402. The processor 401 is electrically connected to the memory 402.

The processor 401 is the control center of the electronic device 400. It uses various interfaces and lines to connect various parts of the entire electronic device. The processor 401 runs or loads applications stored in the memory 402, and calls the data stored in the memory 402 to execute the electronics. Various functions of the device and processing data, so as to monitor the overall electronic equipment.

In this embodiment, the processor 401 in the electronic device 400 will load instructions corresponding to one or more application processes into the memory 402 according to the following steps, and the processor 401 will run and store the instructions in the memory 402 Applications in order to implement various functions:

When the electronic device is in a voice monitoring state, the stored voice segment is updated by using the monitored voice data, and the voice segment is a piece of voice data that has been monitored within a recently preset time period;

During the update of the voice segment, obtain the current voice segment;

Extracting voiceprint features and keywords from the current voice segment;

Start the voice recording function according to the voiceprint characteristics and keywords, and obtain the voice segment at the moment of successful startup as the target voice segment;

If yes, start the voice recording function;

Determining a control instruction according to the spliced voice;

Controlling the electronic device to execute the control instruction.

Determining whether the target parsing word matches a stored parsing word set;

If so, determine the target application based on the successfully parsed words;

When the target parsing word is an application type, all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined application.

Please refer to FIG. 9, which is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 500 may include a radio frequency (RF) circuit 501, a memory 502 including one or more computer-readable storage media, an input unit 503, a display unit 504, a sensor 504, an audio circuit 506, and a wireless fidelity ( A WiFi (Wireless Fidelity) module 507 includes a processor 508 having one or more processing cores, a power supply 509, and other components. Those skilled in the art can understand that the structure of the electronic device shown in FIG. 9 does not constitute a limitation on the electronic device, and may include more or fewer components than shown in the figure, or some components may be combined, or different components may be arranged.

The radio frequency circuit 501 can be used to send and receive information, or to receive and send signals during a call. In particular, after receiving the downlink information of the base station, it is processed by one or more processors 508. In addition, the uplink-related data is sent to the base station. . Generally, the radio frequency circuit 501 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a subscriber identity module (SIM, Subscriber Identity Module) card, a transceiver, a coupler, and a low noise amplifier (LNA, Low Noise Amplifier), duplexer, etc. In addition, the radio frequency circuit 501 can also communicate with a network and other devices through wireless communication. This wireless communication can use any communication standard or protocol, including but not limited to Global System for Mobile Communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA, Code Division Multiple Access), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), Email, Short Messaging Service (SMS), etc.

The memory 502 may be used to store application programs and data. The application program stored in the memory 502 contains executable code. Applications can be composed of various functional modules. The processor 508 executes various functional applications and data processing by running an application program stored in the memory 502. The memory 502 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, at least one application required by a function (such as a sound playback function, an image playback function, etc.), etc .; The data (such as audio data, phone book, etc.) created by the use of electronic devices. In addition, the memory 502 may include a high-speed random access memory, and may further include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices. Accordingly, the memory 502 may further include a memory controller to provide the processor 508 and the input unit 503 to access the memory 502.

The input unit 503 can be used to receive inputted numbers, character information, or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control. Specifically, in a specific embodiment, the input unit 503 may include a touch-sensitive surface and other input devices. A touch-sensitive surface, also known as a touch display or touchpad, collects user touch operations on or near it (such as the user using a finger, stylus or any suitable object or accessory on the touch-sensitive surface or touch-sensitive Operation near the surface), and drive the corresponding connection device according to a preset program. Optionally, the touch-sensitive surface may include two parts, a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch position, and detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts it into contact coordinates, and sends it To the processor 508, and can receive the command sent by the processor 508 and execute it.

The display unit 504 may be used to display information input by the user or information provided to the user and various graphical user interfaces of the electronic device. These graphical user interfaces may be composed of graphics, text, icons, videos, and any combination thereof. The display unit 504 may include a display panel. Optionally, the display panel may be configured using a liquid crystal display (LCD, Liquid Crystal Display), an organic light emitting diode (OLED, Organic Light-Emitting Diode), or the like. Further, the touch-sensitive surface may cover the display panel. When the touch-sensitive surface detects a touch operation on or near the touch-sensitive surface, the touch-sensitive surface is transmitted to the processor 508 to determine the type of the touch event, and then the processor 508 displays the touch event according to the type of the touch event. The corresponding visual output is provided on the panel. Although in FIG. 9, the touch-sensitive surface and the display panel are implemented as two separate components to implement input and input functions, in some embodiments, the touch-sensitive surface and the display panel may be integrated to implement input and output functions.

The electronic device may further include at least one sensor 505, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel according to the brightness of the ambient light, and the proximity sensor may close the display panel and / or when the electronic device is moved to the ear. Backlight. As a kind of motion sensor, the gravity acceleration sensor can detect the magnitude of acceleration in various directions (generally three axes). It can detect the magnitude and direction of gravity when it is stationary. It can be used to identify the attitude of the mobile phone (such as horizontal and vertical screen switching, related Games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tap), etc .; as for electronic devices, other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. can also be configured, not here More details.

The audio circuit 506 may provide an audio interface between the user and the electronic device through a speaker or a microphone. The audio circuit 506 can convert the received audio data into an electrical signal and transmit it to a speaker. The speaker converts the audio signal into a sound signal and outputs it. On the other hand, the microphone converts the collected sound signal into an electrical signal, which is converted by the audio circuit 506 into After the audio data is processed by the audio data output processor 508, it is sent to, for example, another electronic device via the radio frequency circuit 501, or the audio data is output to the memory 502 for further processing. The audio circuit 506 may further include an earphone jack to provide communication between the peripheral headset and the electronic device.

Wireless fidelity (WiFi) is a short-range wireless transmission technology. Electronic devices can help users send and receive email, browse web pages, and access streaming media through the wireless fidelity module 507, which provides users with wireless broadband Internet access. Although FIG. 9 shows the wireless fidelity module 507, it can be understood that it does not belong to the necessary structure of the electronic device, and can be omitted as needed without changing the essence of the invention.

The processor 508 is a control center of an electronic device, and uses various interfaces and lines to connect various parts of the entire electronic device. The electronic device is executed by running or executing an application program stored in the memory 502 and calling data stored in the memory 502. Various functions and processing data to monitor the overall electronic equipment. Optionally, the processor 508 may include one or more processing cores; preferably, the processor 508 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, and an application program, etc. The modem processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 508.

The electronic device also includes a power source 509 (such as a battery) that powers various components. Preferably, the power supply can be logically connected to the processor 508 through a power management system, so that functions such as managing charging, discharging, and power consumption management can be implemented through the power management system. The power supply 509 may also include one or more DC or AC power supplies, a recharging system, a power failure detection circuit, a power converter or inverter, a power supply status indicator, and any other components.

Although not shown in FIG. 9, the electronic device may further include a camera, a Bluetooth module, and the like, and details are not described herein again.

In specific implementation, the above modules may be implemented as independent entities, or may be arbitrarily combined, and implemented as the same or several entities. For the specific implementation of the above modules, refer to the foregoing method embodiments, and details are not described herein again.

Those of ordinary skill in the art may understand that all or part of the steps in the various methods of the foregoing embodiments may be performed by instructions, or by controlling related hardware by instructions, and the instructions may be stored in a computer-readable storage medium. It is loaded and executed by the processor. To this end, an embodiment of the present invention provides a storage medium in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute steps in any one of the voice control methods provided by the embodiments of the present invention.

The storage medium may include a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk.

Since the instructions stored in the storage medium can execute the steps in any one of the voice control methods provided by the embodiments of the present invention, it is possible to implement what can be achieved by any one of the voice control methods provided by the embodiments of the present invention. For the beneficial effects, refer to the foregoing embodiments for details, and details are not described herein again.

For specific implementation of the foregoing operations, refer to the foregoing embodiments, and details are not described herein again.

In summary, although the present application has been disclosed above with preferred embodiments, the above preferred embodiments are not intended to limit the present application. Those skilled in the art can make various modifications without departing from the spirit and scope of the present application. Changes and retouching, therefore, the scope of protection of this application is subject to the scope defined by the claims.

Claims

A voice control method applied to an electronic device includes:

When the electronic device is in a voice monitoring state, using the monitored voice data to update a stored voice segment, where the voice segment is a piece of voice data that has been monitored within a recently preset time period;

Obtaining the current voice segment during the update of the voice segment;

Extracting voiceprint features and keywords from the current voice segment;

Start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment;

The electronic device is controlled according to the recorded voice data and the target voice segment.
The voice control method according to claim 1, wherein the activating a voice recording function according to the voiceprint characteristics and keywords comprises:

Determining whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword;

If yes, start the voice recording function;

If not, return to performing the operation of acquiring the current voice segment.
The voice control method according to claim 1, wherein the controlling the electronic device according to the recorded voice data and a target voice segment comprises:

Stitching the target voice segment and the recorded voice data to obtain a stitched voice;

Determining a control instruction according to the spliced voice;

Controlling the electronic device to execute the control instruction.
The voice control method according to claim 3, wherein the determining a control instruction according to the spliced voice comprises:

Sending the spliced speech to a server, so that the server semantically parses the spliced speech and returns a corresponding target parsed word;

Determining a target application and an application event according to the returned target parsing word;

Generating a control instruction according to the target application and an application event;

The controlling the electronic device to execute the control instruction includes using the target application to execute the application event.
The voice control method according to claim 4, wherein the determining a target application based on the target parsing word comprises:

Determining whether the target parsing word matches a stored parsing word set;

If so, determine the target application based on the successfully parsed words;

If not, determine whether the target parsing word matches the preset word set; when the match is successful, determine the target application based on the successfully matched preset word, and add the successfully matched preset word to the parsed word And simultaneously delete the successfully matched preset word from the preset word set.
The voice control method according to claim 4, wherein the determining a target application based on the target parsing word comprises:

When the target parsing word is an application name, the application corresponding to the application name is used as the target application;

When the target parsing word is an application type, all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined applications.
The voice control method according to claim 6, wherein the determining a target application based on the determined application comprises:

Using the determined application with the highest frequency as the target application, or

Generate a selection interface according to the determined application, and provide the selection interface to a user; determine a target application according to a selection operation of the user on the selection interface.
A voice control device applied to an electronic device includes:

An update module, configured to update the stored voice segment by using the monitored voice data when the electronic device is in a voice monitoring state, where the voice segment is a piece of voice data that has been monitored within a recently preset duration;

An obtaining module, configured to obtain a current voice segment during an update process of the voice segment;

An extraction module for extracting voiceprint features and keywords from the current voice segment;

A startup module, configured to start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment;

A control module, configured to control the electronic device according to the recorded voice data and the target voice segment.
The voice control device according to claim 8, wherein the startup module is specifically configured to:

Determining whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword;

If yes, start the voice recording function;

If not, trigger the acquisition module to perform the operation of acquiring the current voice segment.
The voice control device according to claim 8, wherein the control module specifically comprises:

A splicing unit, configured to splice the target voice segment and the recorded voice data to obtain a spliced voice;

A determining unit, configured to determine a control instruction according to the spliced voice;

An execution unit is configured to control the electronic device to execute the control instruction.
The voice control device according to claim 10, wherein the determining unit is specifically configured to:

Sending the spliced speech to a server, so that the server semantically parses the spliced speech and returns a corresponding target parsed word;

Determining a target application and an application event according to the returned target parsing word;

Generating a control instruction according to the target application and an application event;

The execution unit is configured to execute the application event by using the target application.
The voice control device according to claim 11, wherein the determining unit is specifically configured to:

Determining whether the target parsing word matches a stored parsing word set;

If so, determine the target application based on the successfully parsed words;

If not, determine whether the target parsing word matches the preset word set; when the match is successful, determine the target application based on the successfully matched preset word, and add the successfully matched preset word to the parsed word And simultaneously delete the successfully matched preset word from the preset word set.
The voice control device according to claim 11, wherein the determining unit is specifically configured to:

When the target parsing word is an application name, the application corresponding to the application name is used as the target application;

When the target parsing word is an application type, all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined applications.
The voice control device according to claim 13, wherein the determining unit is specifically configured to:

Using the determined application with the highest frequency as the target application, or

Generate a selection interface according to the determined application, and provide the selection interface to a user; determine a target application according to a selection operation of the user on the selection interface.
A computer-readable storage medium, wherein a plurality of instructions are stored in the storage medium, and the instructions are adapted to be loaded by a processor to execute the voice control method according to claim 1.
An electronic device comprising a processor and a memory, the processor is electrically connected to the memory, the memory is used to store instructions and data, and the processor is used to execute the voice control method of claim 1 Steps.
The electronic device according to claim 16, wherein the activating a voice recording function according to the voiceprint characteristics and keywords comprises:

Determining whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword;

If yes, start the voice recording function;

If not, return to performing the operation of acquiring the current voice segment.
The electronic device according to claim 16, wherein the controlling the electronic device according to the recorded voice data and the target voice segment comprises:

Stitching the target voice segment and the recorded voice data to obtain a stitched voice;

Determining a control instruction according to the spliced voice;

Controlling the electronic device to execute the control instruction.
The electronic device according to claim 18, wherein the determining a control instruction according to the spliced voice comprises:

Sending the spliced speech to a server, so that the server semantically parses the spliced speech and returns a corresponding target parsed word;

Determining a target application and an application event according to the returned target parsing word;

Generating a control instruction according to the target application and an application event;

The controlling the electronic device to execute the control instruction includes using the target application to execute the application event.
The electronic device according to claim 19, wherein the determining a target application based on the target parsing word comprises:

Determining whether the target parsing word matches a stored parsing word set;

If so, determine the target application based on the successfully parsed words;

If not, determine whether the target parsing word matches the preset word set; when the match is successful, determine the target application based on the successfully matched preset word, and add the successfully matched preset word to the parsed word And simultaneously delete the successfully matched preset word from the preset word set.