WO2020001165A1 - Voice control method and apparatus, and storage medium and electronic device - Google Patents

Voice control method and apparatus, and storage medium and electronic device Download PDF

Info

Publication number
WO2020001165A1
WO2020001165A1 PCT/CN2019/085720 CN2019085720W WO2020001165A1 WO 2020001165 A1 WO2020001165 A1 WO 2020001165A1 CN 2019085720 W CN2019085720 W CN 2019085720W WO 2020001165 A1 WO2020001165 A1 WO 2020001165A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
target
application
word
electronic device
Prior art date
Application number
PCT/CN2019/085720
Other languages
French (fr)
Chinese (zh)
Inventor
陈岩
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2020001165A1 publication Critical patent/WO2020001165A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/16Storage of analogue signals in digital stores using an arrangement comprising analogue/digital [A/D] converters, digital memories and digital/analogue [D/A] converters 
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the field of computer technology, and in particular, to a voice control method, device, storage medium, and electronic device.
  • voice assistants for mobile terminals have also become a commonly used function.
  • the user can use the voice assistant function of the mobile terminal to perform voice interaction with the machine assistant, so that the machine assistant can complete various operations on the mobile terminal under the user's voice control, and also include various operations on applications on the mobile terminal, such as Set up schedules, turn on alarms, set up to-do items, open apps, make calls, and more.
  • the startup process of the existing voice assistant mainly includes two phases, the wake-up phase and the preparation phase.
  • the terminal can monitor the user's voice in real time.
  • the system performs related preparations to start the voice assistant.
  • the waiting time is longer and the voice consistency is poor.
  • the embodiments of the present application provide a voice control method, a device, a storage medium, and an electronic device.
  • the voice instructions can be issued without waiting for the system to be ready, and the voice consistency is better.
  • An embodiment of the present application provides a voice control method applied to an electronic device, including:
  • the electronic device When the electronic device is in a voice monitoring state, using the monitored voice data to update a stored voice segment, where the voice segment is a piece of voice data that has been monitored within a recently preset time period;
  • the electronic device is controlled according to the recorded voice data and the target voice segment.
  • An embodiment of the present application further provides a voice control device, which is applied to an electronic device and includes:
  • An update module configured to update the stored voice segment by using the monitored voice data when the electronic device is in a voice monitoring state, where the voice segment is a piece of voice data that has been monitored within a recently preset duration;
  • An obtaining module configured to obtain a current voice segment during an update process of the voice segment
  • An extraction module for extracting voiceprint features and keywords from the current voice segment
  • a startup module configured to start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment;
  • a control module configured to control the electronic device according to the recorded voice data and the target voice segment.
  • An embodiment of the present application further provides a computer-readable storage medium, where a plurality of instructions are stored in the storage medium, and the instructions are adapted to be loaded by a processor to execute any one of the foregoing voice control methods.
  • An embodiment of the present application further provides an electronic device including a processor and a memory, the processor is electrically connected to the memory, the memory is used for storing instructions and data, and the processor is used for any one of the foregoing devices.
  • FIG. 1 is a schematic diagram of an application scenario of a voice control system according to an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a voice control method according to an embodiment of the present application.
  • FIG. 3 is another schematic flowchart of a voice control method according to an embodiment of the present application.
  • FIG. 4 is a schematic framework diagram of a voice control process according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a server parsing process provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a voice control device according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a control module according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • FIG. 9 is another schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the embodiments of the present application provide a voice control method, a device, a storage medium, and an electronic device.
  • FIG. 1 provides a schematic diagram of an application scenario of a voice control system.
  • the voice control system may include any voice control device provided in an embodiment of the present application.
  • the voice control device may be integrated into an electronic device.
  • the electronic device may include a touch-enabled device such as a smart phone or a tablet.
  • the electronic device can use the monitored voice data to update the stored voice segment, which is a piece of voice data that has been monitored within a recently preset time period; the update in the voice segment
  • the current voice segment is obtained, and voiceprint features and keywords are extracted from the current voice segment;
  • the voice recording function is started according to the voiceprint feature and keywords, and the voice segment at the moment of successful startup is obtained as the target voice segment;
  • the electronic device is controlled according to the recorded voice data and the target voice segment.
  • the preset time length may be artificially set to 3 seconds.
  • the preset time length is about the same as the system preparation time when the voice recording function is activated.
  • the electronic device can monitor the user's voice in real time, and save the last 3 seconds of voice data in real-time during the monitoring process for voiceprint analysis and keyword matching.
  • the mobile phone can start recording when user A sends "small x small x” and get the moment of successful startup Voice segment, such as "play xx shall”, stitch it with the subsequent recorded voice "using xxx songs in use” to get the continuous voice of "play xxx songs in xx applications", and perform the mobile phone according to the continuous voice Control accordingly.
  • a voice control method applied to an electronic device includes:
  • the electronic device When the electronic device is in a voice monitoring state, using the monitored voice data to update a stored voice segment, where the voice segment is a piece of voice data that has been monitored within a recently preset time period;
  • the electronic device is controlled according to the recorded voice data and the target voice segment.
  • the starting a voice recording function according to the voiceprint characteristics and keywords includes:
  • the controlling the electronic device according to the recorded voice data and a target voice segment includes:
  • the determining a control instruction according to the spliced voice includes:
  • the controlling the electronic device to execute the control instruction includes using the target application to execute the application event.
  • the determining a target application according to the target parsing word includes:
  • the determining a target application according to the target parsing word includes:
  • the target parsing word is an application name
  • the application corresponding to the application name is used as the target application
  • the target parsing word is an application type
  • all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined applications.
  • the determining a target application based on the determined application includes:
  • FIG. 2 is a schematic flowchart of a voice control method provided by an embodiment of the present application, which is applied to an electronic device.
  • the specific process may be as follows:
  • the stored voice segment is updated by using the monitored voice data, and the voice segment is a piece of voice data that has been monitored within a recently preset time period.
  • voice can be monitored through a device such as a microphone.
  • the preset time can be set manually, for example, 3 seconds, which is usually about the same as the system preparation time when the voice recording function is activated.
  • the electronic device can monitor the user's voice in a low power consumption state, and save the voice data detected in the most recent preset time period in real time.
  • voice monitoring and voice segment updating are performed in real time, and in this process, the electronic device may also perform real time analysis on the voice segment.
  • the voiceprint feature is mainly a frequency spectrum feature, which may include frequency, amplitude, and other information, and usually reflects the characteristics of the sound's loudness, tone, and timbre. Different people generally have different voiceprint features.
  • the speech segments are converted into spectral data by Fourier transform, and then the relevant information is extracted from the spectral data as corresponding voiceprint features.
  • the keyword may include at least one text, and the text may be English or Chinese.
  • step 104 may specifically include:
  • the preset feature is mainly a voiceprint feature bound to the user, which is usually set in advance.
  • the user may be required to record a voice in advance, and the voiceprint feature is extracted from the voice as a preset. Feature to bind with this user.
  • the preset keyword is mainly used to trigger the activation of the voice interaction function, which can be set by the system by default (that is, set by the manufacturer when the electronic device leaves the factory), or can be set by the user according to his own preferences, such as Enter the corresponding setting window through different interfaces of the related setting interface for setting.
  • the electronic device needs to perform a series of preparations, such as interrupting the operation of the foreground application, setting the call parameter settings of the voice recording component, etc. During this preparation process, the electronic device remains In the state of voice monitoring, the voice segment is continuously updated at this time, and after the voice recording function is successfully started, the voice segment can be stopped from updating.
  • step 105 may specifically include:
  • the target voice segment and the recorded voice data are spliced to obtain a spliced voice.
  • the voice duration of the voice segment is similar to the system preparation duration, that is, the voice segment saved at the time when the system is ready (that is, when the voice recording function is successfully started), it can be exactly the key when the user speaks a preset
  • the continuous utterance after the word can be directly used as the starting content of the recorded speech and spliced with the subsequent recorded speech data to form a continuous speech.
  • steps 1-2 may specifically include:
  • Control instructions are generated based on the target application and application event.
  • the server is mainly used for semantic analysis.
  • the electronic device can transmit the spliced voice to the server in real time, and the server can be analyzed by a trained voice analysis model.
  • the semantic analysis model can It is a deep learning model.
  • the server can collect different voice samples in advance to train the deep learning model.
  • step "determining a target application based on the returned target parsing word” may specifically include:
  • the parsed word set and the preset word set are mainly application related words, such as an application name or an application type.
  • the parsed word set is mainly parsed out when the electronic device requests a server to perform semantic parsing in a historical period.
  • the preset word set is mainly set by the system by default, for example, each time an application is installed on the terminal, an application related word of the application can be obtained.
  • the target parsing word is an application name
  • the application corresponding to the application name may be directly used as the target application.
  • the parsing word is an application type
  • all applications in the electronic device that belong to the same application type may be determined first. , You can then use the most frequently used application as the target application, or provide users with a selection interface based on these applications, and determine the target application through the user's selection operation.
  • the preset word set may include all words set by the system by default, the parsing word set may be empty, and each time after the electronic device obtains a new parsing word, the new parsing word may be The parsed words in are stored in the parsed word set and deleted from the preset word set at the same time, so as to continuously update the parsed word set and the preset word set.
  • steps 1-3 may specifically include:
  • the target application may be started first, and then the target application is used to execute a corresponding application event.
  • the voice control method provided in this embodiment is applied to an electronic device.
  • the electronic device When the electronic device is in a voice monitoring state, the stored voice segment is updated by using the monitored voice data, and the voice segment is a recent preset A piece of voice data detected within the duration, then, during the update of the voice segment, the current voice segment is obtained, and voiceprint features and keywords are extracted from the current voice segment.
  • the voiceprint feature and key To activate the voice recording function, and obtain the voice segment at the moment of successful startup as the target voice segment, and then control the electronic device accordingly according to the recorded voice data and the target voice segment, so that it can directly wake up and input interactive instructions through coherent voice, without the need to Voice interruption due to the long system preparation time, the method is simple, can effectively improve the efficiency of voice interaction, and the voice interaction effect is good.
  • the electronic device uses the monitored voice data to update a stored voice segment, which is a segment of voice data that has been monitored within a recently preset time period.
  • the preset duration may be artificially set to 3 seconds. As long as the electronic device is on, voice monitoring can be performed all the time, and for the monitored voice data, the electronic device can only save the voice data within the last 3 seconds.
  • the electronic device obtains the current voice segment, and extracts voiceprint features and keywords from the current voice segment.
  • the electronic device when the user says “small x small x, play xxx song in xx application” to the electronic device, since the voice monitoring operation and voice segment update operation are performed in real time, the user says The first 3 seconds of speech, such as "small x small x", can be stored as the initial speech segment, and then the electronic device will perform voiceprint feature and keyword extraction operations on the stored speech segment, such as using Fourier Leaf change converts it into spectrum data, and extracts relevant information from the spectrum data as corresponding voiceprint features. At the same time, it extracts the content of the speech segment to obtain keywords.
  • voiceprint feature and keyword extraction operations such as using Fourier Leaf change converts it into spectrum data, and extracts relevant information from the spectrum data as corresponding voiceprint features.
  • the electronic device determines whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword. If yes, execute step 204 below; if not, return to execute step 202 above.
  • the preset feature may be a voiceprint feature input by the user in advance
  • the preset keyword may be a phrase set by the system by default, which may include at least two words or words.
  • the preset keywords should be short, such as "small x".
  • the electronic device starts a voice recording function, and obtains a voice segment at a successful start time as a target voice segment.
  • the electronic device splices the target voice segment and the recorded voice data to obtain a spliced voice, and then sends the spliced voice to a server, so that the server semantically parses the spliced voice and returns a corresponding target parsed word.
  • the voice recording function is successfully turned on, since subsequent users' words are saved by recording, there is no need to repeatedly save the voice segment update method. At this time, you can stop the update operation of the voice segment, and The voice segment "Play xx application" is used as the starting content of the recorded voice and the subsequent recorded voice data is spliced into a continuous voice. For example, in the first 2 seconds of recording, the spliced voice can be "play xx application”. At the same time, the spliced speech is transmitted to the server in real time for semantic analysis.
  • the electronic device determines whether the target parsing word matches the stored parsing word set. If yes, determines the target application based on the successfully matched parsing word, and determines an application event based on the target parsing word. If not, performs the following steps. 207.
  • the electronic device determines whether the target parsing word matches a preset word set, and if so, determines a target application based on the successfully matched preset word, determines an application event based on the target parsing word, and then sets the successfully matched preset word. It is added to the parsed word set, and the successfully matched preset word is deleted from the preset word set. If not, the user is prompted to re-enter the voice.
  • the electronic device can first match it with the previous parsing record. Only when the matching is unsuccessful, the matching is performed in the preset word set.
  • the target analytic word is an application name
  • the application corresponding to the application name can be directly used as the target application.
  • the target analytic word parsed from the voice of “playing xxx song in xx application” can be Is "xx application”
  • the application event can be: play xxx songs.
  • the target analytic word is an application type
  • all applications belonging to the same application type in the electronic device may be determined first, and then the application with the highest frequency of use may be used as the target application, for example, from the voice of "playing xxx songs"
  • the parsed target analytic word can be "music player application”, and the application event can be: playing xxx songs.
  • C1, C2, and C3 applications belonging to the music player application can be found in the electronic device. If the C1 application is used the most frequently, The target application is a C1 application.
  • the electronic device executes the application event using the target application.
  • the voice control method provided in this embodiment is applied to an electronic device.
  • the electronic device can use the monitored voice data to update the stored voice segment, and the voice segment is the latest preset duration.
  • a piece of voice data is monitored within the voice segment, and in the process of updating the voice segment, the current voice segment is obtained, and voiceprint features and keywords are extracted from the current voice segment.
  • the voiceprint feature is consistent with a preset Feature matching, and whether the keyword matches a preset keyword, if so, start the voice recording function, and obtain the voice segment at the moment of successful startup as the target voice segment, and then stitch the target voice segment and the recorded voice data , Get the spliced voice, so that you can directly wake up and input interactive instructions through coherent voice, without having to interrupt the voice due to the system preparation time, the method is simple.
  • the spliced speech is sent to the server, so that the server semantically parses the spliced speech, and returns the corresponding target parsing word.
  • the target application is determined according to the successfully parsed word, and the application event is determined according to the target parsed word. If not, it is determined whether the target parsed word matches the preset word set. If so, the target application is determined based on the successfully matched preset word. , And determine an application event according to the target parsed word, then add the successfully matched preset word to the parsed word set, at the same time delete the successfully matched preset word from the preset word set, and finally use the target application to execute the Applying events can improve the efficiency of parsing word matching based on the user's past voice interaction habits, effectively improve the efficiency of voice interaction, and improve the user experience effect.
  • the voice control device may be specifically implemented as an independent entity or integrated in an electronic device, such as a terminal.
  • the terminal may include a mobile phone, a tablet computer, and the like.
  • a voice control device applied to electronic equipment includes:
  • An update module configured to update the stored voice segment by using the monitored voice data when the electronic device is in a voice monitoring state, where the voice segment is a piece of voice data that has been monitored within a recently preset duration;
  • An obtaining module configured to obtain a current voice segment during an update process of the voice segment
  • An extraction module for extracting voiceprint features and keywords from the current voice segment
  • a startup module configured to start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment;
  • a control module configured to control the electronic device according to the recorded voice data and the target voice segment.
  • the startup module is specifically configured to:
  • control module specifically includes:
  • a splicing unit configured to splice the target voice segment and the recorded voice data to obtain a spliced voice
  • a determining unit configured to determine a control instruction according to the spliced voice
  • An execution unit is configured to control the electronic device to execute the control instruction.
  • the determining unit is specifically configured to:
  • the execution unit is configured to execute the application event by using the target application.
  • the determining unit is specifically configured to:
  • the determining unit is specifically configured to:
  • the target parsing word is an application name
  • the application corresponding to the application name is used as the target application
  • the target parsing word is an application type
  • all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined applications.
  • the determining unit is specifically configured to:
  • FIG. 6 specifically describes a voice control device provided in an embodiment of the present application, which is applied to electronic equipment.
  • the voice control device may include: an update module 10, an acquisition module 20, an extraction module 30, a startup module 40, and a control module. 50 of which:
  • the update module 10 is configured to update the stored voice segment by using the monitored voice data when the electronic device is in a voice monitoring state, where the voice segment is a piece of voice data that has been monitored within a recently preset time period.
  • voice can be monitored through a device such as a microphone.
  • the preset time can be set manually, for example, 3 seconds, which is usually about the same as the system preparation time when the voice recording function is activated.
  • the electronic device can monitor the user's voice in a low power consumption state, and the update module 10 saves the voice data detected in the most recent preset time period in real time.
  • the obtaining module 20 is configured to obtain a current voice segment during an update process of the voice segment.
  • voice monitoring and voice segment updating are performed in real time, and in this process, the electronic device may also perform real time analysis on the voice segment.
  • the extraction module 30 is configured to extract voiceprint features and keywords from a current voice segment.
  • the voiceprint feature is mainly a frequency spectrum feature, which may include frequency, amplitude, and other information, and usually reflects the characteristics of the sound's loudness, tone, and timbre. Different people generally have different voiceprint features.
  • the speech segments are converted into spectral data by Fourier transform, and then the relevant information is extracted from the spectral data as corresponding voiceprint features.
  • the keyword may include at least one text, and the text may be English or Chinese.
  • the starting module 40 is configured to start a voice recording function according to the voiceprint feature and keywords, and obtain a voice segment at a successful startup time as a target voice segment.
  • the startup module 40 may be specifically used for:
  • the acquisition module is triggered to perform the operation of acquiring the current voice segment.
  • the preset feature is mainly a voiceprint feature bound to the user, which is usually set in advance.
  • the user may be required to record a voice in advance, and the voiceprint feature is extracted from the voice as a preset. Feature to bind with this user.
  • the preset keyword is mainly used to trigger the activation of the voice interaction function, which can be set by the system by default (that is, set by the manufacturer when the electronic device leaves the factory), or can be set by the user according to his own preferences, such as Enter the corresponding setting window through different interfaces of the related setting interface for setting.
  • the electronic device needs to perform a series of preparations, such as interrupting the operation of the foreground application, setting the call parameter settings of the voice recording component, etc.
  • a series of preparations such as interrupting the operation of the foreground application, setting the call parameter settings of the voice recording component, etc.
  • the voice segment is continuously updated at this time, and after the voice recording function is successfully started, the voice segment can be stopped from updating.
  • the control module 50 is configured to control the electronic device according to the recorded voice data and the target voice segment.
  • control module 50 may specifically include:
  • the splicing unit 51 is configured to splice the target voice segment and the recorded voice data to obtain a spliced voice.
  • the voice duration of the voice segment is similar to the system preparation duration, that is, the voice segment saved at the time when the system is ready (that is, when the voice recording function is successfully started), it can be exactly the key when the user speaks a preset
  • the continuous utterance after the word can be directly used as the starting content of the recorded speech and spliced with the subsequent recorded speech data to form a continuous speech.
  • a determining unit 52 is configured to determine a control instruction according to the stitching voice.
  • the determining unit 52 may be specifically configured to:
  • the server is mainly used for semantic analysis.
  • the electronic device can transmit the spliced voice to the server in real time, and the server can be analyzed by a trained voice analysis model.
  • the semantic analysis model can It is a deep learning model.
  • the server can collect different voice samples in advance to train the deep learning model.
  • determining unit 52 may be specifically configured to:
  • Control instructions are generated based on the target application and application event.
  • the parsed word set and the preset word set are mainly application related words, such as an application name or an application type.
  • the parsed word set is mainly parsed out when the electronic device requests a server to perform semantic parsing in a historical period.
  • the preset word set is mainly set by the system by default, for example, each time an application is installed on the terminal, an application related word of the application can be obtained.
  • the target parsing word is an application name
  • the application corresponding to the application name may be directly used as the target application.
  • the parsing word is an application type
  • all applications in the electronic device that belong to the same application type may be determined first. , You can then use the most frequently used application as the target application, or provide users with a selection interface based on these applications, and determine the target application through the user's selection operation.
  • the preset word set may include all words set by the system by default, the parsing word set may be empty, and each time after the electronic device obtains a new parsing word, the new parsing word may be The parsed words in are stored in the parsed word set and deleted from the preset word set at the same time, so as to continuously update the parsed word set and the preset word set.
  • the execution unit 53 is configured to control the electronic device to execute the control instruction.
  • execution unit 53 may be specifically configured to:
  • the execution unit 53 may start the target application first, and then use the target application to execute a corresponding application event.
  • the above units may be implemented as independent entities, or may be arbitrarily combined, and implemented as the same or several entities.
  • the above units refer to the foregoing method embodiments, and details are not described herein again.
  • the voice control apparatus is applied to an electronic device.
  • the update module 10 updates the stored voice segment by using the monitored voice data, and the voice segment is the most recent A piece of voice data detected within a preset duration.
  • the acquisition module 20 acquires the current voice segment during the update of the voice segment, and the extraction module 30 extracts voiceprint features and keywords from the current voice segment.
  • the starting module 40 starts the voice recording function according to the voiceprint characteristics and keywords, and obtains the voice segment at the moment of successful startup as the target voice segment, and then the control module 50 controls the electronic device accordingly according to the recorded voice data and the target voice segment. Therefore, it is possible to directly wake up and input interactive instructions through continuous voice, without interrupting the voice due to the system preparation time.
  • the method is simple, can effectively improve the efficiency of voice interaction, and has a good voice interaction effect.
  • the embodiment of the present application further provides an electronic device, and the electronic device may be a device such as a smart phone or a tablet computer.
  • the electronic device 400 includes a processor 401 and a memory 402.
  • the processor 401 is electrically connected to the memory 402.
  • the processor 401 is the control center of the electronic device 400. It uses various interfaces and lines to connect various parts of the entire electronic device.
  • the processor 401 runs or loads applications stored in the memory 402, and calls the data stored in the memory 402 to execute the electronics. Various functions of the device and processing data, so as to monitor the overall electronic equipment.
  • the processor 401 in the electronic device 400 will load instructions corresponding to one or more application processes into the memory 402 according to the following steps, and the processor 401 will run and store the instructions in the memory 402 Applications in order to implement various functions:
  • the stored voice segment is updated by using the monitored voice data, and the voice segment is a piece of voice data that has been monitored within a recently preset time period;
  • the electronic device is controlled according to the recorded voice data and the target voice segment.
  • the starting a voice recording function according to the voiceprint characteristics and keywords includes:
  • the controlling the electronic device according to the recorded voice data and a target voice segment includes:
  • the determining a control instruction according to the spliced voice includes:
  • the controlling the electronic device to execute the control instruction includes using the target application to execute the application event.
  • the determining a target application according to the target parsing word includes:
  • the determining a target application according to the target parsing word includes:
  • the target parsing word is an application name
  • the application corresponding to the application name is used as the target application
  • the target parsing word is an application type
  • all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined application.
  • the determining a target application based on the determined application includes:
  • the electronic device 500 may include a radio frequency (RF) circuit 501, a memory 502 including one or more computer-readable storage media, an input unit 503, a display unit 504, a sensor 504, an audio circuit 506, and a wireless fidelity (
  • RF radio frequency
  • a WiFi (Wireless Fidelity) module 507 includes a processor 508 having one or more processing cores, a power supply 509, and other components.
  • the radio frequency circuit 501 can be used to send and receive information, or to receive and send signals during a call. In particular, after receiving the downlink information of the base station, it is processed by one or more processors 508. In addition, the uplink-related data is sent to the base station. .
  • the radio frequency circuit 501 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a subscriber identity module (SIM, Subscriber Identity Module) card, a transceiver, a coupler, and a low noise amplifier (LNA, Low Noise Amplifier), duplexer, etc.
  • the radio frequency circuit 501 can also communicate with a network and other devices through wireless communication.
  • This wireless communication can use any communication standard or protocol, including but not limited to Global System for Mobile Communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA, Code Division Multiple Access), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), Email, Short Messaging Service (SMS), etc.
  • GSM Global System for Mobile Communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • Email Short Messaging Service
  • the memory 502 may be used to store application programs and data.
  • the application program stored in the memory 502 contains executable code.
  • Applications can be composed of various functional modules.
  • the processor 508 executes various functional applications and data processing by running an application program stored in the memory 502.
  • the memory 502 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, at least one application required by a function (such as a sound playback function, an image playback function, etc.), etc .;
  • the data (such as audio data, phone book, etc.) created by the use of electronic devices.
  • the memory 502 may include a high-speed random access memory, and may further include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices. Accordingly, the memory 502 may further include a memory controller to provide the processor 508 and the input unit 503 to access the memory 502.
  • a non-volatile memory such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the memory 502 may further include a memory controller to provide the processor 508 and the input unit 503 to access the memory 502.
  • the input unit 503 can be used to receive inputted numbers, character information, or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.
  • the input unit 503 may include a touch-sensitive surface and other input devices.
  • a touch-sensitive surface also known as a touch display or touchpad, collects user touch operations on or near it (such as the user using a finger, stylus or any suitable object or accessory on the touch-sensitive surface or touch-sensitive Operation near the surface), and drive the corresponding connection device according to a preset program.
  • the touch-sensitive surface may include two parts, a touch detection device and a touch controller.
  • the touch detection device detects the user's touch position, and detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts it into contact coordinates, and sends it To the processor 508, and can receive the command sent by the processor 508 and execute it.
  • the display unit 504 may be used to display information input by the user or information provided to the user and various graphical user interfaces of the electronic device. These graphical user interfaces may be composed of graphics, text, icons, videos, and any combination thereof.
  • the display unit 504 may include a display panel.
  • the display panel may be configured using a liquid crystal display (LCD, Liquid Crystal Display), an organic light emitting diode (OLED, Organic Light-Emitting Diode), or the like.
  • the touch-sensitive surface may cover the display panel.
  • the touch-sensitive surface When the touch-sensitive surface detects a touch operation on or near the touch-sensitive surface, the touch-sensitive surface is transmitted to the processor 508 to determine the type of the touch event, and then the processor 508 displays the touch event according to the type of the touch event.
  • the corresponding visual output is provided on the panel.
  • the touch-sensitive surface and the display panel are implemented as two separate components to implement input and input functions, in some embodiments, the touch-sensitive surface and the display panel may be integrated to implement input and output functions.
  • the electronic device may further include at least one sensor 505, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel according to the brightness of the ambient light, and the proximity sensor may close the display panel and / or when the electronic device is moved to the ear.
  • the gravity acceleration sensor can detect the magnitude of acceleration in various directions (generally three axes). It can detect the magnitude and direction of gravity when it is stationary.
  • attitude of the mobile phone such as horizontal and vertical screen switching, related Games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tap), etc .; as for electronic devices, other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. can also be configured, not here More details.
  • the audio circuit 506 may provide an audio interface between the user and the electronic device through a speaker or a microphone.
  • the audio circuit 506 can convert the received audio data into an electrical signal and transmit it to a speaker.
  • the speaker converts the audio signal into a sound signal and outputs it.
  • the microphone converts the collected sound signal into an electrical signal, which is converted by the audio circuit 506 into
  • the audio circuit 506 may further include an earphone jack to provide communication between the peripheral headset and the electronic device.
  • Wireless fidelity is a short-range wireless transmission technology. Electronic devices can help users send and receive email, browse web pages, and access streaming media through the wireless fidelity module 507, which provides users with wireless broadband Internet access.
  • FIG. 9 shows the wireless fidelity module 507, it can be understood that it does not belong to the necessary structure of the electronic device, and can be omitted as needed without changing the essence of the invention.
  • the processor 508 is a control center of an electronic device, and uses various interfaces and lines to connect various parts of the entire electronic device.
  • the electronic device is executed by running or executing an application program stored in the memory 502 and calling data stored in the memory 502.
  • the processor 508 may include one or more processing cores; preferably, the processor 508 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, and an application program, etc.
  • the modem processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 508.
  • the electronic device also includes a power source 509 (such as a battery) that powers various components.
  • a power source 509 such as a battery
  • the power supply can be logically connected to the processor 508 through a power management system, so that functions such as managing charging, discharging, and power consumption management can be implemented through the power management system.
  • the power supply 509 may also include one or more DC or AC power supplies, a recharging system, a power failure detection circuit, a power converter or inverter, a power supply status indicator, and any other components.
  • the electronic device may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
  • the above modules may be implemented as independent entities, or may be arbitrarily combined, and implemented as the same or several entities.
  • the above modules refer to the foregoing method embodiments, and details are not described herein again.
  • an embodiment of the present invention provides a storage medium in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute steps in any one of the voice control methods provided by the embodiments of the present invention.
  • the storage medium may include a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or an optical disk.

Abstract

A voice control method, comprising: when an electronic device is in a voice monitoring state, updating a stored voice segment by using detected voice data, the voice segment being a segment of voice data detected within a latest preset time period (101); obtaining a current voice segment in the voice segment updating process (102); extracting a voiceprint feature and a keyword from the current voice segment (103); starting a voice recording function according to the voiceprint feature and the keyword, and obtaining a voice segment at the time of successful start as a target voice segment (104); and performing corresponding control on the electronic device according to recorded voice data and the target voice segment (105).

Description

语音控制方法、装置、存储介质及电子设备Voice control method, device, storage medium and electronic equipment
本申请要求于2018年6月27日提交中国专利局、申请号为201810681095.6、发明名称为“语音控制方法、装置、存储介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed on June 27, 2018 with the Chinese Patent Office, application number 201810681095.6, and the invention name is "Voice Control Method, Device, Storage Medium, and Electronic Equipment", the entire contents of which are incorporated by reference. In this application.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种语音控制方法、装置、存储介质及电子设备。The present application relates to the field of computer technology, and in particular, to a voice control method, device, storage medium, and electronic device.
背景技术Background technique
随着移动终端的广泛应用,移动终端的语音助手也成为人们常用的功能。用户可以使用移动终端的语音助手功能与机器助手进行语音交互,使机器助手可以在用户的语音控制下完成对移动终端的各种操作,也包括对移动终端上的应用程序的各种操作,例如设置日程、开启闹钟、设置代办事项、打开应用、拨打电话等等。With the widespread application of mobile terminals, voice assistants for mobile terminals have also become a commonly used function. The user can use the voice assistant function of the mobile terminal to perform voice interaction with the machine assistant, so that the machine assistant can complete various operations on the mobile terminal under the user's voice control, and also include various operations on applications on the mobile terminal, such as Set up schedules, turn on alarms, set up to-do items, open apps, make calls, and more.
现有语音助手的启动过程主要包括两个阶段,唤醒阶段和准备阶段,比如终端可以实时监测用户的语音,当监测到用户说出了唤醒词时,系统进行相关准备工作以启动语音助手,目前,用户在说出唤醒词之后,需要一直等到系统准备好才能发出语音指令,等待时长较长,语音连贯性较差。The startup process of the existing voice assistant mainly includes two phases, the wake-up phase and the preparation phase. For example, the terminal can monitor the user's voice in real time. When it is detected that the user speaks the wake-up word, the system performs related preparations to start the voice assistant. After the user has spoken the wake-up word, they need to wait until the system is ready to issue a voice command. The waiting time is longer and the voice consistency is poor.
发明内容Summary of the invention
本申请实施例提供一种语音控制方法、装置、存储介质及电子设备,无需等待系统准备好即可发出语音指令,语音连贯性较好。The embodiments of the present application provide a voice control method, a device, a storage medium, and an electronic device. The voice instructions can be issued without waiting for the system to be ready, and the voice consistency is better.
本申请实施例提供了一种语音控制方法,应用于电子设备,包括:An embodiment of the present application provides a voice control method applied to an electronic device, including:
当所述电子设备处于语音监测状态时,利用监测的语音数据对已存储的语音段进行更新,所述语音段为最近预设时长内监测到的一段语音数据;When the electronic device is in a voice monitoring state, using the monitored voice data to update a stored voice segment, where the voice segment is a piece of voice data that has been monitored within a recently preset time period;
在所述语音段的更新过程中,获取当前语音段;Obtaining the current voice segment during the update of the voice segment;
从当前语音段中提取出声纹特征、以及关键词;Extracting voiceprint features and keywords from the current voice segment;
根据所述声纹特征和关键词启动语音录制功能,并获取启动成功时刻的语音段作为目标语音段;Start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment;
根据录制的语音数据和目标语音段对所述电子设备进行相应控制。The electronic device is controlled according to the recorded voice data and the target voice segment.
本申请实施例还提供了一种语音控制装置,应用于电子设备,包括:An embodiment of the present application further provides a voice control device, which is applied to an electronic device and includes:
更新模块,用于当所述电子设备处于语音监测状态时,利用监测的语音数据对已存储的语音段进行更新,所述语音段为最近预设时长内监测到的一段语音数据;An update module, configured to update the stored voice segment by using the monitored voice data when the electronic device is in a voice monitoring state, where the voice segment is a piece of voice data that has been monitored within a recently preset duration;
获取模块,用于在所述语音段的更新过程中,获取当前语音段;An obtaining module, configured to obtain a current voice segment during an update process of the voice segment;
提取模块,用于从当前语音段中提取出声纹特征、以及关键词;An extraction module for extracting voiceprint features and keywords from the current voice segment;
启动模块,用于根据所述声纹特征和关键词启动语音录制功能,并获取启动成功时刻的语音段作为目标语音段;A startup module, configured to start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment;
控制模块,用于根据录制的语音数据和目标语音段对所述电子设备进行相应控制。A control module, configured to control the electronic device according to the recorded voice data and the target voice segment.
本申请实施例还提供了一种计算机可读存储介质,所述存储介质中存储有多条指令,所述指令适于由处理器加载以执行上述任一项语音控制方法。An embodiment of the present application further provides a computer-readable storage medium, where a plurality of instructions are stored in the storage medium, and the instructions are adapted to be loaded by a processor to execute any one of the foregoing voice control methods.
本申请实施例还提供了一种电子设备,包括处理器和存储器,所述处理器与所述存储器电性连接,所述存储器用于存储指令和数据,所述处理器用于上述任一项所述的语音控制方法中的步骤。An embodiment of the present application further provides an electronic device including a processor and a memory, the processor is electrically connected to the memory, the memory is used for storing instructions and data, and the processor is used for any one of the foregoing devices. The steps in the voice control method described above.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
下面结合附图,通过对本申请的具体实施方式详细描述,将使本申请的技术方案及其它有益效果显而易见。The following detailed description of specific embodiments of the present application will make the technical solutions and other beneficial effects of the present application obvious in conjunction with the accompanying drawings.
图1为本申请实施例提供的语音控制系统的应用场景示意图。FIG. 1 is a schematic diagram of an application scenario of a voice control system according to an embodiment of the present application.
图2为本申请实施例提供的语音控制方法的流程示意图。FIG. 2 is a schematic flowchart of a voice control method according to an embodiment of the present application.
图3为本申请实施例提供的语音控制方法的另一流程示意图。FIG. 3 is another schematic flowchart of a voice control method according to an embodiment of the present application.
图4为本申请实施例提供的语音控制过程的框架示意图。FIG. 4 is a schematic framework diagram of a voice control process according to an embodiment of the present application.
图5为本申请实施例提供的服务器解析过程示意图。FIG. 5 is a schematic diagram of a server parsing process provided by an embodiment of the present application.
图6为本申请实施例提供的语音控制装置的结构示意图。FIG. 6 is a schematic structural diagram of a voice control device according to an embodiment of the present application.
图7为本申请实施例提供的控制模块的结构示意图。FIG. 7 is a schematic structural diagram of a control module according to an embodiment of the present application.
图8为本申请实施例提供的电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
图9为本申请实施例提供的电子设备的另一结构示意图。FIG. 9 is another schematic structural diagram of an electronic device according to an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without creative work fall into the protection scope of the present application.
本申请实施例提供一种语音控制方法、装置、存储介质及电子设备。The embodiments of the present application provide a voice control method, a device, a storage medium, and an electronic device.
请参阅图1,图1提供了一种语音控制系统的应用场景示意图,该语音控制系统可以包括本申请实施例提供的任一种语音控制装置,该语音控制装置可以集成在电子设备中,该电子设备可以包括智能手机、平板电脑等具有触摸功能的设备。Please refer to FIG. 1. FIG. 1 provides a schematic diagram of an application scenario of a voice control system. The voice control system may include any voice control device provided in an embodiment of the present application. The voice control device may be integrated into an electronic device. The electronic device may include a touch-enabled device such as a smart phone or a tablet.
其中当该电子设备处于语音监测状态时,电子设备可以利用监测的语音数据对已存储的语音段进行更新,该语音段为最近预设时长内监测到的一段语音数据;在该语音段的更新过程中,获取当前语音段,并从当前语音段中提取出声纹特征、以及关键词;根据该声纹特征和关键词启动语音录制功能,并获取启动成功时刻的语音段作为目标语音段;根据录制的语音数据和目标语音段对该电子设备进行相应控制。When the electronic device is in a voice monitoring state, the electronic device can use the monitored voice data to update the stored voice segment, which is a piece of voice data that has been monitored within a recently preset time period; the update in the voice segment In the process, the current voice segment is obtained, and voiceprint features and keywords are extracted from the current voice segment; the voice recording function is started according to the voiceprint feature and keywords, and the voice segment at the moment of successful startup is obtained as the target voice segment; The electronic device is controlled according to the recorded voice data and the target voice segment.
譬如,该预设时长可以是人为设定的3秒,通常该预设时长和启动语音录制功能的系统准备时长差不多。在图1中,电子设备可以实时监测用户语音,并在监测过程中实时保存最近3秒的语音数据用于声纹分析和关键词匹配,一旦用户声纹和关键词满足条件,比如对于A用户的手机,若A用户说出了一句话:“小x小x,播放xx应用中的xxx歌曲”,则手机可以在A用户发出“小x小x”时开始启动录音,并获取启动成功时刻的语音段,比如“播放xx应”,将其与后续录制的语音“用中的xxx歌曲”进行拼接,得到“播放xx应用中的xxx歌曲”的连续语音,并根据该连续语音对手机进行相应控制。For example, the preset time length may be artificially set to 3 seconds. Usually, the preset time length is about the same as the system preparation time when the voice recording function is activated. In Figure 1, the electronic device can monitor the user's voice in real time, and save the last 3 seconds of voice data in real-time during the monitoring process for voiceprint analysis and keyword matching. Once the user's voiceprint and keywords meet the conditions, such as for A user Mobile phone, if user A says "small x small x, play xxx song in xx application", then the mobile phone can start recording when user A sends "small x small x" and get the moment of successful startup Voice segment, such as "play xx shall", stitch it with the subsequent recorded voice "using xxx songs in use" to get the continuous voice of "play xxx songs in xx applications", and perform the mobile phone according to the continuous voice Control accordingly.
一种语音控制方法,应用于电子设备,包括:A voice control method applied to an electronic device includes:
当所述电子设备处于语音监测状态时,利用监测的语音数据对已存储的语音段进行更新,所述语音段为最近预设时长内监测到的一段语音数据;When the electronic device is in a voice monitoring state, using the monitored voice data to update a stored voice segment, where the voice segment is a piece of voice data that has been monitored within a recently preset time period;
在所述语音段的更新过程中,获取当前语音段;Obtaining the current voice segment during the update of the voice segment;
从当前语音段中提取出声纹特征、以及关键词;Extracting voiceprint features and keywords from the current voice segment;
根据所述声纹特征和关键词启动语音录制功能,并获取启动成功时刻的语音段作为目标语音段;Start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment;
根据录制的语音数据和目标语音段对所述电子设备进行相应控制。The electronic device is controlled according to the recorded voice data and the target voice segment.
在一些实施例中,所述根据所述声纹特征和关键词启动语音录制功能,包括:In some embodiments, the starting a voice recording function according to the voiceprint characteristics and keywords includes:
判断所述声纹特征是否与预设特征匹配,且所述关键词是否与预设关键词匹配;Determining whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword;
若是,则启动语音录制功能;If yes, start the voice recording function;
若否,则返回执行所述获取当前语音段的操作。If not, return to performing the operation of acquiring the current voice segment.
在一些实施例中,所述根据录制的语音数据和目标语音段对所述电子设备进行相应控制,包括:In some embodiments, the controlling the electronic device according to the recorded voice data and a target voice segment includes:
对所述目标语音段和录制的语音数据进行拼接,得到拼接语音;Stitching the target voice segment and the recorded voice data to obtain a stitched voice;
根据所述拼接语音确定控制指令;Determining a control instruction according to the spliced voice;
控制所述电子设备执行所述控制指令。Controlling the electronic device to execute the control instruction.
在一些实施例中,所述根据所述拼接语音确定控制指令,包括:In some embodiments, the determining a control instruction according to the spliced voice includes:
将所述拼接语音发送至服务器,以使所述服务器对所述拼接语音进行语义解析,并返回相应的目标解析词;Sending the spliced speech to a server, so that the server semantically parses the spliced speech and returns a corresponding target parsed word;
根据返回的所述目标解析词确定目标应用、以及应用事件;Determining a target application and an application event according to the returned target parsing word;
根据所述目标应用和应用事件生成控制指令;Generating a control instruction according to the target application and an application event;
所述控制所述电子设备执行所述控制指令,包括:利用所述目标应用执行所述应用事件。The controlling the electronic device to execute the control instruction includes using the target application to execute the application event.
在一些实施例中,所述根据所述目标解析词确定目标应用,包括:In some embodiments, the determining a target application according to the target parsing word includes:
判断所述目标解析词与已存储的解析词集是否匹配;Determining whether the target parsing word matches a stored parsing word set;
若是,则根据匹配成功的解析词确定目标应用;If so, determine the target application based on the successfully parsed words;
若否,则判断所述目标解析词与预设词集是否匹配;当匹配成功时,根据匹配成功的预设词确定目标应用,并将所述匹配成功的预设词添加在所述解析词集中,同时将所述匹配成功的预设词从预设词集中删除。If not, determine whether the target parsing word matches the preset word set; when the match is successful, determine the target application based on the successfully matched preset word, and add the successfully matched preset word to the parsed word And simultaneously delete the successfully matched preset word from the preset word set.
在一些实施例中,所述根据所述目标解析词确定目标应用,包括:In some embodiments, the determining a target application according to the target parsing word includes:
当所述目标解析词为应用名称时,将所述应用名称对应的应用作为目标应用;When the target parsing word is an application name, the application corresponding to the application name is used as the target application;
当所述目标解析词为应用类型时,确定所述电子设备中属于同一所述应用类型的所有应用;根据确定的所述应用确定目标应用。When the target parsing word is an application type, all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined applications.
在一些实施例中,所述根据确定的所述应用确定目标应用,包括:In some embodiments, the determining a target application based on the determined application includes:
将确定的所述应用中使用频率最高的应用作为目标应用,或者,Using the determined application with the highest frequency as the target application, or
根据确定的所述应用生成选择界面,并向用户提供所述选择界面;根据所述用户在所述选择界面上的选择操作确定目标应用。Generate a selection interface according to the determined application, and provide the selection interface to a user; determine a target application according to a selection operation of the user on the selection interface.
如图2所示,图2是本申请实施例提供的语音控制方法的流程示意图,其应用于电子设备,具体流程可以如下:As shown in FIG. 2, FIG. 2 is a schematic flowchart of a voice control method provided by an embodiment of the present application, which is applied to an electronic device. The specific process may be as follows:
101、当该电子设备处于语音监测状态时,利用监测的语音数据对已存储的语音段进行更新,该语音段为最近预设时长内监测到的一段语音数据。101. When the electronic device is in a voice monitoring state, the stored voice segment is updated by using the monitored voice data, and the voice segment is a piece of voice data that has been monitored within a recently preset time period.
本实施例中,可以通过麦克风等设备对语音进行监测。该预设时长可以人为设定,比如3秒,其通常和启动语音录制功能的系统准备时长差不多。具体的,电子设备可以在低功耗的状态下监测用户语音,并实时将最近预设时长内监测到的语音数据进行保存。In this embodiment, voice can be monitored through a device such as a microphone. The preset time can be set manually, for example, 3 seconds, which is usually about the same as the system preparation time when the voice recording function is activated. Specifically, the electronic device can monitor the user's voice in a low power consumption state, and save the voice data detected in the most recent preset time period in real time.
102、在该语音段的更新过程中,获取当前语音段。102. In the updating process of the voice segment, obtain a current voice segment.
本实施例中,语音监测和语音段更新是实时进行的,而在这个过程中,电子设备也可以对语音段进行实时分析。In this embodiment, voice monitoring and voice segment updating are performed in real time, and in this process, the electronic device may also perform real time analysis on the voice segment.
103、从当前语音段中提取出声纹特征、以及关键词。103. Extract voiceprint features and keywords from the current voice segment.
本实施例中,该声纹特征主要是频谱特征,其可以包括频率、幅值等信息,通常反映声音的响度、音调和音色等特点,不同的人一般具有不同的声纹特征,具体可以先通过傅里叶变换将语音段转化为频谱数据,然后从频谱数据中提取出相关信息作为对应声纹特征。该关键词可以包括至少一个文字,该文字可以是英文或中文等。In this embodiment, the voiceprint feature is mainly a frequency spectrum feature, which may include frequency, amplitude, and other information, and usually reflects the characteristics of the sound's loudness, tone, and timbre. Different people generally have different voiceprint features. The speech segments are converted into spectral data by Fourier transform, and then the relevant information is extracted from the spectral data as corresponding voiceprint features. The keyword may include at least one text, and the text may be English or Chinese.
104、根据该声纹特征和关键词启动语音录制功能,并获取启动成功时刻的语音段作为目标语音段。104. Start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment.
例如,上述步骤104具体可以包括:For example, the above step 104 may specifically include:
判断该声纹特征是否与预设特征匹配,且该关键词是否与预设关键词匹配;Judging whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword;
若是,则启动语音录制功能;If yes, start the voice recording function;
若否,则返回执行该获取当前语音段的操作。If not, return to performing the operation of obtaining the current voice segment.
本实施例中,该预设特征主要是已绑定用户的声纹特征,其通常是提前设定好的,比如可以要求用户提前录制一段语音,从该语音中提取出声纹特征作为预设特征,与该用户进行绑定。该预设关键词主要用于触发启动语音交互功能,其可以是系统默认设定的(也即电子设备出厂时厂家设定好的),也可以是用户根据自身喜好自行设定的,比如可以通过相关设定界面的不同接口进入相应设置窗口进行设定。In this embodiment, the preset feature is mainly a voiceprint feature bound to the user, which is usually set in advance. For example, the user may be required to record a voice in advance, and the voiceprint feature is extracted from the voice as a preset. Feature to bind with this user. The preset keyword is mainly used to trigger the activation of the voice interaction function, which can be set by the system by default (that is, set by the manufacturer when the electronic device leaves the factory), or can be set by the user according to his own preferences, such as Enter the corresponding setting window through different interfaces of the related setting interface for setting.
需要说明的是,在启动语音录制功能的过程中,电子设备需要进行一系列的准备工作,比如中断前台应用的运行,语音录制组件的调用参数设置等等,在这个准备过程中,电子设备依然处于语音监测状态,语音段此时也是在不断更新的,而当语音录制功能启动成功后,该语音段可以停止更新。It should be noted that during the process of starting the voice recording function, the electronic device needs to perform a series of preparations, such as interrupting the operation of the foreground application, setting the call parameter settings of the voice recording component, etc. During this preparation process, the electronic device remains In the state of voice monitoring, the voice segment is continuously updated at this time, and after the voice recording function is successfully started, the voice segment can be stopped from updating.
105、根据录制的语音数据和目标语音段对该电子设备进行相应控制。105. Perform corresponding control on the electronic device according to the recorded voice data and the target voice segment.
例如,上述步骤105具体可以包括:For example, the above step 105 may specifically include:
1-1、对该目标语音段和录制的语音数据进行拼接,得到拼接语音。1-1. The target voice segment and the recorded voice data are spliced to obtain a spliced voice.
本实施例中,由于该语音段的语音时长和系统准备时长差不多,也即在系统准备就绪时刻(也即语音录制功能成功启动时刻)保存的语音段,刚好可以是用户在说出预设关键词之后的连续话语,其可以直接作为录制语音的起始内容与后续录制的语音数据拼接在一起,形成一段连续语音。In this embodiment, since the voice duration of the voice segment is similar to the system preparation duration, that is, the voice segment saved at the time when the system is ready (that is, when the voice recording function is successfully started), it can be exactly the key when the user speaks a preset The continuous utterance after the word can be directly used as the starting content of the recorded speech and spliced with the subsequent recorded speech data to form a continuous speech.
1-2、根据该拼接语音确定控制指令。1-2. Determine the control instruction according to the spliced voice.
例如,上述步骤1-2具体可以包括:For example, the above steps 1-2 may specifically include:
将该拼接语音发送至服务器,以使该服务器对该拼接语音进行语义解析,并返回相应的目标解析词;Sending the spliced speech to the server, so that the server semantically parses the spliced speech and returns the corresponding target parsed word;
根据返回的该目标解析词确定目标应用、以及应用事件;Determining a target application and an application event according to the returned target parsed word;
根据该目标应用和应用事件生成控制指令。Control instructions are generated based on the target application and application event.
本实施例中,该服务器主要用于语义分析,在语音录制过程中,电子设备可以实时将拼接语音传输至该服务器,而该服务器可以通过训练好的语音分析模型进行解析,该语义分析模型可以是深度学习模型,服务器可以提前采集不同的语音样本对该深度学习模型进行训练。In this embodiment, the server is mainly used for semantic analysis. During the voice recording process, the electronic device can transmit the spliced voice to the server in real time, and the server can be analyzed by a trained voice analysis model. The semantic analysis model can It is a deep learning model. The server can collect different voice samples in advance to train the deep learning model.
进一步地,上述步骤“根据返回的该目标解析词确定目标应用”具体可以包括:Further, the above-mentioned step "determining a target application based on the returned target parsing word" may specifically include:
判断该目标解析词与已存储的解析词集是否匹配;Determine whether the target parsing word matches the stored parsing word set;
若是,则根据匹配成功的解析词确定目标应用;If so, determine the target application based on the successfully parsed words;
若否,则判断该目标解析词与预设词集是否匹配;当匹配成功时,根据匹配成功的预设词确定目标应用,并将该匹配成功的预设词添加在该解析词集中,同时将该匹配成功的预设词从预设词集中删除。If not, determine whether the target parsing word matches the preset word set; when the match is successful, determine the target application based on the successfully matched preset word, and add the successfully matched preset word to the parsed word set, while Delete the successfully matched preset word from the preset word set.
本实施例中,该解析词集和预设词集中主要是应用关联词,比如应用名称或者应用类型,其中,该解析词集主要为历史时段内该电子设备请求服务器进行语义解析时解析出的,该预设词集主要为系统默认设定的,比如终端每安装一个应用,都可以获得该应用的应用关联词。具体的,当该目标解析词是应用名称时,可以直接将该应用名称对应的应用作为目标应用,当该解析词是应用类型时,可以先确定该电子设备中属于该同一应用类型的所有应用,之后可以将其中使用频率最高的应用作为目标应用,或者根据这些应用向用户提供选择界面,通过用户的选择操作确定目标应用。In this embodiment, the parsed word set and the preset word set are mainly application related words, such as an application name or an application type. The parsed word set is mainly parsed out when the electronic device requests a server to perform semantic parsing in a historical period. The preset word set is mainly set by the system by default, for example, each time an application is installed on the terminal, an application related word of the application can be obtained. Specifically, when the target parsing word is an application name, the application corresponding to the application name may be directly used as the target application. When the parsing word is an application type, all applications in the electronic device that belong to the same application type may be determined first. , You can then use the most frequently used application as the target application, or provide users with a selection interface based on these applications, and determine the target application through the user's selection operation.
需要说明的是,对于初次解析,该预设词集内可以包括系统默认设定的所有词,该解析词集可以为空,而之后每次电子设备获得新的解析词,都可以将该新的解析词存储在解析词集中,同时将其从预设词集中删除,从而实现对解析词集和预设词集的不断更新。通过在每次解析之后将目标解析词与以往的解析记录进行匹配,从而能结合用户的交互习惯缩小匹配范围,提高匹配速度。It should be noted that, for the initial parsing, the preset word set may include all words set by the system by default, the parsing word set may be empty, and each time after the electronic device obtains a new parsing word, the new parsing word may be The parsed words in are stored in the parsed word set and deleted from the preset word set at the same time, so as to continuously update the parsed word set and the preset word set. By matching the target parsing word with the previous parsing record after each parsing, the scope of the matching can be narrowed according to the user's interaction habits, and the matching speed can be improved.
1-3、控制该电子设备执行该控制指令。1-3. Control the electronic device to execute the control instruction.
例如,上述步骤1-3具体可以包括:For example, the above steps 1-3 may specifically include:
利用该目标应用执行该应用事件。Use the target application to execute the application event.
本实施例中,如果该目标应用处于关闭状态,可以先启动该目标应用,之后利用该目标应用执行相应的应用事件。In this embodiment, if the target application is in a closed state, the target application may be started first, and then the target application is used to execute a corresponding application event.
由上述可知,本实施例提供的语音控制方法,应用于电子设备,当该电子设备处于语音监测状态时,通过利用监测的语音数据对已存储的语音段进行更新,该语音段为最近预设时长内监测到的一段语音数据,接着,在该语音段的更新过程中,获取当前语音段,并从当前 语音段中提取出声纹特征、以及关键词,接着,根据该声纹特征和关键词启动语音录制功能,并获取启动成功时刻的语音段作为目标语音段,之后根据录制的语音数据和目标语音段对该电子设备进行相应控制,从而能通过连贯语音直接唤醒并输入交互指令,无需因系统准备时长而导致语音中断,方法简单,能有效提高语音交互效率,语音交互效果好。It can be known from the foregoing that the voice control method provided in this embodiment is applied to an electronic device. When the electronic device is in a voice monitoring state, the stored voice segment is updated by using the monitored voice data, and the voice segment is a recent preset A piece of voice data detected within the duration, then, during the update of the voice segment, the current voice segment is obtained, and voiceprint features and keywords are extracted from the current voice segment. Then, according to the voiceprint feature and key To activate the voice recording function, and obtain the voice segment at the moment of successful startup as the target voice segment, and then control the electronic device accordingly according to the recorded voice data and the target voice segment, so that it can directly wake up and input interactive instructions through coherent voice, without the need to Voice interruption due to the long system preparation time, the method is simple, can effectively improve the efficiency of voice interaction, and the voice interaction effect is good.
在本实施例中,将从语音控制装置的角度进行描述,具体将以该语音控制装置集成在电子设备中为例进行详细说明。In this embodiment, description will be made from the perspective of a voice control device, and the voice control device will be specifically described as an example for integration in an electronic device.
请参见图3,一种语音控制方法,具体流程可以如下:See Figure 3, a voice control method. The specific process can be as follows:
201.当处于语音监测状态时,电子设备利用监测的语音数据对已存储的语音段进行更新,该语音段为最近预设时长内监测到的一段语音数据。201. When in a voice monitoring state, the electronic device uses the monitored voice data to update a stored voice segment, which is a segment of voice data that has been monitored within a recently preset time period.
譬如,该预设时长可以是人为设定的3秒。只要电子设备处于开机状态,就可以一直进行语音监测,而对于监测到的语音数据,电子设备可以只保存最近3秒内的语音数据。For example, the preset duration may be artificially set to 3 seconds. As long as the electronic device is on, voice monitoring can be performed all the time, and for the monitored voice data, the electronic device can only save the voice data within the last 3 seconds.
202.在该语音段的更新过程中,电子设备获取当前语音段,并从当前语音段中提取出声纹特征、以及关键词。202. During the update of the voice segment, the electronic device obtains the current voice segment, and extracts voiceprint features and keywords from the current voice segment.
譬如,请参见图4,当用户对着电子设备说出“小x小x,播放xx应用中的xxx歌曲”时,由于该语音监测操作和语音段更新操作是实时进行的,故在用户说出前3秒的语音,比如“小x小x”时,可以将其作为初始的语音段进行存储,之后电子设备会对存储的语音段进行声纹特征和关键词的提取操作,比如利用傅里叶变化将其转化为频谱数据,从频谱数据中提取出相关信息作为对应声纹特征,与此同时,对语音段进行内容提取,得到关键词。For example, referring to FIG. 4, when the user says “small x small x, play xxx song in xx application” to the electronic device, since the voice monitoring operation and voice segment update operation are performed in real time, the user says The first 3 seconds of speech, such as "small x small x", can be stored as the initial speech segment, and then the electronic device will perform voiceprint feature and keyword extraction operations on the stored speech segment, such as using Fourier Leaf change converts it into spectrum data, and extracts relevant information from the spectrum data as corresponding voiceprint features. At the same time, it extracts the content of the speech segment to obtain keywords.
203.电子设备判断该声纹特征是否与预设特征匹配,且该关键词是否与预设关键词匹配,若是,则执行下述步骤204,若否,则返回执行上述步骤202。203. The electronic device determines whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword. If yes, execute step 204 below; if not, return to execute step 202 above.
譬如,该预设特征可以是用户提前输入的声纹特征,该预设关键词可以是系统默认设定的词组,其可以包括至少两个文字或单词,当然,为避免上述预设时长的语音段不能包括完整的预设关键词,该预设关键词应该较简短,比如“小x”。For example, the preset feature may be a voiceprint feature input by the user in advance, and the preset keyword may be a phrase set by the system by default, which may include at least two words or words. Of course, in order to avoid the above-mentioned voice of a preset duration Segments cannot include complete preset keywords. The preset keywords should be short, such as "small x".
204.电子设备启动语音录制功能,并获取启动成功时刻的语音段作为目标语音段。204. The electronic device starts a voice recording function, and obtains a voice segment at a successful start time as a target voice segment.
譬如,在图4中,当通过初始的语音段“小x小x”分析出用户语音满足条件时,可以立即通知电子设备进行相关准备工作,比如断前台应用的运行,语音录制组件的调用参数设置等等,在这个准备过程中,语音段会继续进行更新,直到语音录制功能开启成功时,此时的语音段可能刚好为“播放xx应用”。For example, in Figure 4, when the user's voice meets the conditions by analyzing the initial voice segment "small x small x", the electronic device can be immediately notified to perform related preparations, such as interrupting the running of the foreground application and calling parameters of the voice recording component. Settings and so on. During this preparation process, the voice segment will continue to be updated until the voice recording function is successfully turned on, and the voice segment at this time may be exactly "play xx application".
205.电子设备对该目标语音段和录制的语音数据进行拼接,得到拼接语音,之后将该拼接语音发送至服务器,以使该服务器对该拼接语音进行语义解析,并返回相应的目标解析词。205. The electronic device splices the target voice segment and the recorded voice data to obtain a spliced voice, and then sends the spliced voice to a server, so that the server semantically parses the spliced voice and returns a corresponding target parsed word.
譬如,在语音录制功能开启成功后,由于后续用户说的话会通过录制的方式进行保存,因此不需通过语音段更新方式进行重复保存,此时,可以停止语音段的更新操作,并将此时的语音段“播放xx应用”作为录制语音的起始内容与后续录制的语音数据拼接成一段连续语音,比如在录制的前2秒,该拼接语音可以为“播放xx应用中的”,与此同时,将该拼接语音实时传送至服务器进行语义解析。For example, after the voice recording function is successfully turned on, since subsequent users' words are saved by recording, there is no need to repeatedly save the voice segment update method. At this time, you can stop the update operation of the voice segment, and The voice segment "Play xx application" is used as the starting content of the recorded voice and the subsequent recorded voice data is spliced into a continuous voice. For example, in the first 2 seconds of recording, the spliced voice can be "play xx application". At the same time, the spliced speech is transmitted to the server in real time for semantic analysis.
206.电子设备判断该目标解析词与已存储的解析词集是否匹配,若是,则根据匹配成功的解析词确定目标应用,并根据该目标解析词确定应用事件,若否,则执行下述步骤207。206. The electronic device determines whether the target parsing word matches the stored parsing word set. If yes, determines the target application based on the successfully matched parsing word, and determines an application event based on the target parsing word. If not, performs the following steps. 207.
207.电子设备判断该目标解析词与预设词集是否匹配,若是,则根据匹配成功的预设词确定目标应用,并根据该目标解析词确定应用事件,之后将该匹配成功的预设词添加在该解析词集中,同时将该匹配成功的预设词从预设词集中删除,若否,则提示用户重新输入语音。207. The electronic device determines whether the target parsing word matches a preset word set, and if so, determines a target application based on the successfully matched preset word, determines an application event based on the target parsing word, and then sets the successfully matched preset word. It is added to the parsed word set, and the successfully matched preset word is deleted from the preset word set. If not, the user is prompted to re-enter the voice.
譬如,请参见图5,对于服务器返回的目标解析词,电子设备可以先将其与以往的解析记录进行匹配,只有在匹配不成功时,才在预设词集中进行匹配。请参见图5,当该目标解析词为应用名称时,可以直接将该应用名称对应的应用作为目标应用,比如从“播放xx应用中的xxx歌曲”这段语音中解析出的目标解析词可以是“xx应用”,应用事件可以为:播放xxx歌曲。当该目标解析词为应用类型时,可以先确定该电子设备中属于该同一应用类型的所有应用,之后可以将其中使用频率最高的应用作为目标应用,比如从“播放xxx歌曲”这 段语音中解析出的目标解析词可以是“音乐播放应用”,应用事件可以为:播放xxx歌曲,此时,电子设备中可以找到属于音乐播放应用的C1、C2和C3应用,若C1应用使用频率最高,则目标应用为C1应用。For example, referring to FIG. 5, for the target parsing word returned by the server, the electronic device can first match it with the previous parsing record. Only when the matching is unsuccessful, the matching is performed in the preset word set. Referring to FIG. 5, when the target analytic word is an application name, the application corresponding to the application name can be directly used as the target application. For example, the target analytic word parsed from the voice of “playing xxx song in xx application” can be Is "xx application", the application event can be: play xxx songs. When the target analytic word is an application type, all applications belonging to the same application type in the electronic device may be determined first, and then the application with the highest frequency of use may be used as the target application, for example, from the voice of "playing xxx songs" The parsed target analytic word can be "music player application", and the application event can be: playing xxx songs. At this time, C1, C2, and C3 applications belonging to the music player application can be found in the electronic device. If the C1 application is used the most frequently, The target application is a C1 application.
208.电子设备利用该目标应用执行该应用事件。208. The electronic device executes the application event using the target application.
譬如,对于语音段“播放xx应用中的xxx歌曲”,若xx应用此时并未开启,可以先开启该xx应用,之后利用xx应用找到xxx歌曲进行播放。For example, for the voice segment "Playing xxx songs in xx applications", if the xx application is not opened at this time, you can first open the xx application, and then use the xx application to find the xxx song for playback.
由上述可知,本实施例提供的语音控制方法,应用于电子设备,当处于语音监测状态时,电子设备可以利用监测的语音数据对已存储的语音段进行更新,该语音段为最近预设时长内监测到的一段语音数据,且在该语音段的更新过程中,获取当前语音段,并从当前语音段中提取出声纹特征、以及关键词,接着,判断该声纹特征是否与预设特征匹配,且该关键词是否与预设关键词匹配,若是,则启动语音录制功能,并获取启动成功时刻的语音段作为目标语音段,之后,对该目标语音段和录制的语音数据进行拼接,得到拼接语音,从而能通过连贯语音直接唤醒并输入交互指令,无需因系统准备时长而导致语音中断,方法简单。之后通过将该拼接语音发送至服务器,以使该服务器对该拼接语音进行语义解析,并返回相应的目标解析词,之后,判断该目标解析词与已存储的解析词集是否匹配,若是,则根据匹配成功的解析词确定目标应用,并根据该目标解析词确定应用事件,若否,则判断该目标解析词与预设词集是否匹配,若是,则根据匹配成功的预设词确定目标应用,并根据该目标解析词确定应用事件,之后将该匹配成功的预设词添加在该解析词集中,同时将该匹配成功的预设词从预设词集中删除,最后利用该目标应用执行该应用事件,从而能根据用户以往的语音交互习惯来提高解析词匹配效率,能有效提高语音交互效率,提升用户体验效果。It can be known from the foregoing that the voice control method provided in this embodiment is applied to an electronic device. When the voice monitoring state is in use, the electronic device can use the monitored voice data to update the stored voice segment, and the voice segment is the latest preset duration. A piece of voice data is monitored within the voice segment, and in the process of updating the voice segment, the current voice segment is obtained, and voiceprint features and keywords are extracted from the current voice segment. Then, it is determined whether the voiceprint feature is consistent with a preset Feature matching, and whether the keyword matches a preset keyword, if so, start the voice recording function, and obtain the voice segment at the moment of successful startup as the target voice segment, and then stitch the target voice segment and the recorded voice data , Get the spliced voice, so that you can directly wake up and input interactive instructions through coherent voice, without having to interrupt the voice due to the system preparation time, the method is simple. Then, the spliced speech is sent to the server, so that the server semantically parses the spliced speech, and returns the corresponding target parsing word. Then, it is determined whether the target parsing word matches the stored parsing word set. If yes, then The target application is determined according to the successfully parsed word, and the application event is determined according to the target parsed word. If not, it is determined whether the target parsed word matches the preset word set. If so, the target application is determined based on the successfully matched preset word. , And determine an application event according to the target parsed word, then add the successfully matched preset word to the parsed word set, at the same time delete the successfully matched preset word from the preset word set, and finally use the target application to execute the Applying events can improve the efficiency of parsing word matching based on the user's past voice interaction habits, effectively improve the efficiency of voice interaction, and improve the user experience effect.
根据上述实施例所描述的方法,本实施例将从语音控制装置的角度进一步进行描述,该语音控制装置具体可以作为独立的实体来实现,也可以集成在电子设备,比如终端中来实现,该终端可以包括手机、平板电脑等。According to the method described in the foregoing embodiment, this embodiment will be further described from the perspective of a voice control device. The voice control device may be specifically implemented as an independent entity or integrated in an electronic device, such as a terminal. The terminal may include a mobile phone, a tablet computer, and the like.
一种语音控制装置,应用于电子设备,包括:A voice control device applied to electronic equipment includes:
更新模块,用于当所述电子设备处于语音监测状态时,利用监测的语音数据对已存储的语音段进行更新,所述语音段为最近预设时长内监测到的一段语音数据;An update module, configured to update the stored voice segment by using the monitored voice data when the electronic device is in a voice monitoring state, where the voice segment is a piece of voice data that has been monitored within a recently preset duration;
获取模块,用于在所述语音段的更新过程中,获取当前语音段;An obtaining module, configured to obtain a current voice segment during an update process of the voice segment;
提取模块,用于从当前语音段中提取出声纹特征、以及关键词;An extraction module for extracting voiceprint features and keywords from the current voice segment;
启动模块,用于根据所述声纹特征和关键词启动语音录制功能,并获取启动成功时刻的语音段作为目标语音段;A startup module, configured to start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment;
控制模块,用于根据录制的语音数据和目标语音段对所述电子设备进行相应控制。A control module, configured to control the electronic device according to the recorded voice data and the target voice segment.
在一些实施例中,所述启动模块具体用于:In some embodiments, the startup module is specifically configured to:
判断所述声纹特征是否与预设特征匹配,且所述关键词是否与预设关键词匹配;Determining whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword;
若是,则启动语音录制功能;If yes, start the voice recording function;
若否,则触发所述获取模块执行所述获取当前语音段的操作。If not, trigger the acquisition module to perform the operation of acquiring the current voice segment.
在一些实施例中,所述控制模块具体包括:In some embodiments, the control module specifically includes:
拼接单元,用于对所述目标语音段和录制的语音数据进行拼接,得到拼接语音;A splicing unit, configured to splice the target voice segment and the recorded voice data to obtain a spliced voice;
确定单元,用于根据所述拼接语音确定控制指令;A determining unit, configured to determine a control instruction according to the spliced voice;
执行单元,用于控制所述电子设备执行所述控制指令。An execution unit is configured to control the electronic device to execute the control instruction.
在一些实施例中,所述确定单元具体用于:In some embodiments, the determining unit is specifically configured to:
将所述拼接语音发送至服务器,以使所述服务器对所述拼接语音进行语义解析,并返回相应的目标解析词;Sending the spliced speech to a server, so that the server semantically parses the spliced speech and returns a corresponding target parsed word;
根据返回的所述目标解析词确定目标应用、以及应用事件;Determining a target application and an application event according to the returned target parsing word;
根据所述目标应用和应用事件生成控制指令;Generating a control instruction according to the target application and an application event;
所述执行单元用于:利用所述目标应用执行所述应用事件。The execution unit is configured to execute the application event by using the target application.
在一些实施例中,所述确定单元具体用于:In some embodiments, the determining unit is specifically configured to:
判断所述目标解析词与已存储的解析词集是否匹配;Determining whether the target parsing word matches a stored parsing word set;
若是,则根据匹配成功的解析词确定目标应用;If so, determine the target application based on the successfully parsed words;
若否,则判断所述目标解析词与预设词集是否匹配;当匹配成功时,根据匹配成功的预设词确定目标应用,并将所述匹配成功的预设词添加在所述解析词集中,同时将所述匹配成功的预设词从预设词集中删除。If not, determine whether the target parsing word matches the preset word set; when the match is successful, determine the target application based on the successfully matched preset word, and add the successfully matched preset word to the parsed word And simultaneously delete the successfully matched preset word from the preset word set.
在一些实施例中,所述确定单元具体用于:In some embodiments, the determining unit is specifically configured to:
当所述目标解析词为应用名称时,将所述应用名称对应的应用作为目标应用;When the target parsing word is an application name, the application corresponding to the application name is used as the target application;
当所述目标解析词为应用类型时,确定所述电子设备中属于同一所述应用类型的所有应用;根据确定的所述应用确定目标应用。When the target parsing word is an application type, all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined applications.
在一些实施例中,所述确定单元具体用于:In some embodiments, the determining unit is specifically configured to:
将确定的所述应用中使用频率最高的应用作为目标应用,或者,Using the determined application with the highest frequency as the target application, or
根据确定的所述应用生成选择界面,并向用户提供所述选择界面;根据所述用户在所述选择界面上的选择操作确定目标应用。Generate a selection interface according to the determined application, and provide the selection interface to a user; determine a target application according to a selection operation of the user on the selection interface.
请参阅图6,图6具体描述了本申请实施例提供的语音控制装置,应用于电子设备,该语音控制装置可以包括:更新模块10、获取模块20、提取模块30、启动模块40和控制模块50,其中:Please refer to FIG. 6. FIG. 6 specifically describes a voice control device provided in an embodiment of the present application, which is applied to electronic equipment. The voice control device may include: an update module 10, an acquisition module 20, an extraction module 30, a startup module 40, and a control module. 50 of which:
(1)更新模块10(1) Update module 10
更新模块10,用于当该电子设备处于语音监测状态时,利用监测的语音数据对已存储的语音段进行更新,该语音段为最近预设时长内监测到的一段语音数据。The update module 10 is configured to update the stored voice segment by using the monitored voice data when the electronic device is in a voice monitoring state, where the voice segment is a piece of voice data that has been monitored within a recently preset time period.
本实施例中,可以通过麦克风等设备对语音进行监测。该预设时长可以人为设定,比如3秒,其通常和启动语音录制功能的系统准备时长差不多。具体的,电子设备可以在低功耗的状态下监测用户语音,更新模块10实时将最近预设时长内监测到的语音数据进行保存。In this embodiment, voice can be monitored through a device such as a microphone. The preset time can be set manually, for example, 3 seconds, which is usually about the same as the system preparation time when the voice recording function is activated. Specifically, the electronic device can monitor the user's voice in a low power consumption state, and the update module 10 saves the voice data detected in the most recent preset time period in real time.
(2)获取模块20(2) Acquisition module 20
获取模块20,用于在该语音段的更新过程中,获取当前语音段。The obtaining module 20 is configured to obtain a current voice segment during an update process of the voice segment.
本实施例中,语音监测和语音段更新是实时进行的,而在这个过程中,电子设备也可以对语音段进行实时分析。In this embodiment, voice monitoring and voice segment updating are performed in real time, and in this process, the electronic device may also perform real time analysis on the voice segment.
(3)提取模块30(3) Extraction module 30
提取模块30,用于从当前语音段中提取出声纹特征、以及关键词。The extraction module 30 is configured to extract voiceprint features and keywords from a current voice segment.
本实施例中,该声纹特征主要是频谱特征,其可以包括频率、幅值等信息,通常反映声音的响度、音调和音色等特点,不同的人一般具有不同的声纹特征,具体可以先通过傅里叶变换将语音段转化为频谱数据,然后从频谱数据中提取出相关信息作为对应声纹特征。该关键词可以包括至少一个文字,该文字可以是英文或中文等。In this embodiment, the voiceprint feature is mainly a frequency spectrum feature, which may include frequency, amplitude, and other information, and usually reflects the characteristics of the sound's loudness, tone, and timbre. Different people generally have different voiceprint features. The speech segments are converted into spectral data by Fourier transform, and then the relevant information is extracted from the spectral data as corresponding voiceprint features. The keyword may include at least one text, and the text may be English or Chinese.
(4)启动模块40(4) Start module 40
启动模块40,用于根据该声纹特征和关键词启动语音录制功能,并获取启动成功时刻的语音段作为目标语音段。The starting module 40 is configured to start a voice recording function according to the voiceprint feature and keywords, and obtain a voice segment at a successful startup time as a target voice segment.
例如,该启动模块40具体可以用于:For example, the startup module 40 may be specifically used for:
判断该声纹特征是否与预设特征匹配,且该关键词是否与预设关键词匹配;Judging whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword;
若是,则启动语音录制功能;If yes, start the voice recording function;
若否,则触发该获取模块执行该获取当前语音段的操作。If not, the acquisition module is triggered to perform the operation of acquiring the current voice segment.
本实施例中,该预设特征主要是已绑定用户的声纹特征,其通常是提前设定好的,比如可以要求用户提前录制一段语音,从该语音中提取出声纹特征作为预设特征,与该用户进行绑定。该预设关键词主要用于触发启动语音交互功能,其可以是系统默认设定的(也即电子设备出厂时厂家设定好的),也可以是用户根据自身喜好自行设定的,比如可以通过相关设定界面的不同接口进入相应设置窗口进行设定。In this embodiment, the preset feature is mainly a voiceprint feature bound to the user, which is usually set in advance. For example, the user may be required to record a voice in advance, and the voiceprint feature is extracted from the voice as a preset. Feature to bind with this user. The preset keyword is mainly used to trigger the activation of the voice interaction function, which can be set by the system by default (that is, set by the manufacturer when the electronic device leaves the factory), or can be set by the user according to his own preferences, such as Enter the corresponding setting window through different interfaces of the related setting interface for setting.
需要说明的是,在启动语音录制功能的过程中,电子设备需要进行一系列的准备工作,比如中断前台应用的运行,语音录制组件的调用参数设置等等,在这个准备过程中,电子设 备依然处于语音监测状态,语音段此时也是在不断更新的,而当语音录制功能启动成功后,该语音段可以停止更新。It should be noted that during the process of starting the voice recording function, the electronic device needs to perform a series of preparations, such as interrupting the operation of the foreground application, setting the call parameter settings of the voice recording component, etc. In the state of voice monitoring, the voice segment is continuously updated at this time, and after the voice recording function is successfully started, the voice segment can be stopped from updating.
(5)控制模块50(5) Control module 50
控制模块50,用于根据录制的语音数据和目标语音段对该电子设备进行相应控制。The control module 50 is configured to control the electronic device according to the recorded voice data and the target voice segment.
例如,请参见图7,该控制模块50具体可以包括:For example, referring to FIG. 7, the control module 50 may specifically include:
拼接单元51,用于对该目标语音段和录制的语音数据进行拼接,得到拼接语音。The splicing unit 51 is configured to splice the target voice segment and the recorded voice data to obtain a spliced voice.
本实施例中,由于该语音段的语音时长和系统准备时长差不多,也即在系统准备就绪时刻(也即语音录制功能成功启动时刻)保存的语音段,刚好可以是用户在说出预设关键词之后的连续话语,其可以直接作为录制语音的起始内容与后续录制的语音数据拼接在一起,形成一段连续语音。In this embodiment, since the voice duration of the voice segment is similar to the system preparation duration, that is, the voice segment saved at the time when the system is ready (that is, when the voice recording function is successfully started), it can be exactly the key when the user speaks a preset The continuous utterance after the word can be directly used as the starting content of the recorded speech and spliced with the subsequent recorded speech data to form a continuous speech.
确定单元52,用于根据该拼接语音确定控制指令。A determining unit 52 is configured to determine a control instruction according to the stitching voice.
例如,该确定单元52具体可以用于:For example, the determining unit 52 may be specifically configured to:
判断该目标解析词与已存储的解析词集是否匹配;Determine whether the target parsing word matches the stored parsing word set;
若是,则根据匹配成功的解析词确定目标应用;If so, determine the target application based on the successfully parsed words;
若否,则判断该目标解析词与预设词集是否匹配;当匹配成功时,根据匹配成功的预设词确定目标应用,并将该匹配成功的预设词添加在该解析词集中,同时将该匹配成功的预设词从预设词集中删除。If not, determine whether the target parsing word matches the preset word set; when the match is successful, determine the target application based on the successfully matched preset word, and add the successfully matched preset word to the parsed word set, while Delete the successfully matched preset word from the preset word set.
本实施例中,该服务器主要用于语义分析,在语音录制过程中,电子设备可以实时将拼接语音传输至该服务器,而该服务器可以通过训练好的语音分析模型进行解析,该语义分析模型可以是深度学习模型,服务器可以提前采集不同的语音样本对该深度学习模型进行训练。In this embodiment, the server is mainly used for semantic analysis. During the voice recording process, the electronic device can transmit the spliced voice to the server in real time, and the server can be analyzed by a trained voice analysis model. The semantic analysis model can It is a deep learning model. The server can collect different voice samples in advance to train the deep learning model.
进一步地,该确定单元52具体可以用于:Further, the determining unit 52 may be specifically configured to:
将该拼接语音发送至服务器,以使该服务器对该拼接语音进行语义解析,并返回相应的目标解析词;Sending the spliced speech to the server, so that the server semantically parses the spliced speech and returns the corresponding target parsed word;
根据返回的该目标解析词确定目标应用、以及应用事件;Determining a target application and an application event according to the returned target parsed word;
根据该目标应用和应用事件生成控制指令。Control instructions are generated based on the target application and application event.
本实施例中,该解析词集和预设词集中主要是应用关联词,比如应用名称或者应用类型,其中,该解析词集主要为历史时段内该电子设备请求服务器进行语义解析时解析出的,该预设词集主要为系统默认设定的,比如终端每安装一个应用,都可以获得该应用的应用关联词。具体的,当该目标解析词是应用名称时,可以直接将该应用名称对应的应用作为目标应用,当该解析词是应用类型时,可以先确定该电子设备中属于该同一应用类型的所有应用,之后可以将其中使用频率最高的应用作为目标应用,或者根据这些应用向用户提供选择界面,通过用户的选择操作确定目标应用。In this embodiment, the parsed word set and the preset word set are mainly application related words, such as an application name or an application type. The parsed word set is mainly parsed out when the electronic device requests a server to perform semantic parsing in a historical period. The preset word set is mainly set by the system by default, for example, each time an application is installed on the terminal, an application related word of the application can be obtained. Specifically, when the target parsing word is an application name, the application corresponding to the application name may be directly used as the target application. When the parsing word is an application type, all applications in the electronic device that belong to the same application type may be determined first. , You can then use the most frequently used application as the target application, or provide users with a selection interface based on these applications, and determine the target application through the user's selection operation.
需要说明的是,对于初次解析,该预设词集内可以包括系统默认设定的所有词,该解析词集可以为空,而之后每次电子设备获得新的解析词,都可以将该新的解析词存储在解析词集中,同时将其从预设词集中删除,从而实现对解析词集和预设词集的不断更新。通过在每次解析之后将目标解析词与以往的解析记录进行匹配,从而能结合用户的交互习惯缩小匹配范围,提高匹配速度。It should be noted that, for the initial parsing, the preset word set may include all words set by the system by default, the parsing word set may be empty, and each time after the electronic device obtains a new parsing word, the new parsing word may be The parsed words in are stored in the parsed word set and deleted from the preset word set at the same time, so as to continuously update the parsed word set and the preset word set. By matching the target parsing word with the previous parsing record after each parsing, the scope of the matching can be narrowed according to the user's interaction habits, and the matching speed can be improved.
执行单元53,用于控制该电子设备执行该控制指令。The execution unit 53 is configured to control the electronic device to execute the control instruction.
进一步地,该执行单元53具体可以用于:Further, the execution unit 53 may be specifically configured to:
利用该目标应用执行该应用事件。Use the target application to execute the application event.
本实施例中,如果该目标应用处于关闭状态,执行单元53可以先启动该目标应用,之后利用该目标应用执行相应的应用事件。In this embodiment, if the target application is in a closed state, the execution unit 53 may start the target application first, and then use the target application to execute a corresponding application event.
具体实施时,以上各个单元可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现,以上各个单元的具体实施可参见前面的方法实施例,在此不再赘述。In specific implementation, the above units may be implemented as independent entities, or may be arbitrarily combined, and implemented as the same or several entities. For the specific implementation of the above units, refer to the foregoing method embodiments, and details are not described herein again.
由上述可知,本实施例提供的语音控制装置,应用于电子设备,当该电子设备处于语音监测状态时,更新模块10利用监测的语音数据对已存储的语音段进行更新,该语音段为最近 预设时长内监测到的一段语音数据,接着,获取模块20在该语音段的更新过程中,获取当前语音段,提取模块30从当前语音段中提取出声纹特征、以及关键词,接着,启动模块40根据该声纹特征和关键词启动语音录制功能,并获取启动成功时刻的语音段作为目标语音段,之后控制模块50根据录制的语音数据和目标语音段对该电子设备进行相应控制,从而能通过连贯语音直接唤醒并输入交互指令,无需因系统准备时长而导致语音中断,方法简单,能有效提高语音交互效率,语音交互效果好。It can be known from the foregoing that the voice control apparatus provided in this embodiment is applied to an electronic device. When the electronic device is in a voice monitoring state, the update module 10 updates the stored voice segment by using the monitored voice data, and the voice segment is the most recent A piece of voice data detected within a preset duration. Then, the acquisition module 20 acquires the current voice segment during the update of the voice segment, and the extraction module 30 extracts voiceprint features and keywords from the current voice segment. Then, The starting module 40 starts the voice recording function according to the voiceprint characteristics and keywords, and obtains the voice segment at the moment of successful startup as the target voice segment, and then the control module 50 controls the electronic device accordingly according to the recorded voice data and the target voice segment. Therefore, it is possible to directly wake up and input interactive instructions through continuous voice, without interrupting the voice due to the system preparation time. The method is simple, can effectively improve the efficiency of voice interaction, and has a good voice interaction effect.
另外,本申请实施例还提供一种电子设备,该电子设备可以是智能手机、平板电脑等设备。如图8所示,电子设备400包括处理器401、存储器402。其中,处理器401与存储器402电性连接。In addition, the embodiment of the present application further provides an electronic device, and the electronic device may be a device such as a smart phone or a tablet computer. As shown in FIG. 8, the electronic device 400 includes a processor 401 and a memory 402. The processor 401 is electrically connected to the memory 402.
处理器401是电子设备400的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或加载存储在存储器402内的应用程序,以及调用存储在存储器402内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。The processor 401 is the control center of the electronic device 400. It uses various interfaces and lines to connect various parts of the entire electronic device. The processor 401 runs or loads applications stored in the memory 402, and calls the data stored in the memory 402 to execute the electronics. Various functions of the device and processing data, so as to monitor the overall electronic equipment.
在本实施例中,电子设备400中的处理器401会按照如下的步骤,将一个或一个以上的应用程序的进程对应的指令加载到存储器402中,并由处理器401来运行存储在存储器402中的应用程序,从而实现各种功能:In this embodiment, the processor 401 in the electronic device 400 will load instructions corresponding to one or more application processes into the memory 402 according to the following steps, and the processor 401 will run and store the instructions in the memory 402 Applications in order to implement various functions:
当该电子设备处于语音监测状态时,利用监测的语音数据对已存储的语音段进行更新,该语音段为最近预设时长内监测到的一段语音数据;When the electronic device is in a voice monitoring state, the stored voice segment is updated by using the monitored voice data, and the voice segment is a piece of voice data that has been monitored within a recently preset time period;
在该语音段的更新过程中,获取当前语音段;During the update of the voice segment, obtain the current voice segment;
从当前语音段中提取出声纹特征、以及关键词;Extracting voiceprint features and keywords from the current voice segment;
根据该声纹特征和关键词启动语音录制功能,并获取启动成功时刻的语音段作为目标语音段;Start the voice recording function according to the voiceprint characteristics and keywords, and obtain the voice segment at the moment of successful startup as the target voice segment;
根据录制的语音数据和目标语音段对该电子设备进行相应控制。The electronic device is controlled according to the recorded voice data and the target voice segment.
在一些实施例中,所述根据所述声纹特征和关键词启动语音录制功能,包括:In some embodiments, the starting a voice recording function according to the voiceprint characteristics and keywords includes:
判断所述声纹特征是否与预设特征匹配,且所述关键词是否与预设关键词匹配;Determining whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword;
若是,则启动语音录制功能;If yes, start the voice recording function;
若否,则返回执行所述获取当前语音段的操作。If not, return to performing the operation of acquiring the current voice segment.
在一些实施例中,所述根据录制的语音数据和目标语音段对所述电子设备进行相应控制,包括:In some embodiments, the controlling the electronic device according to the recorded voice data and a target voice segment includes:
对所述目标语音段和录制的语音数据进行拼接,得到拼接语音;Stitching the target voice segment and the recorded voice data to obtain a stitched voice;
根据所述拼接语音确定控制指令;Determining a control instruction according to the spliced voice;
控制所述电子设备执行所述控制指令。Controlling the electronic device to execute the control instruction.
在一些实施例中,所述根据所述拼接语音确定控制指令,包括:In some embodiments, the determining a control instruction according to the spliced voice includes:
将所述拼接语音发送至服务器,以使所述服务器对所述拼接语音进行语义解析,并返回相应的目标解析词;Sending the spliced speech to a server, so that the server semantically parses the spliced speech and returns a corresponding target parsed word;
根据返回的所述目标解析词确定目标应用、以及应用事件;Determining a target application and an application event according to the returned target parsing word;
根据所述目标应用和应用事件生成控制指令;Generating a control instruction according to the target application and an application event;
所述控制所述电子设备执行所述控制指令,包括:利用所述目标应用执行所述应用事件。The controlling the electronic device to execute the control instruction includes using the target application to execute the application event.
在一些实施例中,,所述根据所述目标解析词确定目标应用,包括:In some embodiments, the determining a target application according to the target parsing word includes:
判断所述目标解析词与已存储的解析词集是否匹配;Determining whether the target parsing word matches a stored parsing word set;
若是,则根据匹配成功的解析词确定目标应用;If so, determine the target application based on the successfully parsed words;
若否,则判断所述目标解析词与预设词集是否匹配;当匹配成功时,根据匹配成功的预设词确定目标应用,并将所述匹配成功的预设词添加在所述解析词集中,同时将所述匹配成功的预设词从预设词集中删除。If not, determine whether the target parsing word matches the preset word set; when the match is successful, determine the target application based on the successfully matched preset word, and add the successfully matched preset word to the parsed word And simultaneously delete the successfully matched preset word from the preset word set.
在一些实施例中,所述根据所述目标解析词确定目标应用,包括:In some embodiments, the determining a target application according to the target parsing word includes:
当所述目标解析词为应用名称时,将所述应用名称对应的应用作为目标应用;When the target parsing word is an application name, the application corresponding to the application name is used as the target application;
当所述目标解析词为应用类型时,确定所述电子设备中属于同一所述应用类型的所有应 用;根据确定的所述应用确定目标应用。When the target parsing word is an application type, all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined application.
在一些实施例中,所述根据确定的所述应用确定目标应用,包括:In some embodiments, the determining a target application based on the determined application includes:
将确定的所述应用中使用频率最高的应用作为目标应用,或者,Using the determined application with the highest frequency as the target application, or
根据确定的所述应用生成选择界面,并向用户提供所述选择界面;根据所述用户在所述选择界面上的选择操作确定目标应用。Generate a selection interface according to the determined application, and provide the selection interface to a user; determine a target application according to a selection operation of the user on the selection interface.
请参阅图9,图9为本申请实施例提供的电子设备结构示意图。该电子设备500可以包括射频(RF,Radio Frequency)电路501、包括有一个或一个以上计算机可读存储介质的存储器502、输入单元503、显示单元504、传感器504、音频电路506、无线保真(WiFi,Wireless Fidelity)模块507、包括有一个或者一个以上处理核心的处理器508、以及电源509等部件。本领域技术人员可以理解,图9中示出的电子设备结构并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Please refer to FIG. 9, which is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 500 may include a radio frequency (RF) circuit 501, a memory 502 including one or more computer-readable storage media, an input unit 503, a display unit 504, a sensor 504, an audio circuit 506, and a wireless fidelity ( A WiFi (Wireless Fidelity) module 507 includes a processor 508 having one or more processing cores, a power supply 509, and other components. Those skilled in the art can understand that the structure of the electronic device shown in FIG. 9 does not constitute a limitation on the electronic device, and may include more or fewer components than shown in the figure, or some components may be combined, or different components may be arranged.
射频电路501可用于收发信息,或通话过程中信号的接收和发送,特别地,将基站的下行信息接收后,交由一个或者一个以上处理器508处理;另外,将涉及上行的数据发送给基站。通常,射频电路501包括但不限于天线、至少一个放大器、调谐器、一个或多个振荡器、用户身份模块(SIM,Subscriber Identity Module)卡、收发信机、耦合器、低噪声放大器(LNA,Low Noise Amplifier)、双工器等。此外,射频电路501还可以通过无线通信与网络和其他设备通信。该无线通信可以使用任一通信标准或协议,包括但不限于全球移动通信系统(GSM,Global System of Mobile communication)、通用分组无线服务(GPRS,General Packet Radio Service)、码分多址(CDMA,Code Division Multiple Access)、宽带码分多址(WCDMA,Wideband Code Division Multiple Access)、长期演进(LTE,Long Term Evolution)、电子邮件、短消息服务(SMS,Short Messaging Service)等。The radio frequency circuit 501 can be used to send and receive information, or to receive and send signals during a call. In particular, after receiving the downlink information of the base station, it is processed by one or more processors 508. In addition, the uplink-related data is sent to the base station. . Generally, the radio frequency circuit 501 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a subscriber identity module (SIM, Subscriber Identity Module) card, a transceiver, a coupler, and a low noise amplifier (LNA, Low Noise Amplifier), duplexer, etc. In addition, the radio frequency circuit 501 can also communicate with a network and other devices through wireless communication. This wireless communication can use any communication standard or protocol, including but not limited to Global System for Mobile Communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA, Code Division Multiple Access), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), Email, Short Messaging Service (SMS), etc.
存储器502可用于存储应用程序和数据。存储器502存储的应用程序中包含有可执行代码。应用程序可以组成各种功能模块。处理器508通过运行存储在存储器502的应用程序,从而执行各种功能应用以及数据处理。存储器502可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据电子设备的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器502可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器502还可以包括存储器控制器,以提供处理器508和输入单元503对存储器502的访问。The memory 502 may be used to store application programs and data. The application program stored in the memory 502 contains executable code. Applications can be composed of various functional modules. The processor 508 executes various functional applications and data processing by running an application program stored in the memory 502. The memory 502 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, at least one application required by a function (such as a sound playback function, an image playback function, etc.), etc .; The data (such as audio data, phone book, etc.) created by the use of electronic devices. In addition, the memory 502 may include a high-speed random access memory, and may further include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices. Accordingly, the memory 502 may further include a memory controller to provide the processor 508 and the input unit 503 to access the memory 502.
输入单元503可用于接收输入的数字、字符信息或用户特征信息(比如指纹),以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。具体地,在一个具体的实施例中,输入单元503可包括触敏表面以及其他输入设备。触敏表面,也称为触摸显示屏或者触控板,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触敏表面上或在触敏表面附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触敏表面可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器508,并能接收处理器508发来的命令并加以执行。The input unit 503 can be used to receive inputted numbers, character information, or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control. Specifically, in a specific embodiment, the input unit 503 may include a touch-sensitive surface and other input devices. A touch-sensitive surface, also known as a touch display or touchpad, collects user touch operations on or near it (such as the user using a finger, stylus or any suitable object or accessory on the touch-sensitive surface or touch-sensitive Operation near the surface), and drive the corresponding connection device according to a preset program. Optionally, the touch-sensitive surface may include two parts, a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch position, and detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts it into contact coordinates, and sends it To the processor 508, and can receive the command sent by the processor 508 and execute it.
显示单元504可用于显示由用户输入的信息或提供给用户的信息以及电子设备的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。显示单元504可包括显示面板。可选的,可以采用液晶显示器(LCD,Liquid Crystal Display)、有机发光二极管(OLED,Organic Light-Emitting Diode)等形式来配置显示面板。进一步的,触敏表面可覆盖显示面板,当触敏表面检测到在其上或附近的触摸操作后,传送给处理器508以确定触摸事件的类型,随后处理器508根据触摸事件的类型在显示面板上提供相应的视觉输出。虽然在图9中,触敏表面与显示面板是作为两个独立的部件来实现输入和输入功能,但是在某些实施例中,可以将触敏表面与显示面板集成而实现输入和输出功能。The display unit 504 may be used to display information input by the user or information provided to the user and various graphical user interfaces of the electronic device. These graphical user interfaces may be composed of graphics, text, icons, videos, and any combination thereof. The display unit 504 may include a display panel. Optionally, the display panel may be configured using a liquid crystal display (LCD, Liquid Crystal Display), an organic light emitting diode (OLED, Organic Light-Emitting Diode), or the like. Further, the touch-sensitive surface may cover the display panel. When the touch-sensitive surface detects a touch operation on or near the touch-sensitive surface, the touch-sensitive surface is transmitted to the processor 508 to determine the type of the touch event, and then the processor 508 displays the touch event according to the type of the touch event. The corresponding visual output is provided on the panel. Although in FIG. 9, the touch-sensitive surface and the display panel are implemented as two separate components to implement input and input functions, in some embodiments, the touch-sensitive surface and the display panel may be integrated to implement input and output functions.
电子设备还可包括至少一种传感器505,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板的亮度,接近传感器可在电子设备移动到耳边时,关闭显示面板和/或背光。作为运动传感器的一种,重力加速度传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于电子设备还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。The electronic device may further include at least one sensor 505, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel according to the brightness of the ambient light, and the proximity sensor may close the display panel and / or when the electronic device is moved to the ear. Backlight. As a kind of motion sensor, the gravity acceleration sensor can detect the magnitude of acceleration in various directions (generally three axes). It can detect the magnitude and direction of gravity when it is stationary. It can be used to identify the attitude of the mobile phone (such as horizontal and vertical screen switching, related Games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tap), etc .; as for electronic devices, other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. can also be configured, not here More details.
音频电路506可通过扬声器、传声器提供用户与电子设备之间的音频接口。音频电路506可将接收到的音频数据转换成电信号,传输到扬声器,由扬声器转换为声音信号输出;另一方面,传声器将收集的声音信号转换为电信号,由音频电路506接收后转换为音频数据,再将音频数据输出处理器508处理后,经射频电路501以发送给比如另一电子设备,或者将音频数据输出至存储器502以便进一步处理。音频电路506还可能包括耳塞插孔,以提供外设耳机与电子设备的通信。The audio circuit 506 may provide an audio interface between the user and the electronic device through a speaker or a microphone. The audio circuit 506 can convert the received audio data into an electrical signal and transmit it to a speaker. The speaker converts the audio signal into a sound signal and outputs it. On the other hand, the microphone converts the collected sound signal into an electrical signal, which is converted by the audio circuit 506 into After the audio data is processed by the audio data output processor 508, it is sent to, for example, another electronic device via the radio frequency circuit 501, or the audio data is output to the memory 502 for further processing. The audio circuit 506 may further include an earphone jack to provide communication between the peripheral headset and the electronic device.
无线保真(WiFi)属于短距离无线传输技术,电子设备通过无线保真模块507可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图9示出了无线保真模块507,但是可以理解的是,其并不属于电子设备的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。Wireless fidelity (WiFi) is a short-range wireless transmission technology. Electronic devices can help users send and receive email, browse web pages, and access streaming media through the wireless fidelity module 507, which provides users with wireless broadband Internet access. Although FIG. 9 shows the wireless fidelity module 507, it can be understood that it does not belong to the necessary structure of the electronic device, and can be omitted as needed without changing the essence of the invention.
处理器508是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或执行存储在存储器502内的应用程序,以及调用存储在存储器502内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。可选的,处理器508可包括一个或多个处理核心;优选的,处理器508可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器508中。The processor 508 is a control center of an electronic device, and uses various interfaces and lines to connect various parts of the entire electronic device. The electronic device is executed by running or executing an application program stored in the memory 502 and calling data stored in the memory 502. Various functions and processing data to monitor the overall electronic equipment. Optionally, the processor 508 may include one or more processing cores; preferably, the processor 508 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, and an application program, etc. The modem processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 508.
电子设备还包括给各个部件供电的电源509(比如电池)。优选的,电源可以通过电源管理系统与处理器508逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源509还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。The electronic device also includes a power source 509 (such as a battery) that powers various components. Preferably, the power supply can be logically connected to the processor 508 through a power management system, so that functions such as managing charging, discharging, and power consumption management can be implemented through the power management system. The power supply 509 may also include one or more DC or AC power supplies, a recharging system, a power failure detection circuit, a power converter or inverter, a power supply status indicator, and any other components.
尽管图9中未示出,电子设备还可以包括摄像头、蓝牙模块等,在此不再赘述。Although not shown in FIG. 9, the electronic device may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
具体实施时,以上各个模块可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现,以上各个模块的具体实施可参见前面的方法实施例,在此不再赘述。In specific implementation, the above modules may be implemented as independent entities, or may be arbitrarily combined, and implemented as the same or several entities. For the specific implementation of the above modules, refer to the foregoing method embodiments, and details are not described herein again.
本领域普通技术人员可以理解,上述实施例的各种方法中的全部或部分步骤可以通过指令来完成,或通过指令控制相关的硬件来完成,该指令可以存储于一计算机可读存储介质中,并由处理器进行加载和执行。为此,本发明实施例提供一种存储介质,其中存储有多条指令,该指令能够被处理器进行加载,以执行本发明实施例所提供的任一种语音控制方法中的步骤。Those of ordinary skill in the art may understand that all or part of the steps in the various methods of the foregoing embodiments may be performed by instructions, or by controlling related hardware by instructions, and the instructions may be stored in a computer-readable storage medium. It is loaded and executed by the processor. To this end, an embodiment of the present invention provides a storage medium in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute steps in any one of the voice control methods provided by the embodiments of the present invention.
其中,该存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、磁盘或光盘等。The storage medium may include a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk.
由于该存储介质中所存储的指令,可以执行本发明实施例所提供的任一种语音控制方法中的步骤,因此,可以实现本发明实施例所提供的任一种语音控制方法所能实现的有益效果,详见前面的实施例,在此不再赘述。Since the instructions stored in the storage medium can execute the steps in any one of the voice control methods provided by the embodiments of the present invention, it is possible to implement what can be achieved by any one of the voice control methods provided by the embodiments of the present invention. For the beneficial effects, refer to the foregoing embodiments for details, and details are not described herein again.
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。For specific implementation of the foregoing operations, refer to the foregoing embodiments, and details are not described herein again.
综上该,虽然本申请已以优选实施例揭露如上,但上述优选实施例并非用以限制本申请,本领域的普通技术人员,在不脱离本申请的精神和范围内,均可作各种更动与润饰,因此本申请的保护范围以权利要求界定的范围为准。In summary, although the present application has been disclosed above with preferred embodiments, the above preferred embodiments are not intended to limit the present application. Those skilled in the art can make various modifications without departing from the spirit and scope of the present application. Changes and retouching, therefore, the scope of protection of this application is subject to the scope defined by the claims.

Claims (20)

  1. 一种语音控制方法,应用于电子设备,其包括:A voice control method applied to an electronic device includes:
    当所述电子设备处于语音监测状态时,利用监测的语音数据对已存储的语音段进行更新,所述语音段为最近预设时长内监测到的一段语音数据;When the electronic device is in a voice monitoring state, using the monitored voice data to update a stored voice segment, where the voice segment is a piece of voice data that has been monitored within a recently preset time period;
    在所述语音段的更新过程中,获取当前语音段;Obtaining the current voice segment during the update of the voice segment;
    从当前语音段中提取出声纹特征、以及关键词;Extracting voiceprint features and keywords from the current voice segment;
    根据所述声纹特征和关键词启动语音录制功能,并获取启动成功时刻的语音段作为目标语音段;Start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment;
    根据录制的语音数据和目标语音段对所述电子设备进行相应控制。The electronic device is controlled according to the recorded voice data and the target voice segment.
  2. 根据权利要求1所述的语音控制方法,其中,所述根据所述声纹特征和关键词启动语音录制功能,包括:The voice control method according to claim 1, wherein the activating a voice recording function according to the voiceprint characteristics and keywords comprises:
    判断所述声纹特征是否与预设特征匹配,且所述关键词是否与预设关键词匹配;Determining whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword;
    若是,则启动语音录制功能;If yes, start the voice recording function;
    若否,则返回执行所述获取当前语音段的操作。If not, return to performing the operation of acquiring the current voice segment.
  3. 根据权利要求1所述的语音控制方法,其中,所述根据录制的语音数据和目标语音段对所述电子设备进行相应控制,包括:The voice control method according to claim 1, wherein the controlling the electronic device according to the recorded voice data and a target voice segment comprises:
    对所述目标语音段和录制的语音数据进行拼接,得到拼接语音;Stitching the target voice segment and the recorded voice data to obtain a stitched voice;
    根据所述拼接语音确定控制指令;Determining a control instruction according to the spliced voice;
    控制所述电子设备执行所述控制指令。Controlling the electronic device to execute the control instruction.
  4. 根据权利要求3所述的语音控制方法,其中,所述根据所述拼接语音确定控制指令,包括:The voice control method according to claim 3, wherein the determining a control instruction according to the spliced voice comprises:
    将所述拼接语音发送至服务器,以使所述服务器对所述拼接语音进行语义解析,并返回相应的目标解析词;Sending the spliced speech to a server, so that the server semantically parses the spliced speech and returns a corresponding target parsed word;
    根据返回的所述目标解析词确定目标应用、以及应用事件;Determining a target application and an application event according to the returned target parsing word;
    根据所述目标应用和应用事件生成控制指令;Generating a control instruction according to the target application and an application event;
    所述控制所述电子设备执行所述控制指令,包括:利用所述目标应用执行所述应用事件。The controlling the electronic device to execute the control instruction includes using the target application to execute the application event.
  5. 根据权利要求4所述的语音控制方法,其中,所述根据所述目标解析词确定目标应用,包括:The voice control method according to claim 4, wherein the determining a target application based on the target parsing word comprises:
    判断所述目标解析词与已存储的解析词集是否匹配;Determining whether the target parsing word matches a stored parsing word set;
    若是,则根据匹配成功的解析词确定目标应用;If so, determine the target application based on the successfully parsed words;
    若否,则判断所述目标解析词与预设词集是否匹配;当匹配成功时,根据匹配成功的预设词确定目标应用,并将所述匹配成功的预设词添加在所述解析词集中,同时将所述匹配成功的预设词从预设词集中删除。If not, determine whether the target parsing word matches the preset word set; when the match is successful, determine the target application based on the successfully matched preset word, and add the successfully matched preset word to the parsed word And simultaneously delete the successfully matched preset word from the preset word set.
  6. 根据权利要求4所述的语音控制方法,其中,所述根据所述目标解析词确定目标应用,包括:The voice control method according to claim 4, wherein the determining a target application based on the target parsing word comprises:
    当所述目标解析词为应用名称时,将所述应用名称对应的应用作为目标应用;When the target parsing word is an application name, the application corresponding to the application name is used as the target application;
    当所述目标解析词为应用类型时,确定所述电子设备中属于同一所述应用类型的所有应用;根据确定的所述应用确定目标应用。When the target parsing word is an application type, all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined applications.
  7. 根据权利要求6所述的语音控制方法,其中,所述根据确定的所述应用确定目标应用,包括:The voice control method according to claim 6, wherein the determining a target application based on the determined application comprises:
    将确定的所述应用中使用频率最高的应用作为目标应用,或者,Using the determined application with the highest frequency as the target application, or
    根据确定的所述应用生成选择界面,并向用户提供所述选择界面;根据所述用户在所述选择界面上的选择操作确定目标应用。Generate a selection interface according to the determined application, and provide the selection interface to a user; determine a target application according to a selection operation of the user on the selection interface.
  8. 一种语音控制装置,应用于电子设备,其包括:A voice control device applied to an electronic device includes:
    更新模块,用于当所述电子设备处于语音监测状态时,利用监测的语音数据对已存储的 语音段进行更新,所述语音段为最近预设时长内监测到的一段语音数据;An update module, configured to update the stored voice segment by using the monitored voice data when the electronic device is in a voice monitoring state, where the voice segment is a piece of voice data that has been monitored within a recently preset duration;
    获取模块,用于在所述语音段的更新过程中,获取当前语音段;An obtaining module, configured to obtain a current voice segment during an update process of the voice segment;
    提取模块,用于从当前语音段中提取出声纹特征、以及关键词;An extraction module for extracting voiceprint features and keywords from the current voice segment;
    启动模块,用于根据所述声纹特征和关键词启动语音录制功能,并获取启动成功时刻的语音段作为目标语音段;A startup module, configured to start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment;
    控制模块,用于根据录制的语音数据和目标语音段对所述电子设备进行相应控制。A control module, configured to control the electronic device according to the recorded voice data and the target voice segment.
  9. 根据权利要求8所述的语音控制装置,其中,所述启动模块具体用于:The voice control device according to claim 8, wherein the startup module is specifically configured to:
    判断所述声纹特征是否与预设特征匹配,且所述关键词是否与预设关键词匹配;Determining whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword;
    若是,则启动语音录制功能;If yes, start the voice recording function;
    若否,则触发所述获取模块执行所述获取当前语音段的操作。If not, trigger the acquisition module to perform the operation of acquiring the current voice segment.
  10. 根据权利要求8所述的语音控制装置,其中,所述控制模块具体包括:The voice control device according to claim 8, wherein the control module specifically comprises:
    拼接单元,用于对所述目标语音段和录制的语音数据进行拼接,得到拼接语音;A splicing unit, configured to splice the target voice segment and the recorded voice data to obtain a spliced voice;
    确定单元,用于根据所述拼接语音确定控制指令;A determining unit, configured to determine a control instruction according to the spliced voice;
    执行单元,用于控制所述电子设备执行所述控制指令。An execution unit is configured to control the electronic device to execute the control instruction.
  11. 根据权利要求10所述的语音控制装置,其中,所述确定单元具体用于:The voice control device according to claim 10, wherein the determining unit is specifically configured to:
    将所述拼接语音发送至服务器,以使所述服务器对所述拼接语音进行语义解析,并返回相应的目标解析词;Sending the spliced speech to a server, so that the server semantically parses the spliced speech and returns a corresponding target parsed word;
    根据返回的所述目标解析词确定目标应用、以及应用事件;Determining a target application and an application event according to the returned target parsing word;
    根据所述目标应用和应用事件生成控制指令;Generating a control instruction according to the target application and an application event;
    所述执行单元用于:利用所述目标应用执行所述应用事件。The execution unit is configured to execute the application event by using the target application.
  12. 根据权利要求11所述的语音控制装置,其中,所述确定单元具体用于:The voice control device according to claim 11, wherein the determining unit is specifically configured to:
    判断所述目标解析词与已存储的解析词集是否匹配;Determining whether the target parsing word matches a stored parsing word set;
    若是,则根据匹配成功的解析词确定目标应用;If so, determine the target application based on the successfully parsed words;
    若否,则判断所述目标解析词与预设词集是否匹配;当匹配成功时,根据匹配成功的预设词确定目标应用,并将所述匹配成功的预设词添加在所述解析词集中,同时将所述匹配成功的预设词从预设词集中删除。If not, determine whether the target parsing word matches the preset word set; when the match is successful, determine the target application based on the successfully matched preset word, and add the successfully matched preset word to the parsed word And simultaneously delete the successfully matched preset word from the preset word set.
  13. 根据权利要求11所述的语音控制装置,其中,所述确定单元具体用于:The voice control device according to claim 11, wherein the determining unit is specifically configured to:
    当所述目标解析词为应用名称时,将所述应用名称对应的应用作为目标应用;When the target parsing word is an application name, the application corresponding to the application name is used as the target application;
    当所述目标解析词为应用类型时,确定所述电子设备中属于同一所述应用类型的所有应用;根据确定的所述应用确定目标应用。When the target parsing word is an application type, all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined applications.
  14. 根据权利要求13所述的语音控制装置,其中,所述确定单元具体用于:The voice control device according to claim 13, wherein the determining unit is specifically configured to:
    将确定的所述应用中使用频率最高的应用作为目标应用,或者,Using the determined application with the highest frequency as the target application, or
    根据确定的所述应用生成选择界面,并向用户提供所述选择界面;根据所述用户在所述选择界面上的选择操作确定目标应用。Generate a selection interface according to the determined application, and provide the selection interface to a user; determine a target application according to a selection operation of the user on the selection interface.
  15. 一种计算机可读存储介质,其中,所述存储介质中存储有多条指令,所述指令适于由处理器加载以执行权利要求1所述的语音控制方法。A computer-readable storage medium, wherein a plurality of instructions are stored in the storage medium, and the instructions are adapted to be loaded by a processor to execute the voice control method according to claim 1.
  16. 一种电子设备,其中,包括处理器和存储器,所述处理器与所述存储器电性连接,所述存储器用于存储指令和数据,所述处理器用于执行权利要求1所述的语音控制方法中的步骤。An electronic device comprising a processor and a memory, the processor is electrically connected to the memory, the memory is used to store instructions and data, and the processor is used to execute the voice control method of claim 1 Steps.
  17. 根据权利要求16所述的电子设备,其中,所述根据所述声纹特征和关键词启动语音录制功能,包括:The electronic device according to claim 16, wherein the activating a voice recording function according to the voiceprint characteristics and keywords comprises:
    判断所述声纹特征是否与预设特征匹配,且所述关键词是否与预设关键词匹配;Determining whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword;
    若是,则启动语音录制功能;If yes, start the voice recording function;
    若否,则返回执行所述获取当前语音段的操作。If not, return to performing the operation of acquiring the current voice segment.
  18. 根据权利要求16所述的电子设备,其中,所述根据录制的语音数据和目标语音段对所述电子设备进行相应控制,包括:The electronic device according to claim 16, wherein the controlling the electronic device according to the recorded voice data and the target voice segment comprises:
    对所述目标语音段和录制的语音数据进行拼接,得到拼接语音;Stitching the target voice segment and the recorded voice data to obtain a stitched voice;
    根据所述拼接语音确定控制指令;Determining a control instruction according to the spliced voice;
    控制所述电子设备执行所述控制指令。Controlling the electronic device to execute the control instruction.
  19. 根据权利要求18所述的电子设备,其中,所述根据所述拼接语音确定控制指令,包括:The electronic device according to claim 18, wherein the determining a control instruction according to the spliced voice comprises:
    将所述拼接语音发送至服务器,以使所述服务器对所述拼接语音进行语义解析,并返回相应的目标解析词;Sending the spliced speech to a server, so that the server semantically parses the spliced speech and returns a corresponding target parsed word;
    根据返回的所述目标解析词确定目标应用、以及应用事件;Determining a target application and an application event according to the returned target parsing word;
    根据所述目标应用和应用事件生成控制指令;Generating a control instruction according to the target application and an application event;
    所述控制所述电子设备执行所述控制指令,包括:利用所述目标应用执行所述应用事件。The controlling the electronic device to execute the control instruction includes using the target application to execute the application event.
  20. 根据权利要求19所述的电子设备,其中,所述根据所述目标解析词确定目标应用,包括:The electronic device according to claim 19, wherein the determining a target application based on the target parsing word comprises:
    判断所述目标解析词与已存储的解析词集是否匹配;Determining whether the target parsing word matches a stored parsing word set;
    若是,则根据匹配成功的解析词确定目标应用;If so, determine the target application based on the successfully parsed words;
    若否,则判断所述目标解析词与预设词集是否匹配;当匹配成功时,根据匹配成功的预设词确定目标应用,并将所述匹配成功的预设词添加在所述解析词集中,同时将所述匹配成功的预设词从预设词集中删除。If not, determine whether the target parsing word matches the preset word set; when the match is successful, determine the target application based on the successfully matched preset word, and add the successfully matched preset word to the parsed word And simultaneously delete the successfully matched preset word from the preset word set.
PCT/CN2019/085720 2018-06-27 2019-05-06 Voice control method and apparatus, and storage medium and electronic device WO2020001165A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810681095.6 2018-06-27
CN201810681095.6A CN108694947B (en) 2018-06-27 2018-06-27 Voice control method, device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
WO2020001165A1 true WO2020001165A1 (en) 2020-01-02

Family

ID=63849986

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/085720 WO2020001165A1 (en) 2018-06-27 2019-05-06 Voice control method and apparatus, and storage medium and electronic device

Country Status (2)

Country Link
CN (1) CN108694947B (en)
WO (1) WO2020001165A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108694947B (en) * 2018-06-27 2020-06-19 Oppo广东移动通信有限公司 Voice control method, device, storage medium and electronic equipment
CN110060693A (en) * 2019-04-16 2019-07-26 Oppo广东移动通信有限公司 Model training method, device, electronic equipment and storage medium
CN112053696A (en) * 2019-06-05 2020-12-08 Tcl集团股份有限公司 Voice interaction method and device and terminal equipment
CN112397062A (en) 2019-08-15 2021-02-23 华为技术有限公司 Voice interaction method, device, terminal and storage medium
CN113129893B (en) * 2019-12-30 2022-09-02 Oppo(重庆)智能科技有限公司 Voice recognition method, device, equipment and storage medium
CN111583929A (en) * 2020-05-13 2020-08-25 军事科学院系统工程研究院后勤科学与技术研究所 Control method and device using offline voice and readable equipment
CN112581957B (en) * 2020-12-04 2023-04-11 浪潮电子信息产业股份有限公司 Computer voice control method, system and related device
CN112634916A (en) * 2020-12-21 2021-04-09 久心医疗科技(苏州)有限公司 Automatic voice adjusting method and device for defibrillator

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060155549A1 (en) * 2005-01-12 2006-07-13 Fuji Photo Film Co., Ltd. Imaging device and image output device
US20080256613A1 (en) * 2007-03-13 2008-10-16 Grover Noel J Voice print identification portal
CN102510426A (en) * 2011-11-29 2012-06-20 安徽科大讯飞信息科技股份有限公司 Personal assistant application access method and system
CN104575504A (en) * 2014-12-24 2015-04-29 上海师范大学 Method for personalized television voice wake-up by voiceprint and voice identification
CN106653021A (en) * 2016-12-27 2017-05-10 上海智臻智能网络科技股份有限公司 Voice wake-up control method and device and terminal
CN107147618A (en) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 A kind of user registering method, device and electronic equipment
CN107464557A (en) * 2017-09-11 2017-12-12 广东欧珀移动通信有限公司 Call recording method, device, mobile terminal and storage medium
CN108154882A (en) * 2017-12-13 2018-06-12 广东美的制冷设备有限公司 The control method and control device of remote control equipment, storage medium and remote control equipment
CN108694947A (en) * 2018-06-27 2018-10-23 Oppo广东移动通信有限公司 Sound control method, device, storage medium and electronic equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060155549A1 (en) * 2005-01-12 2006-07-13 Fuji Photo Film Co., Ltd. Imaging device and image output device
US20080256613A1 (en) * 2007-03-13 2008-10-16 Grover Noel J Voice print identification portal
CN102510426A (en) * 2011-11-29 2012-06-20 安徽科大讯飞信息科技股份有限公司 Personal assistant application access method and system
CN104575504A (en) * 2014-12-24 2015-04-29 上海师范大学 Method for personalized television voice wake-up by voiceprint and voice identification
CN106653021A (en) * 2016-12-27 2017-05-10 上海智臻智能网络科技股份有限公司 Voice wake-up control method and device and terminal
CN107147618A (en) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 A kind of user registering method, device and electronic equipment
CN107464557A (en) * 2017-09-11 2017-12-12 广东欧珀移动通信有限公司 Call recording method, device, mobile terminal and storage medium
CN108154882A (en) * 2017-12-13 2018-06-12 广东美的制冷设备有限公司 The control method and control device of remote control equipment, storage medium and remote control equipment
CN108694947A (en) * 2018-06-27 2018-10-23 Oppo广东移动通信有限公司 Sound control method, device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN108694947A (en) 2018-10-23
CN108694947B (en) 2020-06-19

Similar Documents

Publication Publication Date Title
WO2020001165A1 (en) Voice control method and apparatus, and storage medium and electronic device
CN108829235B (en) Voice data processing method and electronic device supporting the same
US11670302B2 (en) Voice processing method and electronic device supporting the same
US11435980B2 (en) System for processing user utterance and controlling method thereof
US11955124B2 (en) Electronic device for processing user speech and operating method therefor
US11145302B2 (en) System for processing user utterance and controlling method thereof
US11042703B2 (en) Method and device for generating natural language expression by using framework
KR20180117485A (en) Electronic device for processing user utterance and method for operation thereof
CN108804070B (en) Music playing method and device, storage medium and electronic equipment
CN110830368B (en) Instant messaging message sending method and electronic equipment
WO2015043200A1 (en) Method and apparatus for controlling applications and operations on a terminal
US11915700B2 (en) Device for processing user voice input
KR20190032026A (en) Method for providing natural language expression and electronic device supporting the same
US11194545B2 (en) Electronic device for performing operation according to user input after partial landing
KR20190113130A (en) The apparatus for processing user voice input
CN111580911A (en) Operation prompting method and device for terminal, storage medium and terminal
CN111522592A (en) Intelligent terminal awakening method and device based on artificial intelligence
CN108711428B (en) Instruction execution method and device, storage medium and electronic equipment
CN106933626B (en) Application association method and device
WO2015067116A1 (en) Method and apparatus for processing speech texts
CN111897916A (en) Voice instruction recognition method and device, terminal equipment and storage medium
CN111027406A (en) Picture identification method and device, storage medium and electronic equipment
CN104978168B (en) Prompting method and device for operation information
CN111142832A (en) Input identification method and device, storage medium and terminal
CN113506571A (en) Control method, mobile terminal and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19825379

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19825379

Country of ref document: EP

Kind code of ref document: A1