WO2020001165A1 - Procédé et appareil de commande vocale, et support de stockage et dispositif électronique - Google Patents

Procédé et appareil de commande vocale, et support de stockage et dispositif électronique Download PDF

Info

Publication number
WO2020001165A1
WO2020001165A1 PCT/CN2019/085720 CN2019085720W WO2020001165A1 WO 2020001165 A1 WO2020001165 A1 WO 2020001165A1 CN 2019085720 W CN2019085720 W CN 2019085720W WO 2020001165 A1 WO2020001165 A1 WO 2020001165A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
target
application
word
electronic device
Prior art date
Application number
PCT/CN2019/085720
Other languages
English (en)
Chinese (zh)
Inventor
陈岩
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2020001165A1 publication Critical patent/WO2020001165A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/16Storage of analogue signals in digital stores using an arrangement comprising analogue/digital [A/D] converters, digital memories and digital/analogue [D/A] converters 
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the field of computer technology, and in particular, to a voice control method, device, storage medium, and electronic device.
  • voice assistants for mobile terminals have also become a commonly used function.
  • the user can use the voice assistant function of the mobile terminal to perform voice interaction with the machine assistant, so that the machine assistant can complete various operations on the mobile terminal under the user's voice control, and also include various operations on applications on the mobile terminal, such as Set up schedules, turn on alarms, set up to-do items, open apps, make calls, and more.
  • the startup process of the existing voice assistant mainly includes two phases, the wake-up phase and the preparation phase.
  • the terminal can monitor the user's voice in real time.
  • the system performs related preparations to start the voice assistant.
  • the waiting time is longer and the voice consistency is poor.
  • the embodiments of the present application provide a voice control method, a device, a storage medium, and an electronic device.
  • the voice instructions can be issued without waiting for the system to be ready, and the voice consistency is better.
  • An embodiment of the present application provides a voice control method applied to an electronic device, including:
  • the electronic device When the electronic device is in a voice monitoring state, using the monitored voice data to update a stored voice segment, where the voice segment is a piece of voice data that has been monitored within a recently preset time period;
  • the electronic device is controlled according to the recorded voice data and the target voice segment.
  • An embodiment of the present application further provides a voice control device, which is applied to an electronic device and includes:
  • An update module configured to update the stored voice segment by using the monitored voice data when the electronic device is in a voice monitoring state, where the voice segment is a piece of voice data that has been monitored within a recently preset duration;
  • An obtaining module configured to obtain a current voice segment during an update process of the voice segment
  • An extraction module for extracting voiceprint features and keywords from the current voice segment
  • a startup module configured to start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment;
  • a control module configured to control the electronic device according to the recorded voice data and the target voice segment.
  • An embodiment of the present application further provides a computer-readable storage medium, where a plurality of instructions are stored in the storage medium, and the instructions are adapted to be loaded by a processor to execute any one of the foregoing voice control methods.
  • An embodiment of the present application further provides an electronic device including a processor and a memory, the processor is electrically connected to the memory, the memory is used for storing instructions and data, and the processor is used for any one of the foregoing devices.
  • FIG. 1 is a schematic diagram of an application scenario of a voice control system according to an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a voice control method according to an embodiment of the present application.
  • FIG. 3 is another schematic flowchart of a voice control method according to an embodiment of the present application.
  • FIG. 4 is a schematic framework diagram of a voice control process according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a server parsing process provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a voice control device according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a control module according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • FIG. 9 is another schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the embodiments of the present application provide a voice control method, a device, a storage medium, and an electronic device.
  • FIG. 1 provides a schematic diagram of an application scenario of a voice control system.
  • the voice control system may include any voice control device provided in an embodiment of the present application.
  • the voice control device may be integrated into an electronic device.
  • the electronic device may include a touch-enabled device such as a smart phone or a tablet.
  • the electronic device can use the monitored voice data to update the stored voice segment, which is a piece of voice data that has been monitored within a recently preset time period; the update in the voice segment
  • the current voice segment is obtained, and voiceprint features and keywords are extracted from the current voice segment;
  • the voice recording function is started according to the voiceprint feature and keywords, and the voice segment at the moment of successful startup is obtained as the target voice segment;
  • the electronic device is controlled according to the recorded voice data and the target voice segment.
  • the preset time length may be artificially set to 3 seconds.
  • the preset time length is about the same as the system preparation time when the voice recording function is activated.
  • the electronic device can monitor the user's voice in real time, and save the last 3 seconds of voice data in real-time during the monitoring process for voiceprint analysis and keyword matching.
  • the mobile phone can start recording when user A sends "small x small x” and get the moment of successful startup Voice segment, such as "play xx shall”, stitch it with the subsequent recorded voice "using xxx songs in use” to get the continuous voice of "play xxx songs in xx applications", and perform the mobile phone according to the continuous voice Control accordingly.
  • a voice control method applied to an electronic device includes:
  • the electronic device When the electronic device is in a voice monitoring state, using the monitored voice data to update a stored voice segment, where the voice segment is a piece of voice data that has been monitored within a recently preset time period;
  • the electronic device is controlled according to the recorded voice data and the target voice segment.
  • the starting a voice recording function according to the voiceprint characteristics and keywords includes:
  • the controlling the electronic device according to the recorded voice data and a target voice segment includes:
  • the determining a control instruction according to the spliced voice includes:
  • the controlling the electronic device to execute the control instruction includes using the target application to execute the application event.
  • the determining a target application according to the target parsing word includes:
  • the determining a target application according to the target parsing word includes:
  • the target parsing word is an application name
  • the application corresponding to the application name is used as the target application
  • the target parsing word is an application type
  • all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined applications.
  • the determining a target application based on the determined application includes:
  • FIG. 2 is a schematic flowchart of a voice control method provided by an embodiment of the present application, which is applied to an electronic device.
  • the specific process may be as follows:
  • the stored voice segment is updated by using the monitored voice data, and the voice segment is a piece of voice data that has been monitored within a recently preset time period.
  • voice can be monitored through a device such as a microphone.
  • the preset time can be set manually, for example, 3 seconds, which is usually about the same as the system preparation time when the voice recording function is activated.
  • the electronic device can monitor the user's voice in a low power consumption state, and save the voice data detected in the most recent preset time period in real time.
  • voice monitoring and voice segment updating are performed in real time, and in this process, the electronic device may also perform real time analysis on the voice segment.
  • the voiceprint feature is mainly a frequency spectrum feature, which may include frequency, amplitude, and other information, and usually reflects the characteristics of the sound's loudness, tone, and timbre. Different people generally have different voiceprint features.
  • the speech segments are converted into spectral data by Fourier transform, and then the relevant information is extracted from the spectral data as corresponding voiceprint features.
  • the keyword may include at least one text, and the text may be English or Chinese.
  • step 104 may specifically include:
  • the preset feature is mainly a voiceprint feature bound to the user, which is usually set in advance.
  • the user may be required to record a voice in advance, and the voiceprint feature is extracted from the voice as a preset. Feature to bind with this user.
  • the preset keyword is mainly used to trigger the activation of the voice interaction function, which can be set by the system by default (that is, set by the manufacturer when the electronic device leaves the factory), or can be set by the user according to his own preferences, such as Enter the corresponding setting window through different interfaces of the related setting interface for setting.
  • the electronic device needs to perform a series of preparations, such as interrupting the operation of the foreground application, setting the call parameter settings of the voice recording component, etc. During this preparation process, the electronic device remains In the state of voice monitoring, the voice segment is continuously updated at this time, and after the voice recording function is successfully started, the voice segment can be stopped from updating.
  • step 105 may specifically include:
  • the target voice segment and the recorded voice data are spliced to obtain a spliced voice.
  • the voice duration of the voice segment is similar to the system preparation duration, that is, the voice segment saved at the time when the system is ready (that is, when the voice recording function is successfully started), it can be exactly the key when the user speaks a preset
  • the continuous utterance after the word can be directly used as the starting content of the recorded speech and spliced with the subsequent recorded speech data to form a continuous speech.
  • steps 1-2 may specifically include:
  • Control instructions are generated based on the target application and application event.
  • the server is mainly used for semantic analysis.
  • the electronic device can transmit the spliced voice to the server in real time, and the server can be analyzed by a trained voice analysis model.
  • the semantic analysis model can It is a deep learning model.
  • the server can collect different voice samples in advance to train the deep learning model.
  • step "determining a target application based on the returned target parsing word” may specifically include:
  • the parsed word set and the preset word set are mainly application related words, such as an application name or an application type.
  • the parsed word set is mainly parsed out when the electronic device requests a server to perform semantic parsing in a historical period.
  • the preset word set is mainly set by the system by default, for example, each time an application is installed on the terminal, an application related word of the application can be obtained.
  • the target parsing word is an application name
  • the application corresponding to the application name may be directly used as the target application.
  • the parsing word is an application type
  • all applications in the electronic device that belong to the same application type may be determined first. , You can then use the most frequently used application as the target application, or provide users with a selection interface based on these applications, and determine the target application through the user's selection operation.
  • the preset word set may include all words set by the system by default, the parsing word set may be empty, and each time after the electronic device obtains a new parsing word, the new parsing word may be The parsed words in are stored in the parsed word set and deleted from the preset word set at the same time, so as to continuously update the parsed word set and the preset word set.
  • steps 1-3 may specifically include:
  • the target application may be started first, and then the target application is used to execute a corresponding application event.
  • the voice control method provided in this embodiment is applied to an electronic device.
  • the electronic device When the electronic device is in a voice monitoring state, the stored voice segment is updated by using the monitored voice data, and the voice segment is a recent preset A piece of voice data detected within the duration, then, during the update of the voice segment, the current voice segment is obtained, and voiceprint features and keywords are extracted from the current voice segment.
  • the voiceprint feature and key To activate the voice recording function, and obtain the voice segment at the moment of successful startup as the target voice segment, and then control the electronic device accordingly according to the recorded voice data and the target voice segment, so that it can directly wake up and input interactive instructions through coherent voice, without the need to Voice interruption due to the long system preparation time, the method is simple, can effectively improve the efficiency of voice interaction, and the voice interaction effect is good.
  • the electronic device uses the monitored voice data to update a stored voice segment, which is a segment of voice data that has been monitored within a recently preset time period.
  • the preset duration may be artificially set to 3 seconds. As long as the electronic device is on, voice monitoring can be performed all the time, and for the monitored voice data, the electronic device can only save the voice data within the last 3 seconds.
  • the electronic device obtains the current voice segment, and extracts voiceprint features and keywords from the current voice segment.
  • the electronic device when the user says “small x small x, play xxx song in xx application” to the electronic device, since the voice monitoring operation and voice segment update operation are performed in real time, the user says The first 3 seconds of speech, such as "small x small x", can be stored as the initial speech segment, and then the electronic device will perform voiceprint feature and keyword extraction operations on the stored speech segment, such as using Fourier Leaf change converts it into spectrum data, and extracts relevant information from the spectrum data as corresponding voiceprint features. At the same time, it extracts the content of the speech segment to obtain keywords.
  • voiceprint feature and keyword extraction operations such as using Fourier Leaf change converts it into spectrum data, and extracts relevant information from the spectrum data as corresponding voiceprint features.
  • the electronic device determines whether the voiceprint feature matches a preset feature, and whether the keyword matches a preset keyword. If yes, execute step 204 below; if not, return to execute step 202 above.
  • the preset feature may be a voiceprint feature input by the user in advance
  • the preset keyword may be a phrase set by the system by default, which may include at least two words or words.
  • the preset keywords should be short, such as "small x".
  • the electronic device starts a voice recording function, and obtains a voice segment at a successful start time as a target voice segment.
  • the electronic device splices the target voice segment and the recorded voice data to obtain a spliced voice, and then sends the spliced voice to a server, so that the server semantically parses the spliced voice and returns a corresponding target parsed word.
  • the voice recording function is successfully turned on, since subsequent users' words are saved by recording, there is no need to repeatedly save the voice segment update method. At this time, you can stop the update operation of the voice segment, and The voice segment "Play xx application" is used as the starting content of the recorded voice and the subsequent recorded voice data is spliced into a continuous voice. For example, in the first 2 seconds of recording, the spliced voice can be "play xx application”. At the same time, the spliced speech is transmitted to the server in real time for semantic analysis.
  • the electronic device determines whether the target parsing word matches the stored parsing word set. If yes, determines the target application based on the successfully matched parsing word, and determines an application event based on the target parsing word. If not, performs the following steps. 207.
  • the electronic device determines whether the target parsing word matches a preset word set, and if so, determines a target application based on the successfully matched preset word, determines an application event based on the target parsing word, and then sets the successfully matched preset word. It is added to the parsed word set, and the successfully matched preset word is deleted from the preset word set. If not, the user is prompted to re-enter the voice.
  • the electronic device can first match it with the previous parsing record. Only when the matching is unsuccessful, the matching is performed in the preset word set.
  • the target analytic word is an application name
  • the application corresponding to the application name can be directly used as the target application.
  • the target analytic word parsed from the voice of “playing xxx song in xx application” can be Is "xx application”
  • the application event can be: play xxx songs.
  • the target analytic word is an application type
  • all applications belonging to the same application type in the electronic device may be determined first, and then the application with the highest frequency of use may be used as the target application, for example, from the voice of "playing xxx songs"
  • the parsed target analytic word can be "music player application”, and the application event can be: playing xxx songs.
  • C1, C2, and C3 applications belonging to the music player application can be found in the electronic device. If the C1 application is used the most frequently, The target application is a C1 application.
  • the electronic device executes the application event using the target application.
  • the voice control method provided in this embodiment is applied to an electronic device.
  • the electronic device can use the monitored voice data to update the stored voice segment, and the voice segment is the latest preset duration.
  • a piece of voice data is monitored within the voice segment, and in the process of updating the voice segment, the current voice segment is obtained, and voiceprint features and keywords are extracted from the current voice segment.
  • the voiceprint feature is consistent with a preset Feature matching, and whether the keyword matches a preset keyword, if so, start the voice recording function, and obtain the voice segment at the moment of successful startup as the target voice segment, and then stitch the target voice segment and the recorded voice data , Get the spliced voice, so that you can directly wake up and input interactive instructions through coherent voice, without having to interrupt the voice due to the system preparation time, the method is simple.
  • the spliced speech is sent to the server, so that the server semantically parses the spliced speech, and returns the corresponding target parsing word.
  • the target application is determined according to the successfully parsed word, and the application event is determined according to the target parsed word. If not, it is determined whether the target parsed word matches the preset word set. If so, the target application is determined based on the successfully matched preset word. , And determine an application event according to the target parsed word, then add the successfully matched preset word to the parsed word set, at the same time delete the successfully matched preset word from the preset word set, and finally use the target application to execute the Applying events can improve the efficiency of parsing word matching based on the user's past voice interaction habits, effectively improve the efficiency of voice interaction, and improve the user experience effect.
  • the voice control device may be specifically implemented as an independent entity or integrated in an electronic device, such as a terminal.
  • the terminal may include a mobile phone, a tablet computer, and the like.
  • a voice control device applied to electronic equipment includes:
  • An update module configured to update the stored voice segment by using the monitored voice data when the electronic device is in a voice monitoring state, where the voice segment is a piece of voice data that has been monitored within a recently preset duration;
  • An obtaining module configured to obtain a current voice segment during an update process of the voice segment
  • An extraction module for extracting voiceprint features and keywords from the current voice segment
  • a startup module configured to start a voice recording function according to the voiceprint characteristics and keywords, and obtain a voice segment at a successful startup time as a target voice segment;
  • a control module configured to control the electronic device according to the recorded voice data and the target voice segment.
  • the startup module is specifically configured to:
  • control module specifically includes:
  • a splicing unit configured to splice the target voice segment and the recorded voice data to obtain a spliced voice
  • a determining unit configured to determine a control instruction according to the spliced voice
  • An execution unit is configured to control the electronic device to execute the control instruction.
  • the determining unit is specifically configured to:
  • the execution unit is configured to execute the application event by using the target application.
  • the determining unit is specifically configured to:
  • the determining unit is specifically configured to:
  • the target parsing word is an application name
  • the application corresponding to the application name is used as the target application
  • the target parsing word is an application type
  • all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined applications.
  • the determining unit is specifically configured to:
  • FIG. 6 specifically describes a voice control device provided in an embodiment of the present application, which is applied to electronic equipment.
  • the voice control device may include: an update module 10, an acquisition module 20, an extraction module 30, a startup module 40, and a control module. 50 of which:
  • the update module 10 is configured to update the stored voice segment by using the monitored voice data when the electronic device is in a voice monitoring state, where the voice segment is a piece of voice data that has been monitored within a recently preset time period.
  • voice can be monitored through a device such as a microphone.
  • the preset time can be set manually, for example, 3 seconds, which is usually about the same as the system preparation time when the voice recording function is activated.
  • the electronic device can monitor the user's voice in a low power consumption state, and the update module 10 saves the voice data detected in the most recent preset time period in real time.
  • the obtaining module 20 is configured to obtain a current voice segment during an update process of the voice segment.
  • voice monitoring and voice segment updating are performed in real time, and in this process, the electronic device may also perform real time analysis on the voice segment.
  • the extraction module 30 is configured to extract voiceprint features and keywords from a current voice segment.
  • the voiceprint feature is mainly a frequency spectrum feature, which may include frequency, amplitude, and other information, and usually reflects the characteristics of the sound's loudness, tone, and timbre. Different people generally have different voiceprint features.
  • the speech segments are converted into spectral data by Fourier transform, and then the relevant information is extracted from the spectral data as corresponding voiceprint features.
  • the keyword may include at least one text, and the text may be English or Chinese.
  • the starting module 40 is configured to start a voice recording function according to the voiceprint feature and keywords, and obtain a voice segment at a successful startup time as a target voice segment.
  • the startup module 40 may be specifically used for:
  • the acquisition module is triggered to perform the operation of acquiring the current voice segment.
  • the preset feature is mainly a voiceprint feature bound to the user, which is usually set in advance.
  • the user may be required to record a voice in advance, and the voiceprint feature is extracted from the voice as a preset. Feature to bind with this user.
  • the preset keyword is mainly used to trigger the activation of the voice interaction function, which can be set by the system by default (that is, set by the manufacturer when the electronic device leaves the factory), or can be set by the user according to his own preferences, such as Enter the corresponding setting window through different interfaces of the related setting interface for setting.
  • the electronic device needs to perform a series of preparations, such as interrupting the operation of the foreground application, setting the call parameter settings of the voice recording component, etc.
  • a series of preparations such as interrupting the operation of the foreground application, setting the call parameter settings of the voice recording component, etc.
  • the voice segment is continuously updated at this time, and after the voice recording function is successfully started, the voice segment can be stopped from updating.
  • the control module 50 is configured to control the electronic device according to the recorded voice data and the target voice segment.
  • control module 50 may specifically include:
  • the splicing unit 51 is configured to splice the target voice segment and the recorded voice data to obtain a spliced voice.
  • the voice duration of the voice segment is similar to the system preparation duration, that is, the voice segment saved at the time when the system is ready (that is, when the voice recording function is successfully started), it can be exactly the key when the user speaks a preset
  • the continuous utterance after the word can be directly used as the starting content of the recorded speech and spliced with the subsequent recorded speech data to form a continuous speech.
  • a determining unit 52 is configured to determine a control instruction according to the stitching voice.
  • the determining unit 52 may be specifically configured to:
  • the server is mainly used for semantic analysis.
  • the electronic device can transmit the spliced voice to the server in real time, and the server can be analyzed by a trained voice analysis model.
  • the semantic analysis model can It is a deep learning model.
  • the server can collect different voice samples in advance to train the deep learning model.
  • determining unit 52 may be specifically configured to:
  • Control instructions are generated based on the target application and application event.
  • the parsed word set and the preset word set are mainly application related words, such as an application name or an application type.
  • the parsed word set is mainly parsed out when the electronic device requests a server to perform semantic parsing in a historical period.
  • the preset word set is mainly set by the system by default, for example, each time an application is installed on the terminal, an application related word of the application can be obtained.
  • the target parsing word is an application name
  • the application corresponding to the application name may be directly used as the target application.
  • the parsing word is an application type
  • all applications in the electronic device that belong to the same application type may be determined first. , You can then use the most frequently used application as the target application, or provide users with a selection interface based on these applications, and determine the target application through the user's selection operation.
  • the preset word set may include all words set by the system by default, the parsing word set may be empty, and each time after the electronic device obtains a new parsing word, the new parsing word may be The parsed words in are stored in the parsed word set and deleted from the preset word set at the same time, so as to continuously update the parsed word set and the preset word set.
  • the execution unit 53 is configured to control the electronic device to execute the control instruction.
  • execution unit 53 may be specifically configured to:
  • the execution unit 53 may start the target application first, and then use the target application to execute a corresponding application event.
  • the above units may be implemented as independent entities, or may be arbitrarily combined, and implemented as the same or several entities.
  • the above units refer to the foregoing method embodiments, and details are not described herein again.
  • the voice control apparatus is applied to an electronic device.
  • the update module 10 updates the stored voice segment by using the monitored voice data, and the voice segment is the most recent A piece of voice data detected within a preset duration.
  • the acquisition module 20 acquires the current voice segment during the update of the voice segment, and the extraction module 30 extracts voiceprint features and keywords from the current voice segment.
  • the starting module 40 starts the voice recording function according to the voiceprint characteristics and keywords, and obtains the voice segment at the moment of successful startup as the target voice segment, and then the control module 50 controls the electronic device accordingly according to the recorded voice data and the target voice segment. Therefore, it is possible to directly wake up and input interactive instructions through continuous voice, without interrupting the voice due to the system preparation time.
  • the method is simple, can effectively improve the efficiency of voice interaction, and has a good voice interaction effect.
  • the embodiment of the present application further provides an electronic device, and the electronic device may be a device such as a smart phone or a tablet computer.
  • the electronic device 400 includes a processor 401 and a memory 402.
  • the processor 401 is electrically connected to the memory 402.
  • the processor 401 is the control center of the electronic device 400. It uses various interfaces and lines to connect various parts of the entire electronic device.
  • the processor 401 runs or loads applications stored in the memory 402, and calls the data stored in the memory 402 to execute the electronics. Various functions of the device and processing data, so as to monitor the overall electronic equipment.
  • the processor 401 in the electronic device 400 will load instructions corresponding to one or more application processes into the memory 402 according to the following steps, and the processor 401 will run and store the instructions in the memory 402 Applications in order to implement various functions:
  • the stored voice segment is updated by using the monitored voice data, and the voice segment is a piece of voice data that has been monitored within a recently preset time period;
  • the electronic device is controlled according to the recorded voice data and the target voice segment.
  • the starting a voice recording function according to the voiceprint characteristics and keywords includes:
  • the controlling the electronic device according to the recorded voice data and a target voice segment includes:
  • the determining a control instruction according to the spliced voice includes:
  • the controlling the electronic device to execute the control instruction includes using the target application to execute the application event.
  • the determining a target application according to the target parsing word includes:
  • the determining a target application according to the target parsing word includes:
  • the target parsing word is an application name
  • the application corresponding to the application name is used as the target application
  • the target parsing word is an application type
  • all applications in the electronic device that belong to the same application type are determined; and the target application is determined according to the determined application.
  • the determining a target application based on the determined application includes:
  • the electronic device 500 may include a radio frequency (RF) circuit 501, a memory 502 including one or more computer-readable storage media, an input unit 503, a display unit 504, a sensor 504, an audio circuit 506, and a wireless fidelity (
  • RF radio frequency
  • a WiFi (Wireless Fidelity) module 507 includes a processor 508 having one or more processing cores, a power supply 509, and other components.
  • the radio frequency circuit 501 can be used to send and receive information, or to receive and send signals during a call. In particular, after receiving the downlink information of the base station, it is processed by one or more processors 508. In addition, the uplink-related data is sent to the base station. .
  • the radio frequency circuit 501 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a subscriber identity module (SIM, Subscriber Identity Module) card, a transceiver, a coupler, and a low noise amplifier (LNA, Low Noise Amplifier), duplexer, etc.
  • the radio frequency circuit 501 can also communicate with a network and other devices through wireless communication.
  • This wireless communication can use any communication standard or protocol, including but not limited to Global System for Mobile Communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA, Code Division Multiple Access), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), Email, Short Messaging Service (SMS), etc.
  • GSM Global System for Mobile Communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • Email Short Messaging Service
  • the memory 502 may be used to store application programs and data.
  • the application program stored in the memory 502 contains executable code.
  • Applications can be composed of various functional modules.
  • the processor 508 executes various functional applications and data processing by running an application program stored in the memory 502.
  • the memory 502 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, at least one application required by a function (such as a sound playback function, an image playback function, etc.), etc .;
  • the data (such as audio data, phone book, etc.) created by the use of electronic devices.
  • the memory 502 may include a high-speed random access memory, and may further include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices. Accordingly, the memory 502 may further include a memory controller to provide the processor 508 and the input unit 503 to access the memory 502.
  • a non-volatile memory such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the memory 502 may further include a memory controller to provide the processor 508 and the input unit 503 to access the memory 502.
  • the input unit 503 can be used to receive inputted numbers, character information, or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.
  • the input unit 503 may include a touch-sensitive surface and other input devices.
  • a touch-sensitive surface also known as a touch display or touchpad, collects user touch operations on or near it (such as the user using a finger, stylus or any suitable object or accessory on the touch-sensitive surface or touch-sensitive Operation near the surface), and drive the corresponding connection device according to a preset program.
  • the touch-sensitive surface may include two parts, a touch detection device and a touch controller.
  • the touch detection device detects the user's touch position, and detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts it into contact coordinates, and sends it To the processor 508, and can receive the command sent by the processor 508 and execute it.
  • the display unit 504 may be used to display information input by the user or information provided to the user and various graphical user interfaces of the electronic device. These graphical user interfaces may be composed of graphics, text, icons, videos, and any combination thereof.
  • the display unit 504 may include a display panel.
  • the display panel may be configured using a liquid crystal display (LCD, Liquid Crystal Display), an organic light emitting diode (OLED, Organic Light-Emitting Diode), or the like.
  • the touch-sensitive surface may cover the display panel.
  • the touch-sensitive surface When the touch-sensitive surface detects a touch operation on or near the touch-sensitive surface, the touch-sensitive surface is transmitted to the processor 508 to determine the type of the touch event, and then the processor 508 displays the touch event according to the type of the touch event.
  • the corresponding visual output is provided on the panel.
  • the touch-sensitive surface and the display panel are implemented as two separate components to implement input and input functions, in some embodiments, the touch-sensitive surface and the display panel may be integrated to implement input and output functions.
  • the electronic device may further include at least one sensor 505, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel according to the brightness of the ambient light, and the proximity sensor may close the display panel and / or when the electronic device is moved to the ear.
  • the gravity acceleration sensor can detect the magnitude of acceleration in various directions (generally three axes). It can detect the magnitude and direction of gravity when it is stationary.
  • attitude of the mobile phone such as horizontal and vertical screen switching, related Games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tap), etc .; as for electronic devices, other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. can also be configured, not here More details.
  • the audio circuit 506 may provide an audio interface between the user and the electronic device through a speaker or a microphone.
  • the audio circuit 506 can convert the received audio data into an electrical signal and transmit it to a speaker.
  • the speaker converts the audio signal into a sound signal and outputs it.
  • the microphone converts the collected sound signal into an electrical signal, which is converted by the audio circuit 506 into
  • the audio circuit 506 may further include an earphone jack to provide communication between the peripheral headset and the electronic device.
  • Wireless fidelity is a short-range wireless transmission technology. Electronic devices can help users send and receive email, browse web pages, and access streaming media through the wireless fidelity module 507, which provides users with wireless broadband Internet access.
  • FIG. 9 shows the wireless fidelity module 507, it can be understood that it does not belong to the necessary structure of the electronic device, and can be omitted as needed without changing the essence of the invention.
  • the processor 508 is a control center of an electronic device, and uses various interfaces and lines to connect various parts of the entire electronic device.
  • the electronic device is executed by running or executing an application program stored in the memory 502 and calling data stored in the memory 502.
  • the processor 508 may include one or more processing cores; preferably, the processor 508 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, and an application program, etc.
  • the modem processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 508.
  • the electronic device also includes a power source 509 (such as a battery) that powers various components.
  • a power source 509 such as a battery
  • the power supply can be logically connected to the processor 508 through a power management system, so that functions such as managing charging, discharging, and power consumption management can be implemented through the power management system.
  • the power supply 509 may also include one or more DC or AC power supplies, a recharging system, a power failure detection circuit, a power converter or inverter, a power supply status indicator, and any other components.
  • the electronic device may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
  • the above modules may be implemented as independent entities, or may be arbitrarily combined, and implemented as the same or several entities.
  • the above modules refer to the foregoing method embodiments, and details are not described herein again.
  • an embodiment of the present invention provides a storage medium in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute steps in any one of the voice control methods provided by the embodiments of the present invention.
  • the storage medium may include a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or an optical disk.

Abstract

Procédé de commande vocale, consistant à : lorsqu'un dispositif électronique est dans un état de surveillance vocale, mettre à jour un segment vocal mémorisé à l'aide de données vocales détectées, le segment vocal étant un segment de données vocales détectées au cours d'une dernière période prédéfinie (101) ; obtenir un segment vocal en cours dans le processus de mise à jour de segment vocal (102) ; extraire une caractéristique d'empreinte vocale et un mot-clé du segment vocal en cours (103) ; démarrer une fonction d'enregistrement vocal selon la caractéristique d'empreinte vocale et le mot-clé, et obtenir un segment vocal au moment du démarrage en tant que segment vocal cible (104) ; et effectuer une commande correspondante sur le dispositif électronique en fonction des données vocales enregistrées et du segment vocal cible (105).
PCT/CN2019/085720 2018-06-27 2019-05-06 Procédé et appareil de commande vocale, et support de stockage et dispositif électronique WO2020001165A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810681095.6 2018-06-27
CN201810681095.6A CN108694947B (zh) 2018-06-27 2018-06-27 语音控制方法、装置、存储介质及电子设备

Publications (1)

Publication Number Publication Date
WO2020001165A1 true WO2020001165A1 (fr) 2020-01-02

Family

ID=63849986

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/085720 WO2020001165A1 (fr) 2018-06-27 2019-05-06 Procédé et appareil de commande vocale, et support de stockage et dispositif électronique

Country Status (2)

Country Link
CN (1) CN108694947B (fr)
WO (1) WO2020001165A1 (fr)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108694947B (zh) * 2018-06-27 2020-06-19 Oppo广东移动通信有限公司 语音控制方法、装置、存储介质及电子设备
CN110060693A (zh) * 2019-04-16 2019-07-26 Oppo广东移动通信有限公司 模型训练方法、装置、电子设备及存储介质
CN112053696A (zh) * 2019-06-05 2020-12-08 Tcl集团股份有限公司 一种语音交互的方法、装置及终端设备
CN112397062A (zh) 2019-08-15 2021-02-23 华为技术有限公司 语音交互方法、装置、终端及存储介质
CN113129893B (zh) * 2019-12-30 2022-09-02 Oppo(重庆)智能科技有限公司 一种语音识别方法、装置、设备及存储介质
CN111583929A (zh) * 2020-05-13 2020-08-25 军事科学院系统工程研究院后勤科学与技术研究所 使用离线语音的控制方法、装置及可识读设备
CN112581957B (zh) * 2020-12-04 2023-04-11 浪潮电子信息产业股份有限公司 一种计算机语音控制方法、系统及相关装置
CN112634916A (zh) * 2020-12-21 2021-04-09 久心医疗科技(苏州)有限公司 一种除颤器语音自动调节方法及装置

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060155549A1 (en) * 2005-01-12 2006-07-13 Fuji Photo Film Co., Ltd. Imaging device and image output device
US20080256613A1 (en) * 2007-03-13 2008-10-16 Grover Noel J Voice print identification portal
CN102510426A (zh) * 2011-11-29 2012-06-20 安徽科大讯飞信息科技股份有限公司 个人助理应用访问方法及系统
CN104575504A (zh) * 2014-12-24 2015-04-29 上海师范大学 采用声纹和语音识别进行个性化电视语音唤醒的方法
CN106653021A (zh) * 2016-12-27 2017-05-10 上海智臻智能网络科技股份有限公司 语音唤醒的控制方法、装置及终端
CN107147618A (zh) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 一种用户注册方法、装置及电子设备
CN107464557A (zh) * 2017-09-11 2017-12-12 广东欧珀移动通信有限公司 通话录音方法、装置、移动终端及存储介质
CN108154882A (zh) * 2017-12-13 2018-06-12 广东美的制冷设备有限公司 遥控设备的控制方法及控制装置、存储介质及遥控设备
CN108694947A (zh) * 2018-06-27 2018-10-23 Oppo广东移动通信有限公司 语音控制方法、装置、存储介质及电子设备

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060155549A1 (en) * 2005-01-12 2006-07-13 Fuji Photo Film Co., Ltd. Imaging device and image output device
US20080256613A1 (en) * 2007-03-13 2008-10-16 Grover Noel J Voice print identification portal
CN102510426A (zh) * 2011-11-29 2012-06-20 安徽科大讯飞信息科技股份有限公司 个人助理应用访问方法及系统
CN104575504A (zh) * 2014-12-24 2015-04-29 上海师范大学 采用声纹和语音识别进行个性化电视语音唤醒的方法
CN106653021A (zh) * 2016-12-27 2017-05-10 上海智臻智能网络科技股份有限公司 语音唤醒的控制方法、装置及终端
CN107147618A (zh) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 一种用户注册方法、装置及电子设备
CN107464557A (zh) * 2017-09-11 2017-12-12 广东欧珀移动通信有限公司 通话录音方法、装置、移动终端及存储介质
CN108154882A (zh) * 2017-12-13 2018-06-12 广东美的制冷设备有限公司 遥控设备的控制方法及控制装置、存储介质及遥控设备
CN108694947A (zh) * 2018-06-27 2018-10-23 Oppo广东移动通信有限公司 语音控制方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN108694947B (zh) 2020-06-19
CN108694947A (zh) 2018-10-23

Similar Documents

Publication Publication Date Title
WO2020001165A1 (fr) Procédé et appareil de commande vocale, et support de stockage et dispositif électronique
CN108829235B (zh) 语音数据处理方法和支持该方法的电子设备
US11670302B2 (en) Voice processing method and electronic device supporting the same
US11435980B2 (en) System for processing user utterance and controlling method thereof
US11955124B2 (en) Electronic device for processing user speech and operating method therefor
US11145302B2 (en) System for processing user utterance and controlling method thereof
US11042703B2 (en) Method and device for generating natural language expression by using framework
KR20180117485A (ko) 사용자 발화를 처리하는 전자 장치 및 그 동작 방법
CN108804070B (zh) 音乐播放方法、装置、存储介质及电子设备
CN110830368B (zh) 即时通讯消息发送方法及电子设备
WO2015043200A1 (fr) Procédé et appareil pour commander des applications et des opérations sur un terminal
US11915700B2 (en) Device for processing user voice input
US11194545B2 (en) Electronic device for performing operation according to user input after partial landing
KR20190113130A (ko) 사용자 음성 입력을 처리하는 장치
CN111580911A (zh) 一种终端的操作提示方法、装置、存储介质及终端
CN108711428B (zh) 指令执行方法、装置、存储介质及电子设备
CN106933626B (zh) 应用关联方法及装置
WO2015067116A1 (fr) Procédé et appareil de traitement de textes vocaux
CN111897916A (zh) 语音指令识别方法、装置、终端设备及存储介质
CN111027406A (zh) 图片识别方法、装置、存储介质及电子设备
KR20190040164A (ko) 멀티 모달 입력을 처리하는 전자 장치, 멀티 모달 입력을 처리하는 방법 및 멀티 모달 입력을 처리하는 서버
CN104978168B (zh) 操作信息的提示方法及装置
CN111142832A (zh) 一种输入识别方法、装置、存储介质及终端
CN113506571A (zh) 控制方法、移动终端及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19825379

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19825379

Country of ref document: EP

Kind code of ref document: A1