WO2023273321A1 - Procédé de commande vocale et dispositif électronique - Google Patents

Procédé de commande vocale et dispositif électronique Download PDF

Info

Publication number
WO2023273321A1
WO2023273321A1 PCT/CN2022/073135 CN2022073135W WO2023273321A1 WO 2023273321 A1 WO2023273321 A1 WO 2023273321A1 CN 2022073135 W CN2022073135 W CN 2022073135W WO 2023273321 A1 WO2023273321 A1 WO 2023273321A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
voice
preset
voice data
preset keyword
Prior art date
Application number
PCT/CN2022/073135
Other languages
English (en)
Chinese (zh)
Inventor
王志超
高欢
Original Assignee
荣耀终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 荣耀终端有限公司 filed Critical 荣耀终端有限公司
Publication of WO2023273321A1 publication Critical patent/WO2023273321A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the field of terminals and artificial intelligence, in particular to a voice control method and electronic equipment.
  • voice interaction A typical representative of implementing voice control is voice interaction.
  • more and more electronic devices have a voice interaction function, and are equipped with applications capable of voice interaction with users, such as voice assistants. Users can interact with electronic devices through voice assistants to realize functions that previously required multiple manual operations. For example, making calls, playing music, etc. Waste power consumption of electronic equipment.
  • the present application provides a voice control method and an electronic device.
  • the electronic device does not need to enable the voice interaction function, but can also receive a non-specific voice command input by a user and respond to the non-specific voice command.
  • the electronic device does not need to enable the voice interaction function, but can also receive non-specific voice commands input by the user and respond to the non-specific voice commands.
  • the power consumption of the electronic device is consumed when voice interaction is turned on, and the power consumption of the electronic device is also consumed in the state of voice interaction.
  • the electronic device determines that it is currently before the first application scene, and the method further includes: the electronic device displays a first user interface; the electronic device does not enable the voice interaction function, and responds to the first
  • the voice command is to execute the operation corresponding to the first preset keyword, which specifically includes: the electronic device does not turn on the voice interaction function, responds to the first voice command, executes the operation corresponding to the first preset keyword, and displays the second user interface, the second user interface is different from the first user interface.
  • the electronic device can change the user interface of the electronic device according to the voice command, so that the electronic device can provide some visualization services for the user. For example, when the electronic device opens a video application, the electronic device can respond to the first voice command to provide the user with functions related to video playback, and the user can control the electronic device without manually touching the display screen of the electronic device.
  • the method according to claim 1 is characterized in that determining that the electronic device is currently in the first application scene specifically includes: the electronic device runs the first application program in the foreground, then the The electronic device determines that it is currently in the first application scenario.
  • the electronic device determines that it is currently in the first application scenario, specifically including:
  • the electronic device determines that it is currently in the first application scenario.
  • the electronic device can also determine that it is currently in a specific application scene, and the electronic device can be controlled by non-specific voice commands.
  • the electronic device can perform response operations in the background to realize voice control of the electronic device. For example, when the electronic device opens a music application, the first voice command is: "Play louder", and the electronic device can perform this operation in the background, which does not involve changes to the user interface and does not affect the user's use of the electronic device running in the foreground.
  • the electronic device determines whether the first voice instruction includes a first preset keyword, specifically including: the electronic device loads all preset keywords corresponding to the first application scene ; The electronic device determines whether the first voice command includes first voice data according to the first voice command and all preset keywords corresponding to the first application scene, and the first voice data includes at least the first preset keyword; in the case of determining that the first voice instruction includes the first voice data, the electronic device determines that the first voice instruction includes the first preset keyword; when determining that the first voice instruction does not include the first In the case of voice data, the electronic device determines that the first voice instruction does not include the first preset keyword.
  • the electronic device first determines whether the first voice command includes the first voice data, the algorithm involved in this process does not need to be too complicated, as long as the first voice can be determined, the electronic device can use a digital signal processor Completing the process saves computing resources of the electronic device.
  • the method further includes: the electronic device loads all preset keywords corresponding to all specific application scenarios, the All specific application scenarios include the first application scenario, and all preset keywords include the first preset keyword; the electronic device determines the second preset keyword according to the first voice data and all preset keywords corresponding to all specific application scenarios.
  • a part of the voice data is used as the second voice data, and the second voice data only includes the first preset keyword; in response to the first voice command, the operation corresponding to the first preset keyword is executed, specifically including: the electronic The device generates an operation corresponding to the first preset keyword according to the first preset keyword in the second voice data; and executes the operation corresponding to the first preset keyword in response to the first voice instruction.
  • the electronic device can extract the preset keywords in the first voice instruction through a relatively accurate algorithm, And use the preset keyword in the first voice command to generate a corresponding operation, and the electronic device can respond to the operation to implement voice control of the electronic device.
  • the method further includes: the electronic device loads all preset keywords corresponding to all specific application scenarios, the All specific application scenarios include the first application scenario, and all preset keywords include the first preset keyword; the electronic device determines the second preset keyword according to the first voice data and all preset keywords corresponding to all specific application scenarios.
  • a part of the voice data is used as the second voice data, and the second voice data only includes the first preset keyword; the electronic device determines whether the voiceprint of the second voice data matches the preset voiceprint, and the preset voiceprint
  • the fingerprint is the voiceprint identification of the user's voice data entered by the electronic device, which is used to identify the user's identity; in response to the first voice command, perform the operation corresponding to the first preset keyword, specifically including: after determining the second When the voiceprint of the voice data matches the preset voiceprint, the electronic device generates an operation corresponding to the first preset keyword according to the first preset keyword in the second voice data; responding to the first voice command , executing the operation corresponding to the first preset keyword; if it is determined that the voiceprint of the second voice data does not match the preset voiceprint, not responding to the first voice instruction.
  • the electronic device After the electronic device recognizes the preset keywords in the first voice command, it needs to judge whether the first voice command is input by the "owner" of the electronic device.
  • the "owner" of the electronic device will input his own biological information, such as a voiceprint, through the electronic device, and the electronic device can match according to the voiceprint of the first voice command and the voiceprint input by the "owner” through the electronic device It is determined whether the voice command is input by the "owner” of the electronic device. If only the first voice command can be responded to, it can prevent anyone from controlling the electronic device by voice, which increases the security of implementing the method.
  • all the preset keywords corresponding to the specific application scene are preset and stored in the electronic device, any of the preset keywords corresponding to the specific application scene
  • a preset keyword corresponds to an operation with the same meaning as the preset keyword.
  • the present application provides an electronic device, which includes: one or more processors and memory; the memory is coupled to the one or more processors, the memory is used to store computer program codes, and the computer
  • the program code includes computer instructions
  • the one or more processors call the computer instructions to make the electronic device execute: determine that it is currently in the first application scene; detect the first voice instruction; the first voice instruction is not used to open the voice interaction Functional non-specific voice commands; determine whether the first voice command includes a first preset keyword; the first preset keyword is any preset key among all preset keywords corresponding to the first application scene words; when it is determined that the first voice instruction includes the first preset keyword, the voice interaction function is not turned on, and the operation corresponding to the first preset keyword is executed in response to the first voice instruction; When a voice command does not include the first preset keyword, the first voice command is not responded to.
  • the electronic device does not need to enable the voice interaction function, but can also receive a non-specific voice command input by the user, and respond to the non-specific voice command.
  • the power consumption of the electronic equipment is consumed when voice interaction is turned on, and the power consumption of the electronic equipment is also consumed in the voice interaction state.
  • the electronic device can change the user interface of the electronic device according to the voice command, so that the electronic device can provide some visualization services for the user. For example, when the electronic device opens a video application, the electronic device can respond to the first voice command to provide the user with functions related to video playback, and the user can control the electronic device without manually touching the display screen of the electronic device.
  • the electronic device when a specific application is running in the foreground, the electronic device may also determine that it is currently in a specific application scene, and non-specific voice commands may be used to control the electronic device.
  • the electronic device can perform response operations in the foreground, realizing voice control of the electronic device. For example, when the electronic device opens a video application, the first voice instruction is: "play the next episode", and the electronic device can perform this operation in the foreground to play the next episode.
  • the one or more processors are specifically configured to call the computer instruction to make the electronic device execute: run the first application program in the background, then determine that it is currently in the first application scenario.
  • the electronic device can also determine that it is currently in a specific application scene, and the electronic device can be controlled by non-specific voice commands.
  • the electronic device can perform response operations in the background to realize voice control of the electronic device. For example, when the electronic device opens a music application, the first voice command is: "Play louder", and the electronic device can perform this operation in the background, which does not involve changes to the user interface and does not affect the user's use of the electronic device running in the foreground.
  • the one or more processors are specifically configured to call the computer instruction to make the electronic device execute: load all preset keywords corresponding to the first application scene;
  • a voice command corresponds to all preset keywords corresponding to the first application scene, and it is determined whether the first voice command includes the first voice data, and the first voice data includes at least the first preset keyword;
  • a voice command includes the first voice data, it is determined that the first voice command includes the first preset keyword;
  • it is determined that the first voice command does not include the first voice data it is determined that the first The voice instruction does not include the first preset keyword.
  • the electronic device first determines whether the first voice command includes the first voice data, the algorithm involved in this process does not need to be too complicated, as long as the first voice can be determined, the electronic device can use a digital signal processor Completing the process saves computing resources of the electronic device.
  • the one or more processors are further configured to call the computer instruction to make the electronic device execute: load all preset keywords corresponding to all specific application scenarios, and all specific application scenarios
  • the scene includes the first application scene, and all the preset keywords include the first preset keyword; according to all the preset keywords corresponding to the first voice data and all the specific application scenarios, determine a part of the first voice data
  • the second voice data only includes the first preset keyword; the one or more processors are specifically used to call the computer instruction to make the electronic device perform: according to the second voice data in the second voice data
  • a preset keyword generates an operation corresponding to the first preset keyword; in response to the first voice instruction, the operation corresponding to the first preset keyword is executed.
  • the electronic device can extract the preset keywords in the first voice instruction through a relatively accurate algorithm, And use the preset keyword in the first voice command to generate a corresponding operation, and the electronic device can respond to the operation to implement voice control of the electronic device.
  • the one or more processors are further configured to call the computer instruction to make the electronic device execute: load all preset keywords corresponding to all specific application scenarios, and all specific application scenarios
  • the scene includes the first application scene, and all the preset keywords include the first preset keyword; according to all the preset keywords corresponding to the first voice data and all the specific application scenarios, determine a part of the first voice data
  • the second voice data only includes the first preset keyword; determine whether the voiceprint of the second voice data matches the preset voiceprint, the preset voiceprint is the user’s registered voice in the electronic device
  • the voiceprint identification of the voice data is used to identify the identity of the user; the one or more processors are specifically used to call the computer instruction to make the electronic device execute: when determining the voiceprint of the second voice data and the preset voiceprint In the case of a match, an operation corresponding to the first preset keyword is generated according to the first preset keyword in the second voice data; in response to the first voice instruction, an operation corresponding to
  • the electronic device After the electronic device recognizes the preset keywords in the first voice command, it needs to judge whether the first voice command is input by the "owner" of the electronic device.
  • the "owner" of the electronic device will input his own biological information, such as a voiceprint, through the electronic device, and the electronic device can match according to the voiceprint of the first voice command and the voiceprint input by the "owner” through the electronic device It is determined whether the voice command is input by the "owner” of the electronic device. If only the first voice command can be responded to, it can prevent anyone from controlling the electronic device by voice, which increases the security of implementing the method.
  • an embodiment of the present application provides an electronic device, where the electronic device includes one or more processors and one or more memories; wherein the one or more memories are coupled to the one or more processors, The one or more memories are used to store computer program code, the computer program code includes computer instructions, and when the one or more processors execute the computer instructions, the one or more processors are used to invoke the computer instructions so that the The electronic device executes the method described in the first aspect or any implementation manner of the first aspect.
  • the electronic device does not need to enable the voice interaction function, but can also receive a non-specific voice command input by the user, and respond to the non-specific voice command.
  • the power consumption of the electronic device is consumed when voice interaction is turned on, and the power consumption of the electronic device is also consumed in the state of voice interaction.
  • the embodiment of the present application provides a chip system, the chip system includes one or more processors, and the processors are used to call computer instructions to make the electronic device execute any one of the first aspect or the first aspect. The method described in one embodiment.
  • the electronic device does not need to enable the voice interaction function, but can also receive a non-specific voice command input by the user, and respond to the non-specific voice command.
  • the power consumption of the electronic equipment is consumed when voice interaction is turned on, and the power consumption of the electronic equipment is also consumed in the voice interaction state.
  • the embodiment of the present application provides a computer program product containing instructions, and when the computer program product is run on the electronic device, the electronic device is made to execute any one of the first aspect or the first aspect. method described.
  • the electronic device does not need to enable the voice interaction function, but can also receive a non-specific voice command input by the user, and respond to the non-specific voice command.
  • the power consumption of the electronic device is consumed when voice interaction is turned on, and the power consumption of the electronic device is also consumed in the state of voice interaction.
  • the embodiment of the present application provides a computer-readable storage medium, which, when the instruction is run on the electronic device, causes the electronic device to execute the method described in the first aspect or any implementation manner of the first aspect. method.
  • the electronic device does not need to enable the voice interaction function, but can also receive a non-specific voice command input by the user, and respond to the non-specific voice command.
  • the power consumption of the electronic device is consumed when voice interaction is turned on, and the power consumption of the electronic device is also consumed in the state of voice interaction.
  • Figures 1a-1d are a set of user interfaces for a user to control an electronic device through voice in one solution
  • Figures 2a-2d are a set of exemplary user interfaces for a user controlling an electronic device through voice provided in the present application;
  • FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Fig. 4 is a software structural block diagram of the electronic device provided by the embodiment of the present application.
  • FIG. 5 is a schematic diagram of an exemplary information flow of the voice control method involved in the present application.
  • FIG. 6 is a schematic flowchart of a voice control method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of determining the first voice data in the embodiment of the present application.
  • first and second are used for descriptive purposes only, and cannot be understood as implying or implying relative importance or implicitly specifying the quantity of indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, the “multiple” The meaning is two or more.
  • UI user interface
  • the term "user interface (UI)” in the following embodiments of this application is a medium interface for interaction and information exchange between an application program or an operating system and a user, and it realizes the difference between the internal form of information and the form acceptable to the user. conversion between.
  • the user interface is the source code written in a specific computer language such as java and extensible markup language (XML).
  • the source code of the interface is parsed and rendered on the electronic device, and finally presented as content that can be recognized by the user.
  • the commonly used form of user interface is the graphical user interface (graphic user interface, GUI), which refers to the user interface related to computer operation displayed in a graphical way. It may be text, icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, Widgets, and other visible interface elements displayed on the display screen of the electronic device.
  • the preset keywords are preset and stored in the electronic device. Any preset keyword corresponds to an operation with the same meaning as the preset keyword. For example, when the specific application scene is a music scene, when the preset keyword is "pause”. Then it means that the operation corresponding to the preset keyword is to pause music playback.
  • voice control of the electronic device can be implemented.
  • the specific application scenario is preset, and the electronic device runs a certain application program, which corresponds to one application scenario.
  • a specific application scenario is an application scenario in which voice control can be performed on an electronic device. It can be set that when the electronic device is running (including running in the background and running in the foreground) the first application program, the electronic device will enter the specific application scenario corresponding to the first application program.
  • the first application program may be a music application program, a video application program, a navigation application program, a call application program, and the like.
  • the electronic device When the electronic device is running a music application program, it is a music scene, and the corresponding preset keywords in the music scene may include: “play the next song”, “play the previous song”, “make the sound louder”, “make the sound softer”, Commonly used keywords such as "pause”.
  • the corresponding preset keywords in the video scene may include: “play the next song”, “play the previous song”, “make the sound louder”, “make the sound softer”, Commonly used keywords such as "pause”.
  • the corresponding preset keywords in the navigation scene may include: “navigate home”, “navigate to work”, “speak louder”, “speak softer” and other commonly used keywords word.
  • the electronic device When the electronic device is running a calling application program, it is a music scene, and the corresponding preset keywords in the calling scene may include commonly used keywords such as "make the sound louder” and “make the sound softer”.
  • this application may also include other specific application scenarios, and each specific application scenario may include more or less preset keywords than the above.
  • the electronic device must first detect the user's input to enable the voice interaction function before enabling the voice interaction function, so that the user can control the electronic device through voice.
  • the input for turning on the voice interaction function may be a preset specific voice command, such as "YOYO, YOYO", or a long press of the power button.
  • the specific voice instruction may also be called a wake-up word or a wake-up instruction.
  • Figures 1a-1d show a set of user interfaces for the user to control the electronic device through voice in this solution.
  • the electronic device may display a user interface 11.
  • the user interface 11 may be a music playing interface of the electronic device, and the electronic device is currently playing "first music”.
  • the electronic device can detect a specific voice command for the user to turn on the voice interaction function: "YOYO, YOYO".
  • the electronic device can display a user interface 12 as shown in FIG. 1b.
  • the user interface 12 may include a voice prompt box 121, and a prompt message 121A may be displayed in the voice prompt box 121: "You said, I am listening" and a voice collection logo 121B, the prompt information 121A and the voice prompt
  • the collection mark 121B can be used to prompt the user: the current electronic device can receive voice commands, and the user can control the electronic device by voice.
  • the user interface 13 can be a user interface when the first voice command input by the user is detected when the electronic device is playing music, the first voice command is a non-specific voice command, and the non-specific voice command refers to a user interface other than Additional voice commands for specific voice commands.
  • the electronic device may display the detected first voice instruction in the voice prompt box 131 shown on the user interface 13 .
  • the first voice instruction input by the user may be: "play the next song”.
  • the electronic device can execute playing the next piece of music, and display the user interface 14 as shown in FIG. 1d.
  • the electronic device always turns on the voice interaction function to detect the user's voice command, which will waste the power consumption of the electronic device, so the electronic device is usually set to turn off the voice interaction function after responding to a non-specific voice command. Then the next time the user needs to perform voice interaction with the electronic device, the voice interaction function can only be turned on again if a specific voice command is detected again, so that the user can control the electronic device by voice again.
  • the electronic device After the electronic device responds to the first voice command, it will turn off the voice interaction function, and when the user inputs the second voice command: "play the previous song", the electronic device will not respond to the second voice command. Two voice commands, the user interface 14 is still displayed.
  • the second voice instruction is a non-specific voice instruction.
  • the user needs to turn on the voice interaction function before each input of a non-specific voice command, so that the user can control the electronic device by voice.
  • a specific voice command must be input first, so that the electronic device can respond to the first voice command, so that the user can control the electronic device by voice.
  • the embodiment of the present application provides a voice control method, implement the voice control method in the embodiment of the present application, in some specific application scenarios, the electronic device does not need to open the voice interaction function, and can also receive A non-specific voice command input by the user and respond to the non-specific voice command.
  • the application scenario for running music application programs on the electronic device is music scene
  • the application scenario for running video application programs is video scene
  • the application scenario for running navigation application programs is navigation scene
  • the application scenario for running call application programs is The application scenario.
  • Non-specific instructions refer to other voice instructions that are different from specific voice instructions, and are not used to turn on the voice interaction function of electronic devices.
  • the electronic device can set some preset keywords for different specific application scenarios, and when the electronic device detects that the preset keyword is included in the voice command input by the user, the electronic device can respond to the voice command and execute the first operation.
  • Figures 2a-2d are a set of exemplary user interfaces provided in this application for a user to control an electronic device through voice.
  • the preset specific application scenarios include an application scenario in which an electronic device runs a music application.
  • the preset keywords may include: “play the previous song” and “play the next song”.
  • the user interface 21 may be a user interface when the electronic device detects that the user inputs a first voice command while playing music, and the electronic device plays "first music". At this time, the electronic device may display the detected first voice instruction in the voice prompt box 211 shown on the user interface 21, where the first voice instruction is a non-specific voice instruction.
  • the first voice instruction input by the user may be: "play the next song”. Since the first voice command includes a preset keyword, the electronic device may respond to the first voice command to perform an operation of playing the next piece of music, and display a user interface 22 as shown in FIG. 2b.
  • the electronic device has switched the currently playing music from "first music” to "second music”.
  • the electronic device can continuously respond to any voice command input by the user.
  • the user interface 23 may be a user interface when the electronic device detects that the user inputs a second voice command while playing music, and the electronic device plays "second music". After the electronic device responds to the user inputting the first voice command, the user inputs a second voice command, and the second voice command is a non-specific voice command.
  • the electronic device can detect the second voice command and display it in the voice prompt box 231 shown on the user interface 23 .
  • the second voice instruction may be: "play the previous song”. Since the second voice command includes preset keywords, the electronic device can respond to the second voice command to perform an operation of playing the previous music, and display the user interface 24 as shown in FIG. 2d.
  • the electronic device has switched the currently playing music from “second music” to “first music” again.
  • the electronic device can not only save power consumption, but also meet the needs of the user to control the electronic device by voice.
  • the user does not need to input a specific voice command to turn on the voice interaction function before inputting a non-specific voice command every time, so that the electronic device can be controlled by voice.
  • FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • an electronic device is taken as an example to describe the embodiment in detail. It should be understood that an electronic device may have more or fewer components than shown in the figures, two or more components may be combined, or may have a different configuration of components.
  • the various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
  • the electronic device may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194 and user An identification module (subscriber identification module, SIM) card interface 195 and the like.
  • SIM subscriber identification module
  • the structure shown in the embodiment of the present invention does not constitute a specific limitation on the electronic device.
  • the electronic device may include more or fewer components than shown in the illustrations, or combine certain components, or separate certain components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the controller may be the nerve center and command center of the electronic equipment.
  • the controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is a cache memory.
  • the memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and /or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input and output
  • subscriber identity module subscriber identity module
  • SIM subscriber identity module
  • USB universal serial bus
  • the interface connection relationship between the modules shown in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device.
  • the electronic device may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the charging management module 140 is configured to receive a charging input from a charger.
  • the charger may be a wireless charger or a wired charger.
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives the input from the battery 142 and/or the charging management module 140 to provide power for the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the wireless communication function of the electronic device can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in an electronic device can be used to cover a single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied to electronic devices.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signals modulated by the modem processor, and convert them into electromagnetic waves through the antenna 1 for radiation.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator sends the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is passed to the application processor after being processed by the baseband processor.
  • the application processor outputs sound signals through audio equipment (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 194 .
  • the wireless communication module 160 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wi-Fi) network), bluetooth (bluetooth, BT), global navigation satellite system, etc. (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the electronic device is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the electronic device can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA) and the like.
  • the electronic device realizes the display function through the GPU, the display screen 194, and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos and the like.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), etc.
  • the electronic device can realize the shooting function through ISP, camera 193 , video codec, GPU, display screen 194 and application processor.
  • the ISP is used for processing the data fed back by the camera 193 .
  • the ISP may be located in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP for conversion into a digital image signal.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when an electronic device selects a frequency point, a digital signal processor is used to perform Fourier transform on the frequency point energy, etc.
  • Video codecs are used to compress or decompress digital video.
  • An electronic device may support one or more video codecs.
  • the electronic device can play or record video in multiple encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • the NPU is a neural-network (NN) computing processor.
  • NPU neural-network
  • Applications such as intelligent cognition of electronic devices can be realized through NPU, such as: image recognition, face recognition, speech recognition, text understanding, etc.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. Such as saving music, video and other files in the external memory card.
  • the internal memory 121 may be used to store computer-executable program codes including instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device by executing instructions stored in the internal memory 121 .
  • the internal memory 121 may include an area for storing programs and an area for storing data.
  • the electronic device can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal.
  • the audio module 170 may also be used to encode and decode audio signals.
  • the audio module 170 may be set in the processor 110 , or some functional modules of the audio module 170 may be set in the processor 110 .
  • Speaker 170A also referred to as a "horn" is used to convert audio electrical signals into sound signals.
  • the electronic device can listen to music through speaker 170A, or listen to hands-free calls.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the electronic device receives a call or a voice message, it can listen to the voice by placing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can put his mouth close to the microphone 170C to make a sound, and input the sound signal to the microphone 170C.
  • the earphone interface 170D is used for connecting wired earphones.
  • the earphone interface 170D can be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense the pressure signal and convert the pressure signal into an electrical signal.
  • pressure sensor 180A may be disposed on display screen 194 .
  • the gyro sensor 180B can be used to determine the motion posture of the electronic device. In some embodiments, the angular velocity of the electronic device about three axes (ie, x, y, and z axes) may be determined by the gyro sensor 180B. The gyro sensor 180B can be used for image stabilization.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device may detect opening and closing of the flip holster using the magnetic sensor 180D.
  • the acceleration sensor 180E can detect the acceleration of the electronic device in various directions (generally three axes).
  • the distance sensor 180F is used to measure the distance.
  • Electronic devices can measure distance via infrared or laser light. In some embodiments, when shooting a scene, the electronic device can use the distance sensor 180F to measure the distance to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a photodetector,
  • the ambient light sensor 180L is used for sensing ambient light brightness.
  • the electronic device can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device is in the pocket to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints. Electronic devices can use the collected fingerprint features to unlock fingerprints, access application locks, take pictures with fingerprints, answer incoming calls with fingerprints, etc.
  • the temperature sensor 180J is used to detect temperature.
  • the electronic device uses the temperature detected by the temperature sensor 180J to implement a temperature treatment strategy.
  • Touch sensor 180K also known as "touch panel”.
  • the touch sensor 180K can be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the keys 190 include a power key, a volume key and the like.
  • the key 190 may be a mechanical key. It can also be a touch button.
  • the electronic device can receive key input and generate key signal input related to user settings and function control of the electronic device.
  • the motor 191 can generate a vibrating reminder.
  • the motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback.
  • the indicator 192 can be an indicator light, and can be used to indicate charging status, power change, and can also be used to indicate messages, missed calls, notifications, and the like.
  • the SIM card interface 195 is used for connecting a SIM card.
  • the SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to realize contact and separation with the electronic device.
  • Fig. 4 is a block diagram of a software structure of an electronic device provided by an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces. In some embodiments, the system is divided into five layers, which are application program layer, application program framework layer, hardware abstraction layer (hardware abstraction layer, HAL), digital signal processing layer and kernel layer from top to bottom.
  • layers which are application program layer, application program framework layer, hardware abstraction layer (hardware abstraction layer, HAL), digital signal processing layer and kernel layer from top to bottom.
  • the application layer can consist of a series of application packages.
  • the application package may include application programs (also called applications) such as SMS, gallery, camera, calendar, bluetooth, and map.
  • application programs also called applications
  • the voice assistant is the first system application, which can provide electronic equipment with a function of managing voice control.
  • the voice assistant can include an application scenario message sending module and a command word processing module.
  • the application scenario message delivery module is used to monitor which scenario the electronic device is currently in, and determine the identifier of the application program corresponding to the application scenario.
  • the message that the machine is in the specific application scene is sent to the following dynamic loading module, and the identification of the application program corresponding to the scene is sent to Command word processing module.
  • the specific application scenarios involved in the embodiments of the present application may at least include music scenes, video scenes, navigation scenes, and call scenarios, and may also include other application scenarios, which are not limited in the embodiments of the present application.
  • the command word processing module is used to generate an instruction corresponding to the preset keyword according to the preset keyword. Then, according to the identification of the application program received from the application scenario message delivery module, the instruction is delivered to the application program corresponding to the identification.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer can include window manager, content provider, view system, phone manager, resource manager, notification manager, etc.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make it accessible to applications.
  • Said data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on.
  • the view system can be used to build applications.
  • a display interface can consist of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the hardware abstraction layer is an interface layer between the operating system and the hardware of the electronic device (such as the microphone 170C), and its purpose is to abstract the hardware and provide a virtual hardware platform for the operating system.
  • the hardware abstraction layer may at least include a voiceprint recognition module and a secondary command word module.
  • the second-level command word recognition module is used to receive voice data, and recognize that part of voice data that only includes preset keywords in the voice data, and then send the part of voice data that only includes preset keywords to the next Describe the voiceprint recognition module.
  • the second-level command word recognition module is also used to record all corresponding preset keywords in the following specific application scenarios.
  • the voiceprint recognition module is used to record the preset voiceprint, which is the voiceprint identification of the user's voice data recorded by the electronic device, and is used to identify the user's identity.
  • the voiceprint recognition module is also used to perform voiceprint recognition on the part of the voice data that only includes preset keywords, so as to obtain the voiceprint of the voice data, and judge whether the voiceprint matches the preset voiceprint.
  • the digital signal processing layer is used to process digital signals.
  • the digital signal processing layer may at least include a specific application scenario module, a dynamic loading module, and a first-level command word recognition module.
  • the specific application scene module is used to record the preset keyword group corresponding to the specific application scene, and may include a music scene module, a video scene module, a navigation scene module and a call scene module.
  • the corresponding keyword group involved in the specific application scenario is a set of preset keywords corresponding to the specific application scenario.
  • the video scene module is used for recording corresponding preset keyword groups in the video scene.
  • the navigation scene module is used to record the corresponding preset keyword groups in the navigation scene.
  • the call scene module is used to record the corresponding preset keyword groups in the call scene.
  • the dynamic loading module is used for loading the preset keywords related to the specific application scene from the specific application scene module after receiving the message of which specific application scene the electronic device is in. And send the preset keywords involved in the specific application scenario to the first-level command word recognition module.
  • the first-level command word recognition module is used to receive voice instructions sent by the kernel layer. It is also possible to obtain the preset keywords involved in the specific application scenarios recorded in the specific application scenario module, combine the preset keywords to identify the part of the voice data that includes the preset keywords in the voice command, and then send the part of the voice data to To the above-mentioned secondary command word recognition module.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.
  • the audio driver is used to receive voice commands collected by the microphone 170C, and send the voice commands to the first-level command word recognition module.
  • Fig. 5 is a schematic diagram of an exemplary information flow of the voice control method involved in the present application.
  • FIG. 5 is only some software modules in the electronic device, and should not limit the software architecture of the electronic device.
  • the exemplary information flow schematic diagram shown in FIG. 5 describes that in a music scene, after the electronic device detects the user's first voice command, the electronic device recognizes the first preset keyword in the first voice command and responds to the first voice command.
  • the first preset keyword is not a specific keyword for opening the voice interaction function;
  • the application scene message sending module 151A in the first system application 151 of the electronic device can detect that the electronic device is currently in a music scene, and determine that the music application 152 (music class application) ID. Then send the message that the electronic device is currently in the music scene to the dynamic loading module 155, and send the identification of the music application program to the command word processing module 151B.
  • the dynamic loading module 155 loads all preset keywords involved in the music scene from the music scene module 157A. Then, the dynamic loading module 155 can deliver all preset keywords involved in the music scene to the first-level command word recognition module 156 .
  • the microphone 170C of the electronic device can collect voice commands, and transmit the voice commands to the audio driver 158 of the electronic device.
  • the audio driver 158 of the electronic device may deliver the voice command to the primary command word recognition module 156 of the electronic device.
  • the first-level command word recognition module 156 of the electronic device can judge whether the voice instruction includes the first voice data according to all preset keywords involved in the music scene, and the first voice data includes at least the first preset keyword .
  • the first preset keyword is a preset keyword among all preset keywords involved in the music scene. If it includes the first voice data, then send the first voice data to the second-level command word recognition module 153 . If the first voice data is not included, no message will be sent to the second-level command word recognition module 153 .
  • the second-level command word recognition module is based on the first voice data and all preset keywords involved in the application scene. Identify the second voice data in the first voice data, the second voice data only includes the first preset keyword, and send the second voice data to the voiceprint recognition module 154 .
  • the voiceprint recognition module 154 is configured to perform voiceprint recognition on the second voice data. If the voiceprint of the second voice data matches the preset voiceprint, the second voice data is transmitted to the command word processing module 151B. If the voiceprint of the second voice data does not match the preset voiceprint, no message will be sent to the command word processing module 151B.
  • the command word processing module 151B is used to generate an instruction corresponding to the preset keyword from the first preset keyword in the second voice data, and then according to the identification of the music application program received from the above-mentioned application scenario message sending module, Send the instruction to the music application program corresponding to the identifier.
  • the music application program may execute a corresponding operation in response to the instruction. For example, if the first preset keyword included in the voice instruction is "play the next song", the music application program of the electronic device executes to play the next music.
  • each module mentioned above may be a set of codes, a set of functions, hardware including one or more signal processing and/or application-specific integrated circuits, or a combination of one or more of them. This embodiment of the present application does not limit it.
  • the DSP of the electronic device can process digital signals and its power consumption is low, then the electronic device can store the dynamic loading module 155, the first-level command word recognition module 156 and the specific application scenario module 157 in the DSP of the electronic device In the built-in memory, DSP calls the instructions corresponding to these functional modules to realize the functions of these modules. In this way, these functional modules will not waste the power consumption of the electronic device during operation, and other functional modules, such as the secondary command word recognition module, can be stored in the internal memory 121 of the electronic device, and these functions will be called by the processor 110 The instructions corresponding to the modules implement the functions of these modules.
  • the electronic device sets some preset keywords for different specific application scenarios.
  • the electronic device may execute the first operation in response to the voice command.
  • the first operation is an operation corresponding to the first voice instruction.
  • FIG. 6 is a schematic flow chart of the voice control method provided by the embodiment of the present application.
  • the electronic device determines that it is currently in the first application scenario
  • the first application scenario is a specific application scenario.
  • the electronic device may not enable the voice interaction function, and may also implement voice control of the electronic device.
  • the first application scene may be a music scene.
  • FIG. 2a it is a music playing interface of an electronic device. Since the electronic device opens a music application program, the electronic device can determine that it is currently in a music scene.
  • the electronic device can be set to start running the first application program, and then enters the first application scenario.
  • the first application program may be a music application program, or other application programs, such as a video application program, a navigation application program, etc., which is not limited in this embodiment of the present application.
  • an identifier corresponding to the application program may be obtained, and the identifier is used to uniquely represent an application program.
  • the electronic device can record the first identification of the first application, and when the electronic device starts an application, it obtains the second identification corresponding to the application, and the electronic device checks whether the second identification is the same as the first identification, and if they are the same, then It is considered that the current electronic device starts to run the first application program, and it is determined that the electronic device is currently in the first application scenario.
  • the electronic device can monitor the data output of the local device, and judge whether the local device has entered the first application scenario according to what data is output by the electronic device.
  • the data output refers to the data transmitted by the electronic device to the user, for example, video data, audio data and so on.
  • the electronic device can pre-set what kind of data the output data is and what kind of application scene the device is in. For example, the electronic device can be set to be in a music scene when the data output by the device is audio data, and to be in a video scene when the electronic device outputs video data.
  • This step S101 can be completed by the application scenario message sending module 151A and the dynamic loading module 155 in FIG. 5 mentioned above.
  • the application scenario message sending module 151A may determine that the electronic device is currently in the first application scenario and send the message to the dynamic loading module 155 .
  • the application scenario message sending module 151A may determine that the electronic device is currently in the first application scenario and send the message to the dynamic loading module 155 .
  • the electronic device loads a first preset keyword group corresponding to the first application scenario, where the first preset keyword group includes at least a first preset keyword;
  • the first preset keyword group is a set of preset keywords involved in the first application scenario, wherein any preset keyword in the first preset keyword group may be referred to as a first preset keyword.
  • the user may input a voice command including the first preset keyword to perform voice control on the electronic device.
  • the first preset keyword may be "play the next song”.
  • This step S102 can be completed by the dynamic loading module 155 and the specific application scenario module 157 in FIG. 5 .
  • the dynamic loading module 155 can load the first preset keyword group corresponding to the first application scenario from the specific application scenario module 157 . And send it to the first-level command word recognition module 156.
  • the dynamic loading module 155 can load the first preset keyword group corresponding to the first application scenario from the specific application scenario module 157 . And send it to the first-level command word recognition module 156.
  • the electronic device detects the first voice command
  • the user inputs a first voice command, the first voice command is "play the next song", and the electronic device can detect the first voice command.
  • the user inputs a first voice command, the first voice command is "play the previous song", and the electronic device can detect the first voice command.
  • the electronic device may detect voice information around the electronic device at a certain frequency, and the microphone of the electronic device may collect voice data around the electronic device, including the first voice command.
  • the first voice instruction may only include the first preset keyword, or may include other voice data.
  • the situation that the first voice data also includes other voice data can be divided into the following three types:
  • the first voice command detected by the electronic device may be a sentence, and the first voice command may include other voice data in addition to the first preset keyword.
  • the first voice command can be: "XXX, play the next song XXXX", then in the first voice command, "play the next song” is the first preset keyword, but other voice data is not preset Key words.
  • the first voice instruction may also include one or more other preset keywords, such as the second preset keyword and other voice data.
  • the second preset keyword is a preset keyword corresponding to the first application scene, appears after the first preset keyword, and may or may not be the same as the first preset keyword.
  • the first voice command can be: "XXXX plays the next song and pauses XXXX", then in the first voice command, the first preset keyword can be "play the next song", and the second preset keyword can be is "play", but other voice data are not preset keywords.
  • the first voice instruction does not include any preset keywords.
  • This step S103 can be completed by the aforementioned audio driver 158 in FIG. 5 .
  • the audio driver 158 can detect the first voice command and send it to the primary command word recognition module 156 .
  • the audio driver 158 can detect the first voice command and send it to the primary command word recognition module 156 .
  • the electronic device determines whether the first voice command includes first voice data according to the first voice command and the first preset keyword group, and the first voice data includes at least the first preset keyword;
  • the first voice data is a part of the first voice instruction, which at least includes the first preset keyword.
  • the length of the first voice command is t seconds.
  • the electronic device Starting from second 0 of the first voice command, the electronic device sequentially acquires voice data with a length of m seconds, where m is smaller than t.
  • the voice data with a length of m seconds corresponds to one character. If the first preset keyword is included in the recently acquired N continuous voice data with a length of m seconds, the electronic device will not acquire the next voice data with a length of m seconds, and directly use the newly acquired continuous N voice data with a length of The voice data of m seconds is determined as the first voice data.
  • X can generally be set to 1.5-2.5, such as 2.
  • the electronic device may determine the M pieces of voice data with a length of m seconds as the first voice data.
  • n seconds there is an overlap of n seconds between two consecutive voice data with a length of m seconds, and n is less than m, that is, the voice data of the last n seconds in the previous voice data with a length of m seconds is the next Voice data of length m seconds starts with n seconds of voice data.
  • the process for the electronic device to determine that the first preset keyword is included in N consecutive speech data with a length of m seconds is as follows: First, the electronic device obtains the i-th speech data with a length of m seconds from the first speech instruction voice data. in.
  • the first voice data with a length of m seconds is the voice data with a length of 0 seconds to m seconds in the first voice instruction.
  • the electronic device judges whether the character corresponding to the i-th voice data with a length of m seconds is the same as the j-th character in the first preset keyword.
  • the electronic device sets j to 1, that is, obtains the first preset The first character in the keyword, then, the electronic device judges whether the character corresponding to the i-th voice data with a length of m seconds is the same as the first character in the first preset keyword; if the i-th length is m The character corresponding to the second voice data is different from the jth character in the first preset keyword and j is equal to 1, then the electronic device continues to obtain the i+1th voice data with a length of m seconds, and then the electronic device judges Whether the character corresponding to the (i+1)th speech data with a length of m seconds is the same as the first character in the first preset keyword.
  • the electronic device continues to obtain the i+1-th speech data with a length of m seconds, and obtains the first The j+1th character in a preset keyword, and then, the electronic device judges whether the character corresponding to the i+1th voice data with a length of m seconds is equal to the j+1th character in the first preset keyword same.
  • FIG. 7 it is a schematic diagram of determining the first voice data.
  • the electronic device sets the first voice data as voice data with a length of six consecutive m seconds.
  • the electronic device continues to obtain the fifth (i+1) character “sound” corresponding to the voice data with a length of m seconds, and obtains the second (j+1) character “sound” in the first preset keyword.
  • the electronic device judges that the character “sheng” is the same as the character “sheng”.
  • the electronic device determines that the six most recently acquired voice data with a length of m seconds "play a little louder" as the first data.
  • step S104 after the electronic device recognizes the first voice data according to the method described in step S104, it can continue to judge whether the voice data that has not been judged in the first voice instruction still contains another first voice data.
  • the electronic device can use it to execute the following steps S105-step S108.
  • This step S104 can be completed by the above-mentioned primary command word recognition module 156 in FIG. 5 .
  • the primary command word recognition module 156 can determine the first voice data according to the first voice command detected by the audio driver 158 and the first preset keyword group issued by the dynamic loading module 155 . And send it to the secondary command word recognition module 153.
  • the algorithm involved in determining the first voice data in the first-level command word recognition module 156 is simple, and the process is completed by calling these function modules from the DSP.
  • the electronic device determines the second voice data according to all the preset keywords corresponding to the first voice data and the specific application scene, and the second voice data only includes the first preset keywords;
  • step S105 For the process of determining the second voice data according to all the preset keywords corresponding to the specific application scene in the first voice data in the step S105, reference may be made to the related description of the aforementioned step S104.
  • This step S105 can be completed by the aforementioned secondary command word recognition module 153 in FIG. 5 .
  • the secondary command word recognition module 153 can determine the second voice data according to all preset keywords corresponding to the first voice data issued by the primary command word recognition module 156 and its recorded specific application scenarios. And send it to the voiceprint recognition module 154.
  • the algorithm involved in determining the second voice data in the secondary command word recognition module 153 is more complex than the algorithm determining the first voice data in the primary command word recognition module 156, and this process is called by the processor. Finish.
  • the electronic device judges whether the voiceprint of the second voice data matches the preset voiceprint
  • the preset voiceprint is the voiceprint identification of the user's voice data recorded by the electronic device, and is used to identify the user's identity.
  • the electronic device can use the voiceprint extracted from the second voice data to match the preset voiceprint. If they are consistent, the electronic device can determine that the second voice data comes from a user who matches the preset voiceprint, and can match the first voiceprint. In response to the voice command, step S107 is executed.
  • the electronic device determines that the second voice data is not from a user that matches the preset voiceprint, and may not respond to the first voice command, and executes step S108.
  • This step S106 can be completed by the aforementioned voiceprint recognition module 154 in FIG. 5 .
  • the voiceprint recognition module 154 can determine whether the second voice data is from a user matching the preset voiceprint according to the second voice data issued by the secondary command word recognition module 153 and the preset voiceprint recorded therein.
  • the voiceprint recognition module 154 can determine whether the second voice data is from a user matching the preset voiceprint according to the second voice data issued by the secondary command word recognition module 153 and the preset voiceprint recorded therein.
  • this step S106 is optional, and the electronic device may directly perform step S107 without performing step S106 after performing step S105.
  • the electronic device responds to the first voice instruction, and performs an operation corresponding to the first preset keyword in the second voice data;
  • the user interface shown in FIG. 2b and the user interface shown in FIG. 2d are two user interfaces for the electronic device to respond to the first voice instruction. It can be seen that the electronic device displays the user interface after performing the operation corresponding to the first preset keyword.
  • the electronic device generates an operation corresponding to the preset keyword according to the first preset keyword in the second voice data, and executes the operation corresponding to the first preset keyword in the second voice data in response to the first voice command.
  • This step S107 can be completed by the aforementioned command word processing module 151B in FIG. 5 .
  • the command word processing module 151B can convert the second voice data issued by the secondary command word recognition module 153 into the operation corresponding to the second voice data and issue it to the application program involved in the first application scenario, the application program The operation is performed in response to the first voice instruction.
  • the command word processing module 151B can convert the second voice data issued by the secondary command word recognition module 153 into the operation corresponding to the second voice data and issue it to the application program involved in the first application scenario, the application program The operation is performed in response to the first voice instruction.
  • the electronic device does not respond to the first voice instruction
  • the electronic device After the user inputs the first voice command, the electronic device does not display the user interface after performing the operation corresponding to the first preset keyword, that is, the electronic device does not respond to the first voice command.
  • the displayed user interface when the electronic device is in the first application scenario, the displayed user interface may be referred to as the first user interface, and the displayed user interface after responding to the first voice instruction may be referred to as the second user interface.
  • the specific application scenario is preset by the electronic device.
  • the electronic device is set to enter a video scene when the machine opens a video application program.
  • the electronic device can set the video scene as a specific application scene.
  • the electronic device can respond to the non-specific voice command and perform the first operation,
  • the non-specific voice instruction may be "sound louder”.
  • the first operation is an operation corresponding to the non-specific voice instruction. For example, when the non-specific voice instruction is "sound louder", the electronic device may set the sound of the video to be played louder.
  • the internal memory 121 of the electronic device or the storage device connected to the external storage interface 120 may pre-store preset keywords, related instructions and preset voiceprints involved in the voice control method involved in the embodiment of the present application.
  • the electronic device sets a memory in the DSP, and stores preset keywords involved in a specific application scene therein, so that the electronic device executes the voice control method in the embodiment of the present application.
  • step S101-step S108 takes step S101-step S108 as an example to illustrate the workflow of the electronic device.
  • the electronic device determines that it is currently in the first application scenario
  • the touch sensor 180K of the electronic device receives a touch operation (triggered when the user touches the camera control), and a corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into original input events (including touch coordinates, time stamps of touch operations, and other information). Raw input events are stored at the kernel level.
  • the application framework layer obtains the original input event from the kernel layer, and identifies the application of the control corresponding to the input event.
  • the above touch operation is a touch click operation
  • the control corresponding to the click operation is an icon of a music application program.
  • the music application program calls the interface of the application framework layer, starts the music application program, and then determines that it is currently in the music scene.
  • the electronic device loads a first preset keyword group corresponding to the first application scenario, where the first preset keyword group includes at least the first preset keyword.
  • the DSP processor of the electronic device may load the stored first preset keyword group corresponding to the first application scenario from the built-in memory of the DSP.
  • the electronic device detects the first voice instruction.
  • the electronic device can detect voice information around the electronic device at a certain frequency, and the microphone of the electronic device can collect voice data around the electronic device, including the first voice command. And store the detected first voice instruction in the internal memory 121 or in a storage device connected to the external storage interface 120 .
  • the electronic device determines the first voice data according to the first voice command and the first preset keyword group.
  • the electronic device can acquire the first voice instruction and the first preset keyword group from the memory through the DSP, and call related computer instructions to determine the first voice data. And store the first voice data in the internal memory 121 or in a storage device connected to the external storage interface 120 .
  • the electronic device determines the second voice data according to all preset keywords corresponding to the first voice data and specific application scenarios.
  • the electronic device can acquire all preset keywords corresponding to the first voice data and specific application scenarios from the internal memory 121 through the processor 110, and call related computer instructions to determine the second voice data. And store the second voice data in the internal memory 121 or in a storage device connected to the external storage interface 120 .
  • the electronic device judges whether the voiceprint of the second voice data matches the preset voiceprint.
  • the electronic device can obtain the second voice data and the preset voiceprint matching from the internal memory 121 through the processor 110, and invoke relevant computer instructions to determine whether the voiceprint of the second voice data matches the preset voiceprint.
  • the electronic device responds to the first voice instruction, and executes an operation corresponding to the first preset keyword in the second voice data.
  • the electronic device may call the interface of the application framework layer to execute the operation corresponding to the first preset keyword in the second voice data. Furthermore, the display driver is started by invoking the kernel layer, and the user interface after performing the operation corresponding to the first preset keyword in the second voice data is displayed.
  • the electronic device can receive non-specific voice commands input by the user in some specific application scenarios without turning on the voice interaction function, and respond to the non-specific voice commands .
  • the term “when” may be interpreted to mean “if” or “after” or “in response to determining" or “in response to detecting" depending on the context.
  • the phrases “in determining” or “if detected (a stated condition or event)” may be interpreted to mean “if determining" or “in response to determining" or “on detecting (a stated condition or event)” or “in response to detecting (a stated condition or event)”.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server, or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, DSL) or wireless (eg, infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state hard disk), etc.
  • the processes can be completed by computer programs to instruct related hardware.
  • the programs can be stored in computer-readable storage media.
  • When the programs are executed may include the processes of the foregoing method embodiments.
  • the aforementioned storage medium includes: ROM or random access memory RAM, magnetic disk or optical disk, and other various media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • User Interface Of Digital Computer (AREA)
  • Telephone Function (AREA)

Abstract

Procédé de commande vocale et dispositif électronique. Le procédé comprend les étapes suivantes : un dispositif électronique détermine qu'il s'agit actuellement d'un premier scénario d'application ; le dispositif électronique détecte une première instruction vocale, la première instruction vocale étant une instruction vocale non spécifique qui n'est pas utilisée pour autoriser une fonction d'interaction vocale ; le dispositif électronique détermine si la première instruction vocale comprend un premier mot-clé prédéfini ; lorsqu'il est déterminé que la première instruction vocale comprend le premier mot-clé prédéfini, le dispositif électronique n'autorise pas la fonction d'interaction vocale, répond à la première instruction vocale, et exécute une opération correspondant au premier mot-clé prédéfini ; lorsqu'il est déterminé que la première instruction vocale ne comprend pas le premier mot-clé prédéfini, le dispositif électronique ne répond pas à la première instruction vocale. Selon la solution technique fournie par le procédé, dans certains scénarios d'application spécifiques, le dispositif électronique peut recevoir une instruction vocale non spécifique entrée par un utilisateur, sans autoriser la fonction d'interaction vocale, et déterminer un mot-clé prédéfini dans l'instruction vocale non spécifique, et le dispositif électronique peut répondre à l'instruction vocale non spécifique et exécuter une opération correspondant au mot-clé prédéfini.
PCT/CN2022/073135 2021-06-29 2022-01-21 Procédé de commande vocale et dispositif électronique WO2023273321A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110728832.5 2021-06-29
CN202110728832.5A CN113488042B (zh) 2021-06-29 2021-06-29 一种语音控制方法及电子设备

Publications (1)

Publication Number Publication Date
WO2023273321A1 true WO2023273321A1 (fr) 2023-01-05

Family

ID=77936552

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/073135 WO2023273321A1 (fr) 2021-06-29 2022-01-21 Procédé de commande vocale et dispositif électronique

Country Status (2)

Country Link
CN (1) CN113488042B (fr)
WO (1) WO2023273321A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117711410A (zh) * 2023-05-30 2024-03-15 荣耀终端有限公司 语音唤醒方法及相关设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113488042B (zh) * 2021-06-29 2022-12-13 荣耀终端有限公司 一种语音控制方法及电子设备
CN114120979A (zh) * 2022-01-25 2022-03-01 荣耀终端有限公司 语音识别模型的优化方法、训练方法、设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140122078A1 (en) * 2012-11-01 2014-05-01 3iLogic-Designs Private Limited Low Power Mechanism for Keyword Based Hands-Free Wake Up in Always ON-Domain
CN108766427A (zh) * 2018-05-31 2018-11-06 北京小米移动软件有限公司 语音控制方法及装置
CN111161734A (zh) * 2019-12-31 2020-05-15 苏州思必驰信息科技有限公司 基于指定场景的语音交互方法及装置
CN111816192A (zh) * 2020-07-07 2020-10-23 云知声智能科技股份有限公司 语音设备及其控制方法、装置和设备
CN112201246A (zh) * 2020-11-19 2021-01-08 深圳市欧瑞博科技股份有限公司 基于语音的智能控制方法、装置、电子设备及存储介质
WO2021027267A1 (fr) * 2019-08-15 2021-02-18 华为技术有限公司 Procédé et appareil d'interaction parlée, terminal et support de stockage
CN113488042A (zh) * 2021-06-29 2021-10-08 荣耀终端有限公司 一种语音控制方法及电子设备

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140272823A1 (en) * 2013-03-15 2014-09-18 Phonics Mouth Positions + Plus Systems and methods for teaching phonics using mouth positions steps
CN110083444B (zh) * 2013-12-10 2024-06-11 华为终端有限公司 一种任务管理方法及设备
CN106373575B (zh) * 2015-07-23 2020-07-21 阿里巴巴集团控股有限公司 一种用户声纹模型构建方法、装置及系统
CN111131601B (zh) * 2018-10-31 2021-08-27 华为技术有限公司 一种音频控制方法、电子设备、芯片及计算机存储介质
CN110197662A (zh) * 2019-05-31 2019-09-03 努比亚技术有限公司 语音控制方法、可穿戴设备及计算机可读存储介质
CN110473556B (zh) * 2019-09-17 2022-06-21 深圳市万普拉斯科技有限公司 语音识别方法、装置和移动终端
CN110970026A (zh) * 2019-12-17 2020-04-07 用友网络科技股份有限公司 语音交互匹配方法、计算机设备以及计算机可读存储介质
CN111078017A (zh) * 2019-12-19 2020-04-28 珠海格力电器股份有限公司 构建虚拟场景的控制方法、装置、电子设备及存储介质
CN112230877A (zh) * 2020-10-16 2021-01-15 惠州Tcl移动通信有限公司 一种语音操作方法、装置、存储介质及电子设备
CN112802468B (zh) * 2020-12-24 2023-07-11 合创汽车科技有限公司 汽车智能终端的交互方法、装置、计算机设备和存储介质
CN112837159B (zh) * 2021-02-24 2024-04-02 中国工商银行股份有限公司 基于场景要素的交易引导方法、装置、电子设备及介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140122078A1 (en) * 2012-11-01 2014-05-01 3iLogic-Designs Private Limited Low Power Mechanism for Keyword Based Hands-Free Wake Up in Always ON-Domain
CN108766427A (zh) * 2018-05-31 2018-11-06 北京小米移动软件有限公司 语音控制方法及装置
WO2021027267A1 (fr) * 2019-08-15 2021-02-18 华为技术有限公司 Procédé et appareil d'interaction parlée, terminal et support de stockage
CN111161734A (zh) * 2019-12-31 2020-05-15 苏州思必驰信息科技有限公司 基于指定场景的语音交互方法及装置
CN111816192A (zh) * 2020-07-07 2020-10-23 云知声智能科技股份有限公司 语音设备及其控制方法、装置和设备
CN112201246A (zh) * 2020-11-19 2021-01-08 深圳市欧瑞博科技股份有限公司 基于语音的智能控制方法、装置、电子设备及存储介质
CN113488042A (zh) * 2021-06-29 2021-10-08 荣耀终端有限公司 一种语音控制方法及电子设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117711410A (zh) * 2023-05-30 2024-03-15 荣耀终端有限公司 语音唤醒方法及相关设备

Also Published As

Publication number Publication date
CN113488042A (zh) 2021-10-08
CN113488042B (zh) 2022-12-13

Similar Documents

Publication Publication Date Title
EP3872807B1 (fr) Procédé de commande vocale et dispositif électronique
US20220147228A1 (en) Display Method and Related Apparatus
WO2021052263A1 (fr) Procédé et dispositif d'affichage d'assistant vocal
EP4325840A1 (fr) Procédé d'affichage d'appel vidéo appliqué à un dispositif électronique et appareil associé
US11385857B2 (en) Method for displaying UI component and electronic device
CN112399390B (zh) 一种蓝牙回连的方法及相关装置
WO2023273321A1 (fr) Procédé de commande vocale et dispositif électronique
US20230216990A1 (en) Device Interaction Method and Electronic Device
EP3876506A1 (fr) Procédé de présentation d'une vidéo sur un dispositif électronique lors de l'arrivée d'un appel entrant et dispositif électronique
US20230351048A1 (en) Application Permission Management Method and Apparatus, and Electronic Device
WO2021052204A1 (fr) Procédé de découverte de dispositif basé sur un carnet d'adresses, procédé de communication audio et vidéo, et dispositif électronique
EP4174633A1 (fr) Système d'interaction d'affichage, procédé d'affichage, et dispositif
US11973895B2 (en) Call method and apparatus
CN114079893A (zh) 蓝牙通信方法、终端设备及计算机可读存储介质
WO2022042326A1 (fr) Procédé de commande d'affichage et appareil associé
WO2023088209A1 (fr) Procédé de transmission de données audio inter-dispositifs et dispositifs électroniques
CN113641271A (zh) 应用窗口的管理方法、终端设备及计算机可读存储介质
CN114125793A (zh) 一种蓝牙数据传输方法及相关装置
WO2022143258A1 (fr) Procédé de traitement d'interaction vocale et appareil associé
WO2022088964A1 (fr) Procédé et appareil de commande pour dispositif électronique
EP4354831A1 (fr) Procédé et appareil inter-dispositifs pour synchroniser une tâche de navigation, et dispositif et support de stockage
WO2023284555A1 (fr) Procédé pour appeler de manière sécurisée un service, et procédé et appareil pour enregistrer de manière sécurisée un service
CN113380240B (zh) 语音交互方法和电子设备
WO2021129453A1 (fr) Procédé de capture d'écran et dispositif associé
CN114115770B (zh) 显示控制的方法及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22831155

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE