WO2020181988A1 - Procédé de commande vocale et dispositif électronique - Google Patents

Procédé de commande vocale et dispositif électronique Download PDF

Info

Publication number
WO2020181988A1
WO2020181988A1 PCT/CN2020/076689 CN2020076689W WO2020181988A1 WO 2020181988 A1 WO2020181988 A1 WO 2020181988A1 CN 2020076689 W CN2020076689 W CN 2020076689W WO 2020181988 A1 WO2020181988 A1 WO 2020181988A1
Authority
WO
WIPO (PCT)
Prior art keywords
interface
electronic device
voice
application
control signal
Prior art date
Application number
PCT/CN2020/076689
Other languages
English (en)
Chinese (zh)
Inventor
王守诚
吴思举
周轩
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020181988A1 publication Critical patent/WO2020181988A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/40Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/22Details of telephonic subscriber devices including a touch pad, a touch sensor or a touch detector
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/74Details of telephonic subscriber devices with voice recognition means

Definitions

  • This application relates to the field of terminal technology, and in particular to a voice control method and electronic equipment.
  • the mobile phone will be pre-configured with voice tasks that the mobile phone can recognize and perform before leaving the factory, such as voice tasks for querying the weather and voice tasks for booking air tickets.
  • voice tasks for querying the weather and voice tasks for booking air tickets.
  • the task type needs to be configured on the background server corresponding to the voice assistant, and then the dialogue flow is designed according to the task type, so as to obtain information corresponding to the task type.
  • the voice assistant of the phone collects the voice control signal and sends the voice control signal to the background server .
  • the back-end server first reads the task type "book air ticket”, and then extracts the voice control signal according to the key information "departure place", "destination” and "time” required by the pre-configured "air ticket booking” task type
  • the key word is to generate a voice user interface (VUI) task.
  • the background server converts the VUI tasks into corresponding control instructions and sends them to the corresponding application.
  • the application responds with a pre-customized code and outputs the query result. It can be seen that the prior art requires the background server to pre-configure task types and key information, and the amount of task configuration is large, and in order to adapt to voice tasks, developers also need to adaptively develop applications that support voice interaction.
  • the present application provides a voice control method and electronic device, which supports voice control by combining with a graphical user interface, improves the user's voice control experience, and has a small development workload.
  • an embodiment of the present application provides a voice control method, the method is applicable to an electronic device, the method includes: the electronic device displays a first interface of the application, and the first interface includes a control for updating the first interface , And then the electronic device collects the user’s voice control signal, and when the touch event corresponding to the voice control signal is determined, in response to the voice control signal, executes the corresponding touch event, and finally displays the second interface of the application.
  • the second interface is the interface after the controls in the first interface perform touch operations.
  • the electronic device determines the corresponding input event according to the collected voice control signal, and then reuses the operation process of the input event of the operating system, and the voice task can be completed without adaptive development of the application.
  • This method makes full use of the operational convenience of voice control, uses voice control when the user is inconvenient for manual operation, and combines a graphical user interface to improve the user's voice experience.
  • the electronic device first obtains the configuration file associated with the first interface, where the configuration file includes the corresponding relationship between the control identifiers of the controls in the first interface and the touch event, so The electronic device may determine the target control identifier that matches the text information of the voice control signal, and then search the configuration file for the touch event corresponding to the target control identifier.
  • the electronic device when running an interface of the application, can determine the touch event corresponding to the voice control signal input by the user according to the configuration file of the interface, and then execute the touch event, so as to realize voice control The function of each control in the application interface.
  • the electronic device may also display an animation effect when a touch operation is performed on the control in the first interface.
  • the user is reminded by displaying an animation effect that the user is currently responding to voice control to improve the user's experience.
  • the electronic device may first respond to the wake-up signal input by the user, and start the voice application in the background; and then collect the voice control signal of the user through the voice application.
  • the input events of the voice application and the current operating system are combined to determine the input event corresponding to the collected voice control signal , And then reuse the operating process of the input event of the operating system, without the need for adaptive development of the application, you can complete the voice task.
  • the electronic device may also perform the touch operation.
  • an embodiment of the present application provides an electronic device including a processor and a memory.
  • the memory is used to store one or more computer programs; when the one or more computer programs stored in the memory are executed by the processor, the electronic device can implement any one of the possible design methods in any of the foregoing aspects.
  • an embodiment of the present application also provides a device, which includes a module/unit that executes any one of the possible design methods in any of the foregoing aspects.
  • modules/units can be realized by hardware, or by hardware executing corresponding software.
  • an embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium includes a computer program.
  • the computer program runs on an electronic device, the electronic device executes any one of the above aspects.
  • a possible design method is also provided.
  • the embodiments of the present application also provide a method that includes a computer program product, and when the computer program product runs on an electronic device, the electronic device executes any one of the possible designs in any of the foregoing aspects.
  • FIG. 1 is a schematic diagram of a voice control system provided by an embodiment of this application.
  • FIG. 2 is a schematic structural diagram of a mobile phone provided by an embodiment of the application.
  • FIG. 3 is a schematic diagram of the architecture of an operating system in an electronic device provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of an interface provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of a scene of a voice control method provided by an embodiment of this application.
  • FIG. 6 is a schematic diagram of a scene of another voice control method provided by an embodiment of the application.
  • Figure 7a is a schematic diagram of another interface provided by an embodiment of the application.
  • 7b to 7g are schematic diagrams of scenes of another voice control method provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of another voice control method provided by an embodiment of the application.
  • FIG. 9 is a schematic diagram of the interface of the voice assist function switch and the voice wake-up function switch provided by an embodiment of the application.
  • Figures 10a to 10b are schematic diagrams of scenarios of another voice control method provided by an embodiment of the application.
  • FIG. 11 is a schematic flowchart of a voice control method provided by an embodiment of this application.
  • FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of this application.
  • the voice control method provided by the embodiments of this application can be applied to mobile phones, tablet computers, desktops, laptops, notebook computers, ultra-mobile personal computers (UMPC), handheld computers, netbooks, and personal digital computers.
  • UMPC ultra-mobile personal computers
  • PDA personal digital assistants
  • wearable electronic devices wearable electronic devices
  • virtual reality devices the embodiments of the present application do not impose any limitation on this.
  • Fig. 2 shows a schematic structural diagram of the mobile phone.
  • the mobile phone may include a processor 110, an external memory interface 120, an internal memory 121, a USB interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, and audio Module 170, speaker 170A, receiver 170B, microphone 170C, earphone interface 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, SIM card interface 195 and so on.
  • a processor 110 an external memory interface 120, an internal memory 121, a USB interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, and audio Module 170, speaker 170A, receiver 170B, microphone 170C, earphone interface 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, SIM card interface 195 and so on.
  • the sensor module 180 may include a gyroscope sensor 180A, an acceleration sensor 180B, a proximity light sensor 180G, a fingerprint sensor 180H, a touch sensor 180K, and a hinge sensor 180M (Of course, the mobile phone 100 may also include other sensors, such as temperature sensors, pressure sensors, and distance sensors. Sensors, magnetic sensors, ambient light sensors, air pressure sensors, bone conduction sensors, etc., not shown in the figure).
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the mobile phone 100.
  • the mobile phone 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (Neural-network Processing Unit, NPU) Wait.
  • AP application processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • NPU neural network Processing Unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • the controller may be the nerve center and command center of the mobile phone 100. The controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 to store instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
  • the processor 110 can run the voice control method provided by the embodiments of the present application.
  • the method converts the voice control signal into an existing touch event, thereby realizing the support of the existing graphical user interface for the voice interaction mode, reducing development work Enhance the voice interaction function of electronic equipment.
  • the processor 110 integrates different devices, such as integrated CPU and GPU, the CPU and GPU can cooperate to execute the voice control method provided in the embodiment of the present application. For example, some of the algorithms in the method are executed by the CPU and the other part of the algorithms are executed by the GPU to obtain Faster processing efficiency.
  • the display screen 194 is used to display images, videos, etc.
  • the display screen 194 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the mobile phone 100 may include one or N display screens 194, and N is a positive integer greater than one.
  • the display screen can accept the user's touch operation and display the graphical user interface.
  • the display screen can also display the touch operation corresponding to the voice control signal when receiving the voice control signal. The animation effect of the event and the interface after execution.
  • the camera 193 (front camera or rear camera) is used to capture still images or videos.
  • the camera 193 may include photosensitive elements such as a lens group and an image sensor, where the lens group includes a plurality of lenses (convex lens or concave lens) for collecting light signals reflected by the object to be photographed and transmitting the collected light signals to the image sensor .
  • the image sensor generates an original image of the object to be photographed according to the light signal.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 110 executes various functional applications and data processing of the mobile phone 100 by running instructions stored in the internal memory 121.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store operating system, application program (such as camera application, WeChat application, etc.) codes and so on.
  • the data storage area can store data created during the use of the mobile phone 100 (for example, images and videos collected by a camera application).
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), etc.
  • a non-volatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), etc.
  • the functions of the sensor module 180 are described below.
  • the gyroscope sensor 180A can be used to determine the movement posture of the mobile phone 100.
  • the angular velocity of the electronic device 100 around three axes ie, x, y, and z axes
  • the gyroscope sensor 180A can be used to detect the current movement state of the mobile phone 100, such as shaking or static.
  • the acceleration sensor 180B can detect the magnitude of the acceleration of the mobile phone 100 in various directions (generally three axes). That is, the gyroscope sensor 180A can be used to detect the current movement state of the mobile phone 100, such as shaking or static.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the mobile phone emits infrared light through light-emitting diodes. Mobile phones use photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the phone. When insufficient reflected light is detected, the phone can determine that there is no object near the phone.
  • the gyroscope sensor 180A (or acceleration sensor 180B) may send the detected motion state information (such as angular velocity) to the processor 110.
  • the processor 110 determines whether it is currently in a hand-held state or a tripod state based on the motion state information (for example, when the angular velocity is not 0, it means that the mobile phone 100 is in the hand-held state).
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the mobile phone 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K may be disposed on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”.
  • the touch sensor 180K is used to detect touch operations acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation can be provided through the display screen 194.
  • the touch sensor 180K may also be disposed on the surface of the mobile phone 100, which is different from the position of the display screen 194.
  • the display screen 194 of the mobile phone 100 displays a main interface, and the main interface includes icons of multiple applications (such as a camera application, a WeChat application, etc.).
  • the display screen 194 displays an interface of the camera application, such as a viewfinder interface.
  • the wireless communication function of the mobile phone 100 can be realized by the antenna 1, the antenna 2, the mobile communication module 151, the wireless communication module 152, the modem processor, and the baseband processor.
  • the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the terminal device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 151 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the terminal device 100.
  • the mobile communication module 151 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
  • the mobile communication module 151 can receive electromagnetic waves by the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic waves for radiation via the antenna 1.
  • at least part of the functional modules of the mobile communication module 151 may be provided in the processor 110.
  • at least part of the functional modules of the mobile communication module 151 and at least part of the modules of the processor 110 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
  • the application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 152 can provide applications on the terminal device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 152 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 152 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
  • the wireless communication module 152 can also receive the signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic wave radiation through the antenna 2.
  • the mobile phone 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
  • the mobile phone 100 can receive the key 190 input, and generate key signal input related to the user settings and function control of the mobile phone 100.
  • the mobile phone 100 can use the motor 191 to generate a vibration notification (such as an incoming call vibration notification).
  • the indicator 192 in the mobile phone 100 can be an indicator light, which can be used to indicate the charging status, power change, and can also be used to indicate messages, missed calls, notifications, and so on.
  • the SIM card interface 195 in the mobile phone 100 is used to connect to the SIM card.
  • the SIM card can be connected to and separated from the mobile phone 100 by inserting into the SIM card interface 195 or pulling out from the SIM card interface 195.
  • the mobile phone 100 may include more or less components than those shown in FIG. 1, which is not limited in the embodiment of the present application.
  • the software system of the above electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiment of the present application takes a layered Android system as an example to illustrate the software structure of the electronic device 100.
  • FIG. 3 is a block diagram of the software structure of the electronic device 100 according to an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface.
  • the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.
  • the application layer can include a series of application packages. As shown in Figure 3, the application package can include applications such as camera, gallery, calendar, call, map, navigation, Bluetooth, music, video, short message, etc.
  • the application layer may also include a voice application with a voice recognition function.
  • the voice control signal sent by the user can be collected, and the voice control signal can be converted into text for semantic understanding.
  • the voice application can be converted into a touch event of the application program to complete the voice task.
  • the voice application can communicate with the background server to complete the voice task.
  • a voice application consists of two parts.
  • One part is a voice service running in the background, which is used to collect voice signals input by users, extract voice signals, text conversion or voice recognition, etc., and the other part refers to the mobile phone screen
  • the display content is used to display the interface of the voice application, such as the content of the dialogue between the user and the voice application.
  • the mobile phone running a voice application in the background can be understood as the mobile phone running a voice service in the background.
  • the mobile phone can also display information such as the identification of the voice APP in the form of a floating menu or the like, and the embodiment of the present application does not impose any restriction on this.
  • the application framework layer provides application programming interfaces (application programming interface, API) and programming frameworks for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, and a notification manager.
  • the window manager is used to manage window programs.
  • the window manager can obtain the size of the display, determine whether there is a status bar, lock the screen, take a screenshot, etc.
  • the content provider is used to store and retrieve data and make these data accessible to applications.
  • the data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls that display text and controls that display pictures.
  • the view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
  • the phone manager is used to provide the communication function of the electronic device 100. For example, the management of the call status (including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, etc.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can disappear automatically after a short stay without user interaction.
  • the notification manager is used to notify the download completion, message reminder, etc.
  • the notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, text messages are prompted in the status bar, prompt sounds, electronic devices vibrate, and indicator lights flash.
  • the application framework layer also includes a VUI (voice user interface, voice user interface) manager.
  • VUI voice user interface, voice user interface
  • the VUI manager can monitor the running status of voice applications, and can also be used as a bridge between voice applications and other applications, passing the voice tasks recognized by the voice applications to related applications for execution.
  • Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
  • the core library consists of two parts: one part is the function functions that the java language needs to call, and the other part is the core library of Android.
  • the application layer and the application framework layer run in a virtual machine.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • the system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), 2D graphics engine (for example: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to realize 3D graphics drawing, image rendering, synthesis, and layer processing.
  • the 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer includes at least a display driver, a camera driver, an audio driver, a sensor driver, etc., which are not limited in the embodiment of the present application.
  • This application provides a voice control method that combines voice applications and input events of the current operating system (such as virtual key input events, key input events, and screen touch events, etc.) to determine the collected voice control
  • the input event corresponding to the signal is then reused with the operating process of the input event of the operating system, and the voice task can be completed without the need for adaptive development of the application.
  • This method makes full use of the operational convenience of voice control, uses voice control when the user is inconvenient for manual operation, and combines a graphical user interface to improve the user's voice experience.
  • the GUI (graphical user interface) displayed by the mobile phone generally includes one or more controls.
  • the elements presented in the GUI can be called controls, which can provide users with certain operations.
  • Figure 4 is a GUI schematic diagram of a new contact in a mobile phone application. As can be seen from the figure, each input box has a prompt text, such as "name”, “work unit”, “phone number”, “email”, and “remarks”. There is also corresponding text information on the button, such as "Add another item". When the voice control function of the phone is turned on, the phone will start the voice application in the background.
  • the user can send a voice control signal to the mobile phone through the voice application, and then the mobile phone determines the corresponding control and the type of the control from the current interface according to the voice control signal, and then performs a touch operation corresponding to the control type on the control.
  • the mobile phone displays the interface of Figure 4, if the user sends out a voice control signal of "Name Zhang San", the voice application uses a microphone to collect the voice control signal input by the user, and extracts, text conversion or voice recognition of the voice control signal. Obtain the voice content "Name Zhang San”, then the mobile phone finds the touch event corresponding to "Name” from the configuration file of the interface according to the voice content "Name” to perform an input operation on the control 202, and first focus on the input Then call the input method to set the voice content "Zhang San” as the input content into the input box, as shown in Figure 5.
  • the mobile phone can also display the animation effect of inputting "Zhang San” as the input content, which visually reminds the user that the mobile phone is responding to the user's input of "Zhang San”.
  • the mobile phone displays the interface in Figure 4, if the user sends out a voice control signal of "add other items", the voice application uses a microphone to collect the voice control signal input by the user, and extracts, text conversion, or voice recognition of the voice control signal , Get the voice content "add other items”. Then, according to the voice content "add other items", the mobile phone finds from the configuration file that the corresponding touch event is to perform a click operation on the control 203, so the click operation is performed on the button, as shown in FIG.
  • the voice application can use a preset voice recognition algorithm to convert the voice control signal input by the user into text and perform semantic understanding, so as to find the control based on the voice content after semantic understanding.
  • the phone can start a voice application in the background.
  • the icon 201 of the voice application may be displayed on the interface shown in FIG. 4.
  • the icon 201 is used to indicate that the voice application is running in the background of the mobile phone.
  • the mobile phone can still respond to the user's various touch operations in the interface, for example, the mobile phone responds when the user clicks the "add other item" click operation.
  • it can also be set by default that when the voice application is running in the background, the mobile phone does not respond to various touch operations of the user on the interface, which is not limited in the embodiment of the present application.
  • Figure 7a is the interface of the ticketing application. If the mobile phone is in the interface shown in Figure 7a, when the user sends out the voice control signal of "ticket", the voice application uses the microphone to collect the voice control signal input by the user and performs the voice control signal Extraction, text conversion or voice recognition, to get the voice content "ticket”. Then, the mobile phone finds that the corresponding touch event based on the voice content "ticket” is to perform a click operation on the control 204, so the mobile phone performs a click operation on the control, as shown in FIG. 7b. Then the mobile phone switches from the interface shown in Figure 7b to the interface shown in Figure 7c.
  • the voice application uses the microphone to collect the voice control signal input by the user, and extracts the voice control signal, text conversion or voice recognition, Get the voice content "Departure Shanghai”. Then the mobile phone finds the corresponding touch event according to the voice content "starting place”. The input operation is performed in the input box corresponding to the "starting place”, so first put the focus on the input box, and then call the input method to change the voice content. "Shanghai” is set as the input content to the input box, as shown in Figure 7d.
  • the voice application uses the microphone to collect the voice control signal input by the user, and extracts, texts, or converts the voice control signal. Voice recognition, get the voice content "Destination Beijing”. Then the mobile phone finds the corresponding touch event according to the voice content "destination” and performs the input operation in the input box corresponding to the "destination”, so first put the focus on the input box, and then call the input method to change the voice content " "Beijing" is set as the input content to the input box, as shown in Figure 7e.
  • the voice application uses the microphone to collect the voice control signal input by the user, and extracts and texts the voice control signal. Conversion or voice recognition, get the voice content "time March 6th”. Then the mobile phone finds the corresponding touch event according to the voice content "time” and performs the input operation in the input box corresponding to the "time”, so first put the focus on the input box, and then call the input method to change the voice content "March No. 6" is set as the input content to the input box, as shown in Figure 7f.
  • the voice application uses the microphone to collect the voice control signal input by the user, and extracts the voice control signal, text conversion or voice recognition, Get the voice content "Search”. Then the mobile phone "search” according to the voice content to find the corresponding touch event is to perform a click operation on the "search" control, as shown in Figure 7g
  • the embodiment of the present application combines the voice control function with the graphical user interface to realize that the existing graphical user interface supports voice control, improves the voice experience, and has a smaller development workload.
  • the background server extracts keywords based on the key information "departure”, “destination”, and “time” required by the pre-configured "ticket booking” task, and generates a VUI task.
  • the background server converts the VUI task into a corresponding control instruction and sends it to the corresponding application program.
  • the application program responds with a pre-customized code and displays the interface as shown in Figure 8b.
  • this embodiment of the application can provide a voice control method that combines the Talkback function and the voice control function, that is, the user turns on the Talkback function switch and the voice wake-up function switch, as shown in the figure 9 shown.
  • the voice application uses the microphone to collect the voice control signal input by the user, and extracts, texts, or converts the voice control signal.
  • the speech content "Zhang San” is obtained, and then the input method is called to set the speech content "Zhang San” as the input content into the input box, as shown in Figure 10b.
  • the mobile phone can also voice broadcast "Zhang San input completed” to remind the user that the operation was successful.
  • the above-mentioned operation method is more convenient and efficient than the traditional language assistance function, which is convenient for the blind and people with low vision to operate the mobile phone more conveniently, and further improves the user experience.
  • Step 301 The electronic device displays the first interface of the application.
  • the first interface includes one or more controls for updating the first interface.
  • the first interface displayed by the mobile phone is the interface shown in FIG. 4, and multiple controls such as a button "add other items" and an input box are provided in the interface.
  • the user can operate these controls to update the display content of the mobile phone, so that the mobile phone displays the updated second interface.
  • Step 302 The electronic device collects the user's voice control signal.
  • the mobile phone may set the microphone to be always on (always on). Then, while the mobile phone displays an application interface (for example, the first interface), the microphone of the mobile phone is also collecting voice control signals at a certain working frequency.
  • the user can start the voice application of the mobile phone by issuing a wake-up signal, and then the mobile phone collects the user's voice control signal through the voice application, and performs extraction, text conversion or voice recognition on it. For example, after the user sends out the sound signal of "Xiaoyi Xiaoyi", the mobile phone can collect the sound signal through the microphone. If the sound signal of the hand is the preset wake-up signal, the mobile phone starts the voice application to collect the voice control signal.
  • Step 303 The electronic device determines a touch event corresponding to the voice control signal.
  • a touch event refers to a touch operation performed on a control.
  • the electronic device may pre-store the configuration files of each application, for example, each application corresponds to one or more configuration files.
  • the configuration file records the correspondence between touch events and voice control signals in different interfaces of an application.
  • a configuration file can also only record the correspondence between touch events and voice control signals in one interface of an application.
  • All controls on the Android-based interface are mounted under a DecorView node under the window of the current interface.
  • the Android software system can scan each control identifier from DecorView and communicate with the user The text information of the spoken voice control signal is compared to determine the target control identifier corresponding to the text information of the voice control signal, and then the touch event corresponding to the target control identifier is searched from the configuration file.
  • the developer can set the configuration file 1 of the new contact interface in the installation package of the phone book application.
  • the configuration file 1 records the corresponding relationship between each touch event and voice control signal in the new contact interface.
  • the input event of the "Name” input box corresponds to the control identifier "Name”
  • the control identifier "Name” corresponds to voice control.
  • the text information of the signal corresponds.
  • the click operation of "add other item” corresponds to the control identifier "add other item”
  • the control identifier "add other item” corresponds to the text information "add other item” of the voice control signal.
  • the electronic device receives the voice control signal, it can find the touch event corresponding to the voice control signal from the configuration file. That is to say, the corresponding relationship between the “voice control signal” and the touch event of clicking the first control in the first interface is recorded in the configuration file 1.
  • the mobile phone when the mobile phone receives the voice control signal for the user to input "name”, it is equivalent to the mobile phone detecting that the user clicks on the "name” input box, and the focus falls on the input box.
  • the electronic device can directly install the configuration file locally.
  • the configuration file 1 provided in the phonebook application installation package can be stored in the memory of the mobile phone. In this way, the mobile phone can support the voice control function even if it is not connected to the Internet.
  • Step 304 In response to the voice control signal, the electronic device executes the touch event and displays a second interface of the application, where the second interface is after the first control in the first interface performs a touch operation Interface.
  • corresponding configuration files can be set for each interface in the application, and the configuration file records the voice control signal supported by the corresponding interface and the touch event corresponding to the voice control signal.
  • the electronic device can determine the touch event corresponding to the voice control signal input by the user according to the configuration file of the interface, and then execute the touch event, so as to realize the voice control of the application interface The function of each control.
  • the interface to which the electronic device can be applied is granular to realize the voice control function of each operation button in the interface, thereby improving voice control efficiency and user experience.
  • the electronic device determines that the voice control signal input by the user is not a voice control signal supported by the configuration file, it can also send the voice control signal to the background server, and the background server determines the task Type and extract key information to generate VUI tasks.
  • the background server converts the VUI task into a corresponding control instruction and sends it to the corresponding application program. For a specific example, see scenario three.
  • control in the interface of some applications of the electronic device may not display the name or text prompt information of the control, and the embodiment of the present application may provide the text prompt information of this type of control in the interface.
  • the controls are configured with "android:contentDescribtion" information. This type of control can directly reuse the configured text description of android:contentDescribtion, that is, these texts The description is displayed in the interface as a text prompt for this type of control.
  • some touch events may be preset in the embodiment of the present application, such as: upward movement corresponding to the voice control signal “up”, The downward movement operation corresponding to the voice control signal “bottom”, the left movement operation corresponding to the voice control signal “left”, and the right movement operation corresponding to the voice control signal “right” are used to simulate the direction stick operation or keyboard
  • the up, down, left, and right key operations of the handles the movement of the focus of the current control.
  • the core of the voice control method provided by the embodiments of the application is to determine the touch event according to the voice control signal, that is, find the corresponding control, and then simulate the corresponding touch event (such as click, Long press), input method events (such as text input) and key operations (such as up, down, left, and right movement), the GUI of the application does not need to be adaptively developed to implement specific voice control functions.
  • an embodiment of the present application discloses an electronic device, including: a touch screen 1201, the touch screen 1201 includes a touch-sensitive surface 1206 and a display screen 1207; one or more processors 1202; a memory 1203; a communication module 1208
  • One or more application programs (not shown); and one or more computer programs 1204, each of the above-mentioned devices may be connected through one or more communication buses 1205.
  • the one or more computer programs 1204 are stored in the aforementioned memory 1203 and are configured to be executed by the one or more processors 1202, and the one or more computer programs 1204 include instructions, which can be used to execute the aforementioned implementations. Each step in the example, for example, the instruction can be used to execute each step shown in FIG. 11.
  • the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and when the computer instructions run on an electronic device, the electronic device executes the above-mentioned related method steps to implement the above-mentioned embodiment Methods.
  • the embodiments of the present application also provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute the above-mentioned related steps to implement the method in the above-mentioned embodiment.
  • the embodiments of the present application also provide a device.
  • the device may specifically be a chip, component or module.
  • the device may include a connected processor and a memory; wherein the memory is used to store computer execution instructions.
  • the processor can execute the computer-executable instructions stored in the memory, so that the chip executes the methods in the foregoing method embodiments.
  • the electronic devices, computer storage media, computer program products, or chips provided in the embodiments of this application are all used to execute the corresponding methods provided above. Therefore, the beneficial effects that can be achieved can refer to the corresponding methods provided above. The beneficial effects of the method are not repeated here.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of modules or units is only a logical function division.
  • there may be other division methods for example, multiple units or components may be combined or It can be integrated into another device, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may be one physical unit or multiple physical units, that is, they may be located in one place, or they may be distributed to multiple different places. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application are essentially or the part that contributes to the prior art, or all or part of the technical solutions can be embodied in the form of software products, which are stored in a storage medium. It includes several instructions to make a device (may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (read only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

L'invention concerne un procédé de commande vocale et un dispositif électronique, se rapportant au domaine technique des communications, qui peuvent inviter un utilisateur à exécuter une tâche vocale en relation avec une application pendant le processus d'exécution de l'application, et améliorer l'efficacité de la commande vocale d'un dispositif électronique et l'expérience utilisateur. Le procédé consiste à : afficher une première interface d'une application, la première interface comprenant une commande pour mettre à jour la première interface (301) ; puis collecter un signal de commande vocale d'un utilisateur (302) ; déterminer un événement de commande tactile correspondant au signal de commande vocale, l'événement de commande tactile exécutant une opération de commande tactile sur la commande (303) ; et, en réponse au signal de commande vocale, faire exécuter, par un dispositif électronique, l'événement de commande tactile et afficher une seconde interface de l'application, la seconde interface étant une interface faisant suite à l'exécution de l'opération de commande tactile sur la commande dans la première interface (304).
PCT/CN2020/076689 2019-03-08 2020-02-26 Procédé de commande vocale et dispositif électronique WO2020181988A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910176543.1A CN110060672A (zh) 2019-03-08 2019-03-08 一种语音控制方法及电子设备
CN201910176543.1 2019-03-08

Publications (1)

Publication Number Publication Date
WO2020181988A1 true WO2020181988A1 (fr) 2020-09-17

Family

ID=67316741

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/076689 WO2020181988A1 (fr) 2019-03-08 2020-02-26 Procédé de commande vocale et dispositif électronique

Country Status (2)

Country Link
CN (1) CN110060672A (fr)
WO (1) WO2020181988A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114826805A (zh) * 2021-01-28 2022-07-29 星络家居云物联科技有限公司 计算机可读存储介质、移动终端、智能家居控制方法
CN115706749A (zh) * 2021-08-12 2023-02-17 华为技术有限公司 一种设置提醒的方法和电子设备

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060672A (zh) * 2019-03-08 2019-07-26 华为技术有限公司 一种语音控制方法及电子设备
CN110493123B (zh) * 2019-09-16 2022-06-28 腾讯科技(深圳)有限公司 即时通讯方法、装置、设备及存储介质
CN110837334B (zh) * 2019-11-04 2022-03-22 北京字节跳动网络技术有限公司 用于交互控制的方法、装置、终端及存储介质
CN110968362B (zh) * 2019-11-18 2023-09-26 北京小米移动软件有限公司 应用运行方法、装置及存储介质
CN111443850A (zh) * 2020-03-10 2020-07-24 努比亚技术有限公司 一种终端操作方法、终端和存储介质
CN111475241B (zh) * 2020-04-02 2022-03-11 深圳创维-Rgb电子有限公司 一种界面的操作方法、装置、电子设备及可读存储介质
CN111599358A (zh) * 2020-04-09 2020-08-28 华为技术有限公司 语音交互方法及电子设备
CN111475216B (zh) * 2020-04-15 2024-03-08 亿咖通(湖北)技术有限公司 一种app的语音控制方法、计算机存储介质及电子设备
CN114007117B (zh) * 2020-07-28 2023-03-21 华为技术有限公司 一种控件显示方法和设备
CN112083843B (zh) * 2020-09-02 2022-05-27 珠海格力电器股份有限公司 应用图标的控制方法及装置
CN114527920A (zh) * 2020-10-30 2022-05-24 华为终端有限公司 一种人机交互方法及电子设备
CN112581957B (zh) * 2020-12-04 2023-04-11 浪潮电子信息产业股份有限公司 一种计算机语音控制方法、系统及相关装置
CN112863514B (zh) * 2021-03-15 2024-03-15 亿咖通(湖北)技术有限公司 一种语音应用的控制方法和电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358953A (zh) * 2017-06-30 2017-11-17 努比亚技术有限公司 语音控制方法、移动终端及存储介质
CN107967055A (zh) * 2017-11-16 2018-04-27 深圳市金立通信设备有限公司 一种人机交互方法、终端及计算机可读介质
CN108108142A (zh) * 2017-12-14 2018-06-01 广东欧珀移动通信有限公司 语音信息处理方法、装置、终端设备及存储介质
CN108364644A (zh) * 2018-01-17 2018-08-03 深圳市金立通信设备有限公司 一种语音交互方法、终端及计算机可读介质
CN108538291A (zh) * 2018-04-11 2018-09-14 百度在线网络技术(北京)有限公司 语音控制方法、终端设备、云端服务器及系统
CN110060672A (zh) * 2019-03-08 2019-07-26 华为技术有限公司 一种语音控制方法及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358953A (zh) * 2017-06-30 2017-11-17 努比亚技术有限公司 语音控制方法、移动终端及存储介质
CN107967055A (zh) * 2017-11-16 2018-04-27 深圳市金立通信设备有限公司 一种人机交互方法、终端及计算机可读介质
CN108108142A (zh) * 2017-12-14 2018-06-01 广东欧珀移动通信有限公司 语音信息处理方法、装置、终端设备及存储介质
CN108364644A (zh) * 2018-01-17 2018-08-03 深圳市金立通信设备有限公司 一种语音交互方法、终端及计算机可读介质
CN108538291A (zh) * 2018-04-11 2018-09-14 百度在线网络技术(北京)有限公司 语音控制方法、终端设备、云端服务器及系统
CN110060672A (zh) * 2019-03-08 2019-07-26 华为技术有限公司 一种语音控制方法及电子设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114826805A (zh) * 2021-01-28 2022-07-29 星络家居云物联科技有限公司 计算机可读存储介质、移动终端、智能家居控制方法
CN115706749A (zh) * 2021-08-12 2023-02-17 华为技术有限公司 一种设置提醒的方法和电子设备

Also Published As

Publication number Publication date
CN110060672A (zh) 2019-07-26

Similar Documents

Publication Publication Date Title
WO2020181988A1 (fr) Procédé de commande vocale et dispositif électronique
JP7142783B2 (ja) 音声制御方法及び電子装置
WO2021164313A1 (fr) Procédé, appareil et système de topologie d'interface
WO2021057868A1 (fr) Procédé de commutation d'interface et dispositif électronique
WO2021063343A1 (fr) Procédé et dispositif d'interaction vocale
WO2021037223A1 (fr) Procédé de commande tactile et dispositif électronique
WO2021110133A1 (fr) Procédé d'opération de commande et dispositif électronique
WO2022052776A1 (fr) Procédé d'interaction homme-ordinateur, ainsi que dispositif électronique et système
CN111316199A (zh) 一种信息处理方法及电子设备
WO2022100221A1 (fr) Procédé et appareil de traitement de récupération et support de stockage
CN112130714B (zh) 可进行学习的关键词搜索方法和电子设备
EP4280058A1 (fr) Procédé d'affichage d'informations et dispositif électronique
WO2021175272A1 (fr) Procédé d'affichage d'informations d'application et dispositif associé
WO2021151320A1 (fr) Procédé de détection de posture de maintien et dispositif électronique
CN112835495B (zh) 开启应用程序的方法、装置及终端设备
WO2021185174A1 (fr) Procédé et appareil de sélection de carte électronique, terminal, et support de stockage
CN112740148A (zh) 一种向输入框中输入信息的方法及电子设备
CN116028148B (zh) 一种界面处理方法、装置及电子设备
CN115421603A (zh) 一种笔迹处理方法、终端设备及芯片系统
WO2022001261A1 (fr) Procédé de suggestion et dispositif terminal
WO2023202444A1 (fr) Procédé et appareil d'entrée
WO2022052961A1 (fr) Procédé d'exécution d'une authentification biométrique lorsque de multiples interfaces d'applications sont affichées simultanément
US20240129619A1 (en) Method and Apparatus for Performing Control Operation, Storage Medium, and Control
WO2023179454A1 (fr) Procédé d'appel de service et dispositif électronique
CN114244951B (zh) 应用程序打开页面的方法及其介质和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20769916

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20769916

Country of ref document: EP

Kind code of ref document: A1