CN112908329B

CN112908329B - Voice control method and device, electronic equipment and medium

Info

Publication number: CN112908329B
Application number: CN202110221230.0A
Authority: CN
Inventors: 刘俊启
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2023-07-18
Anticipated expiration: 2041-02-26
Also published as: CN112908329A

Abstract

The disclosure provides a voice control method and device, electronic equipment and medium, relates to the field of intelligent control, and particularly relates to the field of voice control. The implementation scheme is as follows: acquiring first motion information of a mobile terminal; responding to the first motion information meeting a preset activation condition, and executing at least one verification operation; and switching a voice transmission mode of an application running on the mobile terminal in response to each of the at least one authentication operation verifying success.

Description

Voice control method and device, electronic equipment and medium

Technical Field

The present disclosure relates to the field of intelligent control technology, and in particular, to a method, an apparatus, an electronic device, a computer readable storage medium, and a computer program product for voice control.

Background

With the development of electronic technology, users can realize remote voice communication through various mobile devices, wherein the mobile devices are not limited to mobile phones and mobile computers, and can also comprise various mobile devices capable of realizing remote communication, such as intelligent wearable devices, handheld game machines and the like. With the increasing variety of mobile devices, voice control methods for mobile devices are becoming increasingly diverse.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems mentioned in this section should not be considered as having been recognized in any prior art unless otherwise indicated.

Disclosure of Invention

The present disclosure provides a method, apparatus, electronic device, computer-readable storage medium, and computer program product for voice control.

According to an aspect of the present disclosure, there is provided a voice control method, including: acquiring first motion information of a mobile terminal; responding to the first motion information meeting a preset activation condition, and executing at least one verification operation; and switching a voice transmission mode of an application running on the mobile terminal in response to each of the at least one authentication operation verifying success.

According to another aspect of the present disclosure, there is provided a voice control apparatus, including: an acquisition unit configured to acquire first motion information of a mobile terminal; a verification unit configured to perform at least one verification operation in response to the first motion information satisfying a preset activation condition; and a switching unit configured to switch a voice transmission mode of an application running on the mobile terminal in response to each of the at least one authentication operation being authenticated successfully.

According to another aspect of the present disclosure, there is provided an electronic device including: a memory, a processor and a computer program stored on the memory, wherein the processor is configured to execute the computer program with the steps of the above-described method.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the above-described method.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program, wherein the computer program, when executed by a processor, implements the steps of the above-described method.

According to one or more embodiments of the present disclosure, control of a voice transmission mode of an application running on a mobile terminal may be conveniently performed, and misoperation is effectively avoided, so that accuracy of control of the voice transmission mode is improved, and user experience is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The illustrated embodiments are for exemplary purposes only and do not limit the scope of the claims. Throughout the drawings, identical reference numerals designate similar, but not necessarily identical, elements.

FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, in accordance with an embodiment of the present disclosure;

FIG. 2 illustrates a flow chart of a voice control method according to an embodiment of the present disclosure;

FIG. 3 illustrates a flow chart of another voice control method according to an embodiment of the present disclosure;

FIG. 4 illustrates a flow chart of another voice control method according to an embodiment of the present disclosure;

fig. 5 shows a block diagram of a voice control apparatus according to an embodiment of the present disclosure;

fig. 6 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, the use of the terms "first," "second," and the like to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of the elements, unless otherwise indicated, and such terms are merely used to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on the description of the context.

The terminology used in the description of the various examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, the elements may be one or more if the number of the elements is not specifically limited. Furthermore, the term "and/or" as used in this disclosure encompasses any and all possible combinations of the listed items.

In the related art, in order to control a voice transmission mode of an application running on a mobile terminal, a user is often required to operate a corresponding control key in an operation page of the application to switch the voice transmission mode. In this case, when the user needs to perform switching of the voice transmission mode, it is inevitably necessary to open an operation page of the application program and operate the corresponding control key to perform switching of the voice transmission mode. The operation method sometimes needs multiple operations of a user, is complicated, and causes poor user experience.

Based on this, the present disclosure proposes a method, an apparatus, an electronic device, a computer-readable storage medium and a computer program product for speech control. According to the voice control method, under the condition that the first motion information of the mobile terminal meets the preset activation condition, at least one verification operation is executed, and the voice transmission mode of the application program running on the mobile terminal is switched according to the verification result, so that the control of the voice transmission mode of the application program running on the mobile terminal can be conveniently executed, misoperation is avoided, the accuracy of the control of the voice transmission mode is improved, and user experience is improved.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented, in accordance with an embodiment of the present disclosure. Referring to fig. 1, the system 100 includes one or more client devices 101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120. Client devices 101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In embodiments of the present disclosure, the server 120 may run one or more services or software applications that enable the execution of the method of voice control.

In some embodiments, server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of client devices 101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof that are executable by one or more processors. A user operating client devices 101, 102, 103, 104, 105, and/or 106 may in turn utilize one or more client applications to interact with server 120 to utilize the services provided by these components. It should be appreciated that a variety of different system configurations are possible, which may differ from system 100. Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

The user may acquire voice data using the client devices 101, 102, 103, 104, 105, and/or 106, and various sensing data including motion information, gesture information, and image information of the mobile terminal to implement voice control of the mobile terminal based on the sensing data. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that the present disclosure may support any number of client devices.

Client devices 101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and the like. These computer devices may run various types and versions of software applications and operating systems, such as Microsoft Windows, apple iOS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., google Chrome OS); or include various mobile operating systems such as Microsoft Windows Mobile OS, iOS, windows Phone, android. Portable handheld devices may include cellular telephones, smart phones, tablet computers, personal Digital Assistants (PDAs), and the like. Wearable devices may include head mounted displays and other devices. The gaming system may include various handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. For example only, the one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture that involves virtualization (e.g., one or more flexible pools of logical storage devices that may be virtualized to maintain virtual storage devices of the server). In various embodiments, server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems. Server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, etc.

In some implementations, server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of client devices 101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of client devices 101, 102, 103, 104, 105, and 106.

In some implementations, the server 120 may be a server of a distributed system or a server that incorporates a blockchain. The server 120 may also be a cloud server, or an intelligent cloud computing server or intelligent cloud host with artificial intelligence technology. The cloud server is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and virtual private server (VPS, virtual Private Server) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of databases 130 may be used to store information such as audio files and video files. The data store 130 may reside in a variety of locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The data store 130 may be of different types. In some embodiments, the data store used by server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve the databases and data from the databases in response to the commands.

In some embodiments, one or more of databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key value stores, object stores, or conventional stores supported by the file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

Fig. 2 is a flowchart illustrating a voice control method according to an exemplary embodiment of the present disclosure, the method may include: step S201, obtaining first motion information of a mobile terminal; step S202, at least one verification operation is executed in response to the first motion information meeting a preset activation condition; and step S203, switching the voice transmission mode of the application running on the mobile terminal in response to each verification operation of at least one verification operation being successful. Therefore, the user can conveniently control the voice transmission mode of the application program running on the mobile terminal, misoperation is avoided, the accuracy of controlling the voice transmission mode is improved, and the user experience is improved.

For step S201, the first motion information may be one or more including acceleration information, angular velocity information, and the like.

According to some embodiments, the mobile terminal may obtain the first motion information of the mobile terminal by calling its own acceleration sensor and/or angular velocity sensor.

For step S202, according to some embodiments, the first motion information may include an acceleration value, and the preset activation condition may include the acceleration value of the mobile terminal being greater than a preset acceleration threshold. Thus, whether the activation condition is satisfied or not can be conveniently judged according to the acquired acceleration information of the mobile terminal, and further the subsequent verification operation is executed.

For example, when the user moves the mobile terminal to the mouth by lifting the arm, the acceleration sensor in the mobile terminal can detect a corresponding acceleration value, and at this time, the acceleration value is greater than a preset acceleration threshold, and it can be determined that the activation condition is met, so that a specific application running in the mobile terminal enters an activated state.

It can be understood that, under the condition that the user normally uses the mobile terminal, even if the acceleration sensor in the mobile terminal can detect a certain acceleration value, the acceleration value of the mobile terminal required by the acceleration value not meeting the preset activation condition is larger than the preset acceleration threshold, the application program cannot be brought into the activation state, so that misoperation of the user can be reduced to a certain extent.

According to some embodiments, the preset activation condition may further include a preset specific action. The user can control the mobile terminal to execute a specific action consistent with the preset activation condition so as to enable a specific application running in the mobile terminal to enter an activated state. For example, when the preset activation condition is that a circle is drawn clockwise, the user may put a specific application running in the mobile terminal into an activated state by waving the mobile terminal to draw a circle clockwise in space.

It will be appreciated that the preset activation conditions may also include other specific actions, and the present disclosure is not limited.

When an application running on the mobile terminal enters an active state, it is possible to determine whether to switch the voice transmission mode of the application accordingly by performing at least one authentication operation.

It will be appreciated that the thread of at least one authentication operation will not be initiated until after the application running on the mobile terminal has entered an active state. Therefore, the voice control accuracy is improved, and meanwhile, the excessive occupation of the sensors and the computing resources of the mobile terminal is avoided.

According to some embodiments, the at least one verification operation may include a gesture verification operation, which may include: acquiring attitude information of a mobile terminal; and determining that the gesture verification operation is successful in response to the gesture information satisfying the first verification condition. Therefore, the current intention of the user can be accurately judged based on the gesture information of the mobile terminal, and misoperation of voice control is avoided.

According to some embodiments, the gesture information may include orientation information of the mobile terminal, and the first verification condition may include a right side up of the mobile terminal.

It will be appreciated that when a user wishes to communicate voice via a mobile terminal, the front of the mobile terminal will be oriented so that a microphone is typically provided on the front of the mobile terminal to better receive the user's voice. Therefore, when the front side of the mobile terminal faces upwards, the user can be judged to wish to conduct voice communication through the mobile terminal, and then the voice communication of the mobile terminal is correspondingly started; otherwise, it may be determined that the user does not wish to perform voice communication through the mobile terminal at present, and accordingly does not start voice communication of the mobile terminal.

According to some embodiments, when the microphone of the mobile terminal is disposed on the back or side of the mobile terminal, the first verification condition may be set to face up on the back or side of the mobile terminal, respectively.

According to some embodiments, the orientation information of the mobile terminal may be detected by an acceleration sensor provided in the mobile terminal, and when the acceleration sensor detects that the gravitational acceleration on the vertical coordinate axis is in a negative direction, it may be determined that the front side of the mobile terminal is facing upwards, and conversely, it may be determined that the back side of the mobile terminal is facing upwards. It will be appreciated that the above detection of the orientation information of the mobile terminal by using the acceleration sensor is only an exemplary embodiment, and in practical applications, various gesture information of the mobile terminal including the orientation information may be detected by using other methods, so as to implement gesture verification operation, which is not limited herein.

According to some embodiments, after the running application enters the activated state in response to the first motion information of the mobile terminal satisfying the preset activation condition, a corresponding thread of the gesture verification operation is started, and a corresponding sensor in the mobile terminal is started to acquire gesture information of the mobile terminal, in other words, when the mobile terminal does not enter the activated state, a corresponding thread of the gesture verification operation is not started, and a related sensor for acquiring the gesture information of the mobile terminal is in the closed state. Thereby, excessive occupation of relevant sensors and computing resources in the mobile terminal can be avoided.

According to some embodiments, the at least one authentication operation may include an image authentication operation, fig. 3 is a schematic diagram illustrating the image authentication operation according to an exemplary embodiment of the present disclosure, the image authentication operation including: step S301, controlling the mobile terminal to acquire image information; step S302, acquiring image information acquired by a mobile terminal; and step S303, in response to the image information meeting the second verification condition, determining that the image verification operation is successful in verification. Therefore, the current intention of the user can be accurately judged based on the acquired image information, and misoperation of voice control is avoided.

The mobile terminal can execute image information acquisition by calling one or more cameras arranged at different positions.

According to some embodiments, the second authentication condition includes including face information in the image information.

It will be appreciated that when a user wishes to communicate voice via the mobile terminal, the mobile terminal will be moved to a position close to the face. Therefore, when the camera of the mobile terminal can acquire the face information, the user can be judged to wish to carry out voice communication through the mobile terminal, and then the voice communication of the mobile terminal is correspondingly started; otherwise, it may be determined that the user does not wish to perform voice communication through the mobile terminal, and accordingly does not turn on voice communication of the mobile terminal.

The face information may be the whole face or a part of the face.

According to some embodiments, the face information may include specific organs in the face, such as the mouth. In particular, the face information may also include a specific state of a specific organ, for example, a mouth in an open state.

According to some embodiments, after an application running in the mobile terminal enters an activated state in response to the first motion information of the mobile terminal satisfying a preset activation condition, a corresponding thread of an image verification operation is started, a camera in the mobile terminal is started to acquire image information, in other words, when the mobile terminal does not enter the activated state, the corresponding thread of the image verification operation is not started, and the camera of the mobile terminal is in a closed state. Thus, waste of relevant sensors and computing resources in the mobile terminal can be avoided.

According to some embodiments, the at least one verification operation may include a state verification operation, fig. 4 is a diagram illustrating a state verification operation according to an exemplary embodiment of the present disclosure, the state verification operation including: step S401, obtaining second motion information of the mobile terminal in a first preset time range; step S402, determining the state of the mobile terminal according to second motion information in a first preset time range based on a preset rule; and step S403, in response to the state of the mobile terminal being a static state, determining that the state verification operation is successful. Therefore, the current intention of the user can be accurately judged based on whether the mobile terminal is in a static state or not, and misoperation of voice control is avoided.

It is understood that when the mobile terminal is in a motion state, for example, in a moving vehicle, or when the mobile terminal is carried by a user for outdoor exercises, the activation operation performed according to the first motion information of the mobile terminal does not conform to the actual intention of the user. In this case, by performing the state verification operation on the mobile terminal, the false activation caused by the mobile terminal being in a motion state can be avoided, and the accuracy of voice control can be improved.

The first preset time range may be a specific time period after the mobile terminal enters the active state. The second motion information of the mobile terminal in the first preset time range is time-varying motion information of the mobile terminal in the first preset time range, and may include acceleration information, angular velocity information, and the like.

According to some embodiments, the state verification operation may be performed based on a continuous plurality of first preset time ranges. Specifically, when it is determined that the state of the mobile terminal is not the stationary state according to the second motion information within the first preset time range, it may be determined whether the state of the mobile terminal is the stationary state within the next first preset time range after the first preset time range is ended, and so on until it is determined that the state of the mobile terminal is the stationary state or the state verification operation based on each of the first preset time ranges is completed.

According to some embodiments, determining the state of the mobile terminal according to the second motion information within the first preset time range based on the preset rule comprises: and determining that the state of the mobile terminal is a static state in response to the value of the second motion information being smaller than a preset threshold value in a first preset time range. Thus, whether the state of the mobile terminal is in a static state can be conveniently judged.

According to some embodiments, the method further comprises: for each of the at least one authentication operation, determining that the authentication operation failed in response to the authentication operation not being successful within a second predetermined time range, wherein the second predetermined time range is not less than the first predetermined time range when the at least one authentication operation includes a status authentication operation. Therefore, when the true intention of the user is judged not to be voice control, the mobile terminal sensor and the computing resource occupied by the verification operation can be released in time, and the waste of the resource of the mobile terminal is avoided.

For step S203, in response to each of the at least one authentication operations being successful, the voice transmission mode of the application running on the mobile terminal is switched.

Switching the voice transmission mode of the application program running on the mobile terminal may include turning on voice transmission, turning off voice transmission, changing a voice transmission mode, or the like.

According to some embodiments, the application may run in the background of the mobile terminal. Under the condition, the user can directly switch the voice transmission mode of the application program running in the background of the mobile terminal without switching the application program running in the background to the foreground for running, and then the voice transmission mode of the application program is switched, so that the operation of the user is facilitated.

It will be appreciated that when an application is running in the background of the mobile terminal, only a reduced speech frame may be displayed, which does not include a control key for switching the speech transmission mode of the application. The user can browse other object contents of the mobile terminal while carrying out voice call through the application program running in the background. According to the one or more embodiments of the present disclosure, a user may execute a first motion by controlling a mobile terminal to activate an application program running in the background, and after verification of verification operation is successful, control of a voice transmission mode of the application program running in the background is achieved without clicking a reduced voice frame, and an operation interface of the application program is opened to perform control of the voice transmission mode, so that convenience of user operation is effectively improved.

According to some embodiments, the user may set the authority of the application program in the mobile terminal, so that the application program may continuously monitor the first motion information of the mobile terminal in the background running process, so as to determine whether the first motion information meets the preset activation condition.

According to some embodiments, the user may perform the authentication operation on the mobile terminal by setting the authority of the application program in the mobile terminal, so that the application program may further invoke the relevant sensor and computing resource of the mobile terminal in the active state.

According to some embodiments, the application program may display the open permission prompt information on the display page of the mobile terminal after entering the active state, and after the user selects or confirms, the application program may further invoke the relevant sensor and computing resource to perform the verification operation on the mobile terminal.

According to some embodiments, the application may comprise a teleconferencing application. Thus, the user can conveniently control the voice transmission mode of the teleconference in real time.

According to some embodiments, in response to each of the at least one authentication operations verifying success, switching the voice transmission mode of the application comprises: the voice transmission mode of the teleconferencing application is switched from a forbidden language mode to a non-forbidden language mode.

It will be appreciated that switching the voice transmission mode of the application may also correspondingly include switching the voice transmission mode of the teleconferencing application from a non-talk-inhibit mode to a talk-inhibit mode, which is not limited herein.

According to another method of the present disclosure, there is also disclosed a voice control apparatus 500, the apparatus 500 including: an acquiring unit 501 configured to acquire first motion information of a mobile terminal; a verification unit 502 configured to perform at least one verification operation in response to the first motion information satisfying a preset activation condition; and a switching unit 503 configured to switch a voice transmission mode of an application running on the mobile terminal in response to each of the at least one authentication operations being authenticated.

According to some embodiments, the first motion information comprises an acceleration value and the preset activation condition comprises the acceleration value of the mobile terminal being greater than a preset acceleration threshold.

According to some embodiments, the at least one authentication operation comprises a gesture authentication operation, the authentication unit comprising: a first sub-acquisition unit configured to acquire attitude information of a mobile terminal; and a first sub-determination unit that determines that the posture verification operation verification is successful in response to the posture information satisfying the first verification condition.

According to some embodiments, the at least one authentication operation comprises an image authentication operation, the authentication unit comprising: the first sub-control unit is configured to control the mobile terminal to execute image information acquisition; a second sub-acquisition unit configured to acquire image information acquired by the mobile terminal; and a second sub-determination unit configured to determine that the image authentication operation authentication is successful in response to the image information satisfying the second authentication condition.

According to some embodiments, the at least one authentication operation comprises a state authentication operation, the authentication unit comprising: a third sub-acquisition unit configured to acquire second motion information of the mobile terminal within a first preset time range; a third sub-determining unit configured to determine a state of the mobile terminal according to second motion information within a first preset time range based on a preset rule; and a fourth sub-determination unit configured to determine that the state verification operation verification is successful in response to the state of the mobile terminal being a stationary state.

According to another aspect of the present disclosure, there is also disclosed an electronic device, including: a memory, a processor and a computer program stored on the memory, wherein the processor is configured to execute the computer program to implement the steps of the method described above.

According to another aspect of the present disclosure, a non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor performs the steps of the method.

According to another aspect of the present disclosure, a computer program product is also disclosed, comprising a computer program, wherein the computer program, when executed by a processor, implements the steps of the above-described method.

Referring to fig. 6, a block diagram of an electronic device 600 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606, an output unit 607, a storage unit 608, and a communication unit 609. The input unit 606 may be any type of device capable of inputting information to the device 600, the input unit 606 may receive input numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a trackpad, a trackball, a joystick, a microphone, and/or a remote control. The output unit 607 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 608 may include, but is not limited to, magnetic disks, optical disks. The communication unit 609 allows the device 600 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, 1302.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, such as a voice control method. For example, in some embodiments, the voice control method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the voice control method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the voice control method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the foregoing methods, systems, and apparatus are merely exemplary embodiments or examples, and that the scope of the present invention is not limited by these embodiments or examples but only by the claims following the grant and their equivalents. Various elements of the embodiments or examples may be omitted or replaced with equivalent elements thereof. Furthermore, the steps may be performed in a different order than described in the present disclosure. Further, various elements of the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the disclosure.

Claims

1. A method of voice control, the method comprising:

acquiring first motion information of a mobile terminal;

in response to the first motion information satisfying a preset activation condition, performing at least one verification operation including a state verification operation and a first verification operation including at least one of a pose verification operation and an image verification operation, the state verification operation including:

Acquiring second motion information of the mobile terminal in a first preset time range;

based on a preset rule, determining the state of the mobile terminal according to second motion information in the first preset time range; and

responding to the state of the mobile terminal as a static state, and determining that the state verification operation is successful; and

and switching a voice transmission mode of an application program running on the mobile terminal in response to each of the at least one authentication operation being successfully authenticated.

2. The method of claim 1, wherein the first motion information comprises an acceleration value and the preset activation condition comprises the acceleration value of the mobile terminal being greater than a preset acceleration threshold.

3. The method of claim 1, wherein the at least one verification operation comprises a gesture verification operation comprising:

acquiring attitude information of the mobile terminal; and

and determining that the gesture verification operation is successful in verification in response to the gesture information meeting a first verification condition.

4. A method as claimed in claim 3, wherein the gesture information comprises orientation information of the mobile terminal, and the first verification condition comprises a right side up of the mobile terminal.

5. The method of claim 1, wherein the at least one authentication operation comprises an image authentication operation comprising:

controlling the mobile terminal to acquire image information;

acquiring image information acquired by the mobile terminal; and

and determining that the image verification operation is successful in verification in response to the image information meeting a second verification condition.

6. The method of claim 5, wherein the second authentication condition includes that face information is included in the image information.

7. The method of claim 1, wherein the determining the state of the mobile terminal according to the second motion information within the first preset time range based on the preset rule comprises:

and determining that the state of the mobile terminal is a static state in response to the value of the second motion information being smaller than a preset threshold value in the first preset time range.

8. The method of any one of claims 1 to 7, further comprising:

for each of the at least one authentication operation, in response to the authentication operation not being successful within a second predetermined time frame, determining that the authentication operation failed to authenticate,

Wherein the second preset time range is not less than the first preset time range when the at least one verification operation includes a status verification operation.

9. The method of any of claims 1 to 7, wherein the application runs in the background of the mobile terminal.

10. The method of any of claims 1-7, wherein the application comprises a teleconferencing application.

11. The method of claim 10, wherein the switching the voice transmission mode of the application in response to each of the at least one authentication operation verifying success comprises:

and switching the voice transmission mode of the remote conference application program from the forbidden language mode to the non-forbidden language mode.

12. A voice-controlled apparatus, the apparatus comprising:

an acquisition unit configured to acquire first motion information of a mobile terminal;

a verification unit configured to perform at least one verification operation in response to the first motion information satisfying a preset activation condition, wherein the at least one verification operation includes a state verification operation and a first verification operation including at least one of a pose verification operation and an image verification operation, the verification unit including:

A third sub-acquisition unit configured to acquire second motion information of the mobile terminal within a first preset time range;

a third sub-determining unit configured to determine a state of the mobile terminal according to second motion information within the first preset time range based on a preset rule; and

a fourth sub-determination unit configured to determine that the state verification operation verification is successful in response to the state of the mobile terminal being a stationary state; and

and a switching unit configured to switch a voice transmission mode of an application running on the mobile terminal in response to each of the at least one authentication operation being authenticated.

13. The apparatus of claim 12, wherein the first motion information comprises an acceleration value and the preset activation condition comprises the acceleration value of the mobile terminal being greater than a preset acceleration threshold.

14. The apparatus of claim 12, wherein the at least one verification operation comprises a gesture verification operation, the verification unit comprising:

a first sub-acquisition unit configured to acquire attitude information of the mobile terminal; and

and a first sub-determination unit that determines that the gesture verification operation is successful in verification in response to the gesture information satisfying a first verification condition.

15. The apparatus of claim 12, wherein the at least one authentication operation comprises an image authentication operation, the authentication unit comprising:

the first sub-control unit is configured to control the mobile terminal to execute image information acquisition;

a second sub-acquisition unit configured to acquire image information acquired by the mobile terminal; and

and a second sub-determination unit configured to determine that the image authentication operation is successful in response to the image information satisfying a second authentication condition.

16. An electronic device, comprising:

a memory, a processor and a computer program stored on the memory,

wherein the processor is configured to execute the computer program to implement the steps of the method of any one of claims 1-11.

17. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method of any of claims 1-11.