CN116416987A

CN116416987A - Server, terminal equipment, voice awakening method and medium

Info

Publication number: CN116416987A
Application number: CN202310342883.3A
Authority: CN
Inventors: 张立泽; 王建君
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-07-11

Abstract

The disclosure relates to a server, a terminal device, a voice wake-up method and a medium. The server stores a preset instruction weight library, wherein the preset instruction weight library comprises: a plurality of preset instructions and weights corresponding to the preset instructions; comprising the following steps: a first controller configured to: receiving a request instruction sent by a terminal device, and determining a target instruction corresponding to the request instruction from a plurality of preset instructions; if the weight corresponding to the target instruction is greater than or equal to a preset threshold, determining that the target instruction is an instruction for not waking up the voice assistant of the terminal equipment, and sending a wake-up instruction to the terminal equipment, wherein the wake-up instruction is used for indicating to wake up the voice assistant of the terminal equipment. According to the technical scheme, the problem that in the prior art, when voice interaction is carried out between the user and the terminal equipment, the user experience is improved due to the fact that the voice interaction is inconvenient and rapid is solved.

Description

Server, terminal equipment, voice awakening method and medium

Technical Field

The disclosure relates to the technical field of voice processing, and in particular relates to a server, terminal equipment, a voice awakening method and a medium.

Background

With the development of artificial intelligence, it is very common for users to use terminal devices to perform a voice interaction function, in which a voice assistant is generally used to implement a voice interaction function between a user and a terminal device, and specifically, in a voice interaction process between a user and a terminal device, the user needs to wake up the voice assistant of the terminal device according to a wake-up keyword, that is, wake up the terminal device, and then receive and execute a request instruction input by the user after determining to wake up the terminal device, so as to implement the voice interaction function between the user and the terminal device.

However, with the prior art, the terminal device needs to wake up the terminal device by the wake-up keyword before receiving each request instruction input by the user, so that when the voice interaction function between the user and the terminal device is realized, the problem of inconvenience and rapidness exists, and the user experience is affected.

Disclosure of Invention

In order to solve the above technical problems or at least partially solve the above technical problems, the present disclosure provides a server, a terminal device, a voice wake-up method, and a medium, where a target instruction corresponding to a request instruction is determined in a preset instruction weight library stored on the server by a first controller of the server for a request instruction sent by the terminal device, further when it is determined that a weight corresponding to the target instruction is greater than or equal to a preset threshold value, the target instruction is determined to be an instruction for waking up a voice assistant of the terminal device, and a wake-up instruction for waking up the voice assistant of the terminal device is sent to the terminal device, so as to wake up the voice assistant of the terminal device, thereby, when a user inputs a next request instruction in a voice manner, the voice assistant of the terminal device is not required to be woken up according to a wake-up keyword carried by the user, and solving the problem that in the prior art, when voice interaction is performed between the user and the terminal device, the user experience is improved.

In a first aspect, the present disclosure provides a server storing a preset instruction weight library, the preset instruction weight library including: a plurality of preset instructions and weights corresponding to the preset instructions; comprising the following steps:

a first controller configured to:

receiving a request instruction sent by a terminal device, and determining a target instruction corresponding to the request instruction from the plurality of preset instructions;

if the weight corresponding to the target instruction is greater than or equal to a preset threshold, determining that the target instruction is an instruction for waking up a voice assistant of the terminal equipment, and sending a wake-up instruction to the terminal equipment, wherein the wake-up instruction is used for indicating to wake up the voice assistant of the terminal equipment.

As an optional implementation of the embodiments of the present disclosure, the first controller is further configured to:

acquiring preset instructions corresponding to a plurality of history request instructions input by a user respectively;

determining a plurality of preset instructions as a plurality of first preset instruction sets based on the preset time intervals;

deleting target non-voice preset instructions in each first preset instruction set to obtain a plurality of second preset instruction sets, wherein a first preset instruction in each second preset instruction set is acquired according to user voice, and in each first preset instruction set, the ordering of the target non-voice preset instructions is before the first preset instructions;

Deleting a non-voice preset instruction from each of the plurality of second preset instruction sets, and splitting the non-voice preset instruction to obtain a plurality of third preset instruction sets;

determining a target preset instruction set by combining the plurality of third preset instruction sets, wherein the plurality of preset instructions included in the target preset instruction set are acquired according to user voices;

determining the weight of the preset instruction corresponding to each request operation based on the occurrence frequency and the weight factor of the preset instruction corresponding to each request operation in the target preset instruction set so as to obtain the weight corresponding to each preset instruction;

and determining the preset instruction weight library according to the preset instructions and the weight of each preset instruction.

As an optional implementation manner of the embodiment of the disclosure, the first controller is specifically configured to:

aiming at preset instructions corresponding to each request operation in the target preset instruction set, acquiring a first total number of the preset instructions in the target preset instruction set and a second total number of the preset instructions in preset instructions corresponding to a plurality of historical request instructions respectively, and calculating quotient values of the first total number and the second total number to obtain the occurrence frequency of the preset instructions;

Acquiring a first time stamp corresponding to each preset instruction and a second time stamp corresponding to a next preset instruction adjacent to the preset instruction, and determining a weight factor of the preset instruction according to the first time stamp, the second time stamp, a preset mapping table and a first total number;

and performing product operation on the occurrence frequency and the weight factor to obtain a weight corresponding to the preset instruction.

As an optional implementation manner of the embodiment of the disclosure, the first controller is specifically further configured to:

acquiring a plurality of history request instructions input by a user according to preset conditions;

determining a preset instruction corresponding to each history request instruction;

wherein, the preset conditions include: at least one of a preset scene and a preset time period.

determining a first instruction corresponding to a request instruction sent by terminal equipment;

according to the first instruction and the user identification information corresponding to the first instruction, matching a target instruction corresponding to the request instruction in a plurality of preset instructions, and determining a weight corresponding to the target instruction; or (b)

And matching a target instruction corresponding to the request instruction in a plurality of preset instructions according to the first instruction, the user identification information corresponding to the first instruction and the preset information corresponding to the first instruction, and determining a weight corresponding to the target instruction, wherein the preset information comprises at least one of a preset time period and a preset scene.

if the weight corresponding to the target instruction is smaller than the preset threshold, determining that the target instruction is not an instruction for waking up a voice assistant, and sending a closing instruction to the terminal equipment, wherein the closing instruction is used for indicating to close a preset program of the terminal equipment.

In a second aspect, the present disclosure provides a terminal device, including:

a second controller configured to:

the method comprises the steps of responding to a wake-up request input by a user, and waking up a voice assistant of the terminal equipment, wherein the wake-up request carries a wake-up keyword, and the wake-up keyword is used for waking up the voice assistant of the terminal equipment;

after the voice assistant of the terminal equipment is awakened, responding to a request instruction input by a user, and transmitting the request instruction to a server;

And waking up the voice assistant of the terminal equipment in response to a wake-up instruction sent by the server.

In a third aspect, the present disclosure provides a voice wake-up method, applied to a server, where the server stores a preset instruction weight library, where the preset instruction weight library includes: a plurality of preset instructions and weights corresponding to the preset instructions; comprising the following steps:

In a fourth aspect, the present disclosure provides a voice wake-up method, applied to a terminal device, including:

In a fifth aspect, the present disclosure provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the voice wake method of the third and fourth aspects.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:

in the technical scheme provided by the embodiment of the disclosure, a preset instruction weight library is stored on a server, and the preset instruction weight library comprises: the method comprises the steps that a first controller of a server receives a request instruction sent by a terminal device, and a target instruction corresponding to the request instruction is determined from the preset instructions; if the weight corresponding to the target instruction is greater than or equal to a preset threshold, determining that the target instruction is an instruction for waking up a voice assistant of the terminal equipment, and sending a wake-up instruction to the terminal equipment, wherein the wake-up instruction is used for indicating to wake up the voice assistant of the terminal equipment. According to the technical scheme, the first controller of the server determines the target instruction corresponding to the request instruction in the preset instruction weight library stored on the server aiming at the request instruction sent by the terminal equipment, further determines the target instruction to be an instruction for waking up the voice assistant of the terminal equipment when the weight corresponding to the target instruction is determined to be greater than or equal to the preset threshold value, and sends the wake-up instruction for waking up the voice assistant of the terminal equipment to the terminal equipment, so that the voice assistant of the terminal equipment is woken up, and when a user inputs the next request instruction in a voice mode, the voice assistant of the terminal equipment is not required to be woken up according to the wake-up keyword carried by the wake-up request, the problem that in the prior art, when voice interaction is carried out between the user and the terminal equipment, inconvenience and rapidness exist is solved, and user experience is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

Fig. 1 is a schematic view of a scenario architecture of a voice wake-up method according to an embodiment of the present disclosure;

fig. 2 is a hardware configuration block diagram of a terminal device 200 according to one or more embodiments of the present disclosure;

fig. 3 is a schematic diagram of a software configuration in a terminal device 200 according to one or more embodiments of the present disclosure;

FIG. 4 is a system framework diagram for voice wakeup in accordance with one or more embodiments of the present disclosure;

fig. 5 is a flow chart of a voice wake-up method according to an embodiment of the disclosure;

fig. 6 is an interaction schematic diagram of a voice wake-up method according to an embodiment of the disclosure;

fig. 7 is a flowchart of another voice wake-up method according to an embodiment of the disclosure;

Fig. 8 is a flowchart of another voice wake-up method according to an embodiment of the disclosure;

fig. 9 is a flowchart of another voice wake-up method according to an embodiment of the disclosure;

fig. 10 is an interactive schematic diagram of another voice wake method provided in an embodiment of the disclosure;

FIG. 11 is a flowchart illustrating a method for waking up speech according to another embodiment of the present disclosure;

fig. 12 is an interactive schematic diagram of yet another voice wake-up method according to an embodiment of the disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.

The terms "first" and "second" and the like in this disclosure are used to distinguish between different objects and are not used to describe a particular order of objects. For example, the first processing result and the second processing result, etc., are used to distinguish between different processing results, not to describe a particular order of processing results.

At present, a user realizes a voice interaction function with terminal equipment such as a mobile phone and an intelligent television through a voice assistant, specifically, when a voice interaction process is carried out between the user and the terminal equipment, the voice assistant of the terminal equipment is firstly required to be awakened according to an awakening keyword, so that the terminal equipment is awakened, and after the terminal equipment is awakened, the terminal equipment receives and executes a request instruction input by the user, so that the voice interaction function between the user and the terminal equipment is realized.

Exemplary, fig. 1 is a schematic view of a scenario architecture of a voice wake method provided by an embodiment of the present disclosure, where the scenario architecture provided by the embodiment of the present disclosure includes: the server 100 and the terminal device 200, the terminal device 200 may have various implementation forms, for example, may be a smart speaker, a television, a mobile phone, a personal computer, a smart television, a display, an electronic whiteboard, an electronic desktop, and the like. The user wakes up the terminal device 200 according to a wake-up keyword such as "XX wizard", and after receiving the wake-up keyword "XX wizard", the voice assistant of the terminal device 200 replies to the user with "in, master", thereby determining that the voice assistant of the terminal device 200 is waken up at this time, and the terminal device 200 is also in a wake-up state at this time, and the terminal device 200 receives and executes a request instruction of voice input of the user such as "i want to listen to music" to play music to the user. When the user needs to turn up the sound of music while listening to the music, the user still needs to wake up the voice assistant of the terminal device 200 by waking up the keyword "XX wizard", and the user inputs a request instruction "help me turn up the sound" by voice.

However, before receiving each request instruction input by the user, the terminal device needs to be awakened by the user through the awakening keyword, so that when the user and the terminal device perform voice interaction, the problem of inconvenience and rapidness exists, and the user experience is affected.

In order to solve the above-mentioned problems, an embodiment of the present disclosure provides a voice wake-up method, which is applied to a server, where the server stores a preset instruction weight library, and the preset instruction weight library includes: a plurality of preset instructions and weights corresponding to the preset instructions; a first controller of the server receives a request instruction sent by a terminal device, and determines a target instruction corresponding to the request instruction from a plurality of preset instructions; if the weight corresponding to the target instruction is greater than or equal to a preset threshold, determining that the target instruction is an instruction for waking up a voice assistant of the terminal equipment, and sending a wake-up instruction to the terminal equipment, wherein the wake-up instruction is used for indicating to wake up the voice assistant of the terminal equipment. According to the technical scheme, the first controller of the server determines the target instruction corresponding to the request instruction in the preset instruction weight library stored on the server aiming at the request instruction sent by the terminal equipment, further determines the target instruction to be an instruction for waking up the voice assistant of the terminal equipment when the weight corresponding to the target instruction is determined to be greater than or equal to the preset threshold value, and sends the wake-up instruction for waking up the voice assistant of the terminal equipment to the terminal equipment, so that the voice assistant of the terminal equipment is woken up, and when a user inputs the next request instruction in a voice mode, the voice assistant of the terminal equipment is not required to be woken up according to the wake-up keyword carried by the wake-up request, the problem that in the prior art, when voice interaction is carried out between the user and the terminal equipment, inconvenience and rapidness exist is solved, and user experience is improved.

In some embodiments, the terminal device 200 may be in data communication with the server 100 upon receiving a voice command from a user. The terminal device 200 may be allowed to make a communication connection with the server 100 through a Local Area Network (LAN), a Wireless Local Area Network (WLAN).

The server 100 may be a server providing various services, such as a server providing support for audio data collected by the terminal device 200. The server may perform analysis and other processing on the received data such as audio, and feed back the processing result (e.g., endpoint information) to the terminal device. The server 100 may be a server cluster, or may be a plurality of server clusters, and may include one or more types of servers.

The voice wake-up method provided by the embodiment of the present disclosure may be executed by the server 100, may be executed by the terminal device 200, or may be executed by both the server 100 and the terminal device 200, which is not limited in this disclosure.

In some embodiments, the terminal device 200 may also be controlled by a control device. The control device may be a remote controller, and the communication between the remote controller and the terminal device 200 may include infrared protocol communication, bluetooth protocol communication, wireless or other wired mode, etc., by which the terminal device 200 can be controlled. The user can control the terminal device 200 by inputting user instructions through keys on a remote controller, voice input, control panel input, and the like. Such as: the user can input corresponding control instructions through volume up-down keys, menu keys, on-off keys and the like on the remote controller to realize the function of controlling the terminal device 200.

Fig. 2 is a hardware configuration block diagram of a terminal device 200 according to one or more embodiments of the present disclosure. The terminal apparatus 200 as shown in fig. 2 includes at least one of a modem 210, a communicator 220, a detector 230, an external device interface 240, a second controller 250, a display 260, an audio output interface 270, a memory, a power supply, and a user interface (i.e., a user input interface) 280. The second controller 250 includes a central processor, a video processor, an audio processor, a graphic processor, a RAM, a ROM, and first to nth interfaces for input/output. The display 260 may be at least one of a liquid crystal display, an OLED display, a touch display, and a projection display, and may also be a projection device and a projection screen. The modem 210 receives broadcast television signals through a wired or wireless reception manner, and demodulates audio and video signals, such as EPG data signals, from a plurality of wireless or wired broadcast television signals. The communicator 220 is a component for communicating with external devices or servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi module, a bluetooth module, a wired ethernet module, or other network communication protocol chip or a near field communication protocol chip, and an infrared receiver. The terminal device 200 may establish transmission and reception of control signals and data signals with an external control device or the server 100 through the communicator 220. The detector 230 is used to collect signals of the external environment or interaction with the outside. The second controller 250 and the modem 210 may be located in different separate devices, i.e., the modem 210 may also be located in an external device of the main device where the second controller 250 is located, such as an external set-top box. The user interface 280 may be used to receive control signals for a control device, such as an infrared remote control or the like.

In some embodiments, the second controller 250 controls the operation of the terminal device and responds to the user's operations through various software control programs stored on the memory. The second controller 250 controls the overall operation of the terminal device 200. The user may input a user command through a Graphical User Interface (GUI) displayed on the display 260, and the user input interface receives the user input command through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through the sensor to receive the user input command.

In some embodiments, a "user interface" is a media interface for interaction and exchange of information between an application or operating system and a user that enables conversion between an internal form of information and a form acceptable to the user. A commonly used presentation form of the user interface is a graphical user interface (Graphic User Interface, GUI for short), which refers to a user interface related to computer operations that is displayed in a graphical manner. It may be an interface element such as an icon, a window, and a control displayed in a display screen of the electronic device, where the control may include at least one of a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, and the like.

Fig. 3 is a schematic view of software configuration in a terminal device 200 according to one or more embodiments of the present disclosure, as shown in fig. 3, the system is divided into four layers, namely, an application layer (application layer), an application framework layer (Application Framework layer), an Android run layer and a system library layer (system runtime layer), and a kernel layer from top to bottom.

In some embodiments, at least one application program is running in the application program layer, and these application programs may be a Window (Window) program of an operating system, a system setting program, a clock program, or the like; or may be an application developed by a third party developer. In particular implementations, applications in the application layer include, but are not limited to, the examples above.

In some embodiments, the system runtime layer provides support for the upper layer, the framework layer, and when the framework layer is in use, the android operating system runs the C/C++ libraries contained in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software, containing at least one of the following drivers: audio drive, display drive, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (e.g., fingerprint sensor, temperature sensor, pressure sensor, etc.), and power supply drive, etc.

In some embodiments, a preset instruction weight library is stored in the server 100, where the preset instruction weight library includes: a plurality of preset instructions and weights corresponding to the preset instructions; the server 100:

a first controller configured to:

In some embodiments, the first controller is further configured to:

In some embodiments, the first controller is specifically configured to:

In some embodiments, the first controller is specifically further configured to:

In some embodiments, the first controller is specifically configured to:

In some embodiments, the first controller is further configured to:

In summary, according to the present disclosure, by executing the above voice wake-up method on a server, the server stores a preset instruction weight library, where the preset instruction weight library includes: the method comprises the steps of receiving a request instruction sent by a terminal device through a first controller of a server, determining a target instruction corresponding to the request instruction from the plurality of preset instructions, wherein the weight corresponds to each preset instruction; if the weight corresponding to the target instruction is greater than or equal to a preset threshold, determining that the target instruction is an instruction for waking up a voice assistant of the terminal equipment, and sending a wake-up instruction to the terminal equipment, wherein the wake-up instruction is used for indicating to wake up the voice assistant of the terminal equipment. According to the technical scheme, the target instruction corresponding to the request instruction is determined in the preset instruction weight library stored on the server through the first controller of the server aiming at the request instruction sent by the terminal equipment, further when the weight corresponding to the target instruction is determined to be larger than or equal to the preset threshold value, the target instruction is determined to be the instruction for avoiding waking up the voice assistant of the terminal equipment, and the wake-up instruction for waking up the voice assistant of the terminal equipment is sent to the terminal equipment, so that the voice assistant of the terminal equipment is waken up, when a user inputs the next request instruction in a voice mode, the voice assistant of the terminal equipment is not required to be waken up according to the wake-up keyword carried by the wake-up request, the problem that in the prior art, when voice interaction is carried out between the user and the terminal equipment, inconvenience and rapidness exist is solved, and user experience is improved.

Fig. 4 is a system framework diagram of voice wakeup according to one or more embodiments of the present disclosure, as shown in fig. 4, where a target instruction determination module 401 and a wakeup module 402 may be included in the system. First, the target instruction determining module 401 is configured to receive a request instruction sent by a terminal device, and determine a target instruction corresponding to the request instruction from a plurality of preset instructions; further, the wake-up module 402 is configured to determine that the target instruction is an instruction for waking up a voice assistant of the terminal device, and send a wake-up instruction to the terminal device, where the wake-up instruction is used to instruct waking up the voice assistant of the terminal device, if the weight corresponding to the target instruction is greater than or equal to a preset threshold. According to the technical scheme, the target instruction corresponding to the request instruction is determined in the preset instruction weight library stored on the server through the first controller of the server aiming at the request instruction sent by the terminal equipment, further when the weight corresponding to the target instruction is determined to be larger than or equal to the preset threshold value, the target instruction is determined to be the instruction for avoiding waking up the voice assistant of the terminal equipment, and the wake-up instruction for waking up the voice assistant of the terminal equipment is sent to the terminal equipment, so that the voice assistant of the terminal equipment is waken up, when a user inputs the next request instruction in a voice mode, the voice assistant of the terminal equipment is not required to be waken up according to the wake-up keyword carried by the wake-up request, the problem that in the prior art, when voice interaction is carried out between the user and the terminal equipment, inconvenience and rapidness exist is solved, and user experience is improved.

For a more detailed description of the present solution, the following description will be given by way of example with reference to fig. 5, and it will be understood that the steps involved in fig. 5 may include more steps or fewer steps when actually implemented, and the order between these steps may also be different, so as to enable the voice wake-up method provided in the embodiments of the present disclosure, which are not limited.

Fig. 5 is a flowchart of a voice wake-up method according to an embodiment of the present disclosure. Fig. 6 is an interaction schematic diagram of a voice wake-up method provided by an embodiment of the present disclosure, where the embodiment is applied to a server side, and a preset instruction weight library is stored in the server side, where the preset instruction weight library includes: the system comprises a plurality of preset instructions and weights corresponding to the preset instructions. As shown in fig. 5, the voice wake-up method specifically includes the following steps:

s51, receiving a request instruction sent by the terminal equipment, and determining a target instruction corresponding to the request instruction from a plurality of preset instructions.

The request instruction refers to a request instruction input by a user in a voice mode when the user uses the terminal equipment, and the request instruction is sent to the server after the terminal equipment receives the request instruction. By way of example, the request instruction may be "i want to listen to music", "turn up play sound", etc., but is not limited thereto, and the present disclosure is not particularly limited.

The plurality of preset instructions are instructions stored in a preset instruction weight library, each preset instruction corresponds to a weight in the preset instruction weight library, and the weights are determined according to a mode that a user inputs a request instruction when using the terminal device. Further, the preset instruction weight library further includes: the user identification information includes, but is not limited to, a user voiceprint identification, a preset scene such as a video search scene and a song playing scene, and a preset time period such as 7 pm to 10 pm of a time period when the user frequently uses the terminal device, and the location information includes, but is not limited to, XX province XX region, the disclosure is not particularly limited, and a person skilled in the art can set according to practical situations.

Specifically, after receiving a request instruction input by a user, the terminal device sends the request instruction to the server, and after receiving the request instruction sent by the terminal device, the first controller of the server determines a target instruction corresponding to the request instruction from a plurality of preset instructions included in a preset instruction weight library stored in the server in advance.

Fig. 7 is a schematic flow chart of another voice wake-up method provided by the embodiment of the present disclosure, and fig. 7 is a schematic flow chart of another voice wake-up method based on the embodiment shown in fig. 5, further, as shown in fig. 7, one possible implementation of S51 may be:

S71, determining a first instruction corresponding to a request instruction aiming at the request instruction sent by the terminal equipment.

Specifically, after receiving a request instruction sent by a terminal device, a first controller of the server identifies the request instruction and determines a first instruction corresponding to the request instruction.

S72a, matching target instructions corresponding to the request instructions in a plurality of preset instructions according to the first instructions and the user identification information corresponding to the first instructions, and determining weights corresponding to the target instructions.

The user identification information is unique identification information for identifying the user, for example, may be a user voiceprint identification 1, and the mode of obtaining the user identification information may be that when a request instruction is sent to the server through the terminal device, the request instruction carries the user identification information, so as to obtain the user identification information, but the present disclosure is not limited thereto, and a person skilled in the art may set according to actual situations.

Specifically, the first controller of the server matches among a plurality of preset instructions according to a first instruction corresponding to the request instruction and user identification information corresponding to the first instruction, determines a preset instruction consistent with the first instruction and the user identification information corresponding to the first instruction as a target instruction corresponding to the request instruction, and obtains a weight corresponding to the target instruction.

For example, in the foregoing embodiment, for the request instruction "i want to listen to music" of the user, the user identification information is the user voiceprint identifier 1, the first instruction corresponding to the request instruction "i want to listen to music" is identified as "music search" according to the identification module, after the first instruction is determined as "music search", matching is performed among a plurality of preset instructions according to the first instruction "music search" and the user voiceprint identifier 1, the preset instruction consistent with the first instruction and the user voiceprint identifier 1 is determined as the target instruction, and the weight corresponding to the target instruction is obtained as 0.6, which is not limited to this, and the present disclosure is not particularly limited, and those skilled in the art can set according to practical situations.

Optionally, in some embodiments of the present disclosure, because the behavior habit of the user interacting with the terminal device is different for different preset scenes and different preset time periods when the user interacts with the terminal device, and because the preset instruction weight database is obtained based on the behavior data of the user interacting with the terminal device, another manner of determining, from the plurality of preset instructions, the target instruction corresponding to the request instruction may be further as follows, in order to more accurately determine whether the request instruction of the user is an instruction of the voice assistant of the wake-free terminal device, as shown in fig. 7, by continuing to refer to fig. 7:

S72b, matching a target instruction corresponding to the request instruction in a plurality of preset instructions according to the first instruction, the user identification information corresponding to the first instruction and the preset information corresponding to the first instruction, and determining a weight corresponding to the target instruction.

The preset information comprises at least one of a preset time period and a preset scene.

Specifically, the first controller of the server matches among a plurality of preset instructions according to a first instruction corresponding to the request instruction, user identification information corresponding to the first instruction, and preset information corresponding to the first instruction, such as a preset time period or a preset scene, determines a target instruction corresponding to the request instruction, and obtains a weight corresponding to the target instruction.

In the technical scheme provided by the embodiment of the disclosure, in the process, the target instruction is determined in the plurality of preset instructions according to the first instruction and the user identification information corresponding to the first instruction or the first instruction, the user identification information corresponding to the first instruction and the preset information, so that the accuracy of acquiring the target instruction is improved, the user is prevented from repeatedly waking up the voice assistant of the terminal equipment according to the wake-up keyword, and the user experience is improved.

S52, if the weight corresponding to the target instruction is greater than or equal to a preset threshold, determining that the target instruction is an instruction of a voice assistant of the terminal equipment without waking up, and sending a waking-up instruction to the terminal equipment.

The wake-up instruction is used for indicating to wake up the voice assistant of the terminal equipment. The preset threshold is a parameter used to determine whether the request instruction sent by the terminal device and received by the server is set by an instruction of a voice assistant that does not wake up the terminal device, and the preset threshold may be, for example, 0.5, but is not limited thereto, and the present disclosure is not particularly limited thereto, and may be set by a person skilled in the art according to practical situations.

The instruction of the voice assistant of the wake-free terminal device specifically means that after the user inputs a request instruction on the terminal device, if the user still needs to wake up the voice assistant of the terminal device, and inputs the next request instruction in a voice manner, it is determined that the current request instruction input by the user is the instruction of the voice assistant of the wake-free terminal device, so that in order to avoid that the user continues to wake up the voice assistant of the terminal device through the wake-up keyword, and then inputs the next request instruction, the server sends a wake-up instruction to the terminal device to wake up the voice assistant of the terminal device.

For example, in the above embodiment, after the user wakes up the voice assistant of the terminal device according to the wake-up keyword "XX wizard", after the user inputs the request command "i want to listen to music" by means of voice, the user still needs to wake up the voice assistant of the terminal device continuously, and inputs the next request command "turn up the play sound" by means of voice, where it is determined that the request command "i want to listen to music" is a command for not waking up the voice assistant of the terminal device, but the disclosure is not limited thereto, and those skilled in the art can set up according to practical situations.

Specifically, the first controller of the server judges whether the weight corresponding to the target instruction is greater than or equal to a preset threshold, and when the weight corresponding to the target instruction is determined to be greater than or equal to the preset threshold, the first controller of the server determines that the target instruction is an instruction for avoiding waking up the voice assistant of the terminal device, and sends a wake-up instruction for waking up the voice assistant of the terminal device to the terminal device so as to wake up the voice assistant of the terminal device.

Fig. 8 is a flowchart of another voice wake-up method according to an embodiment of the present disclosure, and fig. 8 is a flowchart of a voice wake-up method according to an embodiment of fig. 7, further, as shown in fig. 8, before executing S51, further including:

s81, acquiring preset instructions corresponding to a plurality of history request instructions input by a user respectively.

The plurality of history request instructions may be input through a user's voice, may be input through a key on a control device such as a remote controller, or may be input through a mode of touching a virtual key on a display screen, but are not limited thereto, the disclosure is not particularly limited, and those skilled in the art may set according to actual situations.

The preset command corresponds to a history request command input by a user, and is exemplified by, for example, taking the above embodiment, where the history request command is "i want to listen to music", the corresponding preset command is "music search", or the history request command is "amplify sound" if the current music is played, "but the present disclosure is not limited thereto, and those skilled in the art can set the preset command according to practical situations.

Specifically, a first controller of the server acquires a preset instruction corresponding to each history request instruction in a plurality of history request instructions input by a user, so as to obtain a plurality of preset instructions.

Optionally, based on the foregoing embodiments, in some embodiments of the present disclosure, because habits of a user inputting a request instruction in different preset scenes, such as a video search scene and a music playing scene, in different preset time periods, based on this, in order to improve accuracy of obtaining a preset instruction weight library according to a request instruction input by the user, an implementation manner of S81 may be:

s811, acquiring a plurality of history request instructions input by a user according to preset conditions.

The preset conditions comprise: at least one of a preset scene and a preset time period.

S812, for each history request instruction, determining a preset instruction corresponding to the history request instruction.

Specifically, the first controller of the server obtains a plurality of history request instructions input by a user according to preset conditions such as a preset scene, a preset time period, or a preset scene and a preset time period, and identifies each history request instruction to determine a preset instruction corresponding to each history request instruction.

The above-mentioned identifying and determining the corresponding preset instruction for each history request instruction may be identifying through a trained identification model, and the specific identification process is not repeated in this disclosure with reference to the prior art.

In the technical scheme provided by the embodiment of the disclosure, in the above-mentioned process, since the history request instruction is acquired according to the preset conditions such as the preset scene and/or the preset time period, and the preset instruction corresponding to the history request instruction is determined, the operation habit of the user for inputting the request instruction in different preset scenes and/or different preset time periods can be considered, so that the accuracy of respectively corresponding weights of a plurality of preset instructions in the preset weight library obtained later is improved.

S82, determining a plurality of preset instructions as a plurality of first preset instruction sets based on the preset time interval.

The preset time interval is a parameter set for grouping a plurality of preset instructions, and the plurality of preset instructions are grouped according to the preset time interval to obtain a plurality of first preset instruction sets, where the preset instructions included in each first preset instruction set are continuous instructions, and the preset time interval may be, for example, 60 seconds or 120 seconds, but is not limited thereto, and the present disclosure is not particularly limited thereto, and a person skilled in the art may set according to actual situations.

For example, the preset time interval is 60 seconds, and the plurality of preset instructions are grouped to obtain a plurality of first preset instruction sets, for example: first preset instruction set 1, first preset instruction set 2, first preset instruction set 3, first preset instruction set 4. The preset instructions 1, 2 and 3 are considered to be continuous instructions, that is, after the preset instruction 1 is input by the user, the preset instruction 2 is input in a short time, and after the preset instruction 2 is input, the preset instruction 3 is input, but not limited thereto, the disclosure is not particularly limited, and those skilled in the art can set according to practical situations.

It should be noted that, for each first preset instruction set, user identification information corresponding to each preset instruction and a timestamp when the user inputs the request instruction are also included. For example, for the preset command, for example, it may be: { user identification information: user voiceprint identification 1, preset command: music search, acquisition mode: user speech, timestamp: 1664767295189, but is not limited thereto, and the present disclosure is not particularly limited thereto, and may be set by those skilled in the art according to actual circumstances.

S83, deleting the target non-voice preset instructions in each first preset instruction set to obtain a plurality of second preset instruction sets.

The first preset instructions in each second preset instruction set are acquired according to the voice of the user, and in each first preset instruction set, the ordering of the target non-voice preset instructions is before the first preset instructions.

Specifically, after a first controller of the server performs grouping processing on preset instructions according to a preset time interval to obtain a plurality of first preset instruction sets, deleting a target non-voice preset instruction before a first preset instruction obtained by a user voice mode in each first preset instruction set, so as to obtain a plurality of second preset instruction sets, wherein the first preset instruction in each second preset instruction set is obtained according to the user voice.

S84, deleting the non-voice preset instructions from each of the second preset instruction sets to obtain a plurality of third preset instruction sets.

S85, combining a plurality of third preset instruction sets to determine a target preset instruction set.

The target preset instruction set comprises a plurality of preset instructions which are acquired according to the voice of the user.

Specifically, since the voice assistant awakening the terminal device wakes up according to the awakening keyword in a voice manner of the user, the first controller of the server deletes the non-voice preset instructions in each second preset instruction set to obtain a plurality of third preset instruction sets, and combines the plurality of third preset instruction sets to obtain a target preset instruction set, wherein the plurality of preset instructions included in the target preset instruction set are all obtained in a voice manner of the user.

S86, determining the weight of the preset instruction corresponding to each request operation based on the occurrence frequency and the weight factor of the preset instruction corresponding to each request operation in the target preset instruction set, so as to obtain the weight corresponding to each preset instruction.

Alternatively, based on the above embodiments, in some embodiments of the disclosure, one implementation of S86 may be:

S861, aiming at preset instructions corresponding to each request operation in a target preset instruction set, acquiring a first total number of the preset instructions in the target preset instruction set and a second total number of the preset instructions in preset instructions corresponding to a plurality of historical request instructions respectively, and calculating quotient values of the first total number and the second total number to obtain the occurrence frequency of the preset instructions.

The request operation refers to an operation executed according to a preset instruction, for example, the preset instruction is a "music search", and the request operation is an operation of music search, and for a plurality of preset instructions included in the target preset instruction set, there may be a plurality of preset instructions for each request operation, and for an example, for a plurality of preset instructions included in the target preset instruction set, the preset instructions include: preset instruction 1, preset instruction 2, preset instruction 3, preset instruction 1, preset instruction 4, preset instruction 2, preset instruction 5, preset instruction 6.

Specifically, a first controller of the server counts a first total number of preset instructions in a target preset instruction set and a second total number of preset instructions corresponding to a plurality of history request instructions respectively according to preset instructions corresponding to each request operation in the target preset instruction set, and calculates quotient values of the first total number and the second total number, so that occurrence frequency of the preset instructions corresponding to each request operation is determined.

For example, in the above embodiment, if the first total number of the preset instructions 1 in the target preset instruction set is N1 and the second total number of the preset instructions corresponding to the plurality of history request instructions is N2, the occurrence frequency of the preset instructions 1 is determined to be t1=n1/N2, but the present disclosure is not limited thereto, and those skilled in the art may set according to practical situations.

S862, a first time stamp corresponding to each preset instruction and a second time stamp corresponding to the next preset instruction adjacent to the preset instruction are obtained, and a weight factor of the preset instruction is determined according to the first time stamp, the second time stamp, the preset mapping table and the first total number.

The preset mapping table is used for determining an initial weight corresponding to each preset instruction, and comprises a plurality of preset time difference ranges, and the preset weight corresponding to each preset time difference range. Exemplary, as shown in table 1 below:

TABLE 1

Preset time difference range (seconds)	Preset weight
		[0,3]	1
(3,5]	0.9
		(5,10]	0.6
(10,15]	0.15
		(15,20]	0.1
(20,]	0.01

Specifically, a first controller of the server acquires a first timestamp corresponding to each preset instruction, acquires a second timestamp corresponding to a next preset instruction adjacent to each preset instruction, and determines a weight factor of the preset instruction according to the first timestamp, the second timestamp, a preset mapping table and a first total number of the preset instructions in a target preset instruction set after the first timestamp and the second timestamp are acquired.

Optionally, based on the foregoing embodiments, in some embodiments of the present disclosure, an implementation manner of determining the weight factor of the preset instruction according to the first timestamp, the second timestamp, the preset mapping table, and the first total number may be: firstly, calculating a time difference value between a second time stamp and a first time stamp, after the time difference value is obtained, determining a preset time difference range to which the time difference value belongs in a preset mapping table to determine an initial weight corresponding to the preset instruction, further calculating a quotient value of the initial weight and a first total number, and finally summing the initial weights corresponding to a plurality of preset instructions existing in each request operation to obtain a weight factor of the preset instruction corresponding to each request operation.

S863, performing product operation on the occurrence frequency and the weight factor to obtain a weight corresponding to the preset instruction.

Specifically, the first controller of the server performs product operation on the occurrence frequency corresponding to the preset instruction and the weight factor, so as to obtain the weight corresponding to the preset instruction.

S87, determining a preset instruction weight library according to the preset instructions and the weight of each preset instruction.

Specifically, the first controller of the server stores a plurality of preset instructions and weights corresponding to each preset instruction in the plurality of preset instructions into a preset instruction weight library, so as to obtain the preset instruction weight library.

In the technical scheme provided by the embodiment of the disclosure, in the above process, the preset weight library obtained by calculation according to the user history request instruction can consider the operation habit of the user input request instruction, so that the accuracy of the weights corresponding to the preset instructions in the preset weight library can be improved, further, whether the request instruction received by the server side is an instruction of a voice assistant free of waking up the terminal equipment can be more accurately determined based on the weights, so as to determine whether the wake-up instruction of the voice assistant of the terminal equipment needs to be sent to the terminal equipment, and the voice assistant of the terminal equipment is woken up, thereby solving the problem that in the prior art, when the user performs voice interaction with the terminal equipment, the user experience is improved.

Fig. 9 is a schematic flow chart of another voice wake-up method provided by an embodiment of the present disclosure, fig. 10 is an interactive schematic flow chart of another voice wake-up method provided by an embodiment of the present disclosure, and fig. 9 is a schematic flow chart of another voice wake-up method based on the embodiment shown in fig. 5, further including, as shown in fig. 10:

s91, if the weight corresponding to the target instruction is smaller than the preset threshold, determining that the target instruction is not the instruction of the wake-up-free voice assistant, and sending a closing instruction to the terminal equipment.

The closing instruction is used to instruct to close the preset program of the terminal device, and by way of example, the closing instruction is used to instruct to close the preset program of the terminal device, such as a built-in microphone for receiving user voice, or may also be a screen, but is not limited thereto, the disclosure is not specifically set, and those skilled in the art may set according to actual situations.

Specifically, when the first controller of the server determines that the weight corresponding to the target instruction is smaller than the preset threshold, the first controller determines that the target instruction is not an instruction for waking up the voice assistant, and after determining that the target instruction is not an instruction for waking up the voice assistant, the first controller sends a closing instruction to the terminal device to close the preset program of the terminal device.

In the technical scheme provided by the embodiment of the disclosure, in the above process, when the target instruction is determined not to be the instruction of the wake-up-free voice assistant according to the weight corresponding to the target instruction, a closing instruction is sent to the terminal equipment to close the terminal equipment, so that the resources of the terminal equipment can be timely saved.

Fig. 11 is a flowchart of another voice wake-up method according to an embodiment of the disclosure. Fig. 12 is an interaction schematic diagram of another voice wake-up method provided in an embodiment of the present disclosure, where the embodiment is applied to a terminal device side. As shown in fig. 11, the method specifically includes the following steps:

S111, in response to a wake-up request input by a user, waking up a voice assistant of the terminal device.

The wake-up request carries a wake-up keyword, which is used to wake up the voice assistant of the terminal device, and the wake-up keyword may be, for example, an "XX puck", but is not limited thereto, and the disclosure is not particularly limited thereto, and those skilled in the art may set the wake-up keyword according to practical situations.

S112, after the voice assistant of the terminal equipment is awakened, responding to a request instruction input by a user, and sending the request instruction to the server.

S113, in response to the wake-up instruction sent by the server, waking up the voice assistant of the terminal equipment.

Specifically, the second controller of the terminal device responds to a wake-up request input by the user, wherein the wake-up request carries a wake-up keyword, and wakes up the voice assistant of the terminal device according to the wake-up keyword. After the voice assistant of the terminal equipment is determined to be awakened, receiving a request instruction input by a user, and sending the request instruction to the server, so that the server determines whether the request instruction is an instruction of the voice assistant of the terminal equipment without awakening, and when the server determines that the request instruction is an instruction of the voice assistant of the terminal equipment without awakening, sending an awakening instruction to the terminal equipment, responding to the awakening instruction sent by the server, and awakening the voice assistant of the terminal equipment.

In the technical scheme provided by the embodiment of the disclosure, a second controller of the terminal equipment responds to a wake-up request input by a user, and wakes up a voice assistant of the terminal equipment, wherein the wake-up request carries a wake-up keyword, the wake-up keyword is used for waking up the voice assistant of the terminal equipment, and after the voice assistant of the terminal equipment is determined to be waken up, the request instruction is sent to a server in response to a request instruction input by the user; and in response to the wake-up instruction sent by the server, waking up the voice assistant of the terminal equipment. According to the technical scheme, after the second controller of the terminal equipment wakes up the voice assistant of the terminal equipment according to the wake-up keyword carried by the wake-up request, the second controller receives the request instruction input by the user and sends the request instruction to the server, so that the server determines a target instruction corresponding to the request instruction in a stored preset instruction weight library aiming at the request instruction sent by the terminal equipment, determines that the target instruction is an instruction for waking up the voice assistant of the terminal equipment when the weight corresponding to the target instruction is determined to be greater than or equal to a preset threshold value, sends the wake-up instruction for waking up the voice assistant of the terminal equipment to the terminal equipment, and wakes up the voice assistant of the terminal equipment so as to wake up the voice assistant of the terminal equipment when a user inputs the next request instruction in a voice mode, the problem that in the prior art, when voice interaction is performed between the user and the terminal equipment, the voice assistant of the terminal equipment is not convenient and quick enough is solved, and user experience is improved.

The embodiments of the present disclosure provide a computer readable storage medium, on which a computer program is stored, where the computer program when executed by a processor implements each process executed by the above-mentioned voice wake-up method, and the same technical effects can be achieved, and for avoiding repetition, a detailed description is omitted herein.

The computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or the like.

The present disclosure provides a computer program product comprising: the computer program product, when run on a computer, causes the computer to implement the voice wake method described above.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the above discussion in some examples is not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. The server is characterized in that the server stores a preset instruction weight library, and the preset instruction weight library comprises: a plurality of preset instructions and weights corresponding to the preset instructions; comprising the following steps:

a first controller configured to:

2. The server of claim 1, wherein the first controller is further configured to:

3. The server according to claim 2, wherein the first controller is specifically configured to:

4. The server according to claim 2, wherein the first controller is in particular further configured to:

5. The server according to claim 1, wherein the first controller is specifically configured to:

6. The server of claim 1, wherein the first controller is further configured to:

7. A terminal device, comprising:

a second controller configured to:

8. The voice awakening method is characterized by being applied to a server, wherein the server stores a preset instruction weight library, and the preset instruction weight library comprises: a plurality of preset instructions and weights corresponding to the preset instructions; comprising the following steps:

9. The voice wake-up method is characterized by being applied to terminal equipment and comprising the following steps:

10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any of claims 8-9.