CN111916086A

CN111916086A - Voice interaction control method and device, computer equipment and storage medium

Info

Publication number: CN111916086A
Application number: CN202010619579.5A
Authority: CN
Inventors: 唐德顺; 阮亚华
Original assignee: Beijing Wensi Haihui Jinxin Software Co ltd
Current assignee: Beijing Wensi Haihui Jinxin Software Co ltd
Priority date: 2020-07-01
Filing date: 2020-07-01
Publication date: 2020-11-10
Anticipated expiration: 2040-07-01
Also published as: CN111916086B

Abstract

The application relates to a voice interaction control method, a voice interaction control device, computer equipment and a storage medium. The method comprises the following steps: receiving voice data input by a user in a current interactive scene, if the voice data is recognized to contain voice data of a non-target user, determining operation authority of the non-target user according to the voice data of the non-target user, and if the operation authority of the non-target user contains operation authority corresponding to the current interactive scene, executing corresponding operation according to semantic content and the operation authority of the voice data. According to the scheme, the identity of the non-target user except the target user can be identified, corresponding operation is executed after the authority verification is completed, the non-target user is allowed to assist the target user to complete interactive operation under a specific scene, and the operation efficiency and the convenience are improved.

Description

Voice interaction control method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of manual interaction technologies, and in particular, to a voice interaction control method and apparatus, a computer device, and a storage medium.

Background

With the development of artificial intelligence and voice recognition technology, the traditional human-computer interaction mode is changed greatly, and a human-computer interaction mode based on voice interaction control appears. The interactive voice operating system on various types of mobile terminal equipment, various types of computers and various types of electric appliances replaces the man-machine interaction mode of a graphical user interface in the existing operating system.

The interactive voice operation mode mainly takes voice input and output as main modes and takes other input and output modes as auxiliary modes. For example, when a user remotely operates a terminal, the user can operate the terminal through voice, the terminal recognizes an operation instruction in the voice, and corresponding operation can be executed according to the operation instruction.

However, in the existing interactive voice operation scheme, the user performs a relatively private operation on the terminal device, and the terminal device generally only recognizes the voice operation instruction of the account owner. In some specific scenarios, if the account owner is inconvenient to operate, assistance of the parent or other people is required, and the voice operation of other people is not approved by the terminal device, so that the operation cannot be performed. In view of this, the conventional interactive voice operation scheme is not highly convenient to operate.

Disclosure of Invention

In view of the above, it is desirable to provide a voice interaction control method, apparatus, computer device, and storage medium capable of improving the convenience of operation.

The embodiment of the invention provides a voice interaction control method, which comprises the following steps:

receiving voice data input by a user in a current interactive scene;

if the voice data contains the voice data of the non-target user, determining the operation authority of the non-target user according to the voice data of the non-target user;

and if the operation authority of the non-target user comprises the operation authority corresponding to the current interactive scene, executing corresponding operation according to the semantic content and the operation authority of the voice data.

In one embodiment, the recognizing whether the voice data includes voice data of a non-target user specifically includes:

extracting voiceprint features in the voice data;

and determining whether the user corresponding to the voice data is the target user or not according to the extracted voiceprint features.

In one embodiment, determining whether the user corresponding to the voice data is the target user includes:

sending the extracted voiceprint features to a third-party voiceprint service platform, wherein the third-party voiceprint service platform prestores the voiceprint features of the target user;

receiving an identity recognition result returned by the third-party voiceprint service platform;

and determining whether the user corresponding to the voice data is the target user or not according to the identity recognition result.

In one embodiment, determining whether the user corresponding to the voice data is the target user according to the extracted voiceprint feature includes:

matching the extracted voiceprint features with the prestored voiceprint features of the target user to obtain a voiceprint matching result;

and determining whether the user corresponding to the voice data is the target user or not according to the voiceprint matching result.

In one embodiment, determining the operation authority of the non-target user according to the voice data of the non-target user comprises:

matching the extracted voiceprint features in a preset voiceprint feature database;

if the pre-stored voiceprint features consistent with the extracted voiceprint features are matched, searching the identity data of the non-target user according to the pre-stored voiceprint features;

and acquiring the operation authority of the non-target user corresponding to the current interactive scene according to the identity identification data.

In one embodiment, the identification result comprises user identification data;

determining the operation authority of the non-target user according to the voice data of the non-target user comprises the following steps:

acquiring identity identification data of a non-target user;

and inquiring the operation authority of the non-target user corresponding to the current interactive scene according to the identity identification data of the non-target user.

In one embodiment, if the operation right of the non-target user includes an operation right corresponding to the current interactive scene, executing corresponding operations according to the semantic content and the operation right of the voice data includes:

extracting operation item key words contained in semantic content of the voice data;

determining the interaction type of the current interaction scene according to the semantic content;

and if the operation item key words and the interaction types are consistent with the operation authority of the current interaction scene, executing corresponding operation according to the operation item key words.

In one embodiment, the method further comprises the following steps:

and if the operation authority of the non-target user does not contain the operation authority corresponding to the current interactive scene, ignoring the voice data of the non-target user and returning to the step of receiving the voice data input by the user in the current interactive scene.

The embodiment of the invention provides a voice interaction control device, which comprises:

the voice data receiving module is used for receiving voice data input by a user in a current interactive scene;

the authority data acquisition module is used for confirming the operation authority of the non-target user according to the voice data of the non-target user if the voice data containing the voice data of the non-target user is identified;

and the data processing module is used for executing corresponding operation according to the semantic content and the operation authority of the voice data if the operation authority of the non-target user comprises the operation authority corresponding to the current interactive scene.

The embodiment of the invention provides computer equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to realize the following steps:

receiving voice data input by a user in a current interactive scene;

if the operation authority of the non-target user comprises the operation authority corresponding to the current interactive scene, the operation authority of the non-target user comprises the operation authority corresponding to the current interactive scene

And executing corresponding operation according to the semantic content and the operation authority of the voice data.

An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps:

receiving voice data input by a user in a current interactive scene;

The voice interaction control method, the voice interaction control device, the computer equipment and the storage medium receive voice data input by a user in a current interaction scene, identify the voice data input by the user, verify operation authority when the voice data is identified to contain voice data of a non-target user, and execute corresponding operation according to semantic content of the voice data and in combination with the operation authority if the non-target user has related operation authority so as to complete user interaction. According to the scheme, the identity of the non-target user except the target user can be identified, corresponding operation is executed after the authority verification is completed, the non-target user is allowed to assist the target user to complete interactive operation under a specific scene, and the operation efficiency and the convenience are improved.

Drawings

FIG. 1 is a diagram of an exemplary implementation of a voice interaction control method;

FIG. 2 is a flow chart illustrating a method for controlling voice interaction in one embodiment;

FIG. 3 is a flow chart illustrating a voice interaction control method according to another embodiment;

FIG. 4 is a schematic flow chart of the user identification step in one embodiment;

FIG. 5 is a block diagram showing the structure of a voice interaction control apparatus according to an embodiment;

FIG. 6 is a block diagram showing the structure of a voice interaction control apparatus according to another embodiment;

FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The voice interaction control method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. Specifically, a user performs human-computer interaction on the terminal 102 by inputting voice in an interaction scene, the terminal 102 collects voice data input by the user in real time and sends the voice data to the server 104 for user identity recognition, the server 104 receives the voice data input by the user in the current interaction scene and sent by the terminal 102, if the voice data is recognized to contain the voice data of a non-target user, the operation authority of the non-target user is determined according to the voice data of the non-target user, and if the operation authority of the non-target user contains the operation authority corresponding to the current interaction scene, corresponding operation is executed according to semantic content and the operation authority of the voice data. The terminal 102 may be, but not limited to, various electronic devices, various types of computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a voice interaction control method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

step 202, receiving voice data input by a user in a current interactive scene.

The voice data is sound/voice data input by a user in a current interactive scene, and specifically, the voice data may be generated based on other devices with a recording function, such as a microphone and an earphone of the terminal device. In this embodiment, the interaction scenario is a transaction scenario of a home bank, where a user logs in an account and inputs voice data (a voice instruction) before a transaction terminal (hereinafter referred to as a terminal) to operate the transaction terminal, and further, the transaction terminal may be provided with a camera to collect a face image of the user to identify the user or ensure transaction security. In the interaction process, besides an account owner (namely a target user) corresponding to the current transaction account, other people may exist, the user may have speech expressions belonging to non-target users although the face image is not acquired, and for the speech expressions of the non-target users, the speech expressions need to be treated differently for identity recognition to judge whether the second speech expression needs to be adopted or not. Therefore, in the interaction process, the identity of each segment of voice data input by the received user is identified, so as to judge the relationship between the owner of each segment of voice data and the target user of the current interaction scene.

And 204, if the voice data is identified to contain the voice data of the non-target user, determining the operation authority of the non-target user according to the voice data of the non-target user.

In a practical application scenario, some interactions (or transactions) may allow non-target users to operate with voice assistance, taking into account the scenario of use in the home. For example: parents help young children perform interactions, children assist parents in performing interactions, and may also allow for assisted interactions between any two people who are trusted. Therefore, in the specific implementation, it is necessary to identify each segment of voice data that receives the voice input of the user, and verify whether the current voice data is input by the account owner (target user) of the current interaction scenario. Specifically, the voice data may be subjected to identity recognition by using a voiceprint recognition technology. And if the user identity corresponding to the current voice data is identified as the target user, performing related operation according to the voice data of the target user. If the user identity corresponding to the current voice data is identified as a non-target user, the authority data of the non-target user needs to be acquired (or inquired) to verify whether the voice data of the non-target user can be adopted. The setting of the authority may be freely authorized and set by the user before the system is used, which is not limited herein. In actual operation, the target user can add one or more contacts (namely, non-target users) to the own account, so that identity information, role information, authority information and the like of the contacts are perfected. The method comprises the steps of prestoring voiceprint characteristics and identity identification information of a contact person, facilitating identity recognition, presetting authority data of the contact person, which are possessed by the contact person aiming at different interaction scenes, for facilitating auxiliary transaction of the contact person, associating the identity identification data with the voiceprint characteristics and the operation authority data, namely, corresponding operation authority data and voiceprint characteristics can be found out correspondingly through the identity identification data, and identity identification data and operation authority data can be recognized through the voiceprint characteristics.

And step 206, if the operation authority data of the non-target user contains the operation authority aiming at the current interactive scene, executing corresponding operation according to the semantic content and the operation authority of the voice data.

In practical applications, the authorization device for operating the permission data may include, but is not limited to, an operation permission of a specific interaction scenario, a data operation permission of an interaction scenario, and the like. For example, taking a home banking transaction scenario as an example, the operation authority data may include, but is not limited to, operation authorities of transaction types, such as inquiry operations, transfer operations, deposit and withdrawal operations, and the like, operation authorities of transaction data, such as a limit of a single transaction, a number of transactions per day, and the like. If the non-target user does not have the corresponding operation authority for the current interaction scene, ignoring the voice data input by the non-target user, regarding that the non-target user starts new interaction, and returning to the step 202. And continuously carrying out identity recognition operation on the input voice data in real time. If the operation authority data of the non-target user contains the operation authority which can be carried out aiming at the current interactive scene, the fact that the non-target user belongs to one of the contacts which are added in advance by the target user is shown, the fact that the voice data coming in and going out of the contact is valid is judged, and then corresponding operation is executed according to the semantic content and the operation authority of the voice data. In a specific application, voice data input by a user may include corresponding operation item keywords, such as "query current balance", text recognition may be performed on the voice data to obtain semantic content, and then corresponding operations are performed according to interactive operation keywords included in the semantic content in combination with the operation authority for the current interactive scene.

The voice interaction control method comprises the steps of receiving voice data input by a user in a current interaction scene, carrying out identity recognition on the voice data input by the user, carrying out operation authority verification when the voice data is recognized to contain voice data of a non-target user, and executing corresponding operation according to semantic content of the voice data and in combination with operation authority if the non-target user has the relevant operation authority so as to complete user interaction. According to the scheme, the identity of the non-target user except the target user can be identified, corresponding operation is executed after the authority verification is completed, the non-target user is allowed to assist the target user to complete interactive operation under a specific scene, and the operation efficiency and the convenience are improved

In one embodiment, as shown in fig. 3, before step 204, the method further includes:

and 203, extracting the voiceprint features in the voice data, and determining whether the user corresponding to the voice data is the target user or not according to the extracted voiceprint features.

The voiceprint feature refers to a sound wave spectrum carrying user speech information. In specific implementation, the user identity is identified based on the voice data, a voiceprint identification technology can be adopted to extract voiceprint features in the voice data, and the voiceprint features are unique and can correspondingly determine the user identity of the voice data. In this embodiment, since the voiceprint feature has the characteristics of specificity and stability, the user identity can be quickly and accurately identified by using the voiceprint feature for identity identification.

In one embodiment, as shown in fig. 4, determining whether the user corresponding to the voice data is the target user according to the extracted voiceprint feature includes:

step 223, sending the extracted voiceprint features to a third-party voiceprint service platform, wherein the third-party voiceprint service platform prestores the voiceprint features of the target user;

step 243, receiving the identity recognition result returned by the third-party voiceprint service platform;

and 263, determining whether the user corresponding to the voice data is the target user according to the identity recognition result.

The voiceprint service is a comprehensive technical service which takes a voiceprint recognition engine as a core foundation and provides a series of solutions such as voice recognition, voiceprint modeling, authentication, identification and the like. In practical application, no matter the face recognition or the voiceprint recognition, the corresponding face recognition service or the corresponding voiceprint recognition service can be obtained by calling a third-party voiceprint service platform interface. In this embodiment, an interface of the third-party voiceprint service platform may be called to obtain a service of the third-party voiceprint service platform, the extracted voiceprint features are sent to the third-party voiceprint service platform in which the voiceprint features of the target user are pre-stored, and the extracted voiceprint features are subjected to voiceprint recognition by the third-party voiceprint service platform in combination with the pre-stored voiceprint features of the target user, and a corresponding identity recognition result is returned. Specifically, the third-party voiceprint service platform can reduce the influence of channel difference by means of a voiceprint technology Ivector technology and a PLDA (provider data acquisition) technology, identify voiceprint characteristics and improve the identification performance. The identification result may include user identification information, such as identification data and role information (target user and non-target user), and the identification result may also include an unrecognized corresponding identity. In this embodiment, through calling the third party voiceprint service platform, the workload about the voiceprint recognition function framework in the earlier stage can be saved, the identity can be quickly and conveniently recognized, and a voiceprint recognition result can be obtained.

In one embodiment, determining whether the user corresponding to the voice data is the target user according to the extracted voiceprint feature includes: and matching the extracted voiceprint features with the prestored voiceprint features of the target user to obtain a voiceprint matching result, and determining whether the user corresponding to the voice data is the target user or not according to the voiceprint matching result.

Besides calling a third-party voiceprint service platform to identify the user identity, voiceprint characteristics of a target user can be prestored in a database, the extracted voiceprint characteristics are directly matched with the prestored voiceprint characteristics of the target user during each voiceprint identification, if the matching results are consistent, the user identity is proved to be the target user, and if the matching results are inconsistent, the user identity is proved to be a non-target user. In this embodiment, the extracted voiceprint features are directly matched with the prestored voiceprint features of the target user, and the third-party voiceprint service platform is not relied on, so that the privacy of user data can be ensured, and the interactive operation safety is improved.

determining the operation authority of the non-target user according to the voice data of the non-target user comprises the following steps: acquiring identity identification data of a non-target user; and inquiring the operation authority of the non-target user corresponding to the current interactive scene according to the identity identification data of the non-target user.

As described in the foregoing embodiment, if the third-party voiceprint service platform identifies the user identity, the returned identity identification result may include user identity identification data, where the identity identification data may be an identity number, an identification number, a mobile phone number, and other identification data for distinguishing the user identity. If the returned identification result includes the identification data of the non-target user, the corresponding operation authority data can be found out according to the identification data of the non-target user. In another embodiment, the determining the operation authority of the non-target user according to the voice data of the non-target user may further include: matching the extracted voiceprint features in a preset voiceprint feature database; if the pre-stored voiceprint features consistent with the extracted voiceprint features are matched, the identity data of the preset non-target user can be found out according to the pre-stored voiceprint features because the pre-stored voiceprint features are associated with the identity data, and the operation authority corresponding to the current interactive scene of the non-target user is obtained according to the identity data. In this embodiment, according to the identification data, the operation permission data can be quickly and accurately found.

In one embodiment, as shown in fig. 3, according to semantic content and interaction type, in combination with the operation authority of the current interaction scenario, the performing of the corresponding operation includes: step 226, extracting the operation item keywords contained in the semantic content of the voice data, determining the interaction type of the current interaction scene according to the semantic content, and if the operation item keywords and the interaction type are consistent with the operation authority of the current interaction scene, executing corresponding operation according to the operation item keywords.

The interaction types of the interaction scenario may include, but are not limited to, query-type interactions, information update-type interactions, and functionality selection interactions. In specific implementation, the extraction of the operation item keywords in the semantic content may be performed by performing semantic recognition (or text recognition) on the voice data to obtain semantic content included in the voice data, and then extracting the obtained semantic content data by combining a keyword recognition technology. And after the operation item key words are extracted, judging whether the operation item key words and the operation items contained in the interaction types accord with the operation authority which is possessed by the non-target users and aims at the current interaction scene, and if so, executing corresponding operation. For example, the operation item included in the semantic content of the non-target user is "check balance", and the non-target user has the operation authority of the current transaction, so that the operation item of "check balance" can be used as the next operation of the current transaction. And if the operation items contained in the current transaction type and the content of the voice of the non-target user do not accord with the operation authority of the non-target user for the current interaction scene, ignoring the operation contained in the voice content of the non-target user. For example, the content of the voice of the non-target user is "check balance", and the non-target user does not have the operation authority of the current transaction or does not have the authority of "poor balance", so that the operation item of "check balance" is not used as the next operation of the current transaction, ignores the voice content, and returns to step 202.

It should be understood that although the various steps in the flow charts of fig. 2-4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-4 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 5, there is provided a voice interaction control apparatus, including: a voice data receiving module 510, a permission data acquiring module 520, a data identifying module 530 and a data processing module 540, wherein:

and a voice data receiving module 510, configured to receive voice data input by a user in the current interactive scene.

The permission data obtaining module 520 is configured to determine, if it is recognized that the voice data includes voice data of a non-target user, an operation permission of the non-target user according to the voice data of the non-target user.

And the data processing module 530 is configured to, if the operation right of the non-target user includes an operation right corresponding to the current interaction scene, execute a corresponding operation according to the semantic content and the operation right of the voice data.

In one embodiment, as shown in fig. 6, the apparatus further includes an identity recognition module 540, configured to extract a voiceprint feature in the voice data, and determine whether a user corresponding to the voice data is a target user according to the extracted voiceprint feature.

In one embodiment, the identity recognition module 540 is further configured to send the extracted voiceprint features to a third-party voiceprint service platform, where the voiceprint features of the target user are prestored in the third-party voiceprint service platform, receive an identity recognition result returned by the third-party voiceprint service platform, and determine whether the user corresponding to the voice data is the target user according to the identity recognition result.

In one embodiment, the identity recognition module 540 is further configured to match the extracted voiceprint features with a prestored voiceprint feature of the target user to obtain a voiceprint matching result, and determine whether the user corresponding to the voice data is the target user according to the voiceprint matching result.

In one embodiment, the permission data obtaining module 520 is further configured to obtain the identification data of the non-target user, and query the operation permission of the non-target user corresponding to the current interaction scenario according to the identification data of the non-target user.

In one embodiment, the permission data obtaining module 520 is further configured to match the extracted voiceprint features in a preset voiceprint feature database, and if a pre-stored voiceprint feature consistent with the extracted voiceprint features is matched, find out the preset identification data of the non-target user according to the pre-stored voiceprint feature, and obtain the operation permission of the non-target user corresponding to the current interaction scenario according to the identification data.

In one embodiment, the data processing module 530 is further configured to extract an operation item keyword included in semantic content of the voice data, determine an interaction type of the current interaction scenario according to the semantic content, and if the operation item keyword and the interaction type are consistent with an operation authority of the current interaction scenario, execute a corresponding operation according to the operation item keyword.

In one embodiment, the data processing module 530 is further configured to, if the operation authority of the non-target user does not include the operation authority corresponding to the current interaction scenario, ignore the voice data of the non-target user, and control the voice data receiving module 510 to perform an operation of receiving the voice data input by the user in the current interaction scenario.

For specific limitations of the voice interaction control device, reference may be made to the above limitations of the voice interaction control method, which are not described herein again. All or part of each module in the voice interaction control device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing voiceprint characteristic data, identity identification data and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a voice interaction control method.

Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: receiving voice data input by a user in a current interactive scene, if the voice data is recognized to contain voice data of a non-target user, determining the operation authority of the non-target user according to the voice data of the non-target user, and if the operation authority data of the non-target user contains the operation authority aiming at the current interactive scene, executing corresponding operation according to semantic content and the operation authority of the voice data.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and extracting voiceprint features in the voice data, and determining whether the user corresponding to the voice data is a target user or not according to the extracted voiceprint features.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and sending the extracted voiceprint features to a third-party voiceprint service platform, wherein the third-party voiceprint service platform prestores the voiceprint features of the target user, receiving an identity recognition result returned by the third-party voiceprint service platform, and determining whether the user corresponding to the voice data is the target user according to the identity recognition result.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and matching the extracted voiceprint features with the prestored voiceprint features of the target user to obtain a voiceprint matching result, and determining whether the user corresponding to the voice data is the target user or not according to the voiceprint matching result.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and acquiring the identity identification data of the non-target user, and inquiring the operation authority of the non-target user corresponding to the current interactive scene according to the identity identification data of the non-target user.

In one embodiment, the processor, when executing the computer program, further performs the steps of: extracting operation item keywords contained in the semantic content of the voice data, determining the interaction type of the current interaction scene according to the semantic content, and executing corresponding operation according to the operation item keywords if the operation item keywords and the interaction type are consistent with the operation authority of the current interaction scene.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and matching the extracted voiceprint features in a preset voiceprint feature database, if a pre-stored voiceprint feature consistent with the extracted voiceprint features is matched, searching the identity data of a preset non-target user according to the pre-stored voiceprint feature, and acquiring the operation authority of the non-target user corresponding to the current interactive scene according to the identity data.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and if the operation authority of the non-target user does not contain the operation authority corresponding to the current interactive scene, ignoring the voice data of the non-target user and returning to the step of receiving the voice data input by the user in the current interactive scene.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor performs the steps of: receiving voice data input by a user in a current interactive scene, if the voice data is recognized to contain voice data of a non-target user, determining the operation authority of the non-target user according to the voice data of the non-target user, and if the operation authority data of the non-target user contains the operation authority aiming at the current interactive scene, executing corresponding operation according to semantic content and the operation authority of the voice data.

In one embodiment, the computer program when executed by the processor further performs the steps of: and extracting voiceprint features in the voice data, and determining whether the user corresponding to the voice data is a target user or not according to the extracted voiceprint features.

In one embodiment, the computer program when executed by the processor further performs the steps of: and sending the extracted voiceprint features to a third-party voiceprint service platform, wherein the third-party voiceprint service platform prestores the voiceprint features of the target user, receiving an identity recognition result returned by the third-party voiceprint service platform, and determining whether the user corresponding to the voice data is the target user according to the identity recognition result.

In one embodiment, the computer program when executed by the processor further performs the steps of: and matching the extracted voiceprint features with the prestored voiceprint features of the target user to obtain a voiceprint matching result, and determining whether the user corresponding to the voice data is the target user or not according to the voiceprint matching result.

In one embodiment, the computer program when executed by the processor further performs the steps of: and acquiring the identity identification data of the non-target user, and inquiring the operation authority of the non-target user corresponding to the current interactive scene according to the identity identification data of the non-target user.

In one embodiment, the computer program when executed by the processor further performs the steps of: extracting operation item keywords contained in the semantic content of the voice data, determining the interaction type of the current interaction scene according to the semantic content, and executing corresponding operation according to the operation item keywords if the operation item keywords and the interaction type are consistent with the operation authority of the current interaction scene.

In one embodiment, the computer program when executed by the processor further performs the steps of: and matching the extracted voiceprint features in a preset voiceprint feature database, if a pre-stored voiceprint feature consistent with the extracted voiceprint features is matched, searching the identity data of a preset non-target user according to the pre-stored voiceprint feature, and acquiring the operation authority of the non-target user corresponding to the current interactive scene according to the identity data.

In one embodiment, the computer program when executed by the processor further performs the steps of: and if the operation authority of the non-target user does not contain the operation authority corresponding to the current interactive scene, ignoring the voice data of the non-target user and returning to the step of receiving the voice data input by the user in the current interactive scene.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A voice interaction control method, the method comprising:

receiving voice data input by a user in a current interactive scene;

and if the operation authority of the non-target user comprises the operation authority corresponding to the current interactive scene, executing corresponding operation according to the semantic content of the voice data and the operation authority.

2. The method of claim 1, wherein identifying whether the voice data includes voice data of a non-target user comprises:

extracting voiceprint features in the voice data;

and determining whether the user corresponding to the voice data is a target user or not according to the extracted voiceprint features.

3. The method of claim 2, wherein the determining whether the user corresponding to the voice data is a target user comprises:

and determining whether the user corresponding to the voice data is a target user or not according to the identity recognition result.

4. The method according to claim 2, wherein the determining whether the user corresponding to the voice data is the target user according to the extracted voiceprint feature comprises:

matching the extracted voiceprint features with prestored voiceprint features of the target user to obtain a voiceprint matching result;

and determining whether the user corresponding to the voice data is a target user or not according to the voiceprint matching result.

5. The method of claim 2, wherein determining the operation authority of the non-target user according to the voice data of the non-target user comprises:

6. The method of claim 3, wherein the identification result comprises user identification data;

acquiring the identity data of the non-target user;

7. The method according to any one of claims 1 to 5, wherein if the operation right of the non-target user includes an operation right corresponding to a current interactive scene, executing a corresponding operation according to the semantic content of the voice data and the operation right comprises:

extracting operation item keywords contained in semantic content of the voice data;

8. The method of any one of claims 1 to 5, further comprising:

9. A voice interaction control apparatus, the apparatus comprising:

and the data processing module is used for executing corresponding operation according to the semantic content of the voice data and the operation authority if the operation authority of the non-target user comprises the operation authority corresponding to the current interactive scene.

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 8.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.