CN111143557A

CN111143557A - Real-time voice interaction processing method and device, electronic equipment and storage medium

Info

Publication number: CN111143557A
Application number: CN201911274649.1A
Authority: CN
Inventors: 赵群; 宁洪珂; 夏小强
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-05-12

Abstract

The disclosure relates to a real-time voice interaction processing method and device, electronic equipment and a storage medium. A real-time voice interaction processing method comprises the following steps: receiving voice session information, and determining whether preset sensitive content exists in the received voice session information; when the sensitive content does not exist in the voice conversation information, returning voice reply data matched with the voice conversation information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information. In the embodiment, the voice reminding data can be replied when the voice conversation information has sensitive content, and the user can be reminded in time, so that the user can suspend the sensitive topic in time, the problem that the user repeats for many times is avoided, and the user experience of voice interaction is promoted.

Description

Real-time voice interaction processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of speech processing technologies, and in particular, to a real-time speech interaction processing method and apparatus, an electronic device, and a storage medium.

Background

With the development of voice interaction technology, the application of the chat robot is more and more extensive, for example, various chat robots loaded in electronic equipment can facilitate the interaction between users and the chat robot, and the robot helps users to search corresponding signals.

In practical applications, the chat robot may face different users, and the qualities of the users are different. For example, during the chat process, some sensitive topics may be queried by some users, such as political events, adversity, yellow topics, or abuse, and if the chat robot feeds back information, the chat robot may violate or violate the violation, and if the chat robot does not feed back a signal, the user may consider that the robot is faulty or not heard clearly, and continue to query, which is not favorable for improving the user experience.

Disclosure of Invention

The present disclosure provides a real-time voice interaction processing method and apparatus, an electronic device, and a storage medium, to solve the deficiencies of the related art.

According to a first aspect of the embodiments of the present disclosure, a real-time voice interaction processing method is provided, including:

receiving voice session information, and determining whether preset sensitive content exists in the received voice session information;

when the sensitive content does not exist in the voice conversation information, returning voice reply data matched with the voice conversation information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information.

Optionally, the determining whether the preset sensitive content exists in the acquired voice session information includes:

converting the voice conversation information into text conversation data;

and determining whether preset sensitive content exists in the text conversation data.

Optionally, the determining whether the preset sensitive content exists in the text conversation data includes:

acquiring sensitive contents in a database, and constructing a dictionary tree structure;

and matching the text conversation data based on the dictionary tree structure to obtain a matching result, wherein the matching result represents whether preset sensitive content exists in the text conversation data.

Optionally, the sensitive content in the database is obtained by:

acquiring a pre-trained text classifier;

inputting text session data to be classified into the text classifier, and acquiring classification of each text session data by the text classifier; the classification is a normal type and an abnormal type, wherein the abnormal type refers to that sensitive content is contained in text conversation data;

sensitive content is extracted from the text conversation data classified as the abnormal type;

and storing the extracted sensitive content in the database.

Optionally, the text classifier is trained by the steps comprising:

acquiring text session data containing different types of sensitive contents to obtain a plurality of text session training sets, wherein the text session data containing the same type of sensitive contents form a text session training set;

and training a preset text classifier by using the text session training set until the output value of the loss function of the text classifier is smaller than a set error threshold value.

Optionally, after the text classifier obtains the classification of each text session data, the method further includes:

displaying the text conversation data of which the classified predicted value is greater than the predicted value threshold;

acquiring text session data selected by user triggering operation;

updating a text session training set with the selected text session data, the updated text session training set being used to retrain the text classifier.

Optionally, the sensitive content in the database is obtained by:

detecting keywords input in a web page interface of a manager;

and storing the keywords as sensitive contents into the database.

According to a second aspect of the embodiments of the present disclosure, there is provided a real-time voice interaction processing apparatus, including:

the sensitive content determining module is used for receiving the voice conversation information and determining whether preset sensitive content exists in the received voice conversation information;

the voice data reply module is used for returning voice reply data matched with the voice conversation information when the sensitive content does not exist in the voice conversation information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information.

Optionally, the sensitive content determining module includes:

a text data acquisition unit for converting the voice conversation information into text conversation data;

and the sensitive content determining unit is used for determining whether preset sensitive content exists in the text conversation data.

Optionally, the sensitive content determining unit includes:

the sensitive content acquisition subunit is used for acquiring sensitive content in the database and constructing a dictionary tree structure;

and the matching result obtaining subunit is used for matching the text conversation data based on the dictionary tree structure to obtain a matching result, and the matching result represents whether preset sensitive content exists in the text conversation data.

Optionally, the apparatus further includes a sensitive content extraction module, where the sensitive content extraction module includes:

the classifier obtaining unit is used for obtaining a pre-trained text classifier;

the text data classification unit is used for inputting the text session data to be classified into the text classifier, and the text classifier acquires the classification of each text session data; the classification is a normal type and an abnormal type, wherein the abnormal type refers to that sensitive content is contained in text conversation data;

the sensitive content extraction unit is used for extracting sensitive content from the text session data classified into the abnormal type;

and the sensitive content storage unit is used for storing the extracted sensitive content into the database.

Optionally, the apparatus further comprises a classifier training module, the classifier training module comprising:

the text data marking unit is used for acquiring text session data containing different types of sensitive contents to obtain a plurality of text session training sets, wherein the text session data containing the same type of sensitive contents form a text session training set;

and the classifier training unit is used for training a preset text classifier by using the text session training set until the output value of the loss function of the text classifier is smaller than a set error threshold value.

Optionally, the apparatus further comprises:

the text data display module is used for displaying the text conversation data of which the classified predicted value is greater than the predicted value threshold;

the text data selection module is used for acquiring text session data selected by user trigger operation;

and the training set updating module is used for updating a text session training set by using the selected text session data, and the updated text session training set is used for retraining the text classifier.

Optionally, the apparatus further includes a sensitive content obtaining module, where the sensitive content obtaining module includes:

a keyword detection unit for detecting a keyword input in a web page interface of a manager;

and the keyword storage unit is used for storing the keywords into the database when the keywords are sensitive contents.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to execute executable instructions in the memory to implement the steps of the method of any one of the above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a readable storage medium having stored thereon executable instructions which, when executed by a processor, implement the steps of the method according to any one of the above.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

as can be seen from the foregoing embodiments, in the embodiments of the present disclosure, by receiving voice session information, it is determined whether preset sensitive content exists in the received voice session information; when the sensitive content does not exist in the voice conversation information, returning voice reply data matched with the voice conversation information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information. Therefore, in the embodiment, the voice reminding data can be replied when the voice conversation information has sensitive content, and the user can be reminded in time, so that the user can stop the sensitive topic in time, the problem that the user repeats for many times is avoided, and the user experience of voice interaction is promoted.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow diagram illustrating a real-time voice interaction processing method according to an example embodiment.

FIG. 2 is a flow diagram illustrating a determination of sensitive content according to an example embodiment.

FIG. 3 is a flow diagram illustrating another process for determining sensitive content according to an example embodiment.

FIG. 4 is a flowchart illustrating training a text classifier in accordance with an exemplary embodiment.

FIG. 5 is a flowchart illustrating matching sensitive content according to an example embodiment.

Fig. 6 to 12 are block diagrams illustrating a real-time voice interaction processing apparatus according to an exemplary embodiment.

FIG. 13 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The following exemplary described embodiments do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of devices consistent with certain aspects of the present disclosure as recited in the claims below.

The embodiment of the present disclosure provides a real-time voice interaction processing method, which may be applied to an electronic device (such as a smart speaker) provided with a chat robot, fig. 1 is a flowchart of a real-time voice interaction processing method according to an exemplary embodiment, and referring to fig. 1, the real-time voice interaction processing method includes steps 101 to 102, where:

in step 101, voice session information is received, and it is determined whether preset sensitive content exists in the received voice session information.

The electronic device may engage in voice interaction with the user, such as the user speaking into the electronic device to form voice conversation information. The electronic device may receive the voice conversation information, understand the intent of the voice conference data, and feed back matching data to the user.

In this embodiment, the electronic device may first determine whether preset sensitive content exists in the voice conference data. Referring to fig. 2, the electronic device obtains voice session information, and the voice conference data can be collected by an audio collection device of the electronic device. The electronic device may then convert the voice conversation information into text conversation data (corresponding to step 201). The electronic device may then determine whether preset sensitive content is stored within the text conversation data (corresponding to step 202).

In practical applications, a database may be preset in the electronic device, and the database may store sensitive content, where the sensitive content may include at least one of the following: the sensitive word and the sensitive rule are used, wherein the sensitive rule refers to a rule for determining at least one sensitive word, for example, A-B-C, A x B x C, ABC forms one sensitive word, only the sensitive word can not be searched due to the existence of "-" or "-", so the sensitive rule can be set as A () B () C, wherein "()" can represent A and B, and a space character exists between B and C, including various symbols or characters, and ABC forms one sensitive word after the space character bar is removed. Of course, the technician may set the setting according to a specific scenario, and is not limited herein.

It should be noted that the sensitive content in the database can be obtained by:

in a first manner, a trained text classifier may be pre-stored in the electronic device, and when there is a need for classifying text conversation data, for example, the text conversation data is received, or after a conversation interaction process is completed, referring to fig. 3, the electronic device may obtain the text classifier (corresponding to step 301), where the text classifier may be a neural network model, for example, CNN or RNN, which is not limited herein. Inputting the text conversation data to be classified into a text classifier, and acquiring classification of each text conversation data by the text classifier; the classification may include a normal type and an abnormal type, where the abnormal type refers to that sensitive content is contained in the text session data (corresponding to step 302). In practical applications, the exception types may be further subdivided, such as politics, pornography, violence, etc., and are not limited herein. In this way, the electronic device may extract sensitive content from the text conversation data classified as the abnormal type (corresponding to step 303), and store the extracted sensitive content in the database (corresponding to step 304).

In one example, after the text classifier classifies each text session data, the text session data with the predicted value of the classification greater than the predicted value threshold, that is, the text session data of the abnormal type, may be displayed to the user, so that when a certain text session data contains sensitive content, the user triggers an operation to select the text session data, then the text session training set is updated by using the selected text session data, and then the text classifier may be retrained by using the updated text session training set. Thus, the accuracy of the classification of the text classifier is improved by repeated training.

The text classifier may be trained through the following steps, referring to fig. 4, and may obtain text session data including different types of sensitive content to obtain a plurality of text session training sets, where the text session data including the same type of sensitive content form one text session training set (corresponding to step 401). Then, a preset text classifier (such as CNN or RNN) is trained by using the text session training set until an output value of a loss function of the text classifier is smaller than a set error threshold (corresponding to step 402), thereby completing training of the classifier. It should be noted that the training process can be completed in the electronic device. Of course, the training process can also be completed outside the electronic device (i.e., offline training), and then the text classifier is migrated into the electronic device, so that the computing resources of the electronic device are not occupied.

In this embodiment, the electronic device can automatically update and enrich the sensitive content in the database by automatically acquiring the sensitive content in the text session data. Therefore, when the subsequent sensitive contents are matched, the accuracy of the matching result is improved.

In a second mode, the electronic device may be provided with an operation interface, for example, a manager web interface, and the user may input a keyword through the manager web interface, where the keyword is the sensitive content determined by the user. The electronic device may store the keywords as sensitive content in a database. In the method, the data volume of the sensitive content in the database can be enriched by manually adding the sensitive content, and the accuracy of the matching result is improved.

In this embodiment, referring to fig. 5, the electronic device may obtain the sensitive content in the database and construct a dictionary tree structure (corresponding to step 501). For example, a sensitive service may be invoked in the electronic device, and sensitive words and sensitive rules are read from a database by the sensitive service using java or python language, constructed into a dictionary tree structure, and stored in the memory. In practical application, the sensitive service may update the dictionary tree structure once according to a set period, for example, several minutes, so as to ensure that newly generated sensitive content can be added to the dictionary tree structure in time.

Then, the electronic device matches the text conversation data based on the dictionary tree structure, and a matching result can be obtained, wherein the matching result represents whether preset sensitive content exists in the text conversation data or not (corresponding to step 502).

In step 102, when the sensitive content does not exist in the voice session information, returning voice reply data matched with the voice session information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information.

In this embodiment, some voice reply data or voice prompt data may be preset in the electronic device.

Taking voice reply data as an example, the voice reply data can be topics which are often queried, such as where the capital of china is, which recommended snacks are in beijing, and the like, so that when sensitive content does not exist in the voice conference data, the electronic device can return the matched voice reply data based on semantic understanding of text conversation content.

In practical applications, the electronic device may also face a small audience topic, such as "what is outer space? At this moment, the electronic device can query answers on the internet based on the topic, voice reply data is formed after the answers are queried, and the voice reply data is fed back to the user, so that voice reply data with low use frequency does not need to exist in the electronic device, occupation of storage resources is reduced, and utilization efficiency of the storage resources is improved.

Taking the voice reminding data as an example, some voice reminding data may be preset in the electronic device, for example, different voice reminding data may be set for a certain type of sensitive content. When the voice conversation information has sensitive content, the voice reminding data corresponding to the sensitive content can be inquired and returned to the user. Therefore, the electronic equipment can remind the user to stop the sensitive topic in time by feeding back the voice reminding data in time, and the situation of repeatedly asking questions is avoided.

To this end, in the embodiment of the present disclosure, by receiving voice session information, it is determined whether preset sensitive content exists in the received voice session information; when the sensitive content does not exist in the voice conversation information, returning voice reply data matched with the voice conversation information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information. Therefore, in the embodiment, the voice reminding data can be replied when the voice conversation information has sensitive content, and the user can be reminded in time, so that the user can stop the sensitive topic in time, the problem that the user repeats for many times is avoided, and the user experience of voice interaction is promoted.

FIG. 6 is a block diagram illustrating a real-time voice interaction processing apparatus according to an example embodiment. Referring to fig. 6, a real-time voice interaction processing apparatus includes:

a sensitive content determining module 601, configured to receive voice session information, and determine whether preset sensitive content exists in the received voice session information;

a voice data reply module 602, configured to return voice reply data matched with the voice session information when the sensitive content does not exist in the voice session information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information.

In one embodiment, referring to fig. 7, the sensitive content determining module 601 includes:

a text data obtaining unit 701 configured to convert the voice session information into text session data;

a sensitive content determining unit 702, configured to determine whether there is a preset sensitive content in the text session data.

In one embodiment, referring to fig. 8, the sensitive content determining unit 702 includes:

a sensitive content acquiring subunit 801, configured to acquire sensitive content in the database and construct a dictionary tree structure;

a matching result obtaining subunit 802, configured to match the text session data based on the dictionary tree structure to obtain a matching result, where the matching result indicates whether preset sensitive content exists in the text session data.

In one embodiment, referring to fig. 9, the apparatus further includes a sensitive content extraction module, which includes:

a classifier obtaining unit 901 configured to obtain a pre-trained text classifier;

a text data classification unit 902, configured to input text session data to be classified into the text classifier, where the text classifier obtains a classification of each text session data; the classification is a normal type and an abnormal type, wherein the abnormal type refers to that sensitive content is contained in text conversation data;

a sensitive content extracting unit 903, configured to extract sensitive content from the text session data classified as the abnormal type;

and a sensitive content storage unit 904, configured to store the extracted sensitive content in the database.

In an embodiment, referring to fig. 10, the apparatus further comprises a classifier training module comprising:

a text data tagging unit 1001 configured to obtain text session data including different types of sensitive content to obtain a plurality of text session training sets, where the text session data including the same type of sensitive content form a text session training set;

a classifier training unit 1002, configured to train a preset text classifier by using the text session training set until an output value of a loss function of the text classifier is smaller than a set error threshold.

In one embodiment, referring to fig. 11, the apparatus further comprises:

a text data display module 1101, configured to display text session data with a predicted value greater than a predicted value threshold;

a text data selection module 1102, configured to acquire text session data selected by a user through a trigger operation;

a training set updating module 1103, configured to update a text session training set using the selected text session data, where the updated text session training set is used to retrain the text classifier.

In one embodiment, referring to fig. 12, the apparatus further includes a sensitive content acquiring module, where the sensitive content acquiring module includes:

a keyword detection unit 1201 for detecting a keyword input in the administrator web page interface;

a keyword storage unit 1202, configured to store the keyword into the database when the keyword is sensitive content.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 13 is a block diagram illustrating an electronic device in accordance with an example embodiment. For example, the electronic device 1300 may be a smartphone, a computer, a digital broadcast terminal, a tablet device, a medical device, a fitness device, a personal digital assistant, a smart speaker, and so on.

Referring to fig. 13, electronic device 1300 may include one or more of the following components: a processing component 1302, a memory 1304, a power component 1306, a multimedia component 1308, an audio component 1310, an input/output (I/O) interface 1312, a sensor component 1314, a communication component 1316, and an image acquisition component 1318.

The processing component 1302 generally operates the entirety of the electronic device 1300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1302 may include one or more processors 1320 to execute instructions. Further, the processing component 1302 can include one or more modules that facilitate interaction between the processing component 1302 and other components. For example, the processing component 1302 may include a multimedia module to facilitate interaction between the multimedia component 1308 and the processing component 1302.

The memory 1304 is configured to store various types of data to support operation at the electronic device 1300. Examples of such data include instructions for any application or method operating on the electronic device 1300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1304 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 1306 provides power to the various components of the electronic device 1300. Power components 1306 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 1300.

The multimedia component 1308 includes a screen between the electronic device 1300 and the target object that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a target object. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The audio component 1310 is configured to output and/or input audio signals. For example, the audio component 1310 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 1300 is in an operational mode, such as a call mode, a recording mode, and a real-time voice interaction processing mode. The received audio signals may further be stored in the memory 1304 or transmitted via the communication component 1316. In some embodiments, the audio component 1310 also includes a speaker for outputting audio signals.

The I/O interface 1312 provides an interface between the processing component 1302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc.

The sensor assembly 1314 includes one or more sensors for providing various aspects of state assessment for the electronic device 1300. For example, the sensor assembly 1314 may detect an open/closed state of the electronic device 1300, the relative positioning of components, such as a display and keypad of the electronic device 1300, the sensor assembly 1314 may also detect a change in the position of the electronic device 1300 or one of the components, the presence or absence of a target object in contact with the electronic device 1300, orientation or acceleration/deceleration of the electronic device 1300, and a change in the temperature of the electronic device 1300.

The communication component 1316 is configured to facilitate communications between the electronic device 1300 and other devices in a wired or wireless manner. The electronic device 1300 may access a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G, 5G, or a combination thereof. In an exemplary embodiment, the communication component 1316 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 1316 also includes a Near Field Communications (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 1300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components.

In an exemplary embodiment, a non-transitory readable storage medium including instructions, such as the memory 1304 including instructions, executable by the processor 1320 of the electronic device 1300 is also provided. For example, the non-transitory readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A real-time voice interaction processing method is characterized by comprising the following steps:

2. The method of claim 1, wherein determining whether preset sensitive content exists in the obtained voice conversation information comprises:

converting the voice conversation information into text conversation data;

3. The real-time voice interaction processing method of claim 2, wherein determining whether preset sensitive content exists in the text conversation data comprises:

4. The real-time voice interaction processing method according to claim 3, wherein the sensitive content in the database is obtained by the following steps:

acquiring a pre-trained text classifier;

and storing the extracted sensitive content in the database.

5. The real-time interactive voice processing method according to claim 4, wherein the text classifier is trained by the steps comprising:

6. The real-time voice interaction processing method according to claim 4, wherein after the text classifier obtains the classification of each text session data, the method further comprises:

acquiring text session data selected by user triggering operation;

7. The real-time voice interaction processing method according to claim 3, wherein the sensitive content in the database is obtained by the following steps:

detecting keywords input in a web page interface of a manager;

and storing the keywords as sensitive contents into the database.

8. A real-time voice interaction processing apparatus, comprising:

the sensitive content determining module is used for receiving the voice conversation information and determining whether preset sensitive content exists in the received and acquired voice conversation information;

9. The real-time voice interaction processing device according to claim 8, wherein the sensitive content determining module comprises:

10. The real-time voice interaction processing device according to claim 9, wherein the sensitive content determining unit comprises:

11. The apparatus according to claim 10, further comprising a sensitive content extraction module, wherein the sensitive content extraction module comprises:

12. The apparatus according to claim 11, wherein the apparatus further comprises a classifier training module, the classifier training module comprising:

13. The real-time voice interaction processing apparatus according to claim 10, further comprising:

14. The apparatus according to claim 10, further comprising a sensitive content acquisition module, wherein the sensitive content acquisition module comprises:

15. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to execute executable instructions in the memory to implement the steps of the method of any of claims 1 to 7.

16. A readable storage medium having stored thereon executable instructions, wherein the executable instructions when executed by a processor implement the steps of the method of any one of claims 1 to 7.