CN111143557A - Real-time voice interaction processing method and device, electronic equipment and storage medium - Google Patents

Real-time voice interaction processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111143557A
CN111143557A CN201911274649.1A CN201911274649A CN111143557A CN 111143557 A CN111143557 A CN 111143557A CN 201911274649 A CN201911274649 A CN 201911274649A CN 111143557 A CN111143557 A CN 111143557A
Authority
CN
China
Prior art keywords
text
sensitive content
data
voice
sensitive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911274649.1A
Other languages
Chinese (zh)
Inventor
赵群
宁洪珂
夏小强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN201911274649.1A priority Critical patent/CN111143557A/en
Publication of CN111143557A publication Critical patent/CN111143557A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The disclosure relates to a real-time voice interaction processing method and device, electronic equipment and a storage medium. A real-time voice interaction processing method comprises the following steps: receiving voice session information, and determining whether preset sensitive content exists in the received voice session information; when the sensitive content does not exist in the voice conversation information, returning voice reply data matched with the voice conversation information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information. In the embodiment, the voice reminding data can be replied when the voice conversation information has sensitive content, and the user can be reminded in time, so that the user can suspend the sensitive topic in time, the problem that the user repeats for many times is avoided, and the user experience of voice interaction is promoted.

Description

Real-time voice interaction processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of speech processing technologies, and in particular, to a real-time speech interaction processing method and apparatus, an electronic device, and a storage medium.
Background
With the development of voice interaction technology, the application of the chat robot is more and more extensive, for example, various chat robots loaded in electronic equipment can facilitate the interaction between users and the chat robot, and the robot helps users to search corresponding signals.
In practical applications, the chat robot may face different users, and the qualities of the users are different. For example, during the chat process, some sensitive topics may be queried by some users, such as political events, adversity, yellow topics, or abuse, and if the chat robot feeds back information, the chat robot may violate or violate the violation, and if the chat robot does not feed back a signal, the user may consider that the robot is faulty or not heard clearly, and continue to query, which is not favorable for improving the user experience.
Disclosure of Invention
The present disclosure provides a real-time voice interaction processing method and apparatus, an electronic device, and a storage medium, to solve the deficiencies of the related art.
According to a first aspect of the embodiments of the present disclosure, a real-time voice interaction processing method is provided, including:
receiving voice session information, and determining whether preset sensitive content exists in the received voice session information;
when the sensitive content does not exist in the voice conversation information, returning voice reply data matched with the voice conversation information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information.
Optionally, the determining whether the preset sensitive content exists in the acquired voice session information includes:
converting the voice conversation information into text conversation data;
and determining whether preset sensitive content exists in the text conversation data.
Optionally, the determining whether the preset sensitive content exists in the text conversation data includes:
acquiring sensitive contents in a database, and constructing a dictionary tree structure;
and matching the text conversation data based on the dictionary tree structure to obtain a matching result, wherein the matching result represents whether preset sensitive content exists in the text conversation data.
Optionally, the sensitive content in the database is obtained by:
acquiring a pre-trained text classifier;
inputting text session data to be classified into the text classifier, and acquiring classification of each text session data by the text classifier; the classification is a normal type and an abnormal type, wherein the abnormal type refers to that sensitive content is contained in text conversation data;
sensitive content is extracted from the text conversation data classified as the abnormal type;
and storing the extracted sensitive content in the database.
Optionally, the text classifier is trained by the steps comprising:
acquiring text session data containing different types of sensitive contents to obtain a plurality of text session training sets, wherein the text session data containing the same type of sensitive contents form a text session training set;
and training a preset text classifier by using the text session training set until the output value of the loss function of the text classifier is smaller than a set error threshold value.
Optionally, after the text classifier obtains the classification of each text session data, the method further includes:
displaying the text conversation data of which the classified predicted value is greater than the predicted value threshold;
acquiring text session data selected by user triggering operation;
updating a text session training set with the selected text session data, the updated text session training set being used to retrain the text classifier.
Optionally, the sensitive content in the database is obtained by:
detecting keywords input in a web page interface of a manager;
and storing the keywords as sensitive contents into the database.
According to a second aspect of the embodiments of the present disclosure, there is provided a real-time voice interaction processing apparatus, including:
the sensitive content determining module is used for receiving the voice conversation information and determining whether preset sensitive content exists in the received voice conversation information;
the voice data reply module is used for returning voice reply data matched with the voice conversation information when the sensitive content does not exist in the voice conversation information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information.
Optionally, the sensitive content determining module includes:
a text data acquisition unit for converting the voice conversation information into text conversation data;
and the sensitive content determining unit is used for determining whether preset sensitive content exists in the text conversation data.
Optionally, the sensitive content determining unit includes:
the sensitive content acquisition subunit is used for acquiring sensitive content in the database and constructing a dictionary tree structure;
and the matching result obtaining subunit is used for matching the text conversation data based on the dictionary tree structure to obtain a matching result, and the matching result represents whether preset sensitive content exists in the text conversation data.
Optionally, the apparatus further includes a sensitive content extraction module, where the sensitive content extraction module includes:
the classifier obtaining unit is used for obtaining a pre-trained text classifier;
the text data classification unit is used for inputting the text session data to be classified into the text classifier, and the text classifier acquires the classification of each text session data; the classification is a normal type and an abnormal type, wherein the abnormal type refers to that sensitive content is contained in text conversation data;
the sensitive content extraction unit is used for extracting sensitive content from the text session data classified into the abnormal type;
and the sensitive content storage unit is used for storing the extracted sensitive content into the database.
Optionally, the apparatus further comprises a classifier training module, the classifier training module comprising:
the text data marking unit is used for acquiring text session data containing different types of sensitive contents to obtain a plurality of text session training sets, wherein the text session data containing the same type of sensitive contents form a text session training set;
and the classifier training unit is used for training a preset text classifier by using the text session training set until the output value of the loss function of the text classifier is smaller than a set error threshold value.
Optionally, the apparatus further comprises:
the text data display module is used for displaying the text conversation data of which the classified predicted value is greater than the predicted value threshold;
the text data selection module is used for acquiring text session data selected by user trigger operation;
and the training set updating module is used for updating a text session training set by using the selected text session data, and the updated text session training set is used for retraining the text classifier.
Optionally, the apparatus further includes a sensitive content obtaining module, where the sensitive content obtaining module includes:
a keyword detection unit for detecting a keyword input in a web page interface of a manager;
and the keyword storage unit is used for storing the keywords into the database when the keywords are sensitive contents.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to execute executable instructions in the memory to implement the steps of the method of any one of the above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a readable storage medium having stored thereon executable instructions which, when executed by a processor, implement the steps of the method according to any one of the above.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
as can be seen from the foregoing embodiments, in the embodiments of the present disclosure, by receiving voice session information, it is determined whether preset sensitive content exists in the received voice session information; when the sensitive content does not exist in the voice conversation information, returning voice reply data matched with the voice conversation information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information. Therefore, in the embodiment, the voice reminding data can be replied when the voice conversation information has sensitive content, and the user can be reminded in time, so that the user can stop the sensitive topic in time, the problem that the user repeats for many times is avoided, and the user experience of voice interaction is promoted.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow diagram illustrating a real-time voice interaction processing method according to an example embodiment.
FIG. 2 is a flow diagram illustrating a determination of sensitive content according to an example embodiment.
FIG. 3 is a flow diagram illustrating another process for determining sensitive content according to an example embodiment.
FIG. 4 is a flowchart illustrating training a text classifier in accordance with an exemplary embodiment.
FIG. 5 is a flowchart illustrating matching sensitive content according to an example embodiment.
Fig. 6 to 12 are block diagrams illustrating a real-time voice interaction processing apparatus according to an exemplary embodiment.
FIG. 13 is a block diagram illustrating an electronic device in accordance with an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The following exemplary described embodiments do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of devices consistent with certain aspects of the present disclosure as recited in the claims below.
The embodiment of the present disclosure provides a real-time voice interaction processing method, which may be applied to an electronic device (such as a smart speaker) provided with a chat robot, fig. 1 is a flowchart of a real-time voice interaction processing method according to an exemplary embodiment, and referring to fig. 1, the real-time voice interaction processing method includes steps 101 to 102, where:
in step 101, voice session information is received, and it is determined whether preset sensitive content exists in the received voice session information.
The electronic device may engage in voice interaction with the user, such as the user speaking into the electronic device to form voice conversation information. The electronic device may receive the voice conversation information, understand the intent of the voice conference data, and feed back matching data to the user.
In this embodiment, the electronic device may first determine whether preset sensitive content exists in the voice conference data. Referring to fig. 2, the electronic device obtains voice session information, and the voice conference data can be collected by an audio collection device of the electronic device. The electronic device may then convert the voice conversation information into text conversation data (corresponding to step 201). The electronic device may then determine whether preset sensitive content is stored within the text conversation data (corresponding to step 202).
In practical applications, a database may be preset in the electronic device, and the database may store sensitive content, where the sensitive content may include at least one of the following: the sensitive word and the sensitive rule are used, wherein the sensitive rule refers to a rule for determining at least one sensitive word, for example, A-B-C, A x B x C, ABC forms one sensitive word, only the sensitive word can not be searched due to the existence of "-" or "-", so the sensitive rule can be set as A () B () C, wherein "()" can represent A and B, and a space character exists between B and C, including various symbols or characters, and ABC forms one sensitive word after the space character bar is removed. Of course, the technician may set the setting according to a specific scenario, and is not limited herein.
It should be noted that the sensitive content in the database can be obtained by:
in a first manner, a trained text classifier may be pre-stored in the electronic device, and when there is a need for classifying text conversation data, for example, the text conversation data is received, or after a conversation interaction process is completed, referring to fig. 3, the electronic device may obtain the text classifier (corresponding to step 301), where the text classifier may be a neural network model, for example, CNN or RNN, which is not limited herein. Inputting the text conversation data to be classified into a text classifier, and acquiring classification of each text conversation data by the text classifier; the classification may include a normal type and an abnormal type, where the abnormal type refers to that sensitive content is contained in the text session data (corresponding to step 302). In practical applications, the exception types may be further subdivided, such as politics, pornography, violence, etc., and are not limited herein. In this way, the electronic device may extract sensitive content from the text conversation data classified as the abnormal type (corresponding to step 303), and store the extracted sensitive content in the database (corresponding to step 304).
In one example, after the text classifier classifies each text session data, the text session data with the predicted value of the classification greater than the predicted value threshold, that is, the text session data of the abnormal type, may be displayed to the user, so that when a certain text session data contains sensitive content, the user triggers an operation to select the text session data, then the text session training set is updated by using the selected text session data, and then the text classifier may be retrained by using the updated text session training set. Thus, the accuracy of the classification of the text classifier is improved by repeated training.
The text classifier may be trained through the following steps, referring to fig. 4, and may obtain text session data including different types of sensitive content to obtain a plurality of text session training sets, where the text session data including the same type of sensitive content form one text session training set (corresponding to step 401). Then, a preset text classifier (such as CNN or RNN) is trained by using the text session training set until an output value of a loss function of the text classifier is smaller than a set error threshold (corresponding to step 402), thereby completing training of the classifier. It should be noted that the training process can be completed in the electronic device. Of course, the training process can also be completed outside the electronic device (i.e., offline training), and then the text classifier is migrated into the electronic device, so that the computing resources of the electronic device are not occupied.
In this embodiment, the electronic device can automatically update and enrich the sensitive content in the database by automatically acquiring the sensitive content in the text session data. Therefore, when the subsequent sensitive contents are matched, the accuracy of the matching result is improved.
In a second mode, the electronic device may be provided with an operation interface, for example, a manager web interface, and the user may input a keyword through the manager web interface, where the keyword is the sensitive content determined by the user. The electronic device may store the keywords as sensitive content in a database. In the method, the data volume of the sensitive content in the database can be enriched by manually adding the sensitive content, and the accuracy of the matching result is improved.
In this embodiment, referring to fig. 5, the electronic device may obtain the sensitive content in the database and construct a dictionary tree structure (corresponding to step 501). For example, a sensitive service may be invoked in the electronic device, and sensitive words and sensitive rules are read from a database by the sensitive service using java or python language, constructed into a dictionary tree structure, and stored in the memory. In practical application, the sensitive service may update the dictionary tree structure once according to a set period, for example, several minutes, so as to ensure that newly generated sensitive content can be added to the dictionary tree structure in time.
Then, the electronic device matches the text conversation data based on the dictionary tree structure, and a matching result can be obtained, wherein the matching result represents whether preset sensitive content exists in the text conversation data or not (corresponding to step 502).
In step 102, when the sensitive content does not exist in the voice session information, returning voice reply data matched with the voice session information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information.
In this embodiment, some voice reply data or voice prompt data may be preset in the electronic device.
Taking voice reply data as an example, the voice reply data can be topics which are often queried, such as where the capital of china is, which recommended snacks are in beijing, and the like, so that when sensitive content does not exist in the voice conference data, the electronic device can return the matched voice reply data based on semantic understanding of text conversation content.
In practical applications, the electronic device may also face a small audience topic, such as "what is outer space? At this moment, the electronic device can query answers on the internet based on the topic, voice reply data is formed after the answers are queried, and the voice reply data is fed back to the user, so that voice reply data with low use frequency does not need to exist in the electronic device, occupation of storage resources is reduced, and utilization efficiency of the storage resources is improved.
Taking the voice reminding data as an example, some voice reminding data may be preset in the electronic device, for example, different voice reminding data may be set for a certain type of sensitive content. When the voice conversation information has sensitive content, the voice reminding data corresponding to the sensitive content can be inquired and returned to the user. Therefore, the electronic equipment can remind the user to stop the sensitive topic in time by feeding back the voice reminding data in time, and the situation of repeatedly asking questions is avoided.
To this end, in the embodiment of the present disclosure, by receiving voice session information, it is determined whether preset sensitive content exists in the received voice session information; when the sensitive content does not exist in the voice conversation information, returning voice reply data matched with the voice conversation information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information. Therefore, in the embodiment, the voice reminding data can be replied when the voice conversation information has sensitive content, and the user can be reminded in time, so that the user can stop the sensitive topic in time, the problem that the user repeats for many times is avoided, and the user experience of voice interaction is promoted.
FIG. 6 is a block diagram illustrating a real-time voice interaction processing apparatus according to an example embodiment. Referring to fig. 6, a real-time voice interaction processing apparatus includes:
a sensitive content determining module 601, configured to receive voice session information, and determine whether preset sensitive content exists in the received voice session information;
a voice data reply module 602, configured to return voice reply data matched with the voice session information when the sensitive content does not exist in the voice session information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information.
In one embodiment, referring to fig. 7, the sensitive content determining module 601 includes:
a text data obtaining unit 701 configured to convert the voice session information into text session data;
a sensitive content determining unit 702, configured to determine whether there is a preset sensitive content in the text session data.
In one embodiment, referring to fig. 8, the sensitive content determining unit 702 includes:
a sensitive content acquiring subunit 801, configured to acquire sensitive content in the database and construct a dictionary tree structure;
a matching result obtaining subunit 802, configured to match the text session data based on the dictionary tree structure to obtain a matching result, where the matching result indicates whether preset sensitive content exists in the text session data.
In one embodiment, referring to fig. 9, the apparatus further includes a sensitive content extraction module, which includes:
a classifier obtaining unit 901 configured to obtain a pre-trained text classifier;
a text data classification unit 902, configured to input text session data to be classified into the text classifier, where the text classifier obtains a classification of each text session data; the classification is a normal type and an abnormal type, wherein the abnormal type refers to that sensitive content is contained in text conversation data;
a sensitive content extracting unit 903, configured to extract sensitive content from the text session data classified as the abnormal type;
and a sensitive content storage unit 904, configured to store the extracted sensitive content in the database.
In an embodiment, referring to fig. 10, the apparatus further comprises a classifier training module comprising:
a text data tagging unit 1001 configured to obtain text session data including different types of sensitive content to obtain a plurality of text session training sets, where the text session data including the same type of sensitive content form a text session training set;
a classifier training unit 1002, configured to train a preset text classifier by using the text session training set until an output value of a loss function of the text classifier is smaller than a set error threshold.
In one embodiment, referring to fig. 11, the apparatus further comprises:
a text data display module 1101, configured to display text session data with a predicted value greater than a predicted value threshold;
a text data selection module 1102, configured to acquire text session data selected by a user through a trigger operation;
a training set updating module 1103, configured to update a text session training set using the selected text session data, where the updated text session training set is used to retrain the text classifier.
In one embodiment, referring to fig. 12, the apparatus further includes a sensitive content acquiring module, where the sensitive content acquiring module includes:
a keyword detection unit 1201 for detecting a keyword input in the administrator web page interface;
a keyword storage unit 1202, configured to store the keyword into the database when the keyword is sensitive content.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
To this end, in the embodiment of the present disclosure, by receiving voice session information, it is determined whether preset sensitive content exists in the received voice session information; when the sensitive content does not exist in the voice conversation information, returning voice reply data matched with the voice conversation information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information. Therefore, in the embodiment, the voice reminding data can be replied when the voice conversation information has sensitive content, and the user can be reminded in time, so that the user can stop the sensitive topic in time, the problem that the user repeats for many times is avoided, and the user experience of voice interaction is promoted.
FIG. 13 is a block diagram illustrating an electronic device in accordance with an example embodiment. For example, the electronic device 1300 may be a smartphone, a computer, a digital broadcast terminal, a tablet device, a medical device, a fitness device, a personal digital assistant, a smart speaker, and so on.
Referring to fig. 13, electronic device 1300 may include one or more of the following components: a processing component 1302, a memory 1304, a power component 1306, a multimedia component 1308, an audio component 1310, an input/output (I/O) interface 1312, a sensor component 1314, a communication component 1316, and an image acquisition component 1318.
The processing component 1302 generally operates the entirety of the electronic device 1300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1302 may include one or more processors 1320 to execute instructions. Further, the processing component 1302 can include one or more modules that facilitate interaction between the processing component 1302 and other components. For example, the processing component 1302 may include a multimedia module to facilitate interaction between the multimedia component 1308 and the processing component 1302.
The memory 1304 is configured to store various types of data to support operation at the electronic device 1300. Examples of such data include instructions for any application or method operating on the electronic device 1300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1304 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 1306 provides power to the various components of the electronic device 1300. Power components 1306 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 1300.
The multimedia component 1308 includes a screen between the electronic device 1300 and the target object that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a target object. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
The audio component 1310 is configured to output and/or input audio signals. For example, the audio component 1310 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 1300 is in an operational mode, such as a call mode, a recording mode, and a real-time voice interaction processing mode. The received audio signals may further be stored in the memory 1304 or transmitted via the communication component 1316. In some embodiments, the audio component 1310 also includes a speaker for outputting audio signals.
The I/O interface 1312 provides an interface between the processing component 1302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc.
The sensor assembly 1314 includes one or more sensors for providing various aspects of state assessment for the electronic device 1300. For example, the sensor assembly 1314 may detect an open/closed state of the electronic device 1300, the relative positioning of components, such as a display and keypad of the electronic device 1300, the sensor assembly 1314 may also detect a change in the position of the electronic device 1300 or one of the components, the presence or absence of a target object in contact with the electronic device 1300, orientation or acceleration/deceleration of the electronic device 1300, and a change in the temperature of the electronic device 1300.
The communication component 1316 is configured to facilitate communications between the electronic device 1300 and other devices in a wired or wireless manner. The electronic device 1300 may access a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G, 5G, or a combination thereof. In an exemplary embodiment, the communication component 1316 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 1316 also includes a Near Field Communications (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 1300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components.
In an exemplary embodiment, a non-transitory readable storage medium including instructions, such as the memory 1304 including instructions, executable by the processor 1320 of the electronic device 1300 is also provided. For example, the non-transitory readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (16)

1. A real-time voice interaction processing method is characterized by comprising the following steps:
receiving voice session information, and determining whether preset sensitive content exists in the received voice session information;
when the sensitive content does not exist in the voice conversation information, returning voice reply data matched with the voice conversation information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information.
2. The method of claim 1, wherein determining whether preset sensitive content exists in the obtained voice conversation information comprises:
converting the voice conversation information into text conversation data;
and determining whether preset sensitive content exists in the text conversation data.
3. The real-time voice interaction processing method of claim 2, wherein determining whether preset sensitive content exists in the text conversation data comprises:
acquiring sensitive contents in a database, and constructing a dictionary tree structure;
and matching the text conversation data based on the dictionary tree structure to obtain a matching result, wherein the matching result represents whether preset sensitive content exists in the text conversation data.
4. The real-time voice interaction processing method according to claim 3, wherein the sensitive content in the database is obtained by the following steps:
acquiring a pre-trained text classifier;
inputting text session data to be classified into the text classifier, and acquiring classification of each text session data by the text classifier; the classification is a normal type and an abnormal type, wherein the abnormal type refers to that sensitive content is contained in text conversation data;
sensitive content is extracted from the text conversation data classified as the abnormal type;
and storing the extracted sensitive content in the database.
5. The real-time interactive voice processing method according to claim 4, wherein the text classifier is trained by the steps comprising:
acquiring text session data containing different types of sensitive contents to obtain a plurality of text session training sets, wherein the text session data containing the same type of sensitive contents form a text session training set;
and training a preset text classifier by using the text session training set until the output value of the loss function of the text classifier is smaller than a set error threshold value.
6. The real-time voice interaction processing method according to claim 4, wherein after the text classifier obtains the classification of each text session data, the method further comprises:
displaying the text conversation data of which the classified predicted value is greater than the predicted value threshold;
acquiring text session data selected by user triggering operation;
updating a text session training set with the selected text session data, the updated text session training set being used to retrain the text classifier.
7. The real-time voice interaction processing method according to claim 3, wherein the sensitive content in the database is obtained by the following steps:
detecting keywords input in a web page interface of a manager;
and storing the keywords as sensitive contents into the database.
8. A real-time voice interaction processing apparatus, comprising:
the sensitive content determining module is used for receiving the voice conversation information and determining whether preset sensitive content exists in the received and acquired voice conversation information;
the voice data reply module is used for returning voice reply data matched with the voice conversation information when the sensitive content does not exist in the voice conversation information; and when the sensitive content exists in the voice conversation information, returning voice reminding data matched with the sensitive content, wherein the voice reminding data is used for reminding that the sensitive content is related to the voice conversation information.
9. The real-time voice interaction processing device according to claim 8, wherein the sensitive content determining module comprises:
a text data acquisition unit for converting the voice conversation information into text conversation data;
and the sensitive content determining unit is used for determining whether preset sensitive content exists in the text conversation data.
10. The real-time voice interaction processing device according to claim 9, wherein the sensitive content determining unit comprises:
the sensitive content acquisition subunit is used for acquiring sensitive content in the database and constructing a dictionary tree structure;
and the matching result obtaining subunit is used for matching the text conversation data based on the dictionary tree structure to obtain a matching result, and the matching result represents whether preset sensitive content exists in the text conversation data.
11. The apparatus according to claim 10, further comprising a sensitive content extraction module, wherein the sensitive content extraction module comprises:
the classifier obtaining unit is used for obtaining a pre-trained text classifier;
the text data classification unit is used for inputting the text session data to be classified into the text classifier, and the text classifier acquires the classification of each text session data; the classification is a normal type and an abnormal type, wherein the abnormal type refers to that sensitive content is contained in text conversation data;
the sensitive content extraction unit is used for extracting sensitive content from the text session data classified into the abnormal type;
and the sensitive content storage unit is used for storing the extracted sensitive content into the database.
12. The apparatus according to claim 11, wherein the apparatus further comprises a classifier training module, the classifier training module comprising:
the text data marking unit is used for acquiring text session data containing different types of sensitive contents to obtain a plurality of text session training sets, wherein the text session data containing the same type of sensitive contents form a text session training set;
and the classifier training unit is used for training a preset text classifier by using the text session training set until the output value of the loss function of the text classifier is smaller than a set error threshold value.
13. The real-time voice interaction processing apparatus according to claim 10, further comprising:
the text data display module is used for displaying the text conversation data of which the classified predicted value is greater than the predicted value threshold;
the text data selection module is used for acquiring text session data selected by user trigger operation;
and the training set updating module is used for updating a text session training set by using the selected text session data, and the updated text session training set is used for retraining the text classifier.
14. The apparatus according to claim 10, further comprising a sensitive content acquisition module, wherein the sensitive content acquisition module comprises:
a keyword detection unit for detecting a keyword input in a web page interface of a manager;
and the keyword storage unit is used for storing the keywords into the database when the keywords are sensitive contents.
15. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to execute executable instructions in the memory to implement the steps of the method of any of claims 1 to 7.
16. A readable storage medium having stored thereon executable instructions, wherein the executable instructions when executed by a processor implement the steps of the method of any one of claims 1 to 7.
CN201911274649.1A 2019-12-12 2019-12-12 Real-time voice interaction processing method and device, electronic equipment and storage medium Pending CN111143557A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911274649.1A CN111143557A (en) 2019-12-12 2019-12-12 Real-time voice interaction processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911274649.1A CN111143557A (en) 2019-12-12 2019-12-12 Real-time voice interaction processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111143557A true CN111143557A (en) 2020-05-12

Family

ID=70518113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911274649.1A Pending CN111143557A (en) 2019-12-12 2019-12-12 Real-time voice interaction processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111143557A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150281446A1 (en) * 2014-03-25 2015-10-01 Intellisist, Inc. Computer-Implemented System And Method For Protecting Sensitive Information Within A Call Center In Real Time
CN106603381A (en) * 2016-11-24 2017-04-26 北京小米移动软件有限公司 Chat information processing method and device
CN108647309A (en) * 2018-05-09 2018-10-12 达而观信息科技(上海)有限公司 Chat content checking method based on sensitive word and system
CN109618068A (en) * 2018-11-08 2019-04-12 上海航动科技有限公司 A kind of voice service method for pushing, device and system based on artificial intelligence

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150281446A1 (en) * 2014-03-25 2015-10-01 Intellisist, Inc. Computer-Implemented System And Method For Protecting Sensitive Information Within A Call Center In Real Time
CN106603381A (en) * 2016-11-24 2017-04-26 北京小米移动软件有限公司 Chat information processing method and device
CN108647309A (en) * 2018-05-09 2018-10-12 达而观信息科技(上海)有限公司 Chat content checking method based on sensitive word and system
CN109618068A (en) * 2018-11-08 2019-04-12 上海航动科技有限公司 A kind of voice service method for pushing, device and system based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN109522419B (en) Session information completion method and device
CN106446054B (en) A kind of information recommendation method, device and electronic equipment
CN110874145A (en) Input method and device and electronic equipment
CN109582768B (en) Text input method and device
CN110069624B (en) Text processing method and device
CN114328838A (en) Event extraction method and device, electronic equipment and readable storage medium
CN112052316A (en) Model evaluation method, model evaluation device, storage medium and electronic equipment
CN111209381B (en) Time management method and device in dialogue scene
CN113849723A (en) Search method and search device
CN111368161A (en) Search intention recognition method and intention recognition model training method and device
CN113656557A (en) Message reply method, device, storage medium and electronic equipment
CN112836026B (en) Dialogue-based inquiry method and device
CN117370586A (en) Information display method and device, electronic equipment and storage medium
CN111381685B (en) Sentence association method and sentence association device
CN113936697A (en) Voice processing method and device for voice processing
CN108108356B (en) Character translation method, device and equipment
CN114579702A (en) Message sending method, device, terminal and storage medium for preventing misoperation
CN111241238B (en) User evaluation method, device, electronic equipment and storage medium
CN111143557A (en) Real-time voice interaction processing method and device, electronic equipment and storage medium
CN104699668B (en) Determine the method and device of Words similarity
CN113901832A (en) Man-machine conversation method, device, storage medium and electronic equipment
CN113035189A (en) Document demonstration control method, device and equipment
CN113807082B (en) Target user determining method and device for determining target user
CN114065743B (en) Title generation method, model training method, device, electronic equipment and medium
CN109408623B (en) Information processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination