WO2020253128A1 - Voice recognition-based communication service method, apparatus, computer device, and storage medium - Google Patents
Voice recognition-based communication service method, apparatus, computer device, and storage medium Download PDFInfo
- Publication number
- WO2020253128A1 WO2020253128A1 PCT/CN2019/122167 CN2019122167W WO2020253128A1 WO 2020253128 A1 WO2020253128 A1 WO 2020253128A1 CN 2019122167 W CN2019122167 W CN 2019122167W WO 2020253128 A1 WO2020253128 A1 WO 2020253128A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- call
- data
- caller
- scene
- audio
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000004891 communication Methods 0.000 title claims abstract description 46
- 230000008451 emotion Effects 0.000 claims abstract description 124
- 230000002996 emotional effect Effects 0.000 claims description 68
- 230000008909 emotion recognition Effects 0.000 claims description 65
- 238000010801 machine learning Methods 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 18
- 238000012549 training Methods 0.000 description 39
- 238000010586 diagram Methods 0.000 description 16
- 239000000284 extract Substances 0.000 description 8
- 238000012986 modification Methods 0.000 description 8
- 230000004048 modification Effects 0.000 description 8
- 239000013598 vector Substances 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 5
- 239000000243 solution Substances 0.000 description 5
- 230000001755 vocal effect Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 238000002347 injection Methods 0.000 description 3
- 239000007924 injection Substances 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000011282 treatment Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72448—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
- H04M1/72454—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72484—User interfaces specially adapted for cordless or mobile telephones wherein functions are triggered by incoming communication events
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2250/00—Details of telephonic subscriber devices
- H04M2250/12—Details of telephonic subscriber devices including a sensor for measuring a physical value, e.g. temperature or motion
Definitions
- This application relates to the field of data analysis technology, and in particular to a communication service method, device, computer equipment and storage medium based on voice recognition.
- the embodiments of the present application provide a communication service method, device, computer equipment, and storage medium based on voice recognition, which can better realize timely and accurate injection intervention when the caller makes a call, so as to guide the caller to better implement the call.
- this application provides a communication service method based on voice recognition, the method including:
- the type data of the call scene and the emotion data of the second caller it is generated and sent to the first call terminal for prompting the first caller to adjust the dialogue strategy to deal with the second caller's emotions.
- the second prompt message is generated and sent to the first call terminal for prompting the first caller to adjust the dialogue strategy to deal with the second caller's emotions.
- the present application provides a communication service device based on voice recognition, the device including:
- An audio acquisition module configured to obtain a first call audio corresponding to the first call terminal and a second call audio corresponding to the second call terminal if the call between the first call terminal and the second call terminal is connected;
- a voice recognition module configured to perform voice recognition on the first call audio and the second call audio to obtain dialogue text data
- a scene recognition module configured to recognize the dialogue text data based on a pre-built scene recognition model to obtain type data of the call scene
- the emotion recognition module is configured to recognize at least one of the first call audio, second call audio, and dialog text data based on a pre-built emotion recognition model to obtain the first call corresponding to the first call terminal Emotional data of a person and emotional data of a second caller corresponding to the second call terminal;
- the first prompting module is configured to generate and send a first prompt for prompting the first caller to adjust emotions to the first call terminal according to the type data of the call scene and the emotional data of the first caller information;
- the second prompting module is configured to generate and send to the first call terminal according to the type data of the call scene and the emotional data of the second caller for prompting the first caller to adjust the dialogue strategy to deal with the situation.
- the second prompt message describing the emotion of the second caller.
- the present application provides a computer device that includes a memory and a processor; the memory is used to store a computer program; the processor is used to execute the computer program and when the computer is executed
- the program implements the above-mentioned communication service method based on voice recognition.
- this application provides a computer-readable storage medium that stores a computer program, and if the computer program is executed by a processor, the above-mentioned voice recognition-based communication service method is implemented.
- This application discloses a communication service method, device, equipment and storage medium based on voice recognition.
- the corresponding audio is obtained during a call between a first call terminal and a second call terminal, and then the dialogue text is obtained through voice recognition and based on
- the conversation text recognizes the call scene and the emotion of the caller according to the acquired audio; then, according to the call scene and the emotion of the caller, prompts the caller accordingly, so as to realize timely and accurate injection of intervention during the call to guide the caller The caller realizes the call better.
- FIG. 1 is a schematic diagram of a usage scenario of a communication service method based on voice recognition according to an embodiment of the application;
- FIG. 2 is a schematic flowchart of a communication service method based on voice recognition according to an embodiment of the application
- Figure 3 is a schematic diagram of a sub-process of obtaining dialogue text data through voice recognition
- FIG. 4 is a schematic flowchart of a communication service method based on voice recognition according to another embodiment of this application.
- Figure 5 is a schematic diagram of a sub-process for obtaining type data of a call scene
- Figure 6 is a schematic diagram of a sub-process for extracting text features
- Figure 7 is a schematic diagram of a sub-process for extracting text features based on a bag of words model
- FIG. 8 is a schematic diagram of a sub-process of obtaining emotional data of the first caller
- FIG. 9 is a schematic diagram of a sub-process of emotion recognition model recognition and acquisition of emotion data
- FIG. 10 is a schematic flowchart of a communication service method based on voice recognition according to still another embodiment of this application.
- FIG. 11 is a schematic flowchart of a communication service method based on voice recognition according to another embodiment of this application.
- FIG. 12 is a schematic structural diagram of a communication service device based on voice recognition provided by an embodiment of the application.
- FIG. 13 is a schematic structural diagram of a communication service device based on voice recognition provided by another embodiment of this application.
- FIG. 14 is a schematic structural diagram of a computer device provided by an embodiment of this application.
- the embodiments of the present application provide a voice recognition-based communication service method, device, computer equipment, and computer-readable storage medium.
- the communication service method can be applied to a terminal or a server, so as to intervene in the communication between the callers when needed.
- the first call terminal and the second call terminal conduct a call
- the communication service method based on voice recognition is applied to at least one of the first call terminal and the second call terminal.
- the first call terminal and the second call terminal conduct a call
- the server provides support for the call between the first call terminal and the second call terminal
- a voice recognition-based communication service method can be applied to the server.
- FIG. 1 is a schematic diagram of an application scenario of a communication service method based on voice recognition provided by an embodiment of the present application.
- the application scenario includes a server, a first call terminal, and a second call terminal.
- the call terminal can be a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, a wearable device, a smart speaker, and other electronic devices;
- the server can be an independent server or a server cluster.
- FIG. 2 is a schematic flowchart of a communication service method based on voice recognition provided by an embodiment of the present application.
- the communication service method based on voice recognition includes the following steps S110 to S160.
- Step S110 If the call between the first call terminal and the second call terminal is connected, obtain the first call audio corresponding to the first call terminal and the second call audio corresponding to the second call terminal.
- the first caller uses the first call terminal to make a call to the second caller, and the second caller uses the second call terminal to answer the call, then the first call terminal and the second call terminal The call is connected.
- the server When the call between the first call terminal and the second call terminal is connected, and when the first caller is talking with the second caller, the server provides support for the call between the first call terminal and the second call terminal.
- the server collects the audio of the first caller, that is, the first call audio corresponding to the first call terminal, and sends the first call audio to the second call terminal so that the speaker of the second call terminal can play the audio to the second call terminal.
- the caller listens; the server also collects the audio of the second caller, that is, the second call audio corresponding to the second call terminal, and sends the second call audio to the first call terminal so that the speaker of the first call terminal plays the audio to the second call terminal
- Step S120 Perform voice recognition on the first call audio and the second call audio to obtain dialog text data.
- the server converts the first call audio and the second call audio into text by means of voice recognition to obtain dialog text data.
- step S120 performs voice recognition on the first call audio and the second call audio to obtain dialog text data, which specifically includes step S121 to step S123.
- Step S121 Perform voice recognition on the first call audio to obtain a first text corresponding to the first caller.
- the server when collecting the first call audio corresponding to the first call terminal, the server performs voice recognition on the collected first call audio, and marks the recognized text as the first text.
- Step S122 Perform voice recognition on the second call audio to obtain a second text corresponding to the second caller.
- the server when collecting the second call audio corresponding to the second call terminal, the server performs voice recognition on the collected second call audio, and marks the recognized text as the second text.
- Step S123 Sort the first text and the second text according to a preset sorting rule to obtain dialogue text data.
- the first text and the second text are sorted to obtain dialog text data.
- the dialogue text data includes a plurality of first texts and second texts arranged at intervals.
- Step S130 Recognizing the dialogue text data based on a pre-built scene recognition model to obtain type data of the call scene.
- the scene recognition model stores or learns several scene recognition rules, and the scene recognition model recognizes the call scene corresponding to the dialogue text data based on the scene recognition rules.
- step S130 recognizes the dialogue text data based on a pre-built scene recognition model to obtain type data of the call scene, including step S131.
- Step S131 Based on the scene rule engine with built-in scene judgment rules, analyze the conversation text data to obtain type data of the call scene.
- the scene rule engine is a rule engine with built-in scene judgment rules, such as a drools rule engine.
- the rule engine originated from the rule-based expert system, and the rule-based expert system is a branch of the expert system. Expert system belongs to the category of artificial intelligence. It imitates human reasoning, uses tentative methods to reason, and uses human-understandable terms to explain and prove its reasoning conclusions.
- the rule engine is a core technical component designed to respond to and process complex business rules. By introducing the rule engine, it is possible to dynamically define and adjust scene judgment rules in a timely manner through flexible configuration.
- the built-in scene judgment rule of the scene rule engine is specifically a rule set based on people's practical experience, and this embodiment does not limit the setting of the preset scene judgment rule. For example, if the dialog text data includes "Hello, Mr. Wang, I am XX", the scene recognition model recognizes the type of the conversation scene corresponding to the conversation text data as a stranger call based on a certain scene judgment rule.
- the construction of the scene rule engine includes: first obtain a number of scene judgment rules matching the rule modification template according to a preset rule modification template; then precompile and test the scene judgment rules, and generate according to the scene judgment rules after the test passes Script file; then store the script file on the server and associate the script file with the rule calling interface of the scene rule engine, so that the scene rule engine calls the corresponding scene judgment rule.
- the rule modification template is a visual rule modification template.
- visualizing the rule modification template it is more conducive for relevant personnel to edit directly on the rule modification template to generate scene judgment rules; so that relevant personnel who understand the call scene judgment rule can modify the scene judgment rule through the template without knowing the implementation method behind the template ,
- the threshold for using the rule engine is further reduced, which is beneficial to improve the accuracy of the scene rule engine's recognition of the call scene.
- the scene recognition model may be constructed in the following manner: the scene recognition model is obtained by learning from a set of scene training samples through a machine learning algorithm.
- step S130 recognizes the dialogue text data based on the pre-built scene recognition model to obtain the type data of the call scene, including step S132 and step S133.
- Step S132 Extract text features in the dialogue text data.
- feature words are extracted from the dialogue text data and quantified to represent the text information, that is, the text features in the dialogue text data, to realize the scientific abstraction of the dialogue text data, and establish its mathematical model to describe and replace Dialog text data.
- text features are extracted from dialogue text data based on a bag-of-words (Bag-of-words, BOW) model.
- BOW bag-of-words
- step S132 extracts the text features in the dialogue text data, including step S1321 and step S1322.
- Step S1321 filter out noisy characters in the dialogue text data according to a preset filtering rule.
- the stop words in the dialogue text data are deleted or replaced with preset symbols.
- some special words such as " ⁇ ", " ⁇ ” and other noise characters and invalid words can be specified as stop words according to the call scene to construct a stop word database and save it in the form of a configuration file.
- the server calls the stop word database when needed.
- each stop word in the stop word database is searched separately whether it appears in the dialogue text data, and if it appears, the stop word in the dialogue text data is deleted; or, the stop word in the stop word database is searched separately Whether each stop word appears in the dialogue text data, if it appears, replace the stop word in the dialogue text data with a preset symbol, such as a space, to preserve the structure of the dialogue text data to a certain extent .
- Step S1322 based on the bag-of-words model, extract text features from the dialogue text data with noise characters filtered out.
- Bag-of-words (Bag-of-words, BOW) is a representation of the text that describes the occurrence of word elements in a document.
- the bag-of-words model is a method of representing text data when modeling text with machine learning algorithms. It involves two aspects: the collection of known words and testing the existence of known words.
- the bag-of-words model includes a dictionary, and the dictionary includes several words.
- the bag-of-words model divides the dialogue text data with noisy characters filtered into words, imagine putting all words in a bag, ignoring the word order, grammar, syntax and other elements, and treating them as just a collection of several words. The appearance of each word in the dialogue text data is independent and does not depend on whether other words appear or not.
- the bag-of-words model extracts text features from dialogue text data that filter out noise characters including bag-of-words feature vectors.
- step S1322 is based on the bag-of-words model to extract text features from the dialogue text data with noise characters filtered out, including steps S1301-step S1303.
- Step S1301 initialize the all-zero bag-of-words feature vector.
- the elements in the bag-of-words feature vector correspond one-to-one with words in the dictionary of the bag-of-words model.
- Step S1302 Count the number of occurrences of each word in the dictionary in the dialogue text data from which the noise character is filtered out.
- Step S1303 Assign a value to the corresponding element in the bag of words feature vector according to the number of times the word appears in the dialogue text data.
- the bag of words feature vector is [1, 1, 1, 1, 0, 0, 0]. If the dialogue text data from which the noise characters are removed is "Xiao Ming likes watching movies and Xiao Ming also likes playing football", the bag of words feature vector is [2, 2, 1, 1, 1, 1, 1].
- Step S133 Based on the trained machine learning model, the type data of the call scene is identified according to the text features in the dialogue text data.
- the text features in the dialogue text data are used as the input of the trained machine learning model, and the output of the machine learning model is used as the type data of the identified call scene.
- the scene training sample set used to train the machine learning model includes several scene training samples.
- the scene training sample includes historical dialogue text data and scene type data corresponding to the historical dialogue text data. Text features can be extracted from historical dialogue text data.
- the scene type data is the annotation data of the historical dialogue text data.
- model training the text characteristics corresponding to the historical dialogue text data are used as input data, and all The scene type data is used as output data, and a selected machine learning model is used to learn from a scene training sample set including a large number of scene training samples to obtain a trained machine learning model.
- the trained machine learning model can be set as a model that only recognizes the call scene type in a single scene, and the type data of the call scene obtained by recognizing the conversation text data based on the pre-built scene recognition model can be It reflects whether the first caller and the second caller belong to a specific call scene.
- the trained machine learning model can also be set as a model that can recognize the type of call scenes in multiple scenarios, and then the type of call scene obtained by recognizing the conversation text data based on the pre-built scene recognition model The data can reflect the probability that the first caller and the second caller belong to multiple specific call scenarios.
- Step S140 Recognizing the first call audio and the second call audio based on the pre-built emotion recognition model to obtain the emotion data of the first caller corresponding to the first call terminal and the second call terminal corresponding The emotional data of the second caller.
- the server recognizes the first call audio based on a pre-built emotion recognition model to obtain the emotion data of the first caller; and the server recognizes the second call audio based on the pre-built emotion recognition model Perform recognition to obtain emotional data of the second caller.
- a machine learning algorithm is used to obtain the emotion recognition model from a set of emotion training samples.
- the emotion training sample set includes several emotion training samples.
- the emotion training sample includes historical audio data and emotion type data corresponding to the historical audio data.
- characteristic data can be extracted, such as volume characteristics, speech rate characteristics, smooth characteristics, pause characteristics, etc.
- the emotion type data is the annotation data of the historical audio data
- the historical audio data is used for model training.
- the feature data corresponding to the audio data is used as input data, the emotion type data is used as output data, and the emotion recognition model is obtained by learning from a set of emotion training samples including several emotion training samples through a selected machine learning model.
- the first call audio is first processed to obtain smooth features that reflect the smoothness of the first caller’s voice, and to obtain pause features that reflect the duration of the pause; specifically, the smooth feature is identified through The voice jitter frequency of the first caller is detected and evaluated. The identification of the pause feature is obtained by starting a timer for timing when the voice of the first caller and the second caller stops.
- the trained emotion recognition model can recognize the emotion data of the first caller based on smooth features, pause features, volume features, and/or speech rate features.
- the emotion recognition model can recognize the second call audio to obtain the emotion data of the second caller.
- the emotion recognition model recognizes that the emotion data of the first caller corresponding to the first call terminal is "excited";
- the emotion recognition model recognizes that the emotion data of the first caller corresponding to the first call terminal is "tension”.
- the emotion recognition model recognizes the dialogue text data to obtain text features; the emotion recognition model can also identify the emotion data of the first speaker or the second speaker based on the text features. For example, if the second text in the dialogue text data includes the sentence "You need to be calm and not excited” corresponding to the second caller, the emotion recognition model can recognize the emotion of the first caller as "excited”; if the dialogue text data The second text includes the sentence "you this **" corresponding to the second caller, and the emotion recognition model can identify the emotion of the second caller as "excited” or "angry”.
- step S140 recognizes the first call audio and the second call audio based on a pre-built emotion recognition model to obtain the first caller corresponding to the first call terminal
- the emotional data of and the emotional data of the second caller corresponding to the second call terminal specifically include step S141 and step S142.
- Step S141 Recognizing the first call audio and dialogue text data based on the pre-built emotion recognition model to obtain the emotion data of the first caller corresponding to the first call terminal.
- the volume feature, the speech rate feature, the smooth feature and/or the pause feature extracted from the first call audio, and the text feature extracted from the dialogue text data are merged as the input of the emotion recognition model, and the emotion recognition model is recognized
- the emotional data of the first caller is obtained; the accuracy of model recognition is further improved.
- Step S142 Recognizing the second call audio and dialogue text data based on the pre-built emotion recognition model to obtain the second caller's emotion data corresponding to the second call terminal.
- the volume feature, the speech rate feature, the smooth feature and/or the pause feature extracted from the second call audio, and the text feature extracted from the dialogue text data are merged as the input of the emotion recognition model, and the emotion recognition model is used to recognize The emotional data of the second caller is obtained; the accuracy of model recognition is further improved.
- step S141 recognizes the first call audio and dialogue text data based on a pre-built emotion recognition model to obtain the emotion data of the first caller corresponding to the first call terminal , Specifically includes step S1411-step S1413.
- Step S1411 extract at least one of a volume feature, a speech rate feature, a smooth feature, and a pause feature from the first call audio.
- the volume feature is a feature used to reflect the amplitude of the first call audio
- the recognition of the speech rate feature is obtained by calculating the rate of change of the energy envelope of the first call audio in the time domain
- the smooth feature is recognized by The voice jitter frequency of the first caller is detected and evaluated.
- the identification of the pause feature is obtained by starting a timer for timing when the voice of the first caller and the second caller stops.
- Step S1412 extract text features from the dialogue text data.
- the text features of the dialog text data extracted in step S132 can be reused.
- Step S1413 Based on the pre-built emotion recognition model, process the text feature and at least one of the volume feature, the speech rate feature, the smooth feature, and the pause feature to obtain the first call terminal corresponding to the first call terminal.
- the emotional data of the caller is not limited to the pre-built emotion recognition model.
- the fusion process is performed on the text feature and the volume feature, speech rate feature, smooth feature, and pause feature, such as splicing as input to the emotion recognition model, and the emotion recognition model recognizes the emotion of the first caller Data, further improve the accuracy of model recognition.
- the emotion training sample set includes several emotion training samples.
- the emotion training sample includes historical audio data, corresponding dialogue text data and corresponding emotion type data.
- volume characteristics, speech rate characteristics, smooth characteristics, pause characteristics, etc. can be extracted, and text characteristics can be obtained from dialogue text data;
- the emotion type data is the annotation data of the historical audio data, which is used during model training .
- Using the volume feature, speaking rate feature, smooth feature, pause feature, etc., and text feature corresponding to the historical audio data as input data, using the emotion type data as output data, and using the selected machine learning model from including Emotion training samples of several emotion training samples are collectively learned to obtain the emotion recognition model.
- Step S150 Generate and send first prompt information for prompting the first caller to adjust emotions to the first call terminal according to the type data of the call scene and the emotion data of the first caller.
- the first prompt information generated and sent to the first call terminal includes “excited too much” and the like.
- the first prompt information may be provided to the first caller using the first call terminal in a manner of display or sound.
- step S150 generates and sends to the first call terminal according to the type data of the call scene and the emotional data of the first caller for prompting the first call
- the first prompt message for adjusting the emotion of the person includes step S151:
- Step S151 Based on the prompt rule engine with built-in prompt rules, analyze the type data of the call scene and the emotional data of the first caller to obtain corresponding first prompt information, and send the first prompt information to Sent by the first call terminal to prompt the first caller to adjust emotions.
- the prompt rule engine is a rule engine with built-in prompt rules, such as a drools rule engine.
- the prompt rule engine includes prompt rules: if the type of the call scene is father and son, and the emotional data of the first caller is "excited", then first prompt information including "excited emotion" is generated.
- step S150 generates and sends to the first call terminal according to the type data of the call scene and the emotional data of the first caller for prompting the first call
- the first prompt message for adjusting the emotion of the caller includes step S152:
- Step S152 Based on the pre-trained first prompt model, generate and send to the first call terminal according to the type data of the call scene, the emotional data of the first caller, and the dialog text data for prompting The first prompt message for adjusting the emotion of the first caller.
- the first prompt model may be constructed in the following manner: a machine learning algorithm is used to obtain the first prompt model from the first prompt training sample set.
- the first prompt training sample set includes a plurality of first prompt training samples.
- Each first prompt training sample includes type data of the historical call scene, historical emotion data corresponding to the first caller, text features corresponding to the historical dialogue text data, and prompt information corresponding to the training sample.
- the prompt information is the annotation data of the training sample; during model training, the type data of the historical call scene, the historical emotion data corresponding to the first caller, and the text features corresponding to the historical dialogue text data are used as input data , Using the prompt information as output data, through the selected machine learning model, learn from the first prompt training sample set including the first prompt training sample to obtain the first prompt model.
- the first prompt model can learn the verbal rules in the call based on the historical dialogue text data, and can provide prompts including the verbal information when generating and prompting information.
- the first prompt message is generated and sent to the first call terminal including "excited, try to talk Talk about the weather” etc.
- Step S160 Generate and send to the first call terminal according to the type data of the call scene and the emotional data of the second caller for prompting the first caller to adjust the dialogue strategy to deal with the second call The second reminder of human emotions.
- the second prompt message including "your mother has been exhausted recently" is generated and sent to the first call terminal;
- the type of the call scene is a conversation between lovers, and the emotional data of the second caller is "acting like a baby”, then a second prompt message including "your girlfriend is acting like a baby” is generated and sent to the first calling terminal; or the scene of the call
- the type of is a call between friends, and the emotional data of the second caller is "angry”, then a second prompt message including "your friend is angry” is generated and sent to the first call terminal.
- the second prompt information may be provided to the first caller using the first call terminal in a manner of display or sound.
- step S160 generates and sends to the first call terminal according to the type data of the call scene and the emotional data of the second caller for prompting the first call
- the person adjusts the dialogue strategy to deal with the second prompt message of the emotion of the second caller, including step S161:
- Step S161 Based on the prompt rule engine with built-in prompt rules, analyze the type data of the call scene and the emotional data of the second caller to obtain corresponding second prompt information, and send the second prompt information to The first call terminal sends to prompt the first caller to adjust the dialogue strategy to deal with the emotion of the second caller.
- the prompt rule engine is a rule engine with built-in prompt rules, such as a drools rule engine.
- the reminder rule engine includes a reminder rule: if the type of the call scene is a couple and the emotional data of the second caller is "acting like a baby", then a second reminder message including "your girlfriend is acting like a baby" is generated.
- step S160 generates and sends to the first call terminal a reminder of the first call based on the type data of the call scene and the emotional data of the second caller.
- the caller adjusts the dialogue strategy to deal with the second prompt message of the second caller’s emotion, including step S162:
- Step S162 Based on the pre-trained second prompt model, generate and send to the first call terminal according to the type data of the call scene, the emotional data of the second caller, and the dialog text data for prompting The first caller adjusts the dialogue strategy to deal with the second prompt message of the emotion of the second caller.
- the second prompt model may be constructed in the following manner: the second prompt model is obtained by learning from the second prompt training sample set through a machine learning algorithm.
- the second prompt training sample set includes a plurality of second prompt training samples.
- Each second prompt training sample includes type data of the historical call scene, historical emotion data corresponding to the second caller, text features corresponding to the historical dialogue text data, and prompt information corresponding to the training sample.
- the prompt information is the annotation data of the training sample; during model training, the type data of the historical call scene, the historical emotion data corresponding to the second caller, and the text features corresponding to the historical dialogue text data are used as input data , Using the prompt information as output data, and through a selected machine learning model, learn from a second prompt training sample set including a second prompt training sample to obtain the second prompt model.
- the second prompt model can learn the verbal rules in the call based on the historical dialogue text data, and can provide prompts including verbal information when generating and prompting information.
- the second prompt message generated and sent to the first call terminal includes "Your mother has been tired recently, Condolences to mom’s life”; or the type of the call scene is a conversation between couples, and the emotional data of the second caller is "coquettish”, then the second prompt message generated and sent to the first call terminal includes "your girl A friend is acting like a baby and tenderly calling her baby”; or the type of the call scene is a call between friends, and the emotional data of the second caller is "anger”, then a second reminder is generated and sent to the first call terminal Messages include "Your friend is angry, try to talk about the weather” etc.
- the first prompt model in step S152 and the second prompt model in step S162 can be integrated into one prompt model. Specifically, it can be used to indicate the reminder object identifier in the reminder training sample; for example, the reminder model running on the server can generate the corresponding reminder information and predict the reminder object corresponding to the reminder information, and send the reminder information to the The reminder object, such as sent to the first call terminal or the second call terminal.
- the first call audio corresponding to the first call terminal is suspended to Sent by the second call terminal to shield the first prompt information from the second caller.
- the second caller when the second prompt message for prompting the first caller to adjust the dialogue strategy in response to the emotion of the second caller is sent to the first call terminal in step S160, the second caller is suspended.
- the first call audio corresponding to a call terminal is sent to the second call terminal to shield the second prompt information from the second caller.
- the server when the server sends corresponding prompt information to the first call terminal, the first call terminal prompts the first caller by means of voice prompts; at this time, the server can pause the collection of the audio obtained by the microphone of the first call terminal, that is, the first call terminal Call audio, for example, control the call mode of the first call terminal to mute mode; thus stop sending the first call audio containing the corresponding sound prompt to the second call terminal, so the first prompt information and the second prompt information will not be Second, the caller heard it.
- the corresponding audio is obtained during a call between the first call terminal and the second call terminal, and then the dialogue text is obtained through voice recognition and the call scene is recognized according to the dialogue text, and according to The acquired audio recognizes the emotion of the caller; then, corresponding prompts are made to the caller according to the call scene and the caller's emotions, so as to realize timely and accurate injection of intervention during the call to guide the caller to better implement the call.
- FIG. 12 is a schematic structural diagram of a voice recognition-based communication service device provided by an embodiment of the present application.
- the voice recognition-based communication service device may be configured in a server for performing the aforementioned voice recognition Communication service method.
- the communication service device based on voice recognition includes: an audio acquisition module 110, a voice recognition module 120, a scene recognition module 130, an emotion recognition module 140, a first prompt module 150, and a second prompt module 160.
- the audio obtaining module 110 is configured to obtain the first call audio corresponding to the first call terminal and the second call audio corresponding to the second call terminal if the call between the first call terminal and the second call terminal is connected .
- the voice recognition module 120 is configured to perform voice recognition on the first call audio and the second call audio to obtain dialogue text data.
- the speech recognition module 120 includes:
- the first voice sub-module 121 is configured to perform voice recognition on the first call audio to obtain the first text corresponding to the first caller;
- the second voice submodule 122 is configured to perform voice recognition on the second call audio to obtain the second text corresponding to the second caller;
- the text sorting sub-module 123 is used to sort the first text and the second text according to a preset sorting rule to obtain dialogue text data.
- the scene recognition module 130 is configured to recognize the dialogue text data based on a pre-built scene recognition model to obtain type data of the call scene.
- the scene recognition module 130 includes:
- the scene rule sub-module 131 is used to analyze the conversation text data to obtain the type data of the call scene based on the scene rule engine of the built-in scene judgment rule
- the scene recognition module 130 includes:
- the feature extraction sub-module 132 is used to extract text features in the dialogue text data
- the scene recognition sub-module 133 is used to identify the type data of the call scene according to the text features in the conversation text data based on the trained machine learning model.
- the emotion recognition module 140 is configured to recognize the first call audio and the second call audio based on a pre-built emotion recognition model to obtain the emotion data of the first caller corresponding to the first call terminal and the second call Emotional data of the second caller corresponding to the second call terminal.
- the emotion recognition module 140 includes:
- the first emotion recognition sub-module 141 is configured to recognize the first call audio and dialog text data based on a pre-built emotion recognition model to obtain the emotion data of the first caller corresponding to the first call terminal.
- the first emotion recognition sub-module 141 includes:
- An audio feature extraction sub-module for extracting at least one of a volume feature, a speech rate feature, a smooth feature, and a pause feature from the first call audio;
- the emotion data acquisition sub-module is used to process the text feature and at least one of the volume feature, speech rate feature, smooth feature, and pause feature based on the pre-built emotion recognition model to obtain the first Emotional data of the first caller corresponding to the call terminal.
- the second emotion recognition sub-module 142 is configured to recognize the second call audio and conversation text data based on a pre-built emotion recognition model to obtain the second caller's emotion data corresponding to the second call terminal.
- the first prompt module 150 is configured to generate and send to the first call terminal a first prompt for prompting the first caller to adjust emotions according to the type data of the call scene and the emotional data of the first caller. Prompt information.
- the first prompting module 150 includes:
- the first prompt rule sub-module 151 is used to analyze the type data of the call scene and the emotional data of the first caller to obtain the corresponding first prompt information based on the prompt rule engine with built-in prompt rules, and The first prompt information is sent to the first call terminal to prompt the first caller to adjust emotions.
- the first prompting module 150 includes:
- the first prompt generation sub-module 152 is configured to generate and report to the first prompt model according to the type data of the call scene, the emotional data of the first caller, and the dialog text data based on the pre-trained first prompt model.
- the call terminal sends first prompt information for prompting the first caller to adjust emotions.
- the second prompting module 160 is configured to generate and send to the first call terminal according to the type data of the call scene and the emotional data of the second caller for prompting the first caller to adjust the dialogue strategy to deal with The second prompt information of the emotion of the second caller.
- the second prompting module 160 includes:
- the second prompt rule sub-module 161 is configured to analyze the type data of the call scene and the emotional data of the second caller to obtain the corresponding second prompt information based on the prompt rule engine with built-in prompt rules, and The second prompt information is sent to the first call terminal to prompt the first caller to adjust the dialogue strategy to deal with the emotions of the second caller
- the second prompting module 160 includes:
- the second prompt generation sub-module 162 is configured to generate and report to the first prompt model based on the pre-trained second prompt model according to the type data of the call scene, the emotional data of the second caller, and the dialog text data.
- the call terminal sends second prompt information for prompting the first caller to adjust the dialogue strategy to deal with the emotion of the second caller.
- the method and device of this application can be used in many general or special computing system environments or configurations.
- the above-mentioned method and apparatus may be implemented in the form of a computer program, and the computer program may run on the computer device as shown in FIG. 14.
- FIG. 14 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
- the computer equipment can be a server or a terminal.
- the computer device includes a processor, a memory, and a network interface connected through a system bus, where the memory may include a non-volatile storage medium and an internal memory.
- the non-volatile storage medium can store an operating system and a computer program.
- the computer program includes program instructions, and when the program instructions are executed, the processor can execute any communication service method based on voice recognition.
- the processor is used to provide computing and control capabilities and support the operation of the entire computer equipment.
- the internal memory provides an environment for the operation of the computer program in the non-volatile storage medium.
- the processor can execute any communication service method based on voice recognition.
- the network interface is used for network communication, such as sending assigned tasks.
- the structure of the computer device is only a block diagram of a part of the structure related to the solution of the application, and does not constitute a limitation on the computer device to which the solution of the application is applied.
- the specific computer device may include More or fewer components are shown in the figure, or some components are combined, or have different component arrangements.
- the processor may be a central processing unit (Central Processing Unit, CPU), the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), and application specific integrated circuits (Application Specific Integrated Circuits). Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
- the processor is used to run a computer program stored in a memory to implement the following steps:
- the type data of the call scene and the emotion data of the second caller it is generated and sent to the first call terminal for prompting the first caller to adjust the dialogue strategy to deal with the second caller's emotions.
- the second prompt message is generated and sent to the first call terminal for prompting the first caller to adjust the dialogue strategy to deal with the second caller's emotions.
- the processor when the processor implements voice recognition on the first call audio and the second call audio to obtain dialog text data, it is specifically implemented: perform voice recognition on the first call audio to obtain the first call The first text corresponding to the person; perform voice recognition on the second call audio to obtain the second text corresponding to the second caller; sort the first text and the second text according to a preset sorting rule to obtain the dialogue text data.
- the processor realizes the recognition of the dialogue text data based on the pre-built scene recognition model to obtain the type data of the call scene, it is specifically realized: the scene rule engine based on the built-in scene judgment rule is used for the dialogue The text data is analyzed to obtain the type data of the call scene.
- the processor realizes the recognition of the dialogue text data based on a pre-built scene recognition model to obtain the type data of the call scene, it is specifically implemented: extracting text features in the dialogue text data; based on the trained The machine learning model recognizes the type data of the call scene according to the text features in the dialog text data.
- the processor realizes the recognition of the first call audio and the second call audio based on a pre-built emotion recognition model to obtain the emotion data of the first caller corresponding to the first call terminal and the
- the emotion data of the second caller corresponding to the second call terminal is implemented, it is specifically implemented: based on the pre-built emotion recognition model, the first call audio and dialogue text data are recognized to obtain the second call corresponding to the first call terminal. Emotional data of a caller; the second call audio and dialog text data are recognized based on a pre-built emotion recognition model to obtain the second caller's emotional data corresponding to the second call terminal.
- the specific implementation is : Extract at least one of the volume feature, the speaking rate feature, the smooth feature, and the pause feature from the first call audio; extract the text feature from the dialogue text data; based on the pre-built emotion recognition model, compare the text feature And at least one of the volume feature, the speaking rate feature, the smooth feature, and the pause feature are processed to obtain the emotion data of the first caller corresponding to the first call terminal.
- the processor generates and sends to the first call terminal a first call for prompting the first caller to adjust emotions according to the type data of the call scene and the emotion data of the first caller.
- prompting information the specific implementation is: based on the prompt rule engine with built-in prompt rules, the type data of the call scene and the emotional data of the first caller are analyzed to obtain the corresponding first prompt information, and the A prompt message is sent to the first call terminal to prompt the first caller to adjust their emotions; or specific implementation: based on a pre-trained first prompt model, according to the type data of the call scene, the first caller The emotion data of and the dialog text data are generated and sent to the first call terminal to prompt the first caller to adjust emotions.
- the processor when the processor implements sending first prompt information to the first call terminal or sends second prompt information to the first call terminal, it also implements: pause the first call terminal corresponding to the first call terminal.
- the call audio is sent to the second call terminal to shield the first prompt information or the second prompt information from the second caller.
- a computer-readable storage medium stores a computer program
- the computer program includes program instructions
- the processor executes the program instructions to implement any item provided in the embodiments of this application based on Voice recognition communication service method.
- the computer-readable storage medium may be the internal storage unit of the computer device described in the foregoing embodiment, such as the hard disk or memory of the computer device.
- the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (SMC), or a secure digital (Secure Digital, SD) equipped on the computer device. ) Card, Flash Card, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computer Networks & Wireless Communication (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Hospice & Palliative Care (AREA)
- Child & Adolescent Psychology (AREA)
- Artificial Intelligence (AREA)
- Environmental & Geological Engineering (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Abstract
Description
Claims (20)
- 一种基于语音识别的通信服务方法,其包括:A communication service method based on speech recognition, which includes:若第一通话终端与第二通话终端之间的通话接通,获取所述第一通话终端对应的第一通话音频和所述第二通话终端对应的第二通话音频;If the call between the first call terminal and the second call terminal is connected, acquiring the first call audio corresponding to the first call terminal and the second call audio corresponding to the second call terminal;对所述第一通话音频和所述第二通话音频进行语音识别以得到对话文本数据;Performing voice recognition on the first call audio and the second call audio to obtain dialogue text data;基于预先构建的场景识别模型对所述对话文本数据进行识别,以获取通话场景的类型数据;Recognizing the dialogue text data based on a pre-built scene recognition model to obtain type data of the call scene;基于预先构建的情绪识别模型对所述第一通话音频、第二通话音频进行识别,以获取所述第一通话终端对应的第一通话人的情绪数据、所述第二通话终端对应的第二通话人的情绪数据;Recognize the first call audio and the second call audio based on the pre-built emotion recognition model to obtain the emotion data of the first caller corresponding to the first call terminal and the second call corresponding to the second call terminal Emotional data of the caller;根据所述通话场景的类型数据和所述第一通话人的情绪数据生成并向所述第一通话终端发送用于提示所述第一通话人调整情绪的第一提示信息;Generating and sending first prompt information for prompting the first caller to adjust emotions to the first call terminal according to the type data of the call scene and the emotion data of the first caller;根据所述通话场景的类型数据和所述第二通话人的情绪数据生成并向所述第一通话终端发送用于提示所述第一通话人调整对话策略以应对所述第二通话人情绪的第二提示信息。According to the type data of the call scene and the emotion data of the second caller, it is generated and sent to the first call terminal for prompting the first caller to adjust the dialogue strategy to deal with the second caller's emotions. The second prompt message.
- 如权利要求1所述的通信服务方法,其中,所述对所述第一通话音频和所述第二通话音频进行语音识别以得到对话文本数据,包括:The communication service method according to claim 1, wherein said performing voice recognition on said first call audio and said second call audio to obtain dialogue text data comprises:对所述第一通话音频进行语音识别以得到第一通话人对应的第一文本;Performing voice recognition on the first call audio to obtain a first text corresponding to the first caller;对所述第二通话音频进行语音识别以得到第二通话人对应的第二文本;Performing voice recognition on the second call audio to obtain a second text corresponding to the second caller;根据预设排序规则对所述第一文本、第二文本排序,以得到对话文本数据。The first text and the second text are sorted according to a preset sorting rule to obtain dialogue text data.
- 如权利要求1所述的通信服务方法,其中,所述基于预先构建的场景识别模型对所述对话文本数据进行识别,以获取通话场景的类型数据,包括:The communication service method according to claim 1, wherein the recognizing the dialogue text data based on the pre-built scene recognition model to obtain the type data of the call scene comprises:基于内置场景判断规则的场景规则引擎,对所述对话文本数据进行分析以获取通话场景的类型数据;或者Based on a scene rule engine with built-in scene judgment rules, analyze the conversation text data to obtain type data of the call scene; or所述基于预先构建的场景识别模型对所述对话文本数据进行识别,以获取通话场景的类型数据,包括:The recognizing the dialogue text data based on the pre-built scene recognition model to obtain the type data of the call scene includes:抽取所述对话文本数据中的文本特征;Extracting text features in the dialogue text data;基于训练好的机器学习模型,根据所述对话文本数据中的文本特征识别出通话场景的类型数据。Based on the trained machine learning model, the type data of the call scene is identified according to the text features in the dialogue text data.
- 如权利要求1所述的通信服务方法,其中,所述基于预先构建的情绪识别模型对所述第一通话音频、第二通话音频进行识别,以获取所述第一通话终端对应的第一通话人的情绪数据、所述第二通话终端对应的第二通话人的情绪数据,包括:The communication service method of claim 1, wherein the first call audio and the second call audio are recognized based on the pre-built emotion recognition model to obtain the first call corresponding to the first call terminal The emotional data of the person and the emotional data of the second caller corresponding to the second call terminal include:基于预先构建的情绪识别模型对所述第一通话音频和对话文本数据进行识别,以获取所述第一通话终端对应的第一通话人的情绪数据;Recognizing the first call audio and dialogue text data based on a pre-built emotion recognition model to obtain the emotion data of the first caller corresponding to the first call terminal;基于预先构建的情绪识别模型对所述第二通话音频和对话文本数据进行识别,以获取所述第二通话终端对应的第二通话人的情绪数据。Recognizing the second call audio and dialogue text data based on a pre-built emotion recognition model to obtain the emotion data of the second caller corresponding to the second call terminal.
- 如权利要求4所述的通信服务方法,其中,所述基于预先构建的情绪识别模型对所述第一通话音频和对话文本数据进行识别,以获取所述第一通话终端对应的第一通话人的情绪数据,包括:The communication service method according to claim 4, wherein the first call audio and conversation text data are recognized based on the pre-built emotion recognition model to obtain the first caller corresponding to the first call terminal Sentiment data, including:从所述第一通话音频提取音量特征、语速特征、顺畅特征、停顿特征中的至少一种;Extracting at least one of a volume feature, a speech rate feature, a smooth feature, and a pause feature from the first call audio;从所述对话文本数据提取文本特征;Extracting text features from the dialogue text data;基于预先构建的情绪识别模型,对所述文本特征以及所述音量特征、语速特征、顺畅特征、停顿特征中的至少一种进行处理,以得到所述第一通话终端对应的第一通话人的情绪数据。Based on the pre-built emotion recognition model, process the text feature and at least one of the volume feature, speech rate feature, smooth feature, and pause feature to obtain the first caller corresponding to the first call terminal Sentiment data.
- 如权利要求1所述的通信服务方法,其中,所述根据所述通话场景的类型数据和所述第一通话人的情绪数据生成并向所述第一通话终端发送用于提示所述第一通话人调整情绪的第一提示信息,包括:The communication service method according to claim 1, wherein the generating and sending to the first calling terminal is used to prompt the first call based on the type data of the call scene and the emotional data of the first caller The first reminder information for the caller to adjust their emotions, including:基于内置提示规则的提示规则引擎,对所述通话场景的类型数据和所述第一通话人的情绪数据进行分析以获取对应的第一提示信息,并将所述第一提示信息向所述第一通话终端发送以提示所述第一通话人调整情绪;或者Based on the prompt rule engine with built-in prompt rules, it analyzes the type data of the call scene and the emotional data of the first caller to obtain corresponding first prompt information, and sends the first prompt information to the first Sent by a call terminal to prompt the first caller to adjust emotions; or基于预先训练的第一提示模型,根据所述通话场景的类型数据、所述第一通话人的情绪数据以及所述对话文本数据生成并向所述第一通话终端发送用于提示所述第一通话人调整情绪的第一提示信息。Based on the pre-trained first prompt model, generate and send to the first call terminal according to the type data of the call scene, the emotional data of the first caller, and the dialog text data for prompting the first call The first message for the caller to adjust his emotions.
- 如权利要求1所述的通信服务方法,其中,所述向所述第一通话终端发送第一提示信息或向所述第一通话终端发送第二提示信息时,暂停将所述第一通话终端对应的第一通话音频向所述第二通话终端发送以对所述第二通话人屏蔽所述第一提示信息或第二提示信息。The communication service method according to claim 1, wherein when the first prompt information is sent to the first call terminal or the second prompt information is sent to the first call terminal, the first call terminal is suspended. The corresponding first call audio is sent to the second call terminal to shield the first prompt information or the second prompt information from the second caller.
- 一种基于语音识别的通信服务装置,其包括:A communication service device based on voice recognition, which includes:音频获取模块,用于若第一通话终端与第二通话终端之间的通话接通,获取所述第一通话终端对应的第一通话音频和所述第二通话终端对应的第二通话音频;An audio acquisition module, configured to obtain a first call audio corresponding to the first call terminal and a second call audio corresponding to the second call terminal if the call between the first call terminal and the second call terminal is connected;语音识别模块,用于对所述第一通话音频和所述第二通话音频进行语音识别以得到对话文本数据;A voice recognition module, configured to perform voice recognition on the first call audio and the second call audio to obtain dialogue text data;场景识别模块,用于基于预先构建的场景识别模型对所述对话文本数据进行识别,以获取通话场景的类型数据;A scene recognition module, configured to recognize the dialogue text data based on a pre-built scene recognition model to obtain type data of the call scene;情绪识别模块,用于基于预先构建的情绪识别模型对所述第一通话音频、第二通话音频进行识别,以获取所述第一通话终端对应的第一通话人的情绪数据、所述第二通话终端对应的第二通话人的情绪数据;The emotion recognition module is configured to recognize the first call audio and the second call audio based on a pre-built emotion recognition model to obtain the emotion data of the first caller and the second call corresponding to the first call terminal Emotional data of the second caller corresponding to the call terminal;第一提示模块,用于根据所述通话场景的类型数据和所述第一通话人的情绪数据生成并向所述第一通话终端发送用于提示所述第一通话人调整情绪的第一提示信息;The first prompting module is configured to generate and send a first prompt for prompting the first caller to adjust emotions to the first call terminal according to the type data of the call scene and the emotional data of the first caller information;第二提示模块,用于根据所述通话场景的类型数据和所述第二通话人的情绪数据生成并向所述第一通话终端发送用于提示所述第一通话人调整对话策略以应对所述第二通话人情绪的第二提示信息。The second prompting module is configured to generate and send to the first call terminal according to the type data of the call scene and the emotional data of the second caller for prompting the first caller to adjust the dialogue strategy to deal with the situation. The second prompt message describing the emotion of the second caller.
- 一种计算机设备,其中,所述计算机设备包括存储器和处理器;A computer device, wherein the computer device includes a memory and a processor;所述存储器用于存储计算机程序;The memory is used to store computer programs;所述处理器,用于执行所述计算机程序并在执行所述计算机程序时实现如下步骤:The processor is configured to execute the computer program and implement the following steps when executing the computer program:若第一通话终端与第二通话终端之间的通话接通,获取所述第一通话终端对应的第一通话音频和所述第二通话终端对应的第二通话音频;If the call between the first call terminal and the second call terminal is connected, acquiring the first call audio corresponding to the first call terminal and the second call audio corresponding to the second call terminal;对所述第一通话音频和所述第二通话音频进行语音识别以得到对话文本数据;Performing voice recognition on the first call audio and the second call audio to obtain dialogue text data;基于预先构建的场景识别模型对所述对话文本数据进行识别,以获取通话场景的类型数据;Recognizing the dialogue text data based on a pre-built scene recognition model to obtain type data of the call scene;基于预先构建的情绪识别模型对所述第一通话音频、第二通话音频进行识别,以获取所述第一通话终端对应的第一通话人的情绪数据、所述第二通话终端对应的第二通话人的情绪数据;Recognize the first call audio and the second call audio based on the pre-built emotion recognition model to obtain the emotion data of the first caller corresponding to the first call terminal and the second call corresponding to the second call terminal Emotional data of the caller;根据所述通话场景的类型数据和所述第一通话人的情绪数据生成并向所述第一通话终端发送用于提示所述第一通话人调整情绪的第一提示信息;Generating and sending first prompt information for prompting the first caller to adjust emotions to the first call terminal according to the type data of the call scene and the emotion data of the first caller;根据所述通话场景的类型数据和所述第二通话人的情绪数据生成并向所述第一通话终端发送用于提示所述第一通话人调整对话策略以应对所述第二通话人情绪的第二提示信息。According to the type data of the call scene and the emotion data of the second caller, it is generated and sent to the first call terminal for prompting the first caller to adjust the dialogue strategy to deal with the second caller's emotions. The second prompt message.
- 如权利要求9所述的计算机设备,其中,所述处理器在实现所述对所述第一通话音频和所述第二通话音频进行语音识别以得到对话文本数据时,用于实现如下步骤:9. The computer device according to claim 9, wherein the processor is configured to implement the following steps when performing voice recognition on the first call audio and the second call audio to obtain dialog text data:对所述第一通话音频进行语音识别以得到第一通话人对应的第一文本;Performing voice recognition on the first call audio to obtain a first text corresponding to the first caller;对所述第二通话音频进行语音识别以得到第二通话人对应的第二文本;Performing voice recognition on the second call audio to obtain a second text corresponding to the second caller;根据预设排序规则对所述第一文本、第二文本排序,以得到对话文本数据。The first text and the second text are sorted according to a preset sorting rule to obtain dialogue text data.
- 如权利要求9所述的计算机设备,其中,所述处理器在实现所述基于预先构建的场 景识别模型对所述对话文本数据进行识别,以获取通话场景的类型数据时,用于实现如下步骤:The computer device according to claim 9, wherein the processor is configured to implement the following steps when implementing the recognition of the dialogue text data based on the pre-built scene recognition model to obtain the type data of the call scene :基于内置场景判断规则的场景规则引擎,对所述对话文本数据进行分析以获取通话场景的类型数据;或者Based on a scene rule engine with built-in scene judgment rules, analyze the conversation text data to obtain type data of the call scene; or所述基于预先构建的场景识别模型对所述对话文本数据进行识别,以获取通话场景的类型数据,包括:The recognizing the dialogue text data based on the pre-built scene recognition model to obtain the type data of the call scene includes:抽取所述对话文本数据中的文本特征;Extracting text features in the dialogue text data;基于训练好的机器学习模型,根据所述对话文本数据中的文本特征识别出通话场景的类型数据。Based on the trained machine learning model, the type data of the call scene is identified according to the text features in the dialogue text data.
- 如权利要求9所述的计算机设备,其中,所述处理器在实现所述基于预先构建的情绪识别模型对所述第一通话音频、第二通话音频进行识别,以获取所述第一通话终端对应的第一通话人的情绪数据、所述第二通话终端对应的第二通话人的情绪数据时,用于实现如下步骤:The computer device according to claim 9, wherein the processor is implementing the recognition of the first call audio and the second call audio based on the pre-built emotion recognition model to obtain the first call terminal Corresponding to the emotional data of the first caller and the emotional data of the second caller corresponding to the second call terminal, it is used to implement the following steps:基于预先构建的情绪识别模型对所述第一通话音频和对话文本数据进行识别,以获取所述第一通话终端对应的第一通话人的情绪数据;Recognizing the first call audio and dialogue text data based on a pre-built emotion recognition model to obtain the emotion data of the first caller corresponding to the first call terminal;基于预先构建的情绪识别模型对所述第二通话音频和对话文本数据进行识别,以获取所述第二通话终端对应的第二通话人的情绪数据。Recognizing the second call audio and dialogue text data based on a pre-built emotion recognition model to obtain the emotion data of the second caller corresponding to the second call terminal.
- 如权利要求12所述的计算机设备,其中,所述处理器在实现所述基于预先构建的情绪识别模型对所述第一通话音频和对话文本数据进行识别,以获取所述第一通话终端对应的第一通话人的情绪数据时,用于实现如下步骤:The computer device according to claim 12, wherein the processor recognizes the first call audio and dialogue text data based on the pre-built emotion recognition model, so as to obtain the corresponding data of the first call terminal. The emotional data of the first caller is used to implement the following steps:从所述第一通话音频提取音量特征、语速特征、顺畅特征、停顿特征中的至少一种;Extracting at least one of a volume feature, a speech rate feature, a smooth feature, and a pause feature from the first call audio;从所述对话文本数据提取文本特征;Extracting text features from the dialogue text data;基于预先构建的情绪识别模型,对所述文本特征以及所述音量特征、语速特征、顺畅特征、停顿特征中的至少一种进行处理,以得到所述第一通话终端对应的第一通话人的情绪数据。Based on the pre-built emotion recognition model, process the text feature and at least one of the volume feature, speech rate feature, smooth feature, and pause feature to obtain the first caller corresponding to the first call terminal Sentiment data.
- 如权利要求9所述的计算机设备,其中,所述处理器在实现所述根据所述通话场景的类型数据和所述第一通话人的情绪数据生成并向所述第一通话终端发送用于提示所述第一通话人调整情绪的第一提示信息时,用于实现如下步骤:The computer device according to claim 9, wherein the processor is configured to generate and send to the first call terminal according to the type data of the call scene and the emotional data of the first caller. When the first prompt message prompting the first caller to adjust emotions is used to implement the following steps:基于内置提示规则的提示规则引擎,对所述通话场景的类型数据和所述第一通话人的情绪数据进行分析以获取对应的第一提示信息,并将所述第一提示信息向所述第一通话终端发送以提示所述第一通话人调整情绪;或者Based on the prompt rule engine with built-in prompt rules, it analyzes the type data of the call scene and the emotional data of the first caller to obtain corresponding first prompt information, and sends the first prompt information to the first Sent by a call terminal to prompt the first caller to adjust emotions; or基于预先训练的第一提示模型,根据所述通话场景的类型数据、所述第一通话人的情绪数据以及所述对话文本数据生成并向所述第一通话终端发送用于提示所述第一通话人调整情绪的第一提示信息。Based on the pre-trained first prompt model, generate and send to the first call terminal according to the type data of the call scene, the emotional data of the first caller, and the dialog text data for prompting the first call The first message for the caller to adjust his emotions.
- 如权利要求9所述的计算机设备,其中,所述处理器在实现所述向所述第一通话终端发送第一提示信息或向所述第一通话终端发送第二提示信息时,用于实现如下步骤:暂停将所述第一通话终端对应的第一通话音频向所述第二通话终端发送以对所述第二通话人屏蔽所述第一提示信息或第二提示信息。The computer device according to claim 9, wherein the processor is configured to implement the sending of the first prompt information to the first call terminal or the second prompt information to the first call terminal The following steps are as follows: suspend sending the first call audio corresponding to the first call terminal to the second call terminal to shield the first prompt information or the second prompt information from the second caller.
- 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中:若所述计算机程序被处理器执行,实现如下步骤:A computer-readable storage medium storing a computer program, wherein: if the computer program is executed by a processor, the following steps are implemented:若第一通话终端与第二通话终端之间的通话接通,获取所述第一通话终端对应的第一通话音频和所述第二通话终端对应的第二通话音频;If the call between the first call terminal and the second call terminal is connected, acquiring the first call audio corresponding to the first call terminal and the second call audio corresponding to the second call terminal;对所述第一通话音频和所述第二通话音频进行语音识别以得到对话文本数据;Performing voice recognition on the first call audio and the second call audio to obtain dialogue text data;基于预先构建的场景识别模型对所述对话文本数据进行识别,以获取通话场景的类型数据;Recognizing the dialogue text data based on a pre-built scene recognition model to obtain type data of the call scene;基于预先构建的情绪识别模型对所述第一通话音频、第二通话音频进行识别,以获取所述第一通话终端对应的第一通话人的情绪数据、所述第二通话终端对应的第二通话人的情绪数据;Recognize the first call audio and the second call audio based on the pre-built emotion recognition model to obtain the emotion data of the first caller corresponding to the first call terminal and the second call corresponding to the second call terminal Emotional data of the caller;根据所述通话场景的类型数据和所述第一通话人的情绪数据生成并向所述第一通话终端发送用于提示所述第一通话人调整情绪的第一提示信息;Generating and sending first prompt information for prompting the first caller to adjust emotions to the first call terminal according to the type data of the call scene and the emotion data of the first caller;根据所述通话场景的类型数据和所述第二通话人的情绪数据生成并向所述第一通话终端发送用于提示所述第一通话人调整对话策略以应对所述第二通话人情绪的第二提示信息。According to the type data of the call scene and the emotion data of the second caller, it is generated and sent to the first call terminal for prompting the first caller to adjust the dialogue strategy to deal with the second caller's emotions. The second prompt message.
- 如权利要求16所述的存储介质,其中,所述处理器在实现所述对所述第一通话音频和所述第二通话音频进行语音识别以得到对话文本数据时,用于实现如下步骤:16. The storage medium of claim 16, wherein the processor is configured to implement the following steps when performing voice recognition on the first call audio and the second call audio to obtain dialog text data:对所述第一通话音频进行语音识别以得到第一通话人对应的第一文本;Performing voice recognition on the first call audio to obtain a first text corresponding to the first caller;对所述第二通话音频进行语音识别以得到第二通话人对应的第二文本;Performing voice recognition on the second call audio to obtain a second text corresponding to the second caller;根据预设排序规则对所述第一文本、第二文本排序,以得到对话文本数据。The first text and the second text are sorted according to a preset sorting rule to obtain dialogue text data.
- 如权利要求16所述的存储介质,其中,所述处理器在实现所述基于预先构建的场景识别模型对所述对话文本数据进行识别,以获取通话场景的类型数据时,用于实现如下步骤:The storage medium according to claim 16, wherein the processor is configured to implement the following steps when implementing the recognition of the dialogue text data based on the pre-built scene recognition model to obtain the type data of the call scene :基于内置场景判断规则的场景规则引擎,对所述对话文本数据进行分析以获取通话场景的类型数据;或者Based on a scene rule engine with built-in scene judgment rules, analyze the conversation text data to obtain type data of the call scene; or所述基于预先构建的场景识别模型对所述对话文本数据进行识别,以获取通话场景的类型数据,包括:The recognizing the dialogue text data based on the pre-built scene recognition model to obtain the type data of the call scene includes:抽取所述对话文本数据中的文本特征;Extracting text features in the dialogue text data;基于训练好的机器学习模型,根据所述对话文本数据中的文本特征识别出通话场景的类型数据。Based on the trained machine learning model, the type data of the call scene is identified according to the text features in the dialogue text data.
- 如权利要求16所述的存储介质,其中,所述处理器在实现所述基于预先构建的情绪识别模型对所述第一通话音频、第二通话音频进行识别,以获取所述第一通话终端对应的第一通话人的情绪数据、所述第二通话终端对应的第二通话人的情绪数据时,用于实现如下步骤:The storage medium according to claim 16, wherein the processor is implementing the recognition of the first call audio and the second call audio based on the pre-built emotion recognition model to obtain the first call terminal Corresponding to the emotional data of the first caller and the emotional data of the second caller corresponding to the second call terminal, it is used to implement the following steps:基于预先构建的情绪识别模型对所述第一通话音频和对话文本数据进行识别,以获取所述第一通话终端对应的第一通话人的情绪数据;Recognizing the first call audio and dialogue text data based on a pre-built emotion recognition model to obtain the emotion data of the first caller corresponding to the first call terminal;基于预先构建的情绪识别模型对所述第二通话音频和对话文本数据进行识别,以获取所述第二通话终端对应的第二通话人的情绪数据;Recognizing the second call audio and dialogue text data based on a pre-built emotion recognition model to obtain the second caller's emotion data corresponding to the second call terminal;其中,所述处理器在实现所述基于预先构建的情绪识别模型对所述第一通话音频和对话文本数据进行识别,以获取所述第一通话终端对应的第一通话人的情绪数据时,用于实现如下步骤:Wherein, when the processor realizes the recognition of the first call audio and dialogue text data based on the pre-built emotion recognition model to obtain the emotion data of the first caller corresponding to the first call terminal, Used to implement the following steps:从所述第一通话音频提取音量特征、语速特征、顺畅特征、停顿特征中的至少一种;Extracting at least one of a volume feature, a speech rate feature, a smooth feature, and a pause feature from the first call audio;从所述对话文本数据提取文本特征;Extracting text features from the dialogue text data;基于预先构建的情绪识别模型,对所述文本特征以及所述音量特征、语速特征、顺畅特征、停顿特征中的至少一种进行处理,以得到所述第一通话终端对应的第一通话人的情绪数据。Based on the pre-built emotion recognition model, process the text feature and at least one of the volume feature, speech rate feature, smooth feature, and pause feature to obtain the first caller corresponding to the first call terminal Sentiment data.
- 如权利要求16所述的存储介质,其中,所述处理器在实现所述根据所述通话场景的类型数据和所述第一通话人的情绪数据生成并向所述第一通话终端发送用于提示所述第一通话人调整情绪的第一提示信息时,用于实现如下步骤:The storage medium according to claim 16, wherein the processor is configured to generate data based on the type data of the call scene and the emotional data of the first caller and send to the first call terminal When the first prompt message prompting the first caller to adjust emotions is used to implement the following steps:基于内置提示规则的提示规则引擎,对所述通话场景的类型数据和所述第一通话人的情绪数据进行分析以获取对应的第一提示信息,并将所述第一提示信息向所述第一通话终端发送以提示所述第一通话人调整情绪;或者Based on the prompt rule engine with built-in prompt rules, it analyzes the type data of the call scene and the emotional data of the first caller to obtain corresponding first prompt information, and sends the first prompt information to the first Sent by a call terminal to prompt the first caller to adjust emotions; or基于预先训练的第一提示模型,根据所述通话场景的类型数据、所述第一通话人的情绪数据以及所述对话文本数据生成并向所述第一通话终端发送用于提示所述第一通话人调整情 绪的第一提示信息。Based on the pre-trained first prompt model, generate and send to the first call terminal according to the type data of the call scene, the emotional data of the first caller, and the dialog text data for prompting the first call The first message for the caller to adjust their emotions.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910523567 | 2019-06-17 | ||
CN201910523567.X | 2019-06-17 | ||
CN201910605732.6 | 2019-07-05 | ||
CN201910605732.6A CN110444229A (en) | 2019-06-17 | 2019-07-05 | Communication service method, device, computer equipment and storage medium based on speech recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020253128A1 true WO2020253128A1 (en) | 2020-12-24 |
Family
ID=68429455
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/122167 WO2020253128A1 (en) | 2019-06-17 | 2019-11-29 | Voice recognition-based communication service method, apparatus, computer device, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110444229A (en) |
WO (1) | WO2020253128A1 (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110444229A (en) * | 2019-06-17 | 2019-11-12 | 深圳壹账通智能科技有限公司 | Communication service method, device, computer equipment and storage medium based on speech recognition |
CN111309715B (en) * | 2020-01-15 | 2023-04-18 | 腾讯科技(深圳)有限公司 | Call scene identification method and device |
CN113316041B (en) * | 2020-02-27 | 2023-08-01 | 阿里巴巴集团控股有限公司 | Remote health detection system, method, device and equipment |
CN111580773B (en) * | 2020-04-15 | 2023-11-14 | 北京小米松果电子有限公司 | Information processing method, device and storage medium |
CN112995422A (en) * | 2021-02-07 | 2021-06-18 | 成都薯片科技有限公司 | Call control method and device, electronic equipment and storage medium |
CN113037610B (en) * | 2021-02-25 | 2022-08-19 | 腾讯科技(深圳)有限公司 | Voice data processing method and device, computer equipment and storage medium |
CN115204127B (en) * | 2022-09-19 | 2023-01-06 | 深圳市北科瑞声科技股份有限公司 | Form filling method, device, equipment and medium based on remote flow adjustment |
CN116682414B (en) * | 2023-06-06 | 2024-01-30 | 安徽迪科数金科技有限公司 | Dialect voice recognition system based on big data |
CN116631451B (en) * | 2023-06-25 | 2024-02-06 | 安徽迪科数金科技有限公司 | Voice emotion recognition system and method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170270922A1 (en) * | 2015-11-18 | 2017-09-21 | Shenzhen Skyworth-Rgb Electronic Co., Ltd. | Smart home control method based on emotion recognition and the system thereof |
CN108536802A (en) * | 2018-03-30 | 2018-09-14 | 百度在线网络技术(北京)有限公司 | Exchange method based on children's mood and device |
CN108962219A (en) * | 2018-06-29 | 2018-12-07 | 百度在线网络技术(北京)有限公司 | Method and apparatus for handling text |
CN109587360A (en) * | 2018-11-12 | 2019-04-05 | 平安科技(深圳)有限公司 | Electronic device should talk with art recommended method and computer readable storage medium |
CN110444229A (en) * | 2019-06-17 | 2019-11-12 | 深圳壹账通智能科技有限公司 | Communication service method, device, computer equipment and storage medium based on speech recognition |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105991849B (en) * | 2015-02-13 | 2019-03-01 | 华为技术有限公司 | One kind is attended a banquet method of servicing, apparatus and system |
US10158758B2 (en) * | 2016-11-02 | 2018-12-18 | International Business Machines Corporation | System and method for monitoring and visualizing emotions in call center dialogs at call centers |
CN107423364B (en) * | 2017-06-22 | 2024-01-26 | 百度在线网络技术(北京)有限公司 | Method, device and storage medium for answering operation broadcasting based on artificial intelligence |
CN108922564B (en) * | 2018-06-29 | 2021-05-07 | 北京百度网讯科技有限公司 | Emotion recognition method and device, computer equipment and storage medium |
CN108962255B (en) * | 2018-06-29 | 2020-12-08 | 北京百度网讯科技有限公司 | Emotion recognition method, emotion recognition device, server and storage medium for voice conversation |
CN109767791B (en) * | 2019-03-21 | 2021-03-30 | 中国—东盟信息港股份有限公司 | Voice emotion recognition and application system for call center calls |
-
2019
- 2019-07-05 CN CN201910605732.6A patent/CN110444229A/en active Pending
- 2019-11-29 WO PCT/CN2019/122167 patent/WO2020253128A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170270922A1 (en) * | 2015-11-18 | 2017-09-21 | Shenzhen Skyworth-Rgb Electronic Co., Ltd. | Smart home control method based on emotion recognition and the system thereof |
CN108536802A (en) * | 2018-03-30 | 2018-09-14 | 百度在线网络技术(北京)有限公司 | Exchange method based on children's mood and device |
CN108962219A (en) * | 2018-06-29 | 2018-12-07 | 百度在线网络技术(北京)有限公司 | Method and apparatus for handling text |
CN109587360A (en) * | 2018-11-12 | 2019-04-05 | 平安科技(深圳)有限公司 | Electronic device should talk with art recommended method and computer readable storage medium |
CN110444229A (en) * | 2019-06-17 | 2019-11-12 | 深圳壹账通智能科技有限公司 | Communication service method, device, computer equipment and storage medium based on speech recognition |
Also Published As
Publication number | Publication date |
---|---|
CN110444229A (en) | 2019-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020253128A1 (en) | Voice recognition-based communication service method, apparatus, computer device, and storage medium | |
Schuller et al. | The INTERSPEECH 2021 computational paralinguistics challenge: COVID-19 cough, COVID-19 speech, escalation & primates | |
CN112804400B (en) | Customer service call voice quality inspection method and device, electronic equipment and storage medium | |
CN111164601B (en) | Emotion recognition method, intelligent device and computer readable storage medium | |
US11769492B2 (en) | Voice conversation analysis method and apparatus using artificial intelligence | |
US11475897B2 (en) | Method and apparatus for response using voice matching user category | |
CN109767765A (en) | Talk about art matching process and device, storage medium, computer equipment | |
WO2022005661A1 (en) | Detecting user identity in shared audio source contexts | |
EP3617946B1 (en) | Context acquisition method and device based on voice interaction | |
US10750018B2 (en) | Modeling voice calls to improve an outcome of a call between a representative and a customer | |
US10110743B2 (en) | Automatic pattern recognition in conversations | |
CN104538043A (en) | Real-time emotion reminder for call | |
CN112949708B (en) | Emotion recognition method, emotion recognition device, computer equipment and storage medium | |
CN110188361A (en) | Speech intention recognition methods and device in conjunction with text, voice and emotional characteristics | |
CN107316635B (en) | Voice recognition method and device, storage medium and electronic equipment | |
CN113096647B (en) | Voice model training method and device and electronic equipment | |
CN115083434B (en) | Emotion recognition method and device, computer equipment and storage medium | |
CN108920640A (en) | Context acquisition methods and equipment based on interactive voice | |
CN112581938B (en) | Speech breakpoint detection method, device and equipment based on artificial intelligence | |
CN107085717A (en) | A kind of family's monitoring method, service end and computer-readable recording medium | |
CN113129866B (en) | Voice processing method, device, storage medium and computer equipment | |
CN109961152B (en) | Personalized interaction method and system of virtual idol, terminal equipment and storage medium | |
CN112910761B (en) | Instant messaging method, device, equipment, storage medium and program product | |
US10282417B2 (en) | Conversational list management | |
CN110298150B (en) | Identity verification method and system based on voice recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19933410 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19933410 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.08.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19933410 Country of ref document: EP Kind code of ref document: A1 |