CN111210818B - Word acquisition method and device matched with emotion polarity and electronic equipment - Google Patents

Word acquisition method and device matched with emotion polarity and electronic equipment Download PDF

Info

Publication number
CN111210818B
CN111210818B CN201911419689.0A CN201911419689A CN111210818B CN 111210818 B CN111210818 B CN 111210818B CN 201911419689 A CN201911419689 A CN 201911419689A CN 111210818 B CN111210818 B CN 111210818B
Authority
CN
China
Prior art keywords
user
emotion
facial expression
matched
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911419689.0A
Other languages
Chinese (zh)
Other versions
CN111210818A (en
Inventor
路璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201911419689.0A priority Critical patent/CN111210818B/en
Publication of CN111210818A publication Critical patent/CN111210818A/en
Priority to PCT/CN2020/100549 priority patent/WO2021135140A1/en
Application granted granted Critical
Publication of CN111210818B publication Critical patent/CN111210818B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Child & Adolescent Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses a word acquisition method matched with emotion polarity, belongs to the technical field of data processing, and is beneficial to improving the word acquisition efficiency based on emotion polarity. The method comprises the following steps: the method comprises the steps of acquiring voice of a first user and a facial image of a second user in a conversation process of the first user and the second user; determining each facial expression of the second user at different time in the conversation process and the speaking text of the first user at corresponding time by performing expression recognition on the facial image of the second user; and determining words matched with the preset emotion polarity of the second user according to the text corresponding to each facial expression of the second user, automatically acquiring the facial expression of one party and the voice of the other party in the conversation process of the two users, and accurately determining the words enabling the user to generate positive emotion and negative emotion based on the words of the other party when the facial expression of the user occurs.

Description

Word acquisition method and device matched with emotion polarity and electronic equipment
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to a word acquisition method and device for matching emotion polarities, electronic equipment and a computer-readable storage medium.
Background
In daily life, the definition of emotional polarity of words by people includes: positive words, negative words and neutral words. In combination with application scenarios such as information push, the emotion polarity of a word can be divided into: both positive and negative emotions, such as words of interest to the user and words of no interest to the user. Words with different emotion polarities are determined accurately, which is important in many application scenarios. For example, in an information push application, by identifying words of interest to a user and words of no interest to the user included in information, it can be determined which information is pushed to the user. For another example, in the intelligent conversation process, words in which the user is interested can be output for the user, so as to improve the user experience. In the prior art, words which are interesting to users and words which are not interesting to users in different application scenes are determined manually according to language use experience. With the difference between word transfer and application scenes, the emotional polarity of the words can also change, and for different users, the words of interest may also be different, and the way of manually determining and matching words with different emotional polarities is not only inefficient, but also accuracy and timeliness cannot meet the requirements of application scenes and users for rapid change.
Therefore, the method for efficiently acquiring and matching words with different emotion polarities is a problem to be solved.
Disclosure of Invention
The embodiment of the application provides a word acquisition method matched with emotion polarity, which is beneficial to improving the acquisition efficiency of words based on emotion polarity.
In order to solve the above problem, in a first aspect, an embodiment of the present application provides a word collecting method for matching emotion polarities, including:
step S1, acquiring the voice of a first user and the face image of a second user in the process of the dialogue between the first user and the second user;
step S2, determining each facial expression of the second user occurring at different time in the conversation process by performing expression recognition on the facial image of the second user;
step S3, matching each facial expression of the second user with a text obtained by converting the voice of the first user according to the occurrence time of each facial expression and the occurrence time of the voice, and determining the text corresponding to each facial expression;
step S4, determining words matching the preset emotion polarity of the second user according to the text corresponding to each facial expression of the second user.
In a second aspect, an embodiment of the present application provides a word collecting device for matching emotion polarities, including:
the voice and facial image acquisition module is used for acquiring the voice of a first user and the facial image of a second user in the conversation process of the first user and the second user;
the facial expression determining module is used for determining each facial expression of the second user at different time in the conversation process by performing expression recognition on the facial image of the second user;
the facial expression and voice matching module is used for matching each facial expression of the second user with a text obtained by voice conversion of the first user according to the occurrence time of each facial expression and the occurrence time of the voice, and determining the text corresponding to each facial expression;
and the word determining module is used for determining words matched with the preset emotion polarity of the second user according to the text corresponding to each facial expression of the second user.
In a third aspect, an embodiment of the present application further discloses an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the word collection method for matching emotion polarities according to the embodiment of the present application when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor, and the steps of the word collection method for matching emotion polarity disclosed in the embodiments of the present application are performed.
According to the word acquisition method for matching emotion polarities, in the conversation process of a first user and a second user, the voice of the first user and the facial image of the second user are acquired; determining each facial expression of the second user at different time in the conversation process by performing expression recognition on the facial image of the second user; matching each facial expression of the second user with a text obtained by voice conversion of the first user according to the occurrence time of each facial expression and the occurrence time of the voice, and determining the text corresponding to each facial expression; and determining words matched with the preset emotion polarity of the second user according to the text corresponding to each facial expression of the second user, so that the collection efficiency of words based on emotion polarity can be improved. According to the word acquisition method matched with the emotion polarity, the facial expression of one party and the voice of the other party in the conversation process of two users are automatically acquired, and words enabling the users to generate positive emotions and negative emotions can be accurately determined based on the words of the other party when the facial expression of the users occurs.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
FIG. 1 is a flowchart of a word collection method for matching emotion polarities according to a first embodiment of the present application;
FIG. 2 is a flowchart of a word collection method for matching emotion polarities according to a second embodiment of the present application;
FIG. 3 is a schematic structural diagram of a word acquisition device for matching emotion polarities in an embodiment of the present application;
FIG. 4 is a second schematic structural diagram of a word acquisition device for matching emotion polarities according to a third embodiment of the present application;
FIG. 5 is a third schematic structural diagram of a word acquisition device with four matching emotion polarities according to an embodiment of the present application;
FIG. 6 schematically shows a block diagram of an electronic device for performing a method according to the present application; and
fig. 7 schematically shows a storage unit for holding or carrying program code implementing a method according to the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Example one
The embodiment of the application discloses a word acquisition method for matching emotion polarity, as shown in fig. 1, the method includes: step S1 to step S4.
Step S1, acquiring a voice of a first user and a facial image of a second user during a dialog between the first user and the second user.
In some embodiments of the present application, the voice of the first user is stored in a form of a voice file in a cloud server of the word capturing platform, and the facial image is stored in a form of an image file in the cloud server. In some embodiments of the present application, each face image has an acquisition time; each voice file has a collection time.
The facial image file can be a facial image file uploaded at a client registered in the word acquisition platform in advance, or a facial image file extracted by the word acquisition platform from a video file uploaded at a registered client of the word acquisition platform; the voice file can be a voice file uploaded by a registered client of the word acquisition platform, and can also be a voice file extracted by the word acquisition platform from a video file uploaded by the registered client of the word acquisition platform.
The word acquisition method for matching emotion polarities is suitable for scenes in which facial expressions and voices of both parties of a conversation can be acquired. For example, in a video conversation scene, voice and video images of both parties of a conversation participant are collected, and words matching the emotional polarity of the other party are collected based on the matching relationship between the voice of one party and the expression of the other party. For another example, in a conversation scene between a salesperson and a customer in various stores online, the voice of the salesperson and the expression of the customer are collected, and words matching the emotion polarity of the customer are collected based on the matching relationship between the voice of the salesperson and the expression of the customer. For another example, in a scene in which the intelligent robot has a conversation with a real person, words matching the emotion polarity of the real person are collected based on the matching relationship between the voice of the intelligent robot and the facial expression of the real person by collecting the voice of the intelligent robot and the facial expression of the real person.
In the embodiment of the application, for different application scenes, corresponding technical means are adopted to collect the voice and the facial images of two conversation parties. The voice and the face image may be collected by the same device or may be collected by different devices.
Taking a video session scenario as an example, a voice file and a video file of a current one of two parties of a session may be collected by an electronic device running a session application. For example, when a salesperson (i.e. a first user) and a customer (i.e. a second user) perform online video and conversation through an application client on an electronic device, the voice of the salesperson is collected through a microphone of the electronic device of the salesperson, and a voice file is generated; and simultaneously, acquiring the video image stream of the customer through the application client to generate a video file.
In some embodiments of the present application, the voice file and the video file collected by the electronic device may be uploaded to the word collection platform in an associated manner, the word collection platform extracts facial images of the customer at different times from the video file, generates a plurality of facial image files with timestamps, and stores the voice file and the generated plurality of facial images in the cloud server in an associated manner. In other embodiments of the present application, the video file may be further subjected to image processing by a video image processing module disposed on the electronic device, so as to generate a plurality of facial image files with timestamps, and then the voice file and the plurality of facial image files with timestamps are uploaded to the word capturing platform in an associated manner. Wherein the timestamp indicates a time of acquisition of the face image in the respective face image file.
Taking the example of a conversation scenario between a salesperson (i.e., a first user) and a customer (i.e., a second user) who are offline from various stores, the voice of the first user and the facial image of the second user may be acquired in several ways as follows.
First, a video file containing the voice of the salesperson and the voice of the customer and the facial image of the customer may be captured by a camera having a microphone provided at the service desk or the counseling desk. Then, the video file is subjected to image processing, an audio stream in the video file is extracted, and an audio file is generated, wherein the audio file is provided with a time stamp for indicating the acquisition time of the audio stream; and extracting the video image frames to obtain image files corresponding to each frame of image, wherein each image file is provided with a time stamp for indicating the acquisition time of the frame of image.
Since the audio file may contain the voices of both the salesperson and the customer, the audio file needs to be further processed to extract the voice of the salesperson (i.e., the first user) from the audio file. For example, the voice file of the salesperson in the dialog scene can be generated by extracting the audio information matched with the voiceprint of the salesperson from the generated audio file in a manner of collecting the voiceprint of the salesperson in advance.
And then, the generated image file can be used as a facial image file, and the voice file is associated and directly uploaded to the word acquisition platform.
In some embodiments of the application, in order to reduce network resources occupied by uploading facial image files and improve facial image transmission efficiency, face detection and positioning can be performed on image files generated by each frame of image, image interception is performed on each image file according to a face positioning result, and only facial image areas are reserved to generate corresponding facial image files. Each generated face image file has a timestamp that is a timestamp of the video image frame from which the face image file was generated. And then, uploading the facial image file only comprising the face area to the word acquisition platform in a related manner.
And the word acquisition platform stores the received voice file of the first user and a plurality of facial image files of the second user in a cloud server.
Second, a video file containing the voice of the salesperson and the voice of the customer and the facial image of the customer may be captured by a camera having a microphone provided at the service desk or the counseling desk. Uploading the video file to the word acquisition platform, carrying out image processing on the video file by the word acquisition platform, extracting an audio stream in the video file to generate an audio file, wherein the audio file is provided with a time stamp for indicating the acquisition time of the audio stream; and extracting the video image frames to obtain image files corresponding to each frame of image, wherein each image file is provided with a time stamp for indicating the acquisition time of the frame of image. And then, the word acquisition platform generates a voice file of the salesperson (namely the first user) according to the voice processing mode. And the word acquisition platform respectively performs face detection and positioning on the image files generated by each frame of image according to the above mode, performs image interception on each image file according to a face positioning result, and only reserves a face image area to generate a corresponding face image file. Each generated face image file has a timestamp that is a timestamp of the video image frame from which the face image file was generated. And the word acquisition platform stores the generated voice file of the first user and a plurality of facial image files of the second user in a cloud server.
Third, a video file containing a sales person's voice and facial image, a customer's voice facial image may be collected through a monitoring apparatus provided in a store. Then, the video file is subjected to image processing, an audio stream in the video file is extracted, and an audio file is generated, wherein the audio file is provided with a time stamp for indicating the acquisition time of the audio stream; and extracting the video image frames to obtain image files corresponding to each frame of image, wherein each image file is provided with a time stamp for indicating the acquisition time of the frame of image.
Since the audio file may contain the voices of the salesperson and the customer at the same time, the audio file needs to be further processed to extract the voice of the salesperson (i.e., the first user) from the audio file and generate the voice file of the salesperson (i.e., the first user). The specific implementation of generating the voice file of the salesperson (i.e. the first user) from the audio file is described in the foregoing description, and will not be described herein again.
Because the image file generated by each frame of image may include the facial image of the customer and the facial image of the salesperson, it is further necessary to perform face detection and positioning on each image file, determine the face region included in each image, compare the images of the face regions with the face images acquired by the salesperson in advance, and use the images of the face regions that are not successfully identified as the face regions of the customer (i.e., the second user). And finally, respectively generating facial image files of the customers according to the face areas of the customers in each image file.
And then, uploading the generated facial image file of the customer and the voice file of the salesperson to the word acquisition platform in a correlated manner, and storing the facial image file and the voice file of the salesperson to a cloud server.
And fourthly, acquiring video files containing voices and facial images of sales personnel and voice facial images of customers through monitoring equipment arranged in a store, and uploading the acquired video files to the word acquisition platform. And the word acquisition platform processes the video file, extracts the audio file and the image file according to a mode in a third situation, further performs voiceprint recognition processing on the audio file to generate a voice file of the customer, and performs face detection, positioning and recognition processing on the video file to generate a plurality of facial image files of the customer.
And then, the generated facial image file of the customer and the voice file of the salesperson are stored in a cloud server in an associated mode.
In some embodiments of the present application, the voice of one of the two parties of the conversation and the facial image of the other party may also be obtained in other manners, which is not illustrated in this embodiment.
Step S2, determining facial expressions of the second user occurring at different times during the dialog by performing expression recognition on the facial image of the second user.
Further, after a plurality of facial images of a second user are acquired, facial expressions of the second user in each facial image can be determined by performing expression recognition on each facial image. Wherein the facial expression of the second user identified for each facial image includes, but is not limited to, any one of: smile, concentration, calm, aversion to cold, engendering qi. For a specific implementation of performing expression recognition on each facial image, reference is made to the prior art, and details in this embodiment are not repeated. The embodiment of the present application is not limited to the specific implementation of performing expression recognition on each facial image.
Further, for each facial image, the timestamp of each facial image is used as the occurrence time of the facial expression of the second user identified from the facial image.
According to the method, the facial expressions of the second user in different time in the conversation with the first user can be obtained.
Step S3, matching each facial expression of the second user with the text obtained by the voice conversion of the first user according to the occurrence time of each facial expression and the occurrence time of the voice, and determining the text corresponding to each facial expression.
Typically, during a conversation, the facial expressions of one party to the conversation reflect the real-time emotional polarity of the words they speak to the other party. For example, during a conversation between a first user and a second user, the facial expressions of the second user at different times reflect whether the second user is satisfied or disliked at that point in time with the first user. Therefore, words matching different emotion polarities of the second user can be obtained by matching the text obtained by voice conversion of the first user with the facial expression of the second user based on time.
In some embodiments of the application, the matching, according to the occurrence time of each facial expression and the occurrence time of the voice, each facial expression of the second user and a text obtained by converting the voice of the first user, and determining the text corresponding to each facial expression includes: for each facial expression, taking the voice segment of the first user in the voice, which occurs within a preset time range of the occurrence time of the facial expression, as the voice segment matched with the facial expression; and for each facial expression, converting the text obtained by converting the voice segment matched with the facial expression to be used as the text matched with the facial expression.
For example, during a session starting at time T for a first user and a second user, the acquired voice file of the first user is denoted as voice. wav, and the acquired plurality of face images of the second user are denoted as picture { { p1, T1}, …, { pN, tN } }. Wav includes an acquisition time, pN indicates that an nth facial image of the second user is acquired, tN indicates an acquisition time (i.e., a time stamp) of the facial image pN, and N is a natural number greater than 1. After performing expression recognition on each face image of the plurality of face images picture { { p1, t1}, …, { pN, tN } } of the second user, the facial expression of the second user in each face image can be obtained. Taking 5 facial images obtained from the second user as an example, the expression recognition results of each facial image are respectively: the facial expression of the second user in the face image p1 is "smile", the facial expression of the second user in the face image p2 is "calm", the facial expression of the second user in the face image p3 is "calm", the facial expression of the second user in the face image p4 is "dislike", and the facial expression of the second user in the face image p5 is "angry".
In some embodiments of the present application, a voice segment of the first user occurring within a preset time range (e.g., 10 seconds) of an occurrence time of each of the facial expressions (i.e., a capturing time of a facial image corresponding to the facial expression) of the second user may be used as a voice segment matching the facial expression. For example, a voice segment of the audio stream with a time stamp within a time range (t1-5, t1+5) in the voice file voice.wav of the first user is taken as a voice segment matching the "smiling" facial expression of the second user; the voice segments of the audio stream with the time stamp within the time range (t2-5, t2+5) in the voice file voice of the first user, voice, are taken as the first voice segments matching the "calm" facial expression of the second user; a voice segment of the audio stream with a time stamp within a time range (t3-5, t3+5) in the voice file voice.wav of the first user is taken as a second voice segment matching the "calm" facial expression of the second user; the voice segments of the audio stream with the time stamps within the time range (t4-5, t4+5) in the voice file voice.wav of the first user are taken as the first voice segments matching the "dislike" facial expression of the second user; the voice segment of the audio stream with the time stamp within the time range (t5-5, t5+5) in the voice file voice of the first user, voice, is taken as the voice segment matching the "angry" facial expression of the second user.
According to the method, the voice segment of the first user corresponding to each facial expression of the second user at different time points can be determined. Where the same facial expression may correspond to different speech segments, different utterances representing the first user may elicit the same expression of the second user.
In some embodiments of the present application, the preset time range is set according to specific needs.
Step S4, determining words matching the preset emotion polarity of the second user according to the text corresponding to each facial expression of the second user.
In some embodiments of the present application, the preset emotion polarity includes: positive emotions and negative emotions. Wherein, the positive condition is the emotion polarity reflected by the definition of the facial expressions of smiling, concentration, calmness and the like of the user; negative emotions are emotional polarities expressed by facial expressions such as "dislike" and "anger".
In some embodiments of the application, determining, according to the text corresponding to each of the facial expressions of the second user, a word matching a preset emotion polarity of the second user includes: determining the emotion polarity matched with each facial expression according to the corresponding relation between the facial expressions and the emotion polarities; taking the emotion polarity matched with each facial expression as the emotion polarity matched with the text matched with the facial expression; and determining words matched with the emotional polarities of the second user according to the occurrence frequency of different words in the text matched with the same emotional polarity.
In some embodiments of the present application, the correspondence between the facial expressions and the emotional polarities may be established in advance according to expert common knowledge. For example, determining the emotional polarity of the user includes: the facial expressions of smiling, concentration and calmness are defined as the emotional polarity of matching the positive emotion, and the expressions of counterintuitive and anger are defined as the emotional polarity of matching the negative emotion. Then, according to the corresponding relationship between the facial expression and the emotion polarity and the facial expression of the second user at different time points obtained by the above recognition, it can be determined that the emotion polarities of the words of the first user at different time points by the second user are respectively: the utterance of the first user causes a positive emotion in the second user at time t1, the utterance of the first user causes a positive emotion in the second user at time t2, the utterance of the first user causes a positive emotion in the second user at time t3, the utterance of the first user causes a negative emotion in the second user at time t4, and the utterance of the first user causes a negative emotion in the second user at time t 5.
After the corresponding relation of the voice segments of the first user with the facial expression matching of the second user is determined, text conversion is respectively carried out on each voice segment, the speaking text of the first user in each voice segment is determined, and the speaking text of the second user with the facial expression matching of the second user at different time points is further determined. The specific implementation of converting the speech segment into the text refers to the prior art, and is not described in detail in the embodiments of the present application. In the embodiment of the present application, a specific implementation manner of converting a speech segment into a text is not limited. And then, taking the text obtained by converting each voice segment as the text matched with the facial expression of the second user matched with the voice segment. For example, for the facial expression "smile" that occurs at time t1 by the second user, the utterance text of the corresponding first user during the (t1-5, t1+5) time period is "hello, very happy to serve you"; for the facial expression "dislike" of the second user at time t5, the utterance text of the corresponding first user in the time period (t5-5, t5+5) is "you must do so as soon as possible, otherwise, it is not in time".
As described above, according to the correspondence between the facial expressions and the emotion polarities, it has been determined that the facial expressions of the second user at 5 time points of the above-mentioned t1 to t5 match the emotion polarities of the emotion, and it has been determined that the facial expressions of the second user at 5 time points of the above-mentioned t1 to t5 match the utterance text of the first user during the conversation, it can be further determined that the utterance text of the first user at different time points matches the emotion polarities of the second user. For example, at time t1, the spoken text "hello, happy to serve you" of the first user matches the positive emotion of the second user; at time t5, the first user's spoken text "you must do so as soon as possible, otherwise it is not time to" match the second user's negative emotion.
Finally, words included in the text of the emotion polarity which is the positive emotion of the second user in the utterance text of the first user in the current conversation are matched, and a candidate word set of the emotion polarity which is the positive emotion is added; and adding words included in the text which is matched with the emotion polarity of the negative emotion of the second user in the utterance text of the first user in the current conversation into the candidate word set of the emotion polarity of the negative emotion.
Thus, a group of words matching the emotional polarity of positive emotions and a group of words matching the emotional polarity of negative emotions are determined according to a session of the first user and the second user. And acquiring a plurality of sections of conversations of the first user and the second user to obtain a plurality of groups of words matched with positive emotion and a plurality of groups of words matched with negative emotion.
Further, by analyzing a plurality of groups of words matched with different emotion polarities and collected in a multi-section conversation of the first user and the second user, determining the words with the emotion polarity matched with the positive emotion of the second user, wherein the words (such as the first 5 words with the maximum occurrence frequency) with the emotion polarity matched with the positive emotion frequently meet preset conditions in the plurality of groups of words matched with the emotion polarity; and determining the words with the frequency meeting the preset condition (such as the first 5 words with the maximum frequency) in the plurality of groups of words matching the negative emotion of the emotional polarity as the words matching the negative emotion of the second user.
And then, completing the collection of words matched with different emotion polarities of the second user.
The word acquisition method for matching emotion polarities, disclosed by the embodiment of the application, comprises the steps of acquiring a voice of a first user and a facial image of a second user in a conversation process of the first user and the second user; determining each facial expression of the second user at different time in the conversation process by performing expression recognition on the facial image of the second user; matching each facial expression of the second user with a text obtained by voice conversion of the first user according to the occurrence time of each facial expression and the occurrence time of the voice, and determining the text corresponding to each facial expression; and determining words matched with the preset emotion polarity of the second user according to the text corresponding to each facial expression of the second user, so that the collection efficiency of words based on emotion polarity can be improved. According to the word acquisition method matched with the emotion polarity, the facial expression of one party and the voice of the other party in the conversation process of two users are automatically acquired, and words enabling the users to generate positive emotions and negative emotions can be accurately determined based on the words of the other party when the facial expression of the users occurs.
The word acquisition method for matching emotion polarity disclosed by the embodiment of the application can be applied to many fields. For example, the word collecting method for matching emotion polarities disclosed by the embodiment of the application is applied to the field of chat robots. Firstly, the robot collects the facial expression of a real person conversing with the robot and the voice of the robot, and the voice of the robot and the facial expression of the real person are identified and matched through a word collection platform based on collection time, so that words causing different emotional polarities of the real person are determined. After determining words matching a preset emotion polarity of the second user according to the text corresponding to each facial expression of the second user, the method further includes: establishing a positive emotion word bank of the second user according to the words matched with the emotion polarity of the positive emotion of the second user; and/or establishing a negative emotion word bank of the second user according to the words matched with the emotion polarity of the negative emotion of the second user. For example, after determining words matching the positive emotion and the negative emotion of the real person according to the multi-segment conversation of the robot and the real person, establishing a positive emotion word bank according to the words matching the positive emotion, establishing a negative emotion word bank according to the words matching the negative emotion, and updating the positive emotion word bank and the negative emotion word bank to a predicted word bank of the robot, so as to optimize the conversation content of the robot and the real person and improve the chat experience of the real person.
For another example, in the customer service scene, by collecting the facial image of the customer and the voice of the customer service staff during the conversation process between a certain customer and a plurality of customer service staff, and performing the above steps S1 to S4 on the facial image of the voice in each conversation process, the positive emotion lexicon and the negative emotion lexicon of the customer can be determined, so that the customer service staff can conveniently refer to the emotion lexicon to select the conversation content with the customer, and the service quality for the customer is improved.
Example two
In other embodiments of the present application, the word collection method for matching emotion polarities described in the first embodiment may also be applied to establishment of a corpus lexicon in a preset session scene. For example, the dialog process of the first user and the second user is a dialog process in a preset dialog scenario, and as shown in fig. 2, after the step of determining, according to the text corresponding to each facial expression of the second user, a word matching a preset emotion polarity of the second user, the method further includes: step S5 and step S6.
Step S5, reselecting the first user and the second user, and repeatedly executing steps S1 to S4 until a word set output condition is satisfied;
step S6, outputting a word set matched with the preset emotion polarity in the conversation scene according to the words matched with the preset emotion polarity of all the selected second users.
Wherein the term set output condition includes, but is not limited to, any one of the following: the steps S1 to S4 are repeatedly performed for a predetermined number of times (e.g., 10000 times), the number of the selected second users reaches a predetermined value (e.g., 1000 persons), and the obtained voice of the first user reaches a predetermined value (e.g., 10000 pieces).
Taking the corpus of salespeople under the scene of establishing conversation between salespeople and customer as an example, 1000 customers (i.e. the second user) and salespople can be selected for voice and facial image acquisition, and words matching the positive emotion and words matching the negative emotion of the 1000 customers are determined according to the methods described in steps S1 to S4 in the first embodiment. Then, constructing a word set matched with the positive emotions according to the words matched with the positive emotions of the 1000 customers; and constructing a word set matching the negative emotion according to the words matching the negative emotion of the 1000 customers. The constructed word set matched with the positive emotion can be used as the word set of the positive emotion in the conversation scene of the salesman and the customer; the constructed word set matching the positive emotion can be used as the word set of the negative emotion in the conversation scene of the salesman and the customer. Finally, the set of words matching positive emotions can be output for constructing a preferred corpus of sellers. And the word set matched with the negative emotion can be output and used for constructing an evasive corpus of the salesman.
The embodiment of the application discloses a word collecting method for matching emotion polarities, which collects voices of a first user and facial images of a second user in a plurality of conversation processes between a plurality of second users and at least one first user under a preset conversation scene, and respectively executes the collected voices and facial images corresponding to each conversation process: determining each facial expression of the second user at different time in the conversation process by performing expression recognition on the facial image of the second user; matching each facial expression of the second user with a text obtained by voice conversion of the first user according to the occurrence time of each facial expression and the occurrence time of the voice, and determining the text corresponding to each facial expression; determining data processing operation of words matched with preset emotion polarity of the second user according to the text corresponding to each facial expression of the second user; and finally, outputting a word set matched with the preset emotion polarity in the conversation scene according to the words matched with the preset emotion polarity of all the second users, and being beneficial to automatically establishing a corpus matched with different emotion polarities aiming at the conversation scene. For example, when a training corpus of salespeople is established, words matched with different emotion polarities of different users do not need to be counted manually, and the efficiency of constructing the corpus can be improved.
EXAMPLE III
The word collection system who matches emotion polarity that this application embodiment disclosed, as shown in fig. 3, the device includes:
a voice and facial image acquiring module 310, configured to acquire a voice of a first user and a facial image of a second user during a conversation between the first user and the second user;
a facial expression determining module 320, configured to determine facial expressions of the second user at different times during the conversation by performing expression recognition on a facial image of the second user;
a facial expression and voice matching module 330, configured to match, according to occurrence time of each facial expression and occurrence time of the voice, each facial expression of the second user and a text obtained by voice conversion of the first user, and determine the text corresponding to each facial expression;
and a word determining module 340 for matching emotion polarities, configured to determine, according to the text corresponding to each facial expression of the second user, a word matching a preset emotion polarity of the second user.
In some embodiments of the present application, the facial expression and speech matching module 330 is further configured to:
for each facial expression, taking the voice segment of the first user in the voice, which occurs within a preset time range of the occurrence time of the facial expression, as the voice segment matched with the facial expression;
and for each facial expression, converting the text obtained by converting the voice segment matched with the facial expression to be used as the text matched with the facial expression.
In some embodiments of the present application, the word determining module 340 for matching emotion polarity is further configured to:
determining the emotion polarity matched with each facial expression according to the corresponding relation between the facial expressions and the emotion polarities;
taking the emotion polarity matched with each facial expression as the emotion polarity matched with the text matched with the facial expression;
and determining words matched with the emotional polarities of the second user according to the occurrence frequency of different words in the text matched with the same emotional polarity.
In some embodiments of the present application, the preset emotion polarity includes: positive emotion, negative emotion, as shown in fig. 4, the apparatus further includes:
the user positive emotion word bank establishing module 350 is configured to establish a positive emotion word bank of the second user according to a word matching the positive emotion of the second user with the emotion polarity; and/or the presence of a gas in the gas,
and the user negative emotion word bank establishing module 360 is configured to establish a negative emotion word bank of the second user according to the word matching the emotion polarity of the negative emotion of the second user.
By automatically acquiring the facial expression of one party and the voice of the other party in the conversation process of the two users and based on the words of the other party when the facial expression of the user occurs, the words which enable the user to generate positive emotion and negative emotion can be accurately determined.
In some embodiments of the present application, the dialog process of the first user and the second user is a dialog process in a preset session scenario, and as shown in fig. 5, the apparatus further includes:
a multi-dialog word acquisition module 370 for reselecting the first user and the second user and recalling the voice and facial image acquisition module 310, the facial expression determination module 320, the facial expression and voice matching module 330, and the word determination module 340 matching emotion polarities until a word set output condition is satisfied;
and a scene word set output module 380, configured to output a word set matched with the preset emotion polarity in the conversation scene according to the words matched with the preset emotion polarity of all the selected second users.
The word acquisition device for matching emotion polarities disclosed in the embodiment of the present application is used to implement the word acquisition method for matching emotion polarities described in the first embodiment or the second embodiment of the present application, and specific implementation modes of modules of the device are not described again, and reference may be made to specific implementation modes of corresponding steps in the method embodiment.
The word acquisition device matched with the emotion polarity, disclosed by the embodiment of the application, acquires the voice of a first user and the facial image of a second user in the conversation process of the first user and the second user; determining each facial expression of the second user at different time in the conversation process by performing expression recognition on the facial image of the second user; matching each facial expression of the second user with a text obtained by voice conversion of the first user according to the occurrence time of each facial expression and the occurrence time of the voice, and determining the text corresponding to each facial expression; and determining words matched with the preset emotion polarity of the second user according to the text corresponding to each facial expression of the second user, so that the collection efficiency of words based on emotion polarity can be improved. The word acquisition device matched with the emotion polarity, disclosed by the embodiment of the application, can accurately determine the words which enable the user to generate positive emotions and negative emotions by automatically acquiring the facial expressions of one party and the voice of the other party in the conversation process of two users and based on the words of the other party when the facial expressions of the users occur.
The embodiment of the application discloses a word collection system who matches emotion polarity, through a plurality of conversation processes between a plurality of second users and at least one first user under the collection presets the conversation scene, the pronunciation of first user and the facial image of second user to the pronunciation and the facial image that each conversation process that to gather corresponds are executed respectively: determining each facial expression of the second user at different time in the conversation process by performing expression recognition on the facial image of the second user; matching each facial expression of the second user with a text obtained by voice conversion of the first user according to the occurrence time of each facial expression and the occurrence time of the voice, and determining the text corresponding to each facial expression; determining data processing operation of words matched with preset emotion polarity of the second user according to the text corresponding to each facial expression of the second user; and finally, outputting a word set matched with the preset emotion polarity in the conversation scene according to the words matched with the preset emotion polarity of all the second users, and being beneficial to automatically establishing a corpus matched with different emotion polarities aiming at the conversation scene. For example, when a training corpus of salespeople is established, words matched with different emotion polarities of different users do not need to be counted manually, and the efficiency of constructing the corpus can be improved.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The word acquisition method and device for matching emotion polarities provided by the application are introduced in detail, specific examples are applied in the text to explain the principle and the implementation mode of the application, and the description of the embodiments is only used for helping to understand the method and a core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in an electronic device according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
For example, fig. 6 illustrates an electronic device that may implement a method according to the present application. The electronic device can be a PC, a mobile terminal, a personal digital assistant, a tablet computer and the like. The electronic device conventionally comprises a processor 610 and a memory 620 and program code 630 stored on said memory 620 and executable on the processor 610, said processor 610 implementing the method described in the above embodiments when executing said program code 630. The memory 620 may be a computer program product or a computer readable medium. The memory 620 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 620 has a storage space 6201 for program code 630 of a computer program for performing any of the method steps described above. For example, the storage space 6201 for the program code 630 may include respective computer programs for implementing the various steps in the above method, respectively. The program code 630 is computer readable code. The computer programs may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. The computer program comprises computer readable code which, when run on an electronic device, causes the electronic device to perform the method according to the above embodiments.
The embodiment of the present application further discloses a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the word collection method for matching emotion polarities as described in the first embodiment or the second embodiment of the present application.
Such a computer program product may be a computer-readable storage medium that may have memory segments, memory spaces, etc. arranged similarly to the memory 620 in the electronic device shown in fig. 6. The program code may be stored in a computer readable storage medium, for example, compressed in a suitable form. The computer readable storage medium is typically a portable or fixed storage unit as described with reference to fig. 7. Typically, the storage unit comprises computer readable code 630 ', said computer readable code 630' being code read by a processor, which when executed by the processor implements the steps of the method described above.
Reference herein to "one embodiment," "an embodiment," or "one or more embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Moreover, it is noted that instances of the word "in one embodiment" are not necessarily all referring to the same embodiment.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (12)

1. A word collection method for matching emotion polarities is characterized by comprising the following steps:
step S1, acquiring the voice of a first user and the face image of a second user in the process of the dialogue between the first user and the second user;
step S2, determining each facial expression of the second user occurring at different time in the conversation process by performing expression recognition on the facial image of the second user;
step S3, matching each facial expression of the second user with a text obtained by converting the voice of the first user according to the occurrence time of each facial expression and the occurrence time of the voice, and determining the text corresponding to each facial expression;
step S4, determining words matching the preset emotion polarity of the second user according to the text corresponding to each facial expression of the second user.
2. The method of claim 1, wherein the step of matching the respective facial expressions of the second user with the text converted from the speech of the first user according to the occurrence time of the respective facial expressions and the occurrence time of the speech to determine the text corresponding to each of the facial expressions comprises:
for each facial expression, taking the voice segment of the first user in the voice, which occurs within a preset time range of the occurrence time of the facial expression, as the voice segment matched with the facial expression;
and for each facial expression, converting the text obtained by converting the voice segment matched with the facial expression to be used as the text matched with the facial expression.
3. The method of claim 1, wherein the step of determining words matching a preset emotion polarity of the second user from the text corresponding to each of the facial expressions of the second user comprises:
determining the emotion polarity matched with each facial expression according to the corresponding relation between the facial expressions and the emotion polarities;
taking the emotion polarity matched with each facial expression as the emotion polarity matched with the text matched with the facial expression;
and determining words matched with the emotional polarities of the second user according to the occurrence frequency of different words in the text matched with the same emotional polarity.
4. The method of claim 1, wherein the preset emotion polarity comprises: positive emotion and negative emotion, and after the step of determining words matching a preset emotion polarity of the second user according to the text corresponding to each facial expression of the second user, the method further includes:
establishing a positive emotion word bank of the second user according to the words matched with the emotion polarity of the positive emotion of the second user; and/or the presence of a gas in the gas,
and establishing a negative emotion word bank of the second user according to the words matched with the emotion polarity of the negative emotion of the second user.
5. The method of claim 1, wherein the first user and the second user dialog process are dialog processes in a preset dialog scenario, and wherein the step of determining words matching a preset emotion polarity of the second user from the text corresponding to each of the facial expressions of the second user further comprises:
reselecting the first user and the second user, and repeatedly performing steps S1-S4 until a word set output condition is satisfied;
and outputting a word set matched with the preset emotion polarity in the conversation scene according to the words matched with the preset emotion polarity of all the selected second users.
6. The utility model provides a word collection system who matches emotion polarity which characterized in that includes:
the voice and facial image acquisition module is used for acquiring the voice of a first user and the facial image of a second user in the conversation process of the first user and the second user;
the facial expression determining module is used for determining each facial expression of the second user at different time in the conversation process by performing expression recognition on the facial image of the second user;
the facial expression and voice matching module is used for matching each facial expression of the second user with a text obtained by voice conversion of the first user according to the occurrence time of each facial expression and the occurrence time of the voice, and determining the text corresponding to each facial expression;
and the word determining module is used for determining words matched with the preset emotion polarity of the second user according to the text corresponding to each facial expression of the second user.
7. The apparatus of claim 6, wherein the facial expression and speech matching module is further configured to:
for each facial expression, taking the voice segment of the first user in the voice, which occurs within a preset time range of the occurrence time of the facial expression, as the voice segment matched with the facial expression;
and for each facial expression, converting the text obtained by converting the voice segment matched with the facial expression to be used as the text matched with the facial expression.
8. The apparatus of claim 6, wherein the word determination module that matches emotion polarities is further configured to:
determining the emotion polarity matched with each facial expression according to the corresponding relation between the facial expressions and the emotion polarities;
taking the emotion polarity matched with each facial expression as the emotion polarity matched with the text matched with the facial expression;
and determining words matched with the emotional polarities of the second user according to the occurrence frequency of different words in the text matched with the same emotional polarity.
9. The apparatus of claim 6, wherein the preset emotion polarity comprises: positive emotion, negative emotion, the apparatus further comprising:
the user positive emotion word bank establishing module is used for establishing a positive emotion word bank of the second user according to the words matched with the positive emotion of the second user, namely the emotion polarity; and/or the presence of a gas in the gas,
and the user negative emotion word bank establishing module is used for establishing the negative emotion word bank of the second user according to the words matched with the emotion polarity of the negative emotion of the second user.
10. The apparatus of claim 6, wherein the first and second user dialog processes are dialog processes in a preset session scenario, the apparatus further comprising:
a multi-conversation word acquisition module for reselecting the first user and the second user, and repeatedly calling the voice and facial image acquisition module, the facial expression determination module, the facial expression and voice matching module, and the word determination module matching emotion polarities until a word set output condition is satisfied;
and the scene word set output module is used for outputting the word set matched with the preset emotion polarity in the conversation scene according to the words matched with the preset emotion polarity of all the selected second users.
11. An electronic device comprising a memory, a processor and a program code stored on the memory and executable on the processor, wherein the processor implements the word collection method for matching emotion polarities of any one of claims 1 to 5 when executing the program code.
12. A computer-readable storage medium having stored thereon program code, which when executed by a processor implements the steps of the method for word spotting matched emotion polarity according to any of claims 1 to 5.
CN201911419689.0A 2019-12-31 2019-12-31 Word acquisition method and device matched with emotion polarity and electronic equipment Active CN111210818B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911419689.0A CN111210818B (en) 2019-12-31 2019-12-31 Word acquisition method and device matched with emotion polarity and electronic equipment
PCT/CN2020/100549 WO2021135140A1 (en) 2019-12-31 2020-07-07 Word collection method matching emotion polarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911419689.0A CN111210818B (en) 2019-12-31 2019-12-31 Word acquisition method and device matched with emotion polarity and electronic equipment

Publications (2)

Publication Number Publication Date
CN111210818A CN111210818A (en) 2020-05-29
CN111210818B true CN111210818B (en) 2021-10-01

Family

ID=70786549

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911419689.0A Active CN111210818B (en) 2019-12-31 2019-12-31 Word acquisition method and device matched with emotion polarity and electronic equipment

Country Status (2)

Country Link
CN (1) CN111210818B (en)
WO (1) WO2021135140A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111210818B (en) * 2019-12-31 2021-10-01 北京三快在线科技有限公司 Word acquisition method and device matched with emotion polarity and electronic equipment
CN112200051B (en) * 2020-09-30 2023-09-29 重庆天智慧启科技有限公司 Case field inspection system and method

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101558553B1 (en) * 2009-02-18 2015-10-08 삼성전자 주식회사 Facial gesture cloning apparatus
CN105976809B (en) * 2016-05-25 2019-12-17 中国地质大学(武汉) Identification method and system based on speech and facial expression bimodal emotion fusion
CN106373569B (en) * 2016-09-06 2019-12-20 北京地平线机器人技术研发有限公司 Voice interaction device and method
KR102034255B1 (en) * 2017-06-29 2019-10-18 네이버 주식회사 Method and system for human-machine emotional communication
CN110674270B (en) * 2017-08-28 2022-01-28 大国创新智能科技(东莞)有限公司 Humorous generation and emotion interaction method based on artificial intelligence and robot system
CN108334583B (en) * 2018-01-26 2021-07-09 上海智臻智能网络科技股份有限公司 Emotion interaction method and device, computer readable storage medium and computer equipment
CN109360130A (en) * 2018-10-29 2019-02-19 四川文轩教育科技有限公司 A kind of student's mood monitoring method based on artificial intelligence
CN110362833A (en) * 2019-07-22 2019-10-22 腾讯科技(深圳)有限公司 A kind of text based sentiment analysis method and relevant apparatus
CN111210818B (en) * 2019-12-31 2021-10-01 北京三快在线科技有限公司 Word acquisition method and device matched with emotion polarity and electronic equipment

Also Published As

Publication number Publication date
CN111210818A (en) 2020-05-29
WO2021135140A1 (en) 2021-07-08

Similar Documents

Publication Publication Date Title
US8676586B2 (en) Method and apparatus for interaction or discourse analytics
JP6462651B2 (en) Speech translation apparatus, speech translation method and program
JP5602653B2 (en) Information processing apparatus, information processing method, information processing system, and program
CN108920640B (en) Context obtaining method and device based on voice interaction
CN112686048B (en) Emotion recognition method and device based on fusion of voice, semantics and facial expressions
CN110633912A (en) Method and system for monitoring service quality of service personnel
CN112148922A (en) Conference recording method, conference recording device, data processing device and readable storage medium
CN111210818B (en) Word acquisition method and device matched with emotion polarity and electronic equipment
CN111597818B (en) Call quality inspection method, device, computer equipment and computer readable storage medium
US11238869B2 (en) System and method for reconstructing metadata from audio outputs
CN111429157A (en) Method, device and equipment for evaluating and processing complaint work order and storage medium
CN104135638A (en) Optimized video snapshot
CN111223487A (en) Information processing method and electronic equipment
CN114138960A (en) User intention identification method, device, equipment and medium
JP5803617B2 (en) Speech information analysis apparatus and speech information analysis program
CN110580899A (en) Voice recognition method and device, storage medium and computing equipment
EP4020468A1 (en) System, method and apparatus for conversational guidance
CN110765242A (en) Method, device and system for providing customer service information
CN113409774A (en) Voice recognition method and device and electronic equipment
CN113312928A (en) Text translation method and device, electronic equipment and storage medium
CN112632241A (en) Method, device, equipment and computer readable medium for intelligent conversation
CN114449297A (en) Multimedia information processing method, computing equipment and storage medium
CN111582708A (en) Medical information detection method, system, electronic device and computer-readable storage medium
CN112714220B (en) Business processing method and device, computing equipment and computer readable storage medium
US11799679B2 (en) Systems and methods for creation and application of interaction analytics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant