CN115171673A - Role portrait based communication auxiliary method and device and storage medium - Google Patents

Role portrait based communication auxiliary method and device and storage medium Download PDF

Info

Publication number
CN115171673A
CN115171673A CN202210554278.8A CN202210554278A CN115171673A CN 115171673 A CN115171673 A CN 115171673A CN 202210554278 A CN202210554278 A CN 202210554278A CN 115171673 A CN115171673 A CN 115171673A
Authority
CN
China
Prior art keywords
communication
data
character
objects
portrait
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210554278.8A
Other languages
Chinese (zh)
Inventor
林皓
高曦
杨华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing VRV Software Corp Ltd
Original Assignee
Beijing VRV Software Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing VRV Software Corp Ltd filed Critical Beijing VRV Software Corp Ltd
Priority to CN202210554278.8A priority Critical patent/CN115171673A/en
Publication of CN115171673A publication Critical patent/CN115171673A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/227Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology

Abstract

The invention provides a communication auxiliary method, a device and a storage medium based on a character portrait.A plurality of video data and audio data of communication objects are collected, and whether communication auxiliary is required to be started or not is judged according to the video data and/or the audio data; when needed, determining the role relationship among the multiple communication objects through the video data and/or the audio data, and then acquiring the role portrait corresponding to each communication object according to the role relationship; respectively interpreting the communication behavior data of a plurality of communication objects based on the character portrait to obtain a first interpretation result; the first interpretation result is processed and output in a mode of being matched with the current communication scene, so that auxiliary measures can be started in time when communication is obstructed, accurate communication behavior interpretation can be provided through the character portrait, various output modes can be provided for the interpretation result, the current communication scene can be matched to the maximum extent, and intimate experience is brought to a user.

Description

Role portrait based communication auxiliary method and device and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a role portrait based communication auxiliary method, a role portrait based communication auxiliary device and a storage medium.
Background
In the social communication process, people usually adopt voice communication to carry out information transmission. Then, in some cases, communication obstacles may exist in the communication between the communication objects through voice, and particularly, when the communication objects relate to hearing-impaired people and voice-impaired people, there may be a problem of inconvenient communication. With the development of computer technology, especially speech recognition technology and computer vision technology, people try to solve the problems through computers.
However, when a computer is used to recognize voice for communication assistance, because there is a certain error in voice recognition and inaccuracy in pronunciation and word use of the user during spoken communication, which may cause a large difference between the voice recognition result and the user's true expression intention, even a north-south-thill situation, there is no mature communication assistance system.
Disclosure of Invention
The invention provides a communication auxiliary method, a device and a storage medium based on a character portrait, and the scheme of the invention not only can start auxiliary measures in time when communication is blocked, but also can provide accurate communication behavior explanation through the character portrait, and also can provide various output forms for explanation results so as to match the current communication scene to the maximum extent and bring attentive experience to users.
In view of the above, an aspect of the present invention provides a communication assistance method based on a character portrait, including:
collecting video data and audio data containing a plurality of communication objects;
judging whether the communication assistance needs to be started or not according to the video data and/or the audio data;
when communication assistance needs to be started, determining the role relationship among a plurality of communication objects through the video data and/or the audio data;
acquiring the character portrait corresponding to each communication object according to the character relation;
respectively interpreting the communication behavior data of the plurality of communication objects based on the character portrait to obtain a first interpretation result;
and processing the first interpretation result and outputting the first interpretation result in a form matched with the current communication scene.
Optionally, the step of determining whether to start communication assistance according to the video data and/or the audio data includes:
extracting first motion data and/or first face data from the video data; and/or the presence of a gas in the gas,
extracting first voice data from the audio data;
judging whether the first action data has action behaviors indicating that communication obstacles exist or not and/or whether the first face data has expressions indicating that communication obstacles exist or not and/or whether the first voice data has keywords indicating that communication obstacles exist or not;
and when the first action data comprises action behaviors which indicate that communication blockage exists, or/and the first face data comprises expressions which indicate that communication blockage exists, or/and the first voice data comprises keywords which indicate that communication blockage exists, determining that communication assistance needs to be started.
Optionally, the step of determining a role relationship between a plurality of communication objects through the video data and/or the audio data includes:
performing first-class keyword retrieval on the audio data;
performing second-class key information retrieval on the video data;
when the second type of key information is retrieved, determining the current communication place according to the second type of key information;
and when the first type of keywords are retrieved, determining the role relationships among a plurality of communication objects according to the current communication place, the first type of keywords and the communication objects associated with the sentences to which the first type of keywords belong.
Optionally, the communication behavior data includes:
communication language data, communication action data and/or communication expression data.
Optionally, the step of interpreting, based on the character representation, the communication behavior data of the plurality of communication objects respectively to obtain a first interpretation result includes:
configuring unique object identifications for the plurality of communication objects respectively;
sequencing the alternating current behavior data according to the generation time, and segmenting the data according to different alternating current objects to obtain a plurality of alternating current behavior data segments;
marking the corresponding object identifications for the plurality of alternating current behavior data segments respectively;
according to the character portrait of the corresponding communication object, the communication behavior data segment marked with the object identification is interpreted by utilizing the character characteristic label;
and fusing the interpretation results of all the communication behavior data segments to obtain a first interpretation result.
Optionally, the processing the first interpretation result and outputting the first interpretation result in a form matching the current communication scene includes:
performing preset data processing on the first interpretation result to obtain output data in various output forms;
acquiring information of a current communication scene, and selecting a first output form matched with the current communication scene according to a corresponding relation between the communication scene and the output form;
selecting first output data from the output data according to the first output form;
presenting the first output data.
Optionally, the step of determining whether the first motion data has a motion behavior indicating that a communication block exists and/or whether the first face data has an expression indicating that a communication block exists and/or whether the first voice data has a keyword indicating that a communication block exists includes:
recognizing and extracting gesture actions from the first action data, performing gesture recognition to obtain first gesture data, and/or,
extracting facial expression features of the first face data to obtain first expression data, and/or,
performing voice recognition on the first voice data to obtain first voice recognition data;
determining whether the first gesture data has a gesture indicating that an obstacle to communication exists, and/or,
judging whether the first expression data has an expression indicating that communication block exists or not, and/or,
and judging whether the first voice recognition data contains a keyword which represents that communication blockage exists or not.
Optionally, before the step of acquiring video data and audio data containing a plurality of communication objects, the method further includes:
determining a relationship between a first object and a second object from a plurality of communication objects, and generating a first relationship label by using respective unique object identifiers of the first object and the second object;
acquiring first communication behavior data between the first object and the second object;
constructing a first character portrait of the first object and a second character portrait of the second object according to the first communication behavior data and the first relation label;
repeating the above operations until all the communication objects establish character images according to different characters.
Another aspect of the present invention provides a character portrait-based communication assistance apparatus, including: the system comprises an acquisition module, a judgment module, a role relationship determination module, a role portrait acquisition module, an interpretation module and an output module;
the acquisition module is used for acquiring video data and audio data containing a plurality of communication objects;
the judging module is used for judging whether the communication assistance needs to be started or not according to the video data and/or the audio data;
the role relationship determining module is used for determining the role relationship among a plurality of communication objects through the video data and/or the audio data when communication assistance needs to be started;
the character portrait acquisition module is used for acquiring the character portrait corresponding to each communication object according to the character relation;
the interpretation module is used for respectively interpreting the communication behavior data of the plurality of communication objects based on the character portrait to obtain a first interpretation result;
and the output module is used for processing the first interpretation result and outputting the first interpretation result in a form matched with the current communication scene.
A third aspect of the present invention provides a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement a character representation-based communication assistance method as described in any one of the preceding.
According to the technical scheme, firstly, video data and audio data containing a plurality of communication objects are collected, and then whether communication assistance needs to be started or not is judged according to the video data and/or the audio data; when communication assistance needs to be started, determining the role relationship among a plurality of communication objects through the video data and/or the audio data, and then acquiring the role portraits corresponding to the communication objects according to the role relationship; then, based on the character portrait, respectively interpreting the communication behavior data of the plurality of communication objects to obtain a first interpretation result; and finally, the first interpretation result is processed and output in a form of being matched with the current communication scene, so that auxiliary measures can be started in time when communication is obstructed, accurate communication behavior interpretation can be provided through the character portrait, various output forms can be provided for the interpretation result, the current communication scene is matched to the maximum extent, and attentive experience is brought to a user.
Drawings
FIG. 1 is a flow diagram of a method for providing a character representation-based communication assistance in accordance with an embodiment of the present invention;
FIG. 2 is a flowchart illustrating the steps of determining whether communication assistance needs to be initiated based on the video data and/or the audio data in accordance with another embodiment of the present invention;
FIG. 3 is a flowchart of the steps for determining a role relationship between a plurality of communication objects through the video data and/or the audio data according to another embodiment of the present invention;
FIG. 4 is a flowchart illustrating steps of interpreting communication behavior data of the communication objects based on the character representation to obtain a first interpretation result according to another embodiment;
FIG. 5 is a flowchart providing steps for processing and outputting the first interpretation result in a form matching the current communication scenario, according to an embodiment of the present invention;
FIG. 6 is a schematic block diagram of a communication assistance device based on a character representation according to an embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as specifically described herein, and thus the scope of the present invention is not limited by the specific embodiments disclosed below.
The terms "first," "second," and the like in the description and claims of the present application and in the foregoing drawings are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
A character representation-based communication assistance method, apparatus, and storage medium according to some embodiments of the present invention are described below with reference to fig. 1 to 6.
As shown in fig. 1, an embodiment of the present invention provides a communication assistance method based on a character portrait, including:
collecting video data and audio data containing a plurality of communication objects;
judging whether communication assistance needs to be started or not according to the video data and/or the audio data;
when communication assistance needs to be started, determining the role relationship among a plurality of communication objects through the video data and/or the audio data;
acquiring the character portrait corresponding to each communication object according to the character relation;
respectively interpreting the communication behavior data of the plurality of communication objects based on the character portrait to obtain a first interpretation result;
and processing the first interpretation result and outputting the first interpretation result in a form of matching with the current communication scene.
It can be understood that, in the communication process of people, due to differences in local culture, education background, social experience, professional field, accent, living habits, used language and the like, situations such as word incompetence, expressiveness and incomprehension exist in the communication process, and particularly, when people with physiological defects exist in communication objects, communication obstacles are larger. When the situation of communication obstacle occurs, communication assistance is needed so that the communication objects can accurately understand the communication behaviors (such as language, gestures, expressions and the like) among each other, and therefore correct feedback can be made.
It should be noted that the communication auxiliary method based on the character portrait provided by the embodiment of the present invention can be applied to an intelligent terminal, such as a smart phone, a computer, a smart television, and the like, and can also be applied to an intercom device, a robot, an access control system, an intelligent medical system, an intelligent teaching system, and the like.
In the embodiment of the present invention, the video data and the audio data may be acquired by an image acquisition unit (e.g., a camera) and a voice acquisition unit (e.g., a microphone) of the image acquisition module, respectively, or may be acquired from a server or other intelligent terminal through a communication network. And in the process of collecting the video data and the audio data, simultaneously storing the related information of the video content or the voice content as the attribute information of the video data and the audio data respectively.
The communication behavior data includes but is not limited to: communication language data, communication action data and/or communication expression data.
According to the technical scheme of the embodiment, firstly, video data and audio data containing a plurality of communication objects are collected, and then whether communication assistance needs to be started or not is judged according to the video data and/or the audio data; when communication assistance needs to be started, determining the role relationship among a plurality of communication objects through the video data and/or the audio data, and then acquiring the role portraits corresponding to the communication objects according to the role relationship; then, based on the character portrait, respectively interpreting the communication behavior data of the plurality of communication objects to obtain a first interpretation result; and finally, the first interpretation result is processed and output in a form of being matched with the current communication scene, so that not only can auxiliary measures be started in time when communication is blocked, but also accurate communication behavior interpretation can be provided through the character portrait, and various output forms can be provided for the interpretation result so as to be matched with the current communication scene to the maximum extent, and the intimate experience is brought to the user.
In some possible embodiments of the invention, as shown in fig. 2, the step of determining whether to start communication assistance according to the video data and/or the audio data includes:
extracting first motion data and/or first face data from the video data; and/or the presence of a gas in the gas,
extracting first voice data from the audio data;
judging whether the first action data has action behaviors indicating that communication obstacles exist or not and/or whether the first face data has expressions indicating that communication obstacles exist or not and/or whether the first voice data has keywords indicating that communication obstacles exist or not;
and when the first action data comprises action behaviors which indicate that communication blockage exists, or/and the first face data comprises expressions which indicate that communication blockage exists, or/and the first voice data comprises keywords which indicate that communication blockage exists, determining that communication assistance needs to be started.
It is to be understood that extracting first motion data from the video data comprises: extracting pictures from the video data by taking a frame as a unit, equally dividing the gray level, the brightness or the color of each pixel of adjacent frames into N (N is a positive integer) levels, counting the number of pixels according to each level, making a histogram comparison, dividing the frames with the histogram difference smaller than a preset threshold into a group, and fusing the frames into new image frames; respectively extracting color features, texture features and shape features of the new image frame, thereby extracting a key frame; and extracting dynamic features such as the motion, motion track, relative speed, position change between objects and the like of the objects from the key frames. According to the dynamic characteristics, first motion data of key parts of the human body can be obtained, and the first motion data can be motions such as gestures, nods, shaking heads, shoulder shrugging and the like.
It should be noted that, the first face data may also be extracted from the video data, specifically: extracting a plurality of pictures from the video data by taking a frame as a unit, graying the pictures, and sequentially inputting all the grayed pictures into a face classifier to obtain a face picture set; compressing the pictures in the face picture set into 64 × 64 pixel values, and performing gray processing to obtain a face picture set to be recognized; and inputting the face picture set to be recognized into a trained convolutional neural network, performing face recognition, and extracting first face data.
Extracting first voice data from the audio data, specifically: and extracting a characteristic vector from the audio data, and performing voice decoding and word searching on the characteristic vector by using the trained acoustic model, the voice comparison dictionary and the trained language model to obtain recognized text information/data. Wherein the training process of the acoustic model comprises the following steps: feature vectors are provided from an existing audio database and input into a neural network to perform acoustic model training to obtain an acoustic model. The training process of the language model comprises the following steps: extracting text sample data from the existing text database, inputting the text sample data into a neural network, and performing language model training to obtain a language model.
According to the embodiment, the video data and/or the audio data are identified and analyzed through technologies such as action identification, face identification and voice identification, so that whether action behaviors and/or expressions and/or keywords indicating that communication obstacles exist or not is judged, and when the action behaviors and/or expressions and/or keywords indicate that the communication obstacles exist, communication assistance is started, so that not only can intelligent and automatic communication assistance service be provided, but also timeliness and accuracy of the service can be guaranteed.
In some possible embodiments of the present invention, as shown in fig. 3, the step of determining the role relationship between the plurality of communication objects through the video data and/or the audio data includes:
performing first-class keyword retrieval on the audio data;
performing second-class key information retrieval on the video data;
when the second type of key information is retrieved, determining the current communication place according to the second type of key information;
and when the first type of keywords are retrieved, determining the role relationships among a plurality of communication objects according to the current communication place, the first type of keywords and the communication objects associated with the sentences to which the first type of keywords belong.
In this embodiment, the first category of keywords include titles, terms, special words, and the like, which can represent identity/role information, role relationship information, and the like. And performing voice recognition on the audio data, converting the audio data to obtain text information, and performing keyword retrieval on the text information to determine whether the first type of keywords exist.
It is understood that the video data may include information characterizing the communication location, such as words "XX hospital", "XX company", "XX cell", "XX mall", etc., as well as special environmental information (such as decorations, people's clothing, etc., which can be directed to a particular location).
According to the current communication place, the first class keywords and the communication object associated with the sentence to which the first class keywords belong, the role relationship among the plurality of communication objects can be determined, so that reference is provided for explaining the communication behavior among the communication objects and providing a correct communication expression form for the communication objects, and smooth communication is facilitated.
In some possible embodiments of the invention, as shown in fig. 4, the step of interpreting communication behavior data of the plurality of communication objects based on the character representation to obtain a first interpretation result includes:
configuring unique object identifications for the plurality of communication objects respectively;
sequencing the alternating current behavior data according to the generation time, and segmenting the data according to the difference of the alternating current objects to obtain a plurality of alternating current behavior data segments;
marking the corresponding object identifications for the plurality of alternating current behavior data segments respectively;
according to the character portrait of the corresponding communication object, the communication behavior data segment marked with the object identification is interpreted by utilizing the character characteristic label;
and fusing the interpretation results of all the communication behavior data segments to obtain a first interpretation result.
It can be understood that, in this embodiment, by configuring unique object identifiers for the plurality of communication objects, a plurality of communication behavior data segments belonging to the same communication object are interpreted by using character feature tags extracted from character images of the same communication object; and after all the communication behavior data segments are explained, fusing the explanation results according to the time sequence to obtain a first explanation result. In the embodiment, the whole part can be broken into parts by segmenting the alternating current behavior data, and a plurality of segments are processed at the same time, so that the processing efficiency is improved; meanwhile, the communication behavior data segments belonging to the same communication object are put together for processing, so that errors can be avoided, and the accuracy is improved.
As shown in fig. 5, in some possible embodiments of the present invention, the processing the first interpretation result and outputting the first interpretation result in a form matching with the current communication scenario includes:
performing preset data processing on the first interpretation result to obtain output data in various output forms;
acquiring information of a current communication scene, and selecting a first output form matched with the current communication scene according to a corresponding relation between the communication scene and the output form;
selecting first output data from the output data according to the first output form;
presenting the first output data.
It can be understood that, after the first interpretation result is obtained, in order to output in a form most suitable for receiving the communication object, in this embodiment, a corresponding relationship between the communication scene and the output form is pre-stored, after preset data processing (such as format conversion, sound effect adjustment, animation generation, sign language gesture generation, and the like) is performed on the first interpretation result to obtain output data in multiple output forms, information of the current communication scene is obtained, so that a first output form matching the current communication scene can be determined, according to the first output form, first output data is selected from the output data, and the first output data is presented, for example, speech is played in a dialect familiar to the communication object, video is played in an animation form demonstrating sign language, or other forms combined with audio and video are presented.
In some possible embodiments of the present invention, the step of determining whether the first motion data has a motion behavior indicating that a communication block exists and/or whether the first face data has an expression indicating that a communication block exists and/or whether the first voice data has a keyword indicating that a communication block exists includes:
recognizing and extracting gesture actions from the first action data, performing gesture recognition to obtain first gesture data, and/or,
extracting facial expression features of the first face data to obtain first expression data, and/or,
performing voice recognition on the first voice data to obtain first voice recognition data;
determining whether a gesture indicating the presence of an obstacle to communication is present in the first gesture data, and/or,
judging whether the first expression data has an expression indicating that communication block exists or not, and/or,
and judging whether the first voice recognition data contains a keyword indicating that communication block exists or not.
It is to be understood that the first gesture data may be sign language, and the determination of whether or not there is a gesture indicating that there is an obstacle to communication in the first gesture data is to determine whether or not there is a sign language action indicating "not understand", "not hear", or the like in the first gesture data.
The first expression data can be actions such as mouth shape change, frowning, blinking, pupil zooming and the like, and can be obtained by extracting facial expression features of the first face data according to preset expression feature points; expressions that represent the presence of a communication impediment may be confusing, at loss, questionable, or not reacting normally to the communication.
The keyword indicating that there is a communication impediment may be "unintelligible", "inaudible", or a word having the same or similar meaning in other dialects.
Through this embodiment, can be more accurate to the judgement that whether has the interchange to hinder.
In some possible embodiments of the present invention, before the step of acquiring video data and audio data including a plurality of communication objects, the method further includes:
the method comprises the following steps: determining a relationship between a first object and a second object from a plurality of communication objects, and generating a first relationship label by using respective unique object identifications of the first object and the second object;
it is understood that each user has a unique object identifier, the relationship between the communication objects may be a role relationship such as a parent, a child, a couple, a friend, a colleague, a doctor or other basic relationship, etc., and the role label of the communication object may be constructed by a preset rule based on its unique object identifier, such as adding a role field after the unique object identifier, etc., and the first relationship label construction may be a fusion of the role labels of the first object and the second object. In this step, any two different communication objects are randomly selected as the first object and the second object.
Step two: acquiring first communication behavior data between the first object and the second object;
in this step, the interactive behavior includes chatting, discussion, teaching, command, etc., and the first interactive behavior data between the first object and the second object is extracted from the interactive behavior data between the objects/characters (such as voice, motion, text, geographical location, character distance, simultaneous participation number, background noise, etc.).
Step three: constructing a first character portrait of the first object and a second character portrait of the second object according to the first communication behavior data and the first relation label;
in this step, a character portrait is created by using words, emotions, ages, sexes, education stages, accents, hobbies, and the like, and a communication behavior database between each two characters is created based on the character portrait and necessary technical means (keyword recognition, emotion recognition, attitude analysis, and the like), and the communication behavior database includes a relationship tag (constructed by the first relationship tag) of each section of communication behavior.
Step four: and repeating the operations from the first step to the third step until all the communication objects establish the character portrait according to different characters.
The method is based on the role relationship among the communication objects and based on the communication behavior data, the role portrait of the communication objects is constructed, and then the newly generated communication behavior data is explained by the role portrait.
As shown in FIG. 6, another embodiment of the present invention provides a character representation-based communication assistance apparatus 600, comprising: the system comprises an acquisition module 601, a judgment module 602, a role relation determination module 603, a role portrait acquisition module 604, an interpretation module 605 and an output module 606;
the acquisition module 601 is configured to acquire video data and audio data that include a plurality of communication objects;
the determining module 602 is configured to determine whether to start an auxiliary communication according to the video data and/or the audio data;
the role relationship determining module 603 is configured to determine, when communication assistance needs to be started, a role relationship between a plurality of communication objects through the video data and/or the audio data;
the character portrait acquisition module 604 is configured to acquire a character portrait corresponding to each communication object according to the role relationship;
the interpretation module 605 is configured to interpret the communication behavior data of the plurality of communication objects respectively based on the character portrait to obtain a first interpretation result;
the output module 606 is configured to process the first interpretation result and output the first interpretation result in a form matching the current communication scene.
Please refer to the foregoing method embodiments for the operation method of the apparatus provided in this embodiment, which is not described herein again.
Fig. 6 is a schematic block diagram of the apparatus in this embodiment. It will be appreciated that fig. 6 only shows a simplified design of the device. In practical applications, the apparatuses may also respectively include other necessary elements, including but not limited to any number of input/output systems, processors, controllers, memories, etc., and all apparatuses that can implement the natural language interpretation method of the embodiments of the present application are within the protection scope of the present application.
Another embodiment of the present invention provides a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement a character representation-based communication assistance method as described in any preceding claims.
It should be understood that the block diagram of the character representation-based communication assistance system shown in FIG. 6 is merely illustrative, and the number of modules shown is not intended to limit the scope of the present invention.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps of the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, the memory including: flash Memory disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing embodiments have been described in detail, and specific examples are used herein to explain the principles and implementations of the present application, where the above description of the embodiments is only intended to help understand the method and its core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications can be easily made by those skilled in the art without departing from the spirit and scope of the present invention, and it is within the scope of the present invention to include different functions, combination of implementation steps, software and hardware implementations.

Claims (10)

1. A character portrait-based communication assistance method is characterized by comprising the following steps:
collecting video data and audio data containing a plurality of communication objects;
judging whether communication assistance needs to be started or not according to the video data and/or the audio data;
when communication assistance needs to be started, determining the role relationship among a plurality of communication objects through the video data and/or the audio data;
acquiring the character portrait corresponding to each communication object according to the character relation;
respectively interpreting the communication behavior data of the plurality of communication objects based on the character portrait to obtain a first interpretation result;
and processing the first interpretation result and outputting the first interpretation result in a form of matching with the current communication scene.
2. The character representation-based communication assistance method of claim 1, wherein said step of determining whether communication assistance needs to be activated based on said video data and/or said audio data comprises:
extracting first motion data and/or first face data from the video data; and/or the presence of a gas in the gas,
extracting first voice data from the audio data;
judging whether the first action data has action behaviors indicating that communication obstacles exist or not and/or whether the first face data has expressions indicating that communication obstacles exist or not and/or whether the first voice data has keywords indicating that communication obstacles exist or not;
and when the first action data comprises action behaviors which indicate that communication blockage exists, or/and the first face data comprises expressions which indicate that communication blockage exists, or/and the first voice data comprises keywords which indicate that communication blockage exists, determining that communication assistance needs to be started.
3. The character representation-based communication assistance method of claim 2, wherein said step of determining a character relationship between a plurality of communication objects through said video data and/or said audio data comprises:
performing first-class keyword retrieval on the audio data;
performing second-type key information retrieval on the video data;
when the second type of key information is retrieved, determining the current communication place according to the second type of key information;
and when the first type of keywords are retrieved, determining the role relationships among a plurality of communication objects according to the current communication place, the first type of keywords and the communication objects associated with the sentences to which the first type of keywords belong.
4. The character representation-based communication assistance method of claim 3, wherein the communication behavior data comprises:
communication language data, communication action data and/or communication expression data.
5. The character representation-based communication assistance method of claim 4, wherein the step of interpreting communication behavior data of the plurality of communication objects based on the character representation to obtain a first interpretation result comprises:
respectively configuring unique object identifiers for the plurality of communication objects;
sequencing the alternating current behavior data according to the generation time, and segmenting the data according to different alternating current objects to obtain a plurality of alternating current behavior data segments;
marking the corresponding object identifications for the plurality of alternating current behavior data segments respectively;
according to the character portrait of the corresponding communication object, the communication behavior data segment marked with the object identification is interpreted by utilizing the character characteristic label;
and fusing the interpretation results of all the communication behavior data segments to obtain a first interpretation result.
6. The character representation-based communication assistance method of claim 5, wherein the step of processing the first interpretation result and outputting the first interpretation result in a form matching with the current communication scene comprises:
performing preset data processing on the first interpretation result to obtain output data in various output forms;
acquiring information of a current communication scene, and selecting a first output form matched with the current communication scene according to a corresponding relation between the communication scene and the output form;
selecting first output data from the output data according to the first output form;
presenting the first output data.
7. The character representation-based communication assistance method according to claim 6, wherein the step of determining whether the first motion data includes a motion behavior indicating presence of communication obstruction and/or whether the first face data includes an expression indicating presence of communication obstruction and/or whether the first voice data includes a keyword indicating presence of communication obstruction includes:
recognizing and extracting gesture actions from the first action data, performing gesture recognition to obtain first gesture data, and/or,
extracting facial expression features of the first face data to obtain first expression data, and/or,
performing voice recognition on the first voice data to obtain first voice recognition data;
determining whether the first gesture data has a gesture indicating that communication obstruction exists, and/or,
judging whether the first expression data has an expression indicating that communication block exists or not, and/or,
and judging whether the first voice recognition data contains a keyword which represents that communication blockage exists or not.
8. The character representation-based communication assistance method of claim 7 wherein said capturing video data and audio data comprising a plurality of communication objects is preceded by:
determining a relationship between a first object and a second object from a plurality of communication objects, and generating a first relationship label by using respective unique object identifiers of the first object and the second object;
acquiring first communication behavior data between the first object and the second object;
constructing a first character portrait of the first object and a second character portrait of the second object according to the first communication behavior data and the first relation label;
repeating the above operations until all the communication objects establish character images according to different characters.
9. A communication assistance device based on a character portrait, comprising: the system comprises an acquisition module, a judgment module, a role relationship determination module, a role portrait acquisition module, an interpretation module and an output module;
the acquisition module is used for acquiring video data and audio data containing a plurality of communication objects;
the judging module is used for judging whether the communication assistance needs to be started according to the video data and/or the audio data;
the role relationship determining module is used for determining the role relationship among a plurality of communication objects through the video data and/or the audio data when communication assistance needs to be started;
the character portrait acquisition module is used for acquiring the character portrait corresponding to each communication object according to the character relation;
the interpretation module is used for respectively interpreting the communication behavior data of the plurality of communication objects based on the character portrait to obtain a first interpretation result;
and the output module is used for processing the first interpretation result and outputting the first interpretation result in a form of matching with the current communication scene.
10. A computer-readable storage medium, comprising,
the computer readable storage medium has stored therein at least one instruction, at least one program, a set of codes, or a set of instructions that are loaded and executed by a processor to implement the character representation-based communication assistance method of any of claims 1-8.
CN202210554278.8A 2022-05-20 2022-05-20 Role portrait based communication auxiliary method and device and storage medium Pending CN115171673A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210554278.8A CN115171673A (en) 2022-05-20 2022-05-20 Role portrait based communication auxiliary method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210554278.8A CN115171673A (en) 2022-05-20 2022-05-20 Role portrait based communication auxiliary method and device and storage medium

Publications (1)

Publication Number Publication Date
CN115171673A true CN115171673A (en) 2022-10-11

Family

ID=83483245

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210554278.8A Pending CN115171673A (en) 2022-05-20 2022-05-20 Role portrait based communication auxiliary method and device and storage medium

Country Status (1)

Country Link
CN (1) CN115171673A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116032679A (en) * 2023-03-28 2023-04-28 合肥坤语智能科技有限公司 Intelligent host interaction control system for intelligent hotel

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116032679A (en) * 2023-03-28 2023-04-28 合肥坤语智能科技有限公司 Intelligent host interaction control system for intelligent hotel
CN116032679B (en) * 2023-03-28 2023-05-30 合肥坤语智能科技有限公司 Intelligent host interaction control system for intelligent hotel

Similar Documents

Publication Publication Date Title
CN112651448B (en) Multi-mode emotion analysis method for social platform expression package
US6526395B1 (en) Application of personality models and interaction with synthetic characters in a computing system
WO2017112813A1 (en) Multi-lingual virtual personal assistant
US20240070397A1 (en) Human-computer interaction method, apparatus and system, electronic device and computer medium
CN112784696A (en) Lip language identification method, device, equipment and storage medium based on image identification
CN113380271B (en) Emotion recognition method, system, device and medium
CN112668407A (en) Face key point generation method and device, storage medium and electronic equipment
CN113067953A (en) Customer service method, system, device, server and storage medium
CN111046148A (en) Intelligent interaction system and intelligent customer service robot
CN112632244A (en) Man-machine conversation optimization method and device, computer equipment and storage medium
CN113703585A (en) Interaction method, interaction device, electronic equipment and storage medium
CN114974253A (en) Natural language interpretation method and device based on character image and storage medium
CN115171673A (en) Role portrait based communication auxiliary method and device and storage medium
CN113542797A (en) Interaction method and device in video playing and computer readable storage medium
CN111311713A (en) Cartoon processing method, cartoon display device, cartoon terminal and cartoon storage medium
CN115529500A (en) Method and device for generating dynamic image
CN111160051B (en) Data processing method, device, electronic equipment and storage medium
CN114359446A (en) Animation picture book generation method, device, equipment and storage medium
CN112233648A (en) Data processing method, device, equipment and storage medium combining RPA and AI
CN113762056A (en) Singing video recognition method, device, equipment and storage medium
Bin Munir et al. A machine learning based sign language interpretation system for communication with deaf-mute people
CN111062207A (en) Expression image processing method and device, computer storage medium and electronic equipment
CN117152308B (en) Virtual person action expression optimization method and system
Zikky et al. Utilizing Virtual Humans as Campus Virtual Receptionists
CN115376512B (en) Speech recognition system and method based on portrait

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination