CN111063346A

CN111063346A - Cross-media star emotion accompany interaction system based on machine learning

Info

Publication number: CN111063346A
Application number: CN201911272671.2A
Authority: CN
Inventors: 李骅
Original assignee: Fifth Dimension Tianjin Intelligent Technology Co Ltd
Current assignee: Fifth Dimension Tianjin Intelligent Technology Co Ltd
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-04-24

Abstract

The invention discloses a cross-media star emotion accompanying interaction system based on machine learning, which comprises the following components: the input content identification module is used for identifying the input contents in various media forms and reading the meanings of the input contents; the interactive content matching module is used for matching the dialect files corresponding to the meanings of the input content; the star real person image module comprises a sound ray model and an avatar model and is used for simulating the sound ray characteristics and the appearance skin of the star real person; and the output module is used for outputting the content of the language file matched with the input content. By adopting the technical scheme, the defined interactive system has the personal attributes of the star, the voice line characteristics and the shape skin of the real star are simulated, the dialectical characteristics of the star are determined according to the defined personal attributes of the star of the interactive system, and response contents aiming at different input contents are formed through machine learning, so that a user can obtain more real experience of interacting with the star.

Description

Cross-media star emotion accompany interaction system based on machine learning

Technical Field

The invention relates to a cross-media star emotion accompanying interaction system based on machine learning, and belongs to the technical field of voice robots.

Background

Today there are many smart voice devices or robots, such as Cortana, Siri, OkGoogle in smart phones, Google Home in personal assistance, Amazon Echo, interactive voice response systems, telephone robots in voice robots, customer service robots, telemarketing robots, etc. They work in the same general principle, and a complete Speech interactive system usually has three typical modules, where Speech Recognition (ASR) converts voice into Text, Natural Language Processing (NLP) interprets the meaning of Text and gives feedback, and finally, output information is converted into voice by Text To Speech (TTS).

In a man-machine interaction mode, voice interaction is the most natural interaction mode, and meanwhile, compared with characters, voice can transmit emotion due to sound and rhythm. However, voice interaction has not been widely popularized because people are not used to converse with robots or mobile phones. Statistically, only 3% of people who use Siri in public places. The default language is human-to-human communication or animal communication with cats and dogs, which is considered human. When people communicate with objects, action interaction is more adopted, and psychological barrier of adopting voice interaction is larger. In order to reduce the psychological stress of users, many intelligent voice assistants are set as virtual character images, for example, Siri is determined as a high-cool and loyal female character, and a favorite robot of millet company is set as a lovely and poor-mouth girl character, and the similar character image setting is used to try to improve the acceptance of people to the voice robot. Even so, voice man-machine interaction still can't obtain wide application in the crowd, want to stride across the popularization bottleneck of voice man-machine interaction, just need a new way. Besides, in the current human-computer interaction system, the following defects still exist: 1. the interactive media is single, and is mostly single media interaction of texts or voices; 2. the voice robot is difficult to customize, and different topic contents and different character characteristics cannot be set for the voice robot; 3. the deployment platform is single, and can only be deployed on a cloud platform, so that the operation and maintenance cost is high; 4. the interactive mode is single, and only passive waiting for user access is available, and active communication cannot be achieved; 5. the interactive scene is single, and different answers cannot be made according to the real-time environment and the user portrait.

Disclosure of Invention

Therefore, the invention aims to provide a cross-media star emotion accompanying interactive system based on machine learning, which enables a user to obtain experience of interaction with stars by defining that the interactive system has star body attributes.

In order to achieve the above object, the cross-media star emotion partner interaction system based on machine learning of the present invention includes:

the input content identification module is used for identifying the input contents in various media forms and reading the meanings of the input contents;

the interactive content matching module is used for matching the dialect files corresponding to the meanings of the input content;

the star real person image module comprises a sound ray model and an avatar model and is used for simulating the sound ray characteristics and the appearance skin of the star real person;

and the output module is used for outputting the content of the language file matched with the input content.

And forming the dialect file, determining the dialect characteristics of the star according to the defined star body attributes of the interactive system, forming response contents aiming at different input contents through machine learning, and forming the file.

The media form of the input content at least comprises characters, expressions, pictures, voice or videos.

The video comprises a chat video acquired by the user terminal in real time.

The input content identification module at least:

carrying out approximate sentence judgment and emotion recognition on the text content by using natural language processing, and inducing the topic;

or judging the meaning of the expression by using a gallery retrieval technology for the expression content;

or acquiring gender, age, emotional characteristics and picture content information of the user by using face recognition and image recognition technologies for the picture content;

or voice noise reduction, voice activity detection and voice recognition technologies are used for voice content to convert voice into text, and then recognition and meaning reading are carried out on the text content;

or decoding the video content into image frames and voice frames to form images and text texts, and then identifying the image content and the text content and reading the meanings.

And the interactive content matching module is used for matching the dialect file corresponding to the meaning of the input content by combining the acquired real-time environment information.

And the output module selects the corresponding media form to output according to the media form of the input content, and at least comprises characters, pictures, voice or videos.

The cross-media star emotion accompanying and interacting system based on machine learning further comprises a cloud server, wherein the cloud server is used for receiving human-computer interaction records of users and establishing user portraits through big data analysis and deep learning.

The cloud server is also used for contacting a third party to provide user behavior early warning.

The output module is also used for actively calling the user.

By adopting the technical scheme, compared with the prior art, the cross-media star emotion accompany interaction system based on machine learning has the following beneficial effects:

1. defining that an interactive system has the personal attribute of the star, simulating the vocal ray characteristic and the shape skin of the real star, determining the dialectical characteristic of the star according to the personal attribute of the star defined by the interactive system, and forming response contents aiming at different input contents through machine learning so that a user can obtain more real experience of interacting with the star;

2. the media forms of the input content comprise characters, expressions, pictures, voice and videos, so that the interaction mode is more flexible and diversified;

3. the interaction system can be set to have different star personal attributes according to the preferences of different users, so that the method is suitable for popularization and application of a human-computer interaction technology and can meet independent individual requirements of different users;

4. the interactive system can perform cloud service deployment, semi-cloud service deployment and offline deployment, and is more flexible and convenient to apply;

5. the interactive system can establish a user portrait through big data analysis and deep learning according to the human-computer interaction records of the user, so that active calling can be carried out by combining the use habits of the user, and the personification degree is higher;

6. the interactive system can match the dialect files corresponding to the meanings of the input contents by combining the acquired real-time environment information, so that thousands of people and thousands of faces are realized, and different contents are displayed for different environments and different users.

Drawings

FIG. 1 is a system block diagram of a cross-media star emotion partner interaction system based on machine learning.

Detailed Description

The invention is described in further detail below with reference to the figures and the detailed description.

The invention provides a cross-media star emotion accompanying and interacting system based on machine learning, which can be set to be personal characteristic attributes of different stars according to the selection of a user, interact with the user in star identities, and meanwhile, the interaction mode is not limited to the voice interaction mode of a traditional chat robot, and can adopt various media modes for interaction.

Specifically, the cross-media star emotion accompany interaction system based on machine learning comprises an input content identification module, an interaction content matching module, a star real person image module and an output module, wherein the input content identification module is used for identifying and semantically analyzing the content input by a user, the interaction content matching module is used for matching response information corresponding to the input content, the response information is output outwards in a corresponding media form through the output module, and when voice output and image output are involved, star real person image characteristic data in the star real person image module is called.

The star real person image module comprises a sound ray model and an avatar model and is used for simulating the sound ray characteristics and the appearance skin of the star real person.

The sound ray model is finally formed by recording real voice of the star at the early stage and adjusting standard sound ray parameters. The virtual image model of the star is modeled by computer graphics technology (CG technology) to form the appearance skin of the star real person, and the virtual image model can show different expressions in the process of carrying out video chat conversation and can match corresponding mouth shapes according to the chat text content, so that the effect similar to the real person is formed by simulating sound and appearance characteristics.

The input content identification module is used for identifying input contents in various media forms and reading the meanings of the input contents. The media form of the input content at least comprises characters, expressions, pictures, voice or videos, and the videos also comprise chat videos collected by the user terminal in real time.

The input content identification module can perform approximate sentence judgment and emotion identification on character content by using natural language processing, generalize the topics, judge the meanings of expressions on the expression content by using a gallery retrieval technology, acquire the sex, age, emotion characteristics and picture content information of a user by using a face identification and image identification technology on the picture content, convert voice into character texts by using a voice noise reduction, voice activity detection and voice identification technology on the voice content, identify and read the meanings of the character contents, decode video contents into image frames and voice frames to form pictures and character texts, and identify and read the meanings of the picture contents and the character contents.

The identification of the input content is the initial behavior of the man-machine interaction process, and the interactive system can carry out correct subsequent steps only by effectively identifying the input content. At present, a voice recognition technology, a natural language processing technology and a face and image recognition technology have relatively complete achievements, and for the recognition of video content, the video content can be decoded into image frames and voice frames, and the recognition can be completed by adopting a mature voice recognition technology and an image recognition technology.

Besides recognizing and reading the meaning of the text, the acquisition of the gender, age and emotional characteristic information of the user can be used for adjusting the matching strategy of the language file when the interactive content matching is carried out subsequently, for example, the language of the caring strategy can be matched when the distressing emotion of the user is recognized, or the language strategy of the active inquiry can be matched when the happy emotion of the user is recognized; further, if the user is identified as male or female, the tactical strategy will be adjusted. The adjusted strategies can be preset in the system, can be directly used in an off-line mode, and can also be used for performing the tactical matching by performing big data analysis by using the identified related information as parameters through the cloud server after being connected with the cloud server.

The interactive content matching module is used for matching a dialect file corresponding to the meaning of the input content by combining the acquired real-time environment information. And forming the dialect file, determining the dialect characteristics of the star according to the defined star body attributes of the interactive system, forming response contents aiming at different input contents through machine learning, and forming the file.

The human attributes of the star may include the personality characteristics, the type of temperament, the social evaluation attributes and the like of the true star, for example, the personality characteristics include one or more of extroversion, liveliness, enthusiasm, relativity, vividness, waiver, chatty, alertness, adaptability, organization, reaction agility, mental weakness, worship, impulsivity, shivering, irritability, rigidness, mania, grogginess, fruit dare and the like, the type of qi includes sanguineness, mucous mass, depressive mass, juice bile mass and the like, the social evaluation includes one or more of mental weakness, polyhidrosis, funny, observation, self-confidence, enjoyment, life enjoyment, no patience, laziness, good intersity, good capture, strong organizational ability, good financing, easy silence, fear, solemy speech abandonment, obsession, obstination, wealth, affluence, affinity, emotion, emotional change, attitude and the like. For example, various labels can be labeled during the dialogies, personal attributes can be preset for certain stars, such as a certain star is a known learner, the human body attributes comprise humorous, alertness, bosom, multiple blood qualities, social benefits, upward benefits, affinity and the like, the formation of the star talk file forms a file with the above-mentioned feature labels through machine learning, in the interactive content matching process, the words marked with labels are selected according to input contents, and some other directional labels are further marked in a word operation file, for example, some dialogs may be for female users, some may be for male users, some may correspond to different age groups, some may correspond to different moods, etc., the final output is then further selected in conjunction with other information identified from the input content (e.g., the user is female, age 21, emotional distraction, etc.).

Of course, in this embodiment, only a few aspects of defining the star person attribute are illustrated, and in addition, other aspects may also be included, which are not described herein again.

And forming response strategies corresponding to different star body attributes in a machine learning mode according to the defined star body attributes, and forming files according to response contents of different input contents.

The interactive system can also acquire real-time environment information, including user position information, weather, the number of steps the user walks on the same day, star hot news and the like, and adjust the content of the response according to the environment information.

The interactive system further comprises a cloud server, wherein the cloud server is used for receiving the human-computer interaction records of the user, building the user portrait through big data analysis and deep learning, and adjusting the response strategy according to the user characteristics through building the user portrait to form different response contents in each interactive process.

The interactive system in the embodiment adopts various media forms for input and output, so that the man-machine interaction behavior is more convenient and complete, the personification degree of the interactive robot is higher, and a user can have more real experience of interacting with stars. And if the user sets the interactive system as a certain entertainment star, the interactive system sets the interactive logic and the response strategy according to the personal characteristic attribute of the star. The star hot news acquired by acquiring real-time environment information and the user portrait established in the cloud server enable the interactive system to be more anthropomorphic in the response process, for example, the user calls in a video chat mode at night, the interactive system responds in a video chat mode, and when the star is busy in a certain work recently, the response content can be expanded around the work; when the user portrayal shows that the user is approaching an examination, the answer content may choose to care the user late at night and advise him that he should take a break.

In addition, the cloud server is also used for contacting a third party to provide user behavior early warning, for example, when the user suffers from dangerous behaviors, has suicide tendency and the like and seriously endangers personal safety, the cloud server can timely inform a guardian, relatives and friends or give an alarm.

The output module is used for outputting the language file content matched with the input content. The output module selects the corresponding media form to output according to the media form of the input content, and the output at least comprises forms of characters, pictures, voice or video.

The output module is also used for actively calling the user, and the active calling function can be selected according to the user and can also be used for establishing user habits in the user image according to the cloud server. For example, if the user is used to listen to the broadcast regularly, the user may be actively reminded of the broadcast time in advance, or the broadcast content may be discussed; if the time that the user expects to finish a certain event is captured in the interaction history with the user, the user can be reminded of the time approaching, and the like.

The cross-media star emotion accompanying interactive system based on machine learning can be arranged on a cloud server platform, and a basic program part can also be arranged on a local user terminal. The online information can be acquired immediately by adopting a cloud platform or semi-cloud platform mode, and the background strong operation function of the cloud platform is used. When the cloud platform deployment is adopted, a user can use the system through a webpage, a lightweight client or a telephone line, upload input contents and process the input contents by a server; when deployment of the semi-cloud platform is carried out, a user installs software, and only a server synchronizes a conversation file, real-time environment information and a conversation record for reporting, wherein a conversation process is completed by a local client.

Certainly, the basic program can also be completely arranged at the local user terminal and used off-line, the system is completely installed at the client, and the real-time environment information and the report of the call record cannot be acquired.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims

1. A cross-media star emotion partner interaction system based on machine learning is characterized by comprising:

2. The machine learning-based cross-media star emotion companion interaction system of claim 1, wherein: and forming the dialect file, determining the dialect characteristics of the star according to the defined star body attributes of the interactive system, forming response contents aiming at different input contents through machine learning, and forming the file.

3. The machine learning-based cross-media star emotion companion interaction system of claim 1, wherein: the media form of the input content at least comprises characters, expressions, pictures, voice or videos.

4. The machine learning-based cross-media star emotion companion interaction system of claim 3, wherein: the video comprises a chat video acquired by the user terminal in real time.

5. The machine learning-based cross-media star emotion companion interaction system of any of claims 1-4, wherein the input content identification module at least:

6. The machine learning-based cross-media star emotion companion interaction system of any of claims 1-4, wherein: and the interactive content matching module is used for matching the dialect file corresponding to the meaning of the input content by combining the acquired real-time environment information.

7. The machine learning-based cross-media star emotion companion interaction system of any of claims 1-4, wherein: and the output module selects the corresponding media form to output according to the media form of the input content, and at least comprises characters, pictures, voice or videos.

8. The machine learning-based cross-media star emotion companion interaction system of any of claims 1-4, wherein: the cloud server is used for receiving the human-computer interaction records of the users and establishing the user portrait through big data analysis and deep learning.

9. The machine learning-based cross-media star emotion companion interaction system of claim 8, wherein: the cloud server is also used for contacting a third party to provide user behavior early warning.

10. The machine learning-based cross-media star emotion companion interaction system of claim 8, wherein: the output module is also used for actively calling the user.