CN110956142A - Intelligent interactive training system - Google Patents

Intelligent interactive training system Download PDF

Info

Publication number
CN110956142A
CN110956142A CN201911221001.8A CN201911221001A CN110956142A CN 110956142 A CN110956142 A CN 110956142A CN 201911221001 A CN201911221001 A CN 201911221001A CN 110956142 A CN110956142 A CN 110956142A
Authority
CN
China
Prior art keywords
engine
training
natural language
simulated
intelligent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911221001.8A
Other languages
Chinese (zh)
Inventor
朱丙坤
林砺
张建辉
卢凌云
沈海先
何雪海
毛国庆
覃亚芬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Pacific Insurance Group Co Ltd CPIC
Original Assignee
China Pacific Insurance Group Co Ltd CPIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Pacific Insurance Group Co Ltd CPIC filed Critical China Pacific Insurance Group Co Ltd CPIC
Priority to CN201911221001.8A priority Critical patent/CN110956142A/en
Publication of CN110956142A publication Critical patent/CN110956142A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • General Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Educational Technology (AREA)
  • Primary Health Care (AREA)
  • Technology Law (AREA)

Abstract

The invention discloses an intelligent interactive training system, which comprises: the virtual scene engine (1) is used for simulating a real-service scene and showing the simulated real-service scene; a natural language processing engine (2) for processing natural language information received in the simulated reality scenario; a multi-modal sentiment analysis engine (3) for modeling analysis of training input information received in the simulated practice scenario; a multi-dimensional intelligent profiling engine (4) which is communicated with the natural language processing engine and the multi-modal emotion analysis engine and analyzes the training content based on the output results of the natural language processing engine and the multi-modal emotion analysis engine. The invention can judge the expressions, micro-expressions, voiceprints, semantics and the like of the person to be accompanied by training by utilizing natural language processing and computer vision technology, and carries out multidimensional scoring and gives the accompanying training result comprehensively. The invention has the advantages of convenient use, simple operation and extremely high commercial value.

Description

Intelligent interactive training system
Technical Field
The invention belongs to the field of computer application, and particularly relates to an intelligent interactive training system.
Background
In the current insurance industry, insurance operators are often needed to introduce the pros and cons and the cost performance of different types of insurance to customers, the insurance operators are often needed to master a large amount of insurance patent knowledge, the insurance operators need to spend a large amount of time to learn to go to post, and the insurance operators are usually trained before going to post.
In the existing insurance staff training, fixed training time and place are mostly adopted, a lecturer gives content to a student, the lecturer needs to spend a lot of time and energy to prepare a course and teach related knowledge, the insurance staff needs to master a lot of insurance knowledge in a short time, the lecturer and the insurance staff bring great troubles and burdens, the form has high requirements on manpower and material resources, the interaction form is single, and the training effect is difficult to track. Meanwhile, for an insurance salesman, how to know the specific situation of the insurance salesman who knows the insurance related knowledge cannot be known through special detection, and the insurance salesman can only analyze the situation through observation of tests and instructors, but cannot accurately know the service capability of the insurance salesman through multiple dimensions.
At present, in the prior art, a technical scheme capable of solving the technical problems is not provided, and specifically, an intelligent interactive training system is lacked.
Disclosure of Invention
Aiming at the technical defects in the prior art, the invention aims to provide an intelligent interactive training system, which comprises:
the virtual scene engine is used for simulating a real-service scene and displaying the simulated real-service scene;
a natural language processing engine for processing natural language information received in the simulated practical scene;
a multi-modal sentiment analysis engine for modeling analysis of training input information received in the simulated practice scenario;
the multi-dimensional intelligent analysis engine is communicated with the natural language processing engine and the multi-mode emotion analysis engine and analyzes the training content based on the output results of the natural language processing engine and the multi-mode emotion analysis engine;
the natural language processing engine, the multi-modal emotion analysis engine and the multi-dimensional intelligent analysis engine are respectively connected with and communicated with the virtual scene engine.
Preferably, the method further comprises the following steps:
and the deep training mining engine is communicated with the multi-dimensional intelligent analysis engine and triggers new simulated training contents in the simulated practical scene based on the output result of the multi-dimensional intelligent analysis engine, wherein the new simulated training contents are adaptive to the output result of the multi-dimensional intelligent analysis engine.
Preferably, the virtual scene engine at least comprises a VR generating device for simulating the scene according to at least the age, sex, occupation and family structure of the simulated client and outputting the simulated practical scene.
Preferably, the multi-modal sentiment analysis engine comprises at least:
at least one capturing device for capturing character micro-expressions and/or voice input information;
and the emotion analysis engine is used for carrying out analysis modeling at least according to the character micro expression and/or sound input information.
Preferably, the emotion analysis engine performs analysis modeling by the following algorithm:
a. carrying out face detection and key point calibration based on a deep learning MTCNN algorithm; performing face alignment processing based on the calibrated key points;
b. inputting a standard face data set into a convolutional neural network for training, and constructing a deep expression recognition model:
c. inputting a standard facial image into the deep expression recognition model, determining the probability of the standard facial image belonging to each expression by using a softmax function, and taking the expression with the maximum probability value as a final recognition result.
Preferably, the standard face data set is input into a convolutional neural network for training, and an h-swish activation function is adopted to improve the accuracy of the network, wherein the formula is as follows:
Figure BDA0002300842450000031
preferably, the natural language processing engine includes at least:
intention recognition means for performing intention prediction after modeling the natural language;
and the dialogue management device is used for matching the optimal answer to the corresponding semantic of the natural language.
Preferably, the dialogue management device is further configured to develop a subsequent dialogue based on the optimal answer.
Preferably, the intention recognition device adopts Word2vector algorithm to train and generate a Word vector model, and uses Bi-LSTM algorithm to predict the intention.
Preferably, the multidimensional intelligent analysis engine implements a process of analyzing training contents through the following algorithm:
i: determining one or more single-modality data;
ii: performing multi-mode fusion modeling based on a multi-mode deep Boltzmann machine, and determining a multi-mode fusion training model;
iii: and inputting one or more single-mode data into the multi-mode fusion training model, and determining one or more evaluation results.
Preferably, the single modality data includes at least:
-intonation, semantics, facial micro-expressions;
-the precedence logic of the statements and the completeness of the coverage of knowledge points;
-pace of speech, proficiency of auxiliary word usage;
-similarity to standard sentences;
-facial movements, body gestures.
The invention aims to provide an intelligent interactive training system, which comprises: the virtual scene engine is used for simulating a real-service scene and displaying the simulated real-service scene; a natural language processing engine for processing natural language information received in the simulated practical scene; a multi-modal sentiment analysis engine for modeling analysis of training input information received in the simulated practice scenario; the multi-dimensional intelligent analysis engine is communicated with the natural language processing engine and the multi-mode emotion analysis engine and analyzes the training content based on the output results of the natural language processing engine and the multi-mode emotion analysis engine; the natural language processing engine, the multi-modal emotion analysis engine and the multi-dimensional intelligent analysis engine are respectively connected with and communicated with the virtual scene engine. The invention enables the training scene through the artificial intelligence technology, and starts to replace simple and repeated training, and makes the business personnel capable of simulating the business scene at any time and any place by using a convenient app or a small program tool to perform simulated drilling; the problem of a great deal of pain point that traditional training met is solved, gradually liberate the lecturer from tedious training work, make the student educate in happy, the light training of receiving of morating simultaneously, promote the effect of training. The project can judge expressions, micro expressions, voiceprints, semantics and the like of the person to be accompanied by training by utilizing natural language processing and computer vision technology, and carries out multidimensional scoring and gives accompanying results comprehensively. The invention has the advantages of convenient use, simple operation and extremely high commercial value.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic diagram illustrating the connection of modules of an intelligent interactive training system, in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram showing a specific flow of the emotion analysis engine for analysis modeling according to the first embodiment of the present invention; and
fig. 3 is a specific flowchart of a process of analyzing training content by the multidimensional intelligent profiling engine according to the second embodiment of the present invention.
Detailed Description
In order to better and clearly show the technical scheme of the invention, the invention is further described with reference to the attached drawings.
Fig. 1 is a schematic diagram illustrating a module connection of an intelligent interactive training system according to an embodiment of the present invention, and those skilled in the art will understand that, in view of the deficiencies of the prior art, the present invention discloses an intelligent interactive training system including a virtual scene engine for performing a practical scene simulation and presenting the simulated practical scene. The invention is mainly used for training insurance businessmen, but in combination with the technical scheme recorded in the invention, not only in the insurance industry, but also other training and teaching institutions can adopt the technical scheme recorded in the invention, and details are not repeated herein.
Further, the intelligent interactive training system further comprises a natural language processing engine, which is used for processing the natural language information received in the simulated practical scenario, the natural language processing engine is mainly used for receiving and processing the dialog content of the user, in the present invention, the processing is mainly performed in the forms of semantic algorithm matching, intention prediction, and the like, and the present invention will be further described in the detailed description below.
Further, the intelligent interactive training system further comprises a multi-modal emotion analysis engine, which is used for modeling and analyzing the training input information received in the simulated practical scene, and as understood by those skilled in the art, the training input information not only includes the dialogue language information of the user, but also includes the behavior of the user, specifically, micro-expressions of the user, facial movements of the user, the expression, posture and the like, and the multi-modal emotion analysis engine can be input into the model for analysis based on the facial features of the user, and obtain the analysis result.
Furthermore, the intelligent interactive training system also comprises a multi-dimensional intelligent analysis engine which is communicated with the natural language processing engine and the multi-mode emotion analysis engine and analyzes the training content based on the output results of the natural language processing engine and the multi-mode emotion analysis engine, wherein the multi-dimensional intelligent analysis engine can be used for comprehensively analyzing the multi-dimensional information of the user, giving specific scores and conclusions of each dimension of the user and places needing to be noticed by the user and needing to be improved next time, even creating another training content aiming at the specific situation of the user, and training the user in a targeted manner again.
In the present invention, the user is mainly analyzed in five dimensions, that is, the unimodal data at least includes intonation, semantics, facial micro-expression, sequential logic of sentences and completeness of knowledge point coverage, speed of speech, proficiency of auxiliary word application, similarity to standard sentences, facial movement, and body posture, which will be further described in detail in the specific embodiments described later, and will not be described herein again.
Further, with reference to the foregoing embodiment, the natural language processing engine, the multi-modal emotion analysis engine, and the multi-dimensional intelligent parsing engine are all connected and communicated with the virtual scene engine, that is, all natural language processing, multi-modal emotion analysis, and multi-dimensional intelligent parsing need to be implemented based on a virtual scene, and a user simulates an actual conversation in the virtual scene to obtain characteristics such as a sentence, a speech rate, a specific word, and a facial expression, and then performs analysis.
And further, the deep training mining engine is communicated with the multi-dimensional intelligent analysis engine and triggers new simulated training contents in the simulated practical scene based on the output result of the multi-dimensional intelligent analysis engine, wherein the new simulated training contents are adaptive to the output result of the multi-dimensional intelligent analysis engine, and in such an embodiment, the deep training mining engine is mainly used for mining a service ability 'short board' according to a service person portrait and a historical training result, and carrying out intelligent recommendation on a training task for improving the 'short board' ability by utilizing a machine learning algorithm, so that the improvement of the target of the deep training mining engine is facilitated, and the growth of the deep training mining engine is accelerated.
Preferably, the virtual scene engine at least comprises a VR generation device, which is used for performing scene simulation according to at least the age, sex, occupation, and family structure of a simulation client, and outputting the simulation practical scene, and the technical personnel in the field understand that the invention utilizes VR technology and other technologies to perform business scene simulation on the age, sex, occupation, family structure, and other dimensions of the client to be visited, so that the simulation scene is as close as possible to the business scene, and the business personnel can enter the scene close to the actual business in advance to perform intelligent training; the specific process is that a drilling scene is set, a student selects attributes such as character types, customer figures (age, gender, family structure, income and the like), customer emotions, meeting places, partner training tasks and the like, a service scene is set, then voice interaction is carried out on the student in real time in the drilling process, changes such as voice, facial expressions, semantic data and the like are collected, and interactive evaluation suggestions are given in real time; in the process of accompanying exercise, a help function is provided, so that a student can exercise and learn at the same time and obtain correct answers in time; after the drilling task is finished, carrying out voice, expression, semantics and other dimensionalities according to information collected in the drilling process, giving comprehensive grading and rating results, and giving accompanying advice; if not, it will indicate whether to go over again.
Further, the multi-modal emotion analysis engine comprises at least one capturing device for capturing the micro expression and/or sound input information of the character, in such an embodiment, the capturing device may be a camera, a video recorder, a microphone, or the like for capturing the expression and sound of the character, and more specifically, the capturing device may capture the real-time change of the expression and sound of the character.
The emotion multi-mode emotion analysis engine further comprises an analysis engine, wherein the analysis engine is used for analyzing and modeling according to the character micro expression and/or the voice input information, in such an embodiment, the character micro expression is input into the deep expression recognition model, the voice information is input into the voice processing model, and then output results corresponding to the character micro expression and the voice input information are obtained.
Fig. 2 shows a specific flowchart of the emotion analysis engine for performing analysis modeling according to the first embodiment of the present invention, specifically, the method includes the following steps:
firstly, the method proceeds to step S101, performs face detection and key point calibration based on the deep learning MTCNN algorithm, and performs face alignment processing based on the calibrated key points, and those skilled in the art understand that this step S101 actually includes two steps, that is, firstly, the deep learning MTCNN algorithm (Multi-task probabilistic neural network) is adopted to perform face detection and key point calibration, and the face region detection and the face key point detection are put together, and its theme frame is similar to a cascade. The whole can be divided into three-layer network structures of P-Net, R-Net and O-Net, wherein the P-Net is called a ProposalNet, and the basic structure is a fully-connected network. And performing initial feature extraction and frame calibration on the image pyramid constructed in the last step through an FCN, and filtering most of windows through a Bounding-Box Regression adjustment window and NMS (Network management system), wherein R-Net is fully called a Refine Network, the basic structure of the pyramid is a convolutional neural Network, and compared with the P-Net of the first layer, a full connection layer is added, so that the screening of input data is stricter. After a picture passes through P-Net, a plurality of prediction windows are left, all the prediction windows are sent to R-Net, a large number of candidate frames with poor effects are filtered out by the Network, and finally, Bounding-Box Regression and NMS are carried out on the selected candidate frames to further optimize prediction results, wherein O-Net is called Output Network, the basic structure is a complex convolutional neural Network, and compared with R-Net, one convolutional layer is added. The difference between the O-Net effect and the R-Net effect is that the structure of the layer can identify facial regions through more supervision, regress facial feature points of a person, finally output five facial feature points of the person, and then carry out face alignment treatment on key points of the detected face, specifically, two eyes, two corners of the mouth and 5 points of the nose are adopted for alignment.
Then, step S102 is carried out, a standard face data set is input into a convolutional neural network for training, a deep expression recognition model is constructed, the standard face data set is input into a well-defined convolutional neural network for training, the deep expression recognition model is constructed, an h-swish activation function is used, the accuracy of the network can be effectively improved, and a formula can be referred to
Figure BDA0002300842450000081
Finally, step S103 is performed, a standard facial image is input into the deep expression recognition model, the probability that the standard facial image belongs to each expression is determined by using a softmax function, the expression with the largest probability value is taken as a final recognition result, and the softmax function, also called an exponential normalization function, is a normalization form of a logistic function, and can compress a K-dimensional real number vector into a new K-dimensional real number vector in a range [0-1], which are the existing technologies at present and are not described herein again.
Furthermore, the natural language processing engine at least comprises an intention recognition device which is used for carrying out intention prediction after modeling the natural language, and a dialogue management device which is used for matching the optimal answer to the corresponding semantics of the natural language, and the technical personnel in the field understand that the invention obtains the input semantic text of an operator by the ASR technology, matches the answer corresponding to the input semantic by using the natural language technology, carries out intention recognition and dialogue management, and comprises an intention recognition model which is a multi-classification model, and is intended to adopt a Word2vector algorithm and a Bi-LSTM algorithm, wherein the Word2vector algorithm is used for training and generating a Word vector model, and then uses the Bi-LSTM algorithm to carry out intention prediction, and also comprises a plurality of dialogue models, namely, simple question and answer is solved by adopting an FQA question and answer pair mode, and for complex problems, the method needs to collect information through multiple rounds of conversations, complex problems need to be combed and classified, and the problems of a user pass through an intention identification module and trigger multiple rounds of conversations if the user is a multiple-round conversation scene according to a conversation theme, a conversation script and a flow, and a conversation process is managed through a conversation control module; in the process, a CRF algorithm is adopted for named entity identification and entity extraction, and further, the conversation management device is also used for developing subsequent conversations based on the optimal answers.
Fig. 3 shows a specific flowchart of a process of analyzing training content implemented by the multidimensional intelligent profiling engine according to the second embodiment of the present invention, and those skilled in the art understand that the process includes the following steps:
firstly, step S201 is entered, one or more single-mode data are determined, in such an embodiment, the method comprises five single-mode data, wherein one single-mode data comprises tone, semantics and facial micro-expression, which are emotion analysis, namely, the analysis is carried out according to the tone, the semantics and the facial micro-expression, image data such as facial expression, head posture and the like of a salesman are collected in the process of accompanying and training, and the change of emotion and concentration of the salesman is obtained through a deep learning technology; secondly, the completeness of the sequence logic of the sentences and the coverage rate of the knowledge points is the completeness of a logic structure, namely, the investigation is carried out according to the sequence logic of the answers and the coverage rate of the knowledge points; thirdly, proficiency level of the application of the speech rate and the auxiliary words is the proficiency level, namely, the proficiency level is identified according to the speech rate and the auxiliary words; fourthly, evaluating the similarity of the standard sentences and the standard sentences, namely the similarity of the standard sentences and the standard answers; and fifthly, the face movement and the body posture are supported by the face movement and the body posture of the partner training staff, namely the concentration degree. Furthermore, proficiency (speech speed, auxiliary words and the like), emotion analysis (intonation, semantics and facial micro expression), integrity (coverage rate of knowledge points), accuracy (similarity between answers and standard answers) and concentration (head posture change of a salesman) collected in the partner training process are comprehensively considered, scoring of multiple dimensions is carried out, and partner training results are comprehensively given.
Then, the process proceeds to step S202, and a multi-modal fusion modeling is performed based on the multi-modal deep boltzmann machine to determine a multi-modal fusion training model, and finally, in step S203, one or more single-modal data are input to the multi-modal fusion training model to determine one or more evaluation results. The technical personnel in the field understand that in the training process, the data collected by the method come from multiple modes such as speech speed, intonation, language and gas word modes, facial micro-expression, body posture modes, semantics and the like of voice, the multi-mode fusion modeling is carried out based on a multi-mode deep boltzmann machine, the multi-mode deep boltzmann machine analyzes the training result in multiple dimensions, and the multi-mode deep boltzmann machine can learn the input data to synthesize the joint expression of the modes by combining the deep boltzmann machines of the multiple modes. For data with certain missing modes, the model disclosed by the invention can generate the missing modes by sampling condition distribution and other methods, and can also obtain multi-mode expression of the missing modes. Inputting single-mode data into x1, x2 and … xn, and performing multi-mode fusion by using xm ═ f (x1,. once.xn); the function f is designed by adopting a Deep Boltzmann Machines (DBM), single-mode data are mapped into a multi-mode space, and model training is carried out to obtain a probability value; and finally, evaluating the training result according to the probability value.
The invention is mainly innovated in the following 5 points, firstly, a virtual scene is built: by utilizing VR technology, scene simulation is carried out on the dimensions of age, sex, occupation, family structure and the like of a client to be visited, so that a simulated scene is close to a business scene as much as possible, and a business worker enters a scene close to actual business in advance to carry out scene training; then, intelligent interaction and multi-mode emotion analysis are carried out, and changes of expressions and sounds in the modeling analysis training process are carried out through technologies such as language input and staff micro-expression recognition; then, natural language processing: performing natural language understanding and natural language understanding by using a deep learning model, and performing dialogue management; and then, combining the above embodiment, carrying out fusion modeling by 5 dimensions through a multi-dimensional intelligent analysis model, carrying out evaluation, and finally carrying out intelligent pushing.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims (11)

1. An intelligent interactive training system, comprising:
the virtual scene engine (1) is used for simulating a real-service scene and showing the simulated real-service scene;
a natural language processing engine (2) for processing natural language information received in the simulated reality scenario;
a multi-modal sentiment analysis engine (3) for modeling analysis of training input information received in the simulated practice scenario;
a multi-dimensional intelligent analysis engine (4) which is communicated with the natural language processing engine and the multi-mode emotion analysis engine and analyzes the training content based on the output results of the natural language processing engine and the multi-mode emotion analysis engine;
the natural language processing engine (2), the multi-modal emotion analysis engine (3) and the multi-dimensional intelligent analysis engine (4) are respectively connected with and communicate with the virtual scene engine (1).
2. The intelligent interactive training system as recited in claim 1, further comprising:
a deep training mining engine (5) which is communicated with the multi-dimensional intelligent profiling engine (4) and triggers new simulated training contents in the simulated practical scene based on the output result of the multi-dimensional intelligent profiling engine (4), wherein the new simulated training contents are suitable for the output result of the multi-dimensional intelligent profiling engine (4).
3. An intelligent interactive training system according to claim 1 or 2, wherein the virtual scene engine (1) comprises at least one VR generator (11) for performing scene simulation based on at least the age, sex, occupation, and family structure of the simulated client, and outputting the simulated practice scene.
4. The intelligent interactive training system according to any one of claims 1 to 3, characterized in that the multimodal emotion analysis engine (3) comprises at least:
at least one capturing device (31) for capturing human micro expressions and/or voice input information;
an emotion analysis engine (32) for analytical modeling based at least on the character micro-expressions and/or voice input information.
5. The intelligent interactive training system of claim 4, wherein the sentiment analysis engine is analytically modeled by the following algorithm:
a. performing face detection and key point calibration based on a deep learning MTCNN algorithm, and performing face alignment processing based on the calibrated key points;
b. inputting a standard face data set into a convolutional neural network for training, and constructing a deep expression recognition model:
c. inputting a standard facial image into the deep expression recognition model, determining the probability of the standard facial image belonging to each expression by using a softmax function, and taking the expression with the maximum probability value as a final recognition result.
6. The intelligent interactive training system of claim 5, wherein the standard face data set is input to a convolutional neural network for training and the accuracy of the network is improved by using an h-swish activation function, which has the formula:
Figure FDA0002300842440000021
7. the intelligent interactive training system according to any one of claims 1 to 6, characterized in that the natural language processing engine (2) comprises at least:
intention recognition means (21) for predicting an intention after modeling the natural language;
dialog management means (22) for matching the optimal answer to the natural language correspondence semantics.
8. The intelligent interactive training system as claimed in claim 6, wherein the dialogue management device is further configured to develop subsequent dialogues based on the optimal answers.
9. The intelligent interactive training system as claimed in claim 7 or 8, wherein the intention recognition device adopts Word2vector algorithm to train and generate Word vector model, and uses Bi-LSTM algorithm to predict the intention.
10. The intelligent interactive training system of any one of claims 1 to 9, wherein the multi-dimensional intelligent profiling engine implements the process of analyzing training content by the following algorithm:
i: determining one or more single-modality data;
ii: performing multi-mode fusion modeling based on a multi-mode deep Boltzmann machine, and determining a multi-mode fusion training model;
iii: and inputting one or more single-mode data into the multi-mode fusion training model, and determining one or more evaluation results.
11. The intelligent interactive training system of claim 10, wherein the single-modality data comprises at least:
-intonation, semantics, facial micro-expressions;
-the precedence logic of the statements and the completeness of the coverage of knowledge points;
-pace of speech, proficiency of auxiliary word usage;
-similarity to standard sentences;
-facial movements, body gestures.
CN201911221001.8A 2019-12-03 2019-12-03 Intelligent interactive training system Pending CN110956142A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911221001.8A CN110956142A (en) 2019-12-03 2019-12-03 Intelligent interactive training system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911221001.8A CN110956142A (en) 2019-12-03 2019-12-03 Intelligent interactive training system

Publications (1)

Publication Number Publication Date
CN110956142A true CN110956142A (en) 2020-04-03

Family

ID=69979543

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911221001.8A Pending CN110956142A (en) 2019-12-03 2019-12-03 Intelligent interactive training system

Country Status (1)

Country Link
CN (1) CN110956142A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767199A (en) * 2020-12-25 2021-05-07 科讯嘉联信息技术有限公司 Enterprise employee training system and method
CN113627801A (en) * 2021-08-12 2021-11-09 云知声(上海)智能科技有限公司 Intelligent partner training method and device, electronic equipment and storage medium
CN114117755A (en) * 2021-11-11 2022-03-01 泰康保险集团股份有限公司 Simulation drilling method and device, computing equipment and storage medium
US11532179B1 (en) 2022-06-03 2022-12-20 Prof Jim Inc. Systems for and methods of creating a library of facial expressions

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366618A (en) * 2013-07-18 2013-10-23 梁亚楠 Scene device for Chinese learning training based on artificial intelligence and virtual reality
CN105843381A (en) * 2016-03-18 2016-08-10 北京光年无限科技有限公司 Data processing method for realizing multi-modal interaction and multi-modal interaction system
CN106485973A (en) * 2016-10-21 2017-03-08 上海申电教育培训有限公司 Electric power skills training platform based on virtual reality technology
CN106919251A (en) * 2017-01-09 2017-07-04 重庆邮电大学 A kind of collaborative virtual learning environment natural interactive method based on multi-modal emotion recognition
CN108804654A (en) * 2018-06-07 2018-11-13 重庆邮电大学 A kind of collaborative virtual learning environment construction method based on intelligent answer
CN108877336A (en) * 2018-03-26 2018-11-23 深圳市波心幻海科技有限公司 Teaching method, cloud service platform and tutoring system based on augmented reality
CN108876643A (en) * 2018-05-24 2018-11-23 北京工业大学 It is a kind of social activity plan exhibition network on acquire(Pin)Multimodal presentation method
CN109324688A (en) * 2018-08-21 2019-02-12 北京光年无限科技有限公司 Exchange method and system based on visual human's behavioral standard
CN110175229A (en) * 2019-05-27 2019-08-27 言图科技有限公司 A kind of method and system carrying out online training based on natural language
CN110209791A (en) * 2019-06-12 2019-09-06 百融云创科技股份有限公司 It is a kind of to take turns dialogue intelligent speech interactive system and device more
CN110471531A (en) * 2019-08-14 2019-11-19 上海乂学教育科技有限公司 Multi-modal interactive system and method in virtual reality
CN110489756A (en) * 2019-08-23 2019-11-22 上海乂学教育科技有限公司 Conversational human-computer interaction spoken language evaluation system
CN110516622A (en) * 2019-08-29 2019-11-29 的卢技术有限公司 A kind of gender of occupant, age and emotional intelligence recognition methods and system

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366618A (en) * 2013-07-18 2013-10-23 梁亚楠 Scene device for Chinese learning training based on artificial intelligence and virtual reality
CN105843381A (en) * 2016-03-18 2016-08-10 北京光年无限科技有限公司 Data processing method for realizing multi-modal interaction and multi-modal interaction system
CN106485973A (en) * 2016-10-21 2017-03-08 上海申电教育培训有限公司 Electric power skills training platform based on virtual reality technology
CN106919251A (en) * 2017-01-09 2017-07-04 重庆邮电大学 A kind of collaborative virtual learning environment natural interactive method based on multi-modal emotion recognition
CN108877336A (en) * 2018-03-26 2018-11-23 深圳市波心幻海科技有限公司 Teaching method, cloud service platform and tutoring system based on augmented reality
CN108876643A (en) * 2018-05-24 2018-11-23 北京工业大学 It is a kind of social activity plan exhibition network on acquire(Pin)Multimodal presentation method
CN108804654A (en) * 2018-06-07 2018-11-13 重庆邮电大学 A kind of collaborative virtual learning environment construction method based on intelligent answer
CN109324688A (en) * 2018-08-21 2019-02-12 北京光年无限科技有限公司 Exchange method and system based on visual human's behavioral standard
CN110175229A (en) * 2019-05-27 2019-08-27 言图科技有限公司 A kind of method and system carrying out online training based on natural language
CN110209791A (en) * 2019-06-12 2019-09-06 百融云创科技股份有限公司 It is a kind of to take turns dialogue intelligent speech interactive system and device more
CN110471531A (en) * 2019-08-14 2019-11-19 上海乂学教育科技有限公司 Multi-modal interactive system and method in virtual reality
CN110489756A (en) * 2019-08-23 2019-11-22 上海乂学教育科技有限公司 Conversational human-computer interaction spoken language evaluation system
CN110516622A (en) * 2019-08-29 2019-11-29 的卢技术有限公司 A kind of gender of occupant, age and emotional intelligence recognition methods and system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767199A (en) * 2020-12-25 2021-05-07 科讯嘉联信息技术有限公司 Enterprise employee training system and method
CN113627801A (en) * 2021-08-12 2021-11-09 云知声(上海)智能科技有限公司 Intelligent partner training method and device, electronic equipment and storage medium
CN114117755A (en) * 2021-11-11 2022-03-01 泰康保险集团股份有限公司 Simulation drilling method and device, computing equipment and storage medium
US11532179B1 (en) 2022-06-03 2022-12-20 Prof Jim Inc. Systems for and methods of creating a library of facial expressions
US11790697B1 (en) 2022-06-03 2023-10-17 Prof Jim Inc. Systems for and methods of creating a library of facial expressions
US11922726B2 (en) 2022-06-03 2024-03-05 Prof Jim Inc. Systems for and methods of creating a library of facial expressions

Similar Documents

Publication Publication Date Title
CN107423398A (en) Exchange method, device, storage medium and computer equipment
CN115239527B (en) Teaching behavior analysis system based on knowledge base teaching feature fusion and modeling
CN112908355B (en) System and method for quantitatively evaluating teaching skills of teacher and teacher
CN110069707A (en) Artificial intelligence self-adaptation interactive teaching system
CN112069970B (en) Classroom teaching event analysis method and device
CN110956142A (en) Intelligent interactive training system
CN114254208A (en) Identification method of weak knowledge points and planning method and device of learning path
CN111401828A (en) Dynamic intelligent interviewing method, device and equipment for strengthening sorting and computer storage medium
US20230176911A1 (en) Task performance adjustment based on video analysis
CN115205764B (en) Online learning concentration monitoring method, system and medium based on machine vision
CN115146975A (en) Teacher-machine-student oriented teaching effect evaluation method and system based on deep learning
Wagner et al. Applying cooperative machine learning to speed up the annotation of social signals in large multi-modal corpora
CN117251057A (en) AIGC-based method and system for constructing AI number wisdom
CN116050892A (en) Intelligent education evaluation supervision method based on artificial intelligence
CN116957867A (en) Digital human teacher online teaching service method, electronic equipment and computer readable storage medium
CN116244474A (en) Learner learning state acquisition method based on multi-mode emotion feature fusion
Jingning Speech recognition based on mobile sensor networks application in English education intelligent assisted learning system
CN110705523B (en) Entrepreneur performance evaluation method and system based on neural network
CN111460245A (en) Multi-dimensional crowd characteristic measuring method
Zheng et al. Automated Multi-Mode Teaching Behavior Analysis: A Pipeline Based Event Segmentation and Description
CN111461153A (en) Crowd characteristic deep learning method
CN117932044B (en) Automatic dialogue generation method and system for psychological counseling assistant based on AI
Yuanyuan et al. Research on the Application Framework of Intelligent Technologies to Promote Teachers' Classroom Teaching Behavior Evaluation.
He Development and Implementation of an Embedded Systems-Based Artificial Intelligence-Driven Music Teaching System for Vocational Colleges
Ma et al. Optimization of Computer Aided English Pronunciation Teaching System Based on Speech Signal Processing Technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200403