CN110524559B

CN110524559B - Intelligent man-machine interaction system and method based on personnel behavior data

Info

Publication number: CN110524559B
Application number: CN201910815748.XA
Authority: CN
Inventors: 甘小勇
Original assignee: Chengdu Weizhi Technology Co ltd
Current assignee: Chengdu Weizhi Technology Co ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2022-06-10
Anticipated expiration: 2039-08-30
Also published as: CN110524559A

Abstract

The invention discloses an intelligent human-computer interaction system and method based on personnel behavior data, wherein the human-computer interaction system comprises a local subsystem and a cloud subsystem; the local subsystem comprises a local detection module, a local processing module, a voice synthesis and output module and a mobile unit; the cloud subsystem comprises: the system comprises a behavior recognition module, an identity recognition module and a personnel identity behavior database; the invention can identify the interactive scene according to the field or historical behavior data, make different interactive responses according to the identity of the person, and realize richer and more natural interactive responses according to the current behavior and the historical behavior data of the person.

Description

Intelligent man-machine interaction system and method based on personnel behavior data

Technical Field

The invention relates to a human-computer interaction robot, in particular to an intelligent human-computer interaction system and method based on personnel behavior data.

Background

With the rapid breakthrough of the algorithm and the calculation power of the artificial intelligence technology, the demand of the artificial intelligence product is more and more intense. Typical representatives at home and abroad are intelligent sound products taking an intelligent voice technology as a core, and the products mainly take passive voice response or mechanical response according to preset starting scenes or time as main characteristics and cannot perform active intelligent interaction according to behavior data of personnel such as identity, action, expression, social contact and the like.

The traditional voice interaction technology mainly considers the application of standardized scenes such as security, e-commerce and hotels, and the patent with the publication number of CN 108877797A discloses an active interactive intelligent voice system, which can make active or passive response according to preset scenes or time, but cannot make intelligent response according to on-site or historical behavior (action, expression and social contact) data, and cannot meet the requirements of personalized artificial intelligent interaction scenes in the fields of education, medical treatment and old age.

Disclosure of Invention

The invention aims to provide an intelligent human-computer interaction system based on personnel behavior data to solve the problems, which comprises a local subsystem and a cloud subsystem;

the local subsystem comprises:

the local detection module is used for acquiring face and skeleton images, scene images and voice print data;

the local processing module is used for analyzing the image face and the skeleton, processing the voice data and tracking a face target; the system captures a face image, obtains face characteristic parameters, matches personnel identity information through an identity recognition module, obtains personnel behavior data, recognizes the personnel behavior data through the behavior recognition module and adds the recognition data into the personnel identity behavior database; when the characteristic parameters accord with a set value, marking the characteristic points of the face image, and tracking the characteristic points;

And the voice synthesis and output module is used for outputting voice interaction data.

The mobile unit comprises a motor and a limiting device, the system sets the initial rotation angle of the motor, and after the human face and the skeleton are detected, the rotation angle of the motor is adjusted to enable the human face and the skeleton image to be positioned in the center of the picture;

the cloud subsystem comprises:

the behavior recognition module is used for detecting the human skeleton image, recognizing human body actions through the action model and performing action model training;

the identity recognition module is used for detecting the face image, extracting face characteristic points and recognizing the identity information of the personnel through characteristic matching;

the personnel identity behavior database stores personnel identity data and current behavior and historical behavior data of personnel;

and the interactive content output module processes the current behavior and historical behavior data of the personnel and outputs interactive data to the voice synthesis and output module.

The intelligent man-machine interaction method based on the personnel behavior data comprises the following steps:

s1, the system collects image data and voice print data;

s2, extracting the face feature and the skeleton feature by the system, recognizing the voice voiceprint data and recognizing the scene;

s3, judging whether the face and the skeleton are in the center position of the image, if so, turning to S4, otherwise, calculating the positions of the face and the skeleton by the system, controlling the motor to rotate to enable the face and the skeleton to be in the center position of the image, and tracking the face target;

S4, the system identifies the personnel identity and the behavior according to the personnel identity behavior database;

s5, the system generates voice interaction data according to the personnel identity, the behavior and the scene data;

s6, the system outputs the voice interaction data.

The invention has the beneficial effects that: the invention can identify the interactive scene according to the data of the on-site or historical behaviors (actions, expressions and social contact), make different interactive responses according to the identity of the person, and realize richer and more natural interactive responses according to the data of the current behaviors and the historical behaviors of the person.

Drawings

FIG. 1 is a system diagram of an intelligent human-computer interaction system based on human behavior data;

FIG. 2 is a flow chart of an intelligent human-computer interaction method based on human behavior data.

Detailed Description

The invention will be further described with reference to the accompanying drawings in which:

as shown in fig. 1, the intelligent human-computer interaction system based on personnel behavior data comprises a local subsystem and a cloud subsystem;

the local subsystem includes:

the local processing module is used for analyzing the image human face and the skeleton, processing the voice data and tracking the human face target; the system captures a face image, obtains face characteristic parameters, matches personnel identity information through an identity recognition module, obtains personnel behavior data, recognizes the personnel behavior data through the behavior recognition module and adds the recognition data into the personnel identity behavior database; when the characteristic parameters accord with a set value, marking the characteristic points of the face image, and tracking the characteristic points;

The mobile unit comprises a motor and a limiting device, the system sets the initial rotation angle of the motor, and after the face and the skeleton are detected, the rotation angle of the motor is adjusted to enable the face and the skeleton to be positioned in the center of the picture;

the cloud subsystem comprises:

the behavior recognition module is used for detecting the human skeleton image, recognizing human body motions through the motion model and performing motion model training;

s1, the system collects image data and voice print data;

s6, the system outputs the voice interaction data.

Further, the face target tracking process includes:

s31, detecting a local target, extracting target feature points, and completing feature modeling of the target according to the target feature points to generate a target address;

s32, inputting a first frame, initializing the detected target and creating a new tracker, and marking out the target address;

and S33, inputting the subsequent frame, predicting the track through a Kalman filter: calculating the intersection and comparison (IOU) of all target state predictions of the tracker and the detection box of the current frame according to the state and covariance predictions generated by the box of the previous frame, obtaining the maximum matching of the IOU through a Hungarian assignment algorithm, and then removing the matching pairs with the matching values lower than the threshold value of the IOU; updating a Kalman tracker by using a target box matched with the frame, calculating Kalman gain, state update and covariance update, and outputting a state update value as a tracking box of the frame; and if the target box is not matched, the tracker is reinitialized.

The local detection module adopts a high-definition intelligent camera and a high-sensitivity microphone matrix, and carries out image and voice data sampling with different frequencies on site according to specific scenes; the local algorithm detects the face, the skeleton and the voice semantics on the basis of the high-speed CPU and the GPU respectively, if the data are abnormal, the detected data are uploaded to the cloud subsystem for processing, if the data are abnormal, the face, the skeleton and the voice semantics data are detected again, and the data are processed through the cloud subsystem, so that the pressure of data transmission of a local data processing network can be reduced, and the data processing efficiency of the local subsystem is improved. Capturing 72 parameter characteristics of human face eyes, mouth, nose, ears and the like from the video stream, and when the confidence coefficient of the characteristic parameters exceeds 80%, determining that a detection target accords with human face judgment; and simultaneously, the ID of the face is marked, and real-time tracking and reproduction of the marked feature points are realized.

And detecting a human body in the image, returning to the position of a rectangular frame of the human body, accurately positioning key points including the nose, the neck and main joint parts of four limbs, and analyzing personnel behavior data based on the position characteristics of the joint points.

The local subsystem is an intelligent robot end, and the robot head is provided with a brushless motor to drive the robot head to rotate. When the robot is started, networked and self-checked, the setting of the initial angle (0 degree) of the motor is finished according to the brushless motor and the limiting device of the head. The motor rotation range takes the angle as a starting point and can rotate by plus or minus 90 degrees. After the initial angle is set, starting local detection to detect whether a human face and a skeleton exist, if the human face ID is detected to be more than or equal to 1, finely adjusting the angle, placing the central point of the portrait at the left center and the right center of the picture (50% of the picture), then starting zooming until the human skeleton is equal to 30% of the picture, and staying for 45 seconds; and zooming the picture with the face area less than or equal to 30% again, and staying for 45 seconds again. And after the two times of zooming are finished, rotating to the next observation angle, and starting the next human face and skeleton detection. If the human face and the skeleton are not detected, zooming is gradually carried out until the human face and the skeleton can be detected, and then fine adjustment and zooming stop are carried out.

The face target tracking is realized based on an OpenCV visual library, after the detection is finished, target face characteristic points are extracted, a target ID is generated according to the characteristic points, when a first frame is input, a new tracker is initialized and created according to the detected target, and the ID is marked out; the subsequent frame comes in, the trajectory is predicted by a Kalman filter: i.e. the state and covariance predictions produced by the box of the previous frame. And solving the intersection and parallel ratio (IOU) of all target state predictions of the tracker and the detection box of the current frame, obtaining the maximum matching of the IOU through a Hungarian assignment algorithm, and then removing the matching pairs with the matching values lower than the threshold value of the IOU. And updating the Kalman tracker by using the target box matched with the frame, calculating Kalman gain, state update and covariance update, outputting a state update value as the tracking box of the frame, and re-initializing the tracker if no target is matched.

The personnel identity behavior database of the cloud subsystem comprises face identity, current expression, current action description, current voice content, current time, current course and historical behavior characteristics.

And the identity recognition module of the cloud subsystem judges the identity of the person according to the detected data information such as the skeleton, the face, the voice voiceprint and the like (if the person is a stranger, new person identity information is added to the person behavior database), the behavior recognition module recognizes the behavior of the person, and the person is stored in the person identity behavior database after the recognition is finished.

And the interactive content output module outputs the interactive content after processing the current behavior data and the historical data. And the cloud subsystem transmits the processed data back to the local voice synthesis and output module to realize voice interaction. And after the personnel respond to the interactive contents, the local detection module collects the on-site personnel response data again and enters the next intelligent interaction turn. During man-machine interaction, the system can realize rich and natural interaction according to personnel preference and characteristics.

The communication between the cloud subsystem and the local subsystem is realized through a high-speed and low-delay WIFI or 4G or 5G communication module.

The invention can identify the interactive scene according to the data of the on-site or historical behaviors (actions, expressions and social contact), makes interactive response according to the identity of the person, and can realize richer and more natural interactive response according to the current behavior and the historical behavior data of the person.

The technical solution of the present invention is not limited to the limitations of the above specific embodiments, and all technical modifications made according to the technical solution of the present invention fall within the protection scope of the present invention.

Claims

1. The intelligent human-computer interaction system based on the personnel behavior data is characterized by comprising a local subsystem and a cloud subsystem;

The local subsystem includes:

the local processing module is used for analyzing the image human face and the skeleton, processing the voice data and tracking the human face target; the system captures a face image, obtains face characteristic parameters, matches personnel identity information through a personnel identity recognition module, obtains personnel behavior data, recognizes the personnel behavior data through a behavior recognition module and adds the recognition data into a personnel identity behavior database; when the characteristic parameters accord with a set value, marking the characteristic points of the face image, and tracking the characteristic points;

the voice synthesis and output module is used for outputting voice interaction data;

the cloud subsystem comprises:

the personnel identity recognition module is used for detecting the face image, extracting the face characteristic points and recognizing the personnel identity information through characteristic matching by the system;

and the interactive content output module processes the current behavior and historical behavior data of the personnel and outputs the interactive data to the voice synthesis and output module.

2. The intelligent man-machine interaction method based on the personnel behavior data is characterized by comprising the following steps:

s1, the system collects image data and voice print data;

s4, the system identifies the personnel identity and the action according to the personnel identity action database;

s6, outputting voice interaction data by the system;

the face target tracking process comprises the following steps:

and S33, inputting the subsequent frame, predicting the track through a Kalman filter: predicting states and covariance generated by a box of a previous frame, solving the intersection and comparison between all target state predictions of the tracker and a detection box of the current frame, obtaining the maximum matching of the intersection and comparison through a Hungarian assignment algorithm, and removing the matching pairs with the matching values lower than the threshold of the intersection and comparison; updating a Kalman tracker by using a target box matched with the frame, calculating Kalman gain, state update and covariance update, and outputting a state update value as a tracking box of the frame; and if the target box is not matched, the tracker is reinitialized.

3. The intelligent human-computer interaction method based on personnel behavior data as claimed in claim 2, wherein the behavior recognition specifically comprises:

s41, constructing an LSTM neural network action database, wherein one action corresponds to one or more data sets;

s42, establishing an action model of the relation among joints, a framework time domain and a space domain;

s43, training the action model, adjusting the parameters of the action model, when the training degree of the parameters of the action model reaches the set value, the action model is successfully trained;

S44, collecting joint and skeleton characteristic data, and inputting the joint and skeleton characteristic data and the identity of the person as an action training model;

and S45, outputting the action recognition result and updating the action recognition result to the LSTM neural network action database.

4. The intelligent human-computer interaction method based on personnel behavior data as claimed in claim 2, wherein the S5 specifically comprises:

s51, the system builds a corpus database;

s52, the system searches the corpus database according to the personnel identity, the behavior and the scene data to obtain the voice interaction data for man-machine interaction;

and S53, outputting the voice data and updating the corpus.