CN114185430A - Human-computer interaction system and method and intelligent robot - Google Patents

Human-computer interaction system and method and intelligent robot Download PDF

Info

Publication number
CN114185430A
CN114185430A CN202111358580.8A CN202111358580A CN114185430A CN 114185430 A CN114185430 A CN 114185430A CN 202111358580 A CN202111358580 A CN 202111358580A CN 114185430 A CN114185430 A CN 114185430A
Authority
CN
China
Prior art keywords
model
face
user
face image
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111358580.8A
Other languages
Chinese (zh)
Inventor
刘娜
袁野
张赛
王中磐
吴国栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongyuan Power Intelligent Robot Co ltd
Original Assignee
Zhongyuan Power Intelligent Robot Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongyuan Power Intelligent Robot Co ltd filed Critical Zhongyuan Power Intelligent Robot Co ltd
Priority to CN202111358580.8A priority Critical patent/CN114185430A/en
Publication of CN114185430A publication Critical patent/CN114185430A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J11/00Manipulators not otherwise provided for
    • B25J11/0005Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
    • B25J11/001Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means with emotions simulating means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range

Abstract

The application discloses a man-machine interaction system, a man-machine interaction method and an intelligent robot, wherein the man-machine interaction system comprises an image acquisition module, a model acceleration module, a model reasoning module and an interaction function module; the image acquisition module is used for initializing parameters of a camera of the intelligent robot based on a preset initialization strategy and controlling the camera to acquire a user face image when the camera interacts with the intelligent robot; the model acceleration module is used for carrying out model conversion on a preset face recognition model to obtain a TensorRT model and establishing a model operation engine based on the TensorRT model; the model reasoning module is used for operating the engine based on the model and identifying and reasoning the face image of the user to obtain a reasoning result; and the interactive function module is used for feeding back interactive action information to the user according to the reasoning result. The embodiment can ensure the accuracy of the face recognition result, improve the face recognition speed and reduce the hardware cost.

Description

Human-computer interaction system and method and intelligent robot
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a human-computer interaction system, a human-computer interaction method and an intelligent robot.
Background
Artificial intelligence technology is the science of technology for simulating, extending and extending human intelligence. The user characteristics can be better extracted by using the deep convolutional neural network in the artificial intelligence technology, so that the intelligent machine can more accurately identify the user behaviors according to the user characteristics, and the man-machine interaction with the user is facilitated.
At present, most of intelligent machines only have a voice recognition function and a networking function, such as an intelligent sound box and an intelligent floor sweeping robot, and the human-computer interaction function is single. However, the deep convolutional neural network has high complexity and computation workload, has high requirements on edge devices, and considers the problem of the landing cost of the edge devices, so that the deep convolutional neural network actually deployed on the intelligent robot hardware is generally difficult to realize real-time, efficient and accurate user identity identification and rapid interaction based on the human face features of the user.
Disclosure of Invention
The application provides a human-computer interaction system, a human-computer interaction method and an intelligent robot, and aims to solve the technical problem that the face recognition effect of an existing deep convolutional neural network on the intelligent robot is poor.
In order to solve the technical problem, an embodiment of the present application provides a human-computer interaction system applied to an intelligent robot, where the human-computer interaction system includes an image acquisition module, a model acceleration module, a model inference module, and an interaction function module;
the image acquisition module is used for initializing parameters of a camera of the intelligent robot based on a preset initialization strategy and controlling the camera to acquire a user face image when the camera interacts with the intelligent robot;
the model acceleration module is used for carrying out model conversion on a preset face recognition model to obtain a TensorRT model and establishing a model operation engine based on the TensorRT model;
the model reasoning module is used for operating an engine based on the model and identifying and reasoning the user face image to obtain a reasoning result;
and the interactive function module is used for feeding back interactive action information to the user according to the reasoning result.
In the embodiment, the image acquisition module is used for carrying out parameter initialization on the camera of the intelligent robot based on the preset initialization strategy, and the image is adopted in a mode capable of adopting model test, so that the quality of a data set is ensured, and the accuracy of a face recognition result is ensured; the model acceleration module is used for carrying out model conversion on a preset face recognition model so as to reduce the complexity of the model, improve the face recognition speed and reduce the hardware cost for adapting the model; and realizing the interaction between the intelligent robot and the user through the model reasoning module and the interactive function model.
In an embodiment, the model acceleration module specifically includes:
the conversion unit is used for carrying out interlayer fusion and precision calibration on the face recognition model to obtain the TensorRT model;
the serialization unit is used for serializing the TensorRT model to obtain an optimized file, and storing the optimized file to a preset storage space;
and the creating unit is used for reading the optimized file in the preset storage space, performing deserialization on the optimized file, and creating the model operation engine according to the deserialized optimized file.
In an embodiment, the model inference module specifically includes:
the detection unit is used for extracting the face features of the face image of the user to obtain first face features and detecting a face area and face key points in the face image of the user according to the first face features;
the alignment unit is used for carrying out angle correction on a face area in the user face image by utilizing a Poinch analysis method and carrying out alignment transformation on face key points in the user face image to obtain a target face image;
and the determining unit is used for extracting the face features of the target face image to obtain second face features, and determining an inference result corresponding to the user face image according to the second face features.
In an embodiment, the detecting unit specifically includes:
the first extraction subunit is used for extracting the overall characteristics of the face image of the user to obtain the overall characteristics;
the second extraction subunit is used for extracting the multi-scale features of the face image of the user according to the overall features to obtain a multi-scale feature map;
and the output subunit is used for outputting the face region and the face key point of the user face image according to the multi-scale feature map based on an SSH algorithm.
In an embodiment, the alignment unit specifically includes:
the first determining subunit is used for determining a face inclination angle between a face area in the user face image and a preset standard area based on a least square method;
the rotation subunit is used for performing rotation transformation on the face area in the user face image according to the face inclination angle;
and the alignment subunit is used for performing alignment transformation on the face key points in the user face image after the rotation transformation to obtain the target face image.
In an embodiment, the determining unit specifically includes:
the enhancement unit is used for horizontally overturning the target face image and splicing the two target face images before and after overturning to obtain a target spliced image;
the third extraction subunit is used for extracting the face features of the target mosaic image to obtain the second face features;
the query subunit is configured to traverse a preset face database, and query a target feature ID with the highest similarity to the second face feature and the similarity greater than a preset threshold;
and the second determining subunit is used for determining an inference result corresponding to the user face image according to the target feature ID.
In an embodiment, the interactive function module specifically includes:
the motion control unit is used for controlling the intelligent robot to move to a target position close to the user according to the user position information in the inference result;
the voice unit is used for combining preset voice information with the user identity information in the inference result to obtain target voice information and broadcasting the target voice information to a user;
and the expression control unit is used for simulating the expression of the user according to the user expression information in the inference result.
In a second aspect, an embodiment of the present application provides a human-computer interaction method, which is applied to an intelligent robot, and the method includes:
initializing parameters of a camera based on a preset initialization strategy, and controlling the camera to acquire a user face image when interacting with the intelligent robot;
performing model conversion on a preset face recognition model to obtain a TensorRT model, and creating a model operation engine based on the TensorRT model;
operating an engine based on the model, and identifying and reasoning the user face image to obtain a reasoning result;
and feeding back interactive action information to the user according to the reasoning result.
In an embodiment, the performing model conversion on a preset face recognition model to obtain a TensorRT model, and creating a model operation engine based on the TensorRT model includes:
carrying out interlayer fusion and precision calibration on the face recognition model to obtain the TensorRT model;
serializing the TensorRT model to obtain an optimized file, and storing the optimized file to a preset storage space;
reading the optimized file in the preset storage space, performing deserialization on the optimized file, and creating the model operation engine according to the deserialized optimized file.
In a third aspect, an embodiment of the present application provides an intelligent robot, including a processor and a memory, where the memory is used to store a computer program, and the processor, when executing the computer program, implements the human-computer interaction method according to the second aspect.
It should be noted that, please refer to the relevant description of the first aspect for the beneficial effects of the second aspect and the third aspect, which are not described herein again.
Drawings
Fig. 1 is a schematic structural diagram of a human-computer interaction system according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of a human-computer interaction method according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an intelligent robot according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
If the record of the related art, most of intelligent machines only have a voice recognition function and a networking function, such as an intelligent sound box and an intelligent floor sweeping robot, the man-machine interaction function is single. However, the deep convolutional neural network has high complexity and computation workload, has high requirements on edge devices, and considers the problem of the landing cost of the edge devices, so that the deep convolutional neural network actually deployed on the intelligent robot hardware is generally difficult to realize real-time, efficient and accurate user identity identification and rapid interaction based on the human face features of the user.
Therefore, the embodiment of the application provides a human-computer interaction system, a human-computer interaction method and an intelligent robot, wherein the image acquisition module is used for carrying out parameter initialization on a camera of the intelligent robot based on a preset initialization strategy, so that an image can be adopted in a mode of model test, the quality of a data set is ensured, and the accuracy of a face recognition result is ensured; the model acceleration module is used for carrying out model conversion on a preset face recognition model so as to reduce the complexity of the model, improve the face recognition speed and reduce the hardware cost for adapting the model; and realizing the interaction between the intelligent robot and the user through the model reasoning module and the interactive function model.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a human-computer interaction system applied to an intelligent robot according to an embodiment of the present application, where the human-computer interaction system includes an image acquisition module 101, a model acceleration module 102, a model inference module 103, and an interaction function module 104;
the image acquisition module 101 is configured to perform parameter initialization on a camera of the intelligent robot based on a preset initialization strategy, and control the camera to acquire a user face image when interacting with the intelligent robot.
Because the performance of the deep learning model needs to be supported by big data and the quality of a data set needs to be ensured, in the field of robots, because the software and hardware design and installation links are different, the equipment for acquiring data before model training is different from the equipment for acquiring data during model use and test, and great difference exists in data acquisition, so that the performance of the model is influenced. The embodiment initializes the intelligent robot through an initialization strategy so as to reduce data difference caused during data acquisition.
The model acceleration module 102 is configured to perform model conversion on a preset face recognition model to obtain a TensorRT model, and create a model operation engine based on the TensorRT model.
When a model inference module of the robot is deployed, the face recognition model obtained by training may be very large and have many parameters, and the deployed machine performance has differences, so that the inference speed is low and the delay is high. So to reduce cost, the model is transformed by the model acceleration module using TensorRT. TensorRT is a high-performance deep learning Inference (Inference) optimizer, can optimize a trained model, and provides low-delay and high-throughput deployment Inference for deep learning application. TensorRT can be used for reasoning and accelerating a super-large scale data center, an embedded platform or an automatic driving platform.
The model inference module 103 is configured to operate an engine based on the model, and perform recognition inference on the user face image to obtain an inference result.
The hi recognition inference is carried out on the face image of the user through a model operation engine established based on a TensorRT model, and the recognition speed can be improved. Optionally, the above-mentioned identification inference process may include face region and key point detection, face rectification alignment, and face characterization and identification.
The interactive function module 104 is configured to feed back interactive action information to the user according to the inference result.
The inference result includes, but is not limited to, user identity information, user expression information, and the like.
In an embodiment, based on the embodiment shown in fig. 1, the model acceleration module 102 specifically includes:
the conversion unit is used for carrying out interlayer fusion and precision calibration on the face recognition model to obtain the TensorRT model;
the serialization unit is used for serializing the TensorRT model to obtain an optimized file, and storing the optimized file to a preset storage space;
and the creating unit is used for reading the optimized file in the preset storage space, performing deserialization on the optimized file, and creating the model operation engine according to the deserialized optimized file.
In the present embodiment, the model is an example of a process accelerated by TensorRT, including:
installing TensorRT, and confirming the CUDA version of the equipment; converting the trained model from the pytorch model into a universal ONNX format;
converting the ONNX format model into a TensorRT model for acceleration and deployment, finishing interlayer fusion and precision calibration in the optimization process during model conversion, wherein the output of the step is the optimized TensorRT model aiming at a specific GPU platform and a network model, and the TensorRT model can be stored in a disk or a memory in a serialized mode;
and testing the engine model, deserializing the model file in the step, creating a runtime engine, inputting data (such as pictures outside the test set or the data set), and outputting a classification vector result or a detection result.
In an embodiment, on the basis of the embodiment shown in fig. 1, the model inference module 103 specifically includes:
and the detection unit is used for extracting the face features of the user face image to obtain first face features and detecting a face area and face key points in the user face image according to the first face features.
The alignment unit is used for carrying out angle correction on a face area in the user face image by utilizing a Poinch analysis method and carrying out alignment transformation on face key points in the user face image to obtain a target face image;
and the determining unit is used for extracting the face features of the target face image to obtain second face features, and determining an inference result corresponding to the user face image according to the second face features.
In this embodiment, the detection unit is correspondingly applied to a face region and key point detection process, the alignment unit is correspondingly applied to a face rectification alignment process, and the determination unit is correspondingly applied to a face characterization and recognition process.
Optionally, the detection unit specifically includes:
the first extraction subunit is used for extracting the overall characteristics of the face image of the user to obtain the overall characteristics;
the second extraction subunit is used for extracting multi-scale features according to the overall features to obtain a multi-scale feature map;
and the output subunit is used for outputting the face region and the face key point of the user face image according to the multi-scale feature map based on an SSH algorithm.
In this embodiment, the neural network model used for detecting the face region and the key point may be a Retinaface model, and the Retinaface model is encapsulated in the detection unit to implement the process of detecting the face region and the key point. Wherein the Retinaface model comprises: taking a deep convolution neural network such as Mobilene or Resnet and the like as a backbone network to extract the overall characteristics of the picture; extracting multi-scale features by adopting an FPN feature pyramid; a Context Modeling method of an SSH algorithm is introduced, and the score of face classification, a bounding box (namely a face area), an output head part of key point regression and a multiple function (respectively, classification (local) and bounding box (IOU) and face key points) are output.
Illustratively, the specific process in the Retinaface network training includes:
step1, data loading, preprocessing, and initializing parameters of the model. The data loading is to load information in pictures and label and convert the information into a format of Pythrch training for data normalization; the model parameter initialization is to initialize the backbone network backhaul, the fpn feature pyramid network, the ssh detection module and the full connection network layer full network of the model.
Step2, processing the sample data by the model, and outputting embedding which comprises a classification score (probability value), a bounding box coordinate and a key point coordinate;
step3, calculating a loss function according to the embedding and the label data;
and Step4, reversely propagating and adjusting model parameters according to the loss function until the model converges to obtain the detection unit.
Illustratively, the inference process of the detection unit includes: reading a video frame, and cutting a video frame image into an image size of (640 × 640) or (320 × 320); inputting the cut image into a network backbone module for feature extraction, outputting the key point position, the score and a plurality of human face anchors according to the extracted features, then performing NMS (non-maximum suppression) operation on the anchors, and selecting a human face frame (namely the human face region) with the highest quality; and restoring the face frame and the corresponding key points thereof to the corresponding positions of the original frame according to the proportion and displaying the face frame and the corresponding key points.
In an embodiment, on the basis of the embodiment shown in fig. 1, the alignment unit specifically includes:
the first determining subunit is used for determining a face inclination angle between a face area in the user face image and a preset standard area based on a least square method;
the rotation subunit is used for performing rotation transformation on the face area in the user face image according to the face inclination angle;
and the alignment subunit is used for performing alignment transformation on the face key points in the user face image after the rotation transformation to obtain the target face image.
In this embodiment, the face rectification alignment adopts a pockels transform algorithm, and the pockels analysis method is encapsulated in the alignment unit, so that the alignment unit realizes the face rectification alignment process. The pilfer analysis is a method for analyzing the shape distribution. Mathematically, iteration is repeated to find a standard shape, and the least square method is used to find the affine variation of each sample shape to the standard shape. In the embodiment, the face with the inclined angle is corrected and cut into the face image with the uniform size according to the iterated standard template. The objects for obtaining the parameters of affine change are a point set of key points of the human face and a key point set of the labeling template. The method is characterized in that the Pushing analysis is used for preprocessing original data, a better local change model is obtained to serve as a learning basis of a subsequent model, images are processed through the method, and the structure of the face is more and more obvious through normalization change, namely the positions of face feature clusters are more and more close to the average positions of the face feature clusters.
Illustratively, the specific implementation process of the pockels analysis method in the alignment unit includes:
step1, averaging each sample point i (i ═ 1, 2.. times, N) (i.e. five face keypoint positions: two angles of left eye, right eye, nose and mouth) in N images;
step2, normalizing the sizes of all the shapes, and subtracting the corresponding mean value of each sample point;
step3, calculating the gravity center of the shape in each image according to the decentralized data;
step4, based on the center of gravity and angle, aligns the standard and sample shapes together so that the Peter distance of the two shapes is minimal.
Specifically, the standard shape of each image is obtained by calculating the average value of all normalized sample points in each image; calculating the rotation angle from the sample shape to the standard shape in each image by using a least square method; according to the rotation angle, the shape of the sample is subjected to rotation change, and a new shape aligned with the standard shape is obtained; the above steps are repeated until a specified number of cycles is reached or the absolute norm of the canonical shape between two iterations meets a certain threshold.
In an embodiment, the determining unit specifically includes:
the enhancement unit is used for horizontally overturning the target face image and splicing the two target face images before and after overturning to obtain a target spliced image;
the third extraction subunit is used for extracting the face features of the target mosaic image to obtain the second face features;
the query subunit is configured to traverse a preset face database, and query a target feature ID with the highest similarity to the second face feature and the similarity greater than a preset threshold;
and the second determining subunit is used for determining an inference result corresponding to the user face image according to the target feature ID.
In this embodiment, an arcfac face recognition model is selected in the face characterization and recognition process, and the arcfac face recognition model is packaged in the determination unit, so that the determination unit realizes the face characterization and recognition process. The arcface face recognition model adopts a new measurement function Additional Metric Loss to solve the cosine distance between two features, then strengthens the difference between classes and improves the recognition effect. It mainly comprises: extracting the characteristics of the aligned and corrected face image through a depth convolution neural network such as a mobilene network or a ResnetIR network; and (4) solving the cosine similarity between the extracted features and the known face features in the database, and further matching a closest known user ID (target feature ID).
Illustratively, the process flow of the arcface face recognition model includes:
step1, enhancing the face image: horizontally overturning the picture, and splicing the two overturned images to obtain a spliced image;
step2, extracting 512-dimensional features of the spliced image to be used as human face representation;
step3, face feature matching: and traversing the feature information in the database, acquiring a target feature ID with the highest similarity (greater than a preset threshold) with the features to be identified, and outputting and displaying the ID after matching.
In an embodiment, the interactive function module 104 specifically includes:
the motion control unit is used for controlling the intelligent robot to move to a target position close to the user according to the user position information in the inference result;
the voice unit is used for combining preset voice information with the user identity information in the inference result to obtain target voice information and broadcasting the target voice information to a user;
and the expression control unit is used for simulating the expression of the user according to the user expression information in the inference result.
In this embodiment, the interactive function module includes a motion control module, a voice module, an expression control module, and the like, and performs corresponding interactive actions on the user according to the inference result, for example, approaching the user through the motion control module, simulating human expressions through the expression control module, and reporting the name of the user when the voice module plays voice.
Referring to fig. 2, fig. 2 shows a flowchart illustrating a human-computer interaction method, which may be applied to an intelligent robot according to an embodiment of the present application, and as shown in fig. 2, the method includes steps S201 to S204.
Step S201, initializing parameters of a camera based on a preset initialization strategy, and controlling the camera to acquire a user face image when interacting with the intelligent robot;
step S202, performing model conversion on a preset face recognition model to obtain a TensorRT model, and creating a model operation engine based on the TensorRT model;
step S203, operating an engine based on the model, and identifying and reasoning the user face image to obtain a reasoning result;
and step S204, feeding back interactive action information to the user according to the inference result.
In an embodiment, based on the embodiment shown in fig. 2, the step S202 includes:
carrying out interlayer fusion and precision calibration on the face recognition model to obtain the TensorRT model;
serializing the TensorRT model to obtain an optimized file, and storing the optimized file to a preset storage space;
reading the optimized file in the preset storage space, performing deserialization on the optimized file, and creating the model operation engine according to the deserialized optimized file.
It should be understood that, for the explanation of the steps of the human-computer interaction method of the embodiment, reference may be made to the description of the human-computer interaction system in fig. 1, and details are not described herein again.
Fig. 3 is a schematic structural diagram of an intelligent robot according to an embodiment of the present application. As shown in fig. 3, the intelligent robot 3 of this embodiment includes: at least one processor 30 (only one shown in fig. 3), a memory 31, and a computer program 32 stored in the memory 31 and executable on the at least one processor 30, the processor 30 implementing the steps of any of the above-described method embodiments when executing the computer program 32.
The intelligent robot may include, but is not limited to, a processor 30, a memory 31. Those skilled in the art will appreciate that fig. 3 is merely an example of the intelligent robot 3, and does not constitute a limitation of the intelligent robot 3, and may include more or less components than those shown, or combine some components, or different components, such as input and output devices, network access devices, and the like.
The Processor 30 may be a Central Processing Unit (CPU), and the Processor 30 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 31 may in some embodiments be an internal storage unit of the intelligent robot 3, such as a hard disk or a memory of the intelligent robot 3. The memory 31 may also be an external storage device of the Smart robot 3 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the Smart robot 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the intelligent robot 3. The memory 31 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 31 may also be used to temporarily store data that has been output or is to be output.
In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in any of the method embodiments described above.
The embodiment of the present application provides a computer program product, which when running on an intelligent robot, enables the intelligent robot to implement the steps in the above method embodiments when executed.
In several embodiments provided herein, it will be understood that each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a terminal device to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are further detailed to explain the objects, technical solutions and advantages of the present application, and it should be understood that the above-mentioned embodiments are only examples of the present application and are not intended to limit the scope of the present application. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the present application, may occur to those skilled in the art and are intended to be included within the scope of the present application.

Claims (10)

1. A man-machine interaction system is characterized by comprising an image acquisition module, a model acceleration module, a model reasoning module and an interaction function module;
the image acquisition module is used for initializing parameters of a camera of the intelligent robot based on a preset initialization strategy and controlling the camera to acquire a user face image when the camera interacts with the intelligent robot;
the model acceleration module is used for carrying out model conversion on a preset face recognition model to obtain a TensorRT model and establishing a model operation engine based on the TensorRT model;
the model reasoning module is used for operating an engine based on the model and identifying and reasoning the user face image to obtain a reasoning result;
and the interactive function module is used for feeding back interactive action information to the user according to the reasoning result.
2. The human-computer interaction system of claim 1, wherein the model acceleration module specifically comprises:
the conversion unit is used for carrying out interlayer fusion and precision calibration on the face recognition model to obtain the TensorRT model;
the serialization unit is used for serializing the TensorRT model to obtain an optimized file, and storing the optimized file to a preset storage space;
and the creating unit is used for reading the optimized file in the preset storage space, performing deserialization on the optimized file, and creating the model operation engine according to the deserialized optimized file.
3. The human-computer interaction system of claim 1, wherein the model inference module specifically comprises:
the detection unit is used for extracting the face features of the face image of the user to obtain first face features and detecting a face area and face key points in the face image of the user according to the first face features;
the alignment unit is used for carrying out angle correction on a face area in the user face image by utilizing a Poinch analysis method and carrying out alignment transformation on face key points in the user face image to obtain a target face image;
and the determining unit is used for extracting the face features of the target face image to obtain second face features, and determining the inference result corresponding to the user face image according to the second face features.
4. The human-computer interaction system of claim 3, wherein the detection unit specifically comprises:
the first extraction subunit is used for extracting the overall characteristics of the face image of the user to obtain the overall characteristics;
the second extraction subunit is used for extracting the multi-scale features of the user face image according to the overall features to obtain a multi-scale feature map;
and the output subunit is used for outputting the face region and the face key point in the user face image according to the multi-scale feature map by using a preset SSH algorithm.
5. The human-computer interaction system of claim 3, wherein the alignment unit specifically comprises:
the first determining subunit is used for determining a face inclination angle between a face area in the user face image and a preset standard area by using a least square method;
the rotation subunit is used for performing rotation transformation on the face area in the user face image according to the face inclination angle;
and the alignment subunit is used for performing alignment transformation on the face key points in the user face image after the rotation transformation to obtain the target face image.
6. The human-computer interaction system of claim 3, wherein the determining unit specifically comprises:
the enhancement unit is used for horizontally overturning the target face image and splicing the two target face images before and after overturning to obtain a target spliced image;
the third extraction subunit is used for extracting the face features of the target mosaic image to obtain the second face features;
the query subunit is configured to traverse a preset face database, and query a target feature ID with the highest similarity to the second face feature and the similarity greater than a preset threshold;
and the second determining subunit is used for determining an inference result corresponding to the user face image according to the target feature ID.
7. The human-computer interaction system of claim 1, wherein the interaction function module specifically comprises:
the motion control unit is used for controlling the intelligent robot to move to a target position close to the user according to the user position information in the inference result;
the voice unit is used for combining preset voice information with the user identity information in the inference result to obtain target voice information and broadcasting the target voice information to a user;
and the expression control unit is used for simulating the expression of the user according to the user expression information in the inference result.
8. A human-computer interaction method is applied to an intelligent robot, and comprises the following steps:
initializing parameters of a camera based on a preset initialization strategy, and controlling the camera to acquire a user face image when interacting with the intelligent robot;
performing model conversion on a preset face recognition model to obtain a TensorRT model, and creating a model operation engine based on the TensorRT model;
operating an engine based on the model, and identifying and reasoning the user face image to obtain a reasoning result;
and feeding back interactive action information to the user according to the reasoning result.
9. The human-computer interaction method of claim 8, wherein the performing model conversion on the preset face recognition model to obtain a TensorRT model, and creating a model running engine based on the TensorRT model comprises:
carrying out interlayer fusion and precision calibration on the face recognition model to obtain the TensorRT model;
serializing the TensorRT model to obtain an optimized file, and storing the optimized file to a preset storage space;
reading the optimized file in the preset storage space, performing deserialization on the optimized file, and creating the model operation engine according to the deserialized optimized file.
10. An intelligent robot, characterized by comprising a processor and a memory for storing a computer program which, when executed by the processor, implements the human-machine interaction method of claim 8 or 9.
CN202111358580.8A 2021-11-12 2021-11-12 Human-computer interaction system and method and intelligent robot Pending CN114185430A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111358580.8A CN114185430A (en) 2021-11-12 2021-11-12 Human-computer interaction system and method and intelligent robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111358580.8A CN114185430A (en) 2021-11-12 2021-11-12 Human-computer interaction system and method and intelligent robot

Publications (1)

Publication Number Publication Date
CN114185430A true CN114185430A (en) 2022-03-15

Family

ID=80602145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111358580.8A Pending CN114185430A (en) 2021-11-12 2021-11-12 Human-computer interaction system and method and intelligent robot

Country Status (1)

Country Link
CN (1) CN114185430A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114918935A (en) * 2022-05-17 2022-08-19 上海理工大学 Expression recognition and simulation system based on network reasoning and motor drive

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564036A (en) * 2018-04-13 2018-09-21 上海思依暄机器人科技股份有限公司 A kind of method for judging identity, device and Cloud Server based on recognition of face
CN109815804A (en) * 2018-12-19 2019-05-28 平安普惠企业管理有限公司 Exchange method, device, computer equipment and storage medium based on artificial intelligence
CN112364744A (en) * 2020-11-03 2021-02-12 珠海市卓轩科技有限公司 TensorRT-based accelerated deep learning image recognition method, device and medium
CN112487922A (en) * 2020-11-25 2021-03-12 奥比中光科技集团股份有限公司 Multi-mode face in-vivo detection method and system
CN112989875A (en) * 2019-12-13 2021-06-18 海信集团有限公司 Face recognition method, face recognition device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564036A (en) * 2018-04-13 2018-09-21 上海思依暄机器人科技股份有限公司 A kind of method for judging identity, device and Cloud Server based on recognition of face
CN109815804A (en) * 2018-12-19 2019-05-28 平安普惠企业管理有限公司 Exchange method, device, computer equipment and storage medium based on artificial intelligence
CN112989875A (en) * 2019-12-13 2021-06-18 海信集团有限公司 Face recognition method, face recognition device and storage medium
CN112364744A (en) * 2020-11-03 2021-02-12 珠海市卓轩科技有限公司 TensorRT-based accelerated deep learning image recognition method, device and medium
CN112487922A (en) * 2020-11-25 2021-03-12 奥比中光科技集团股份有限公司 Multi-mode face in-vivo detection method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
叶瑞: "基于嵌入式GPU的智能服务机器人视觉系统软件设计", 《中国优秀硕士学位论文全文数据库 信息科技辑》, pages 3 *
薛晨等: "复杂光照场景下基于MTCNN的人脸检测", 《南华大学学报(自然科学版)》, pages 1 - 4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114918935A (en) * 2022-05-17 2022-08-19 上海理工大学 Expression recognition and simulation system based on network reasoning and motor drive
CN114918935B (en) * 2022-05-17 2024-04-02 上海理工大学 Expression recognition and simulation system based on network reasoning and motor driving

Similar Documents

Publication Publication Date Title
CN108038474B (en) Face detection method, convolutional neural network parameter training method, device and medium
US10936911B2 (en) Logo detection
US11543830B2 (en) Unsupervised real-to-virtual domain unification for end-to-end highway driving
EP3620956B1 (en) Learning method, learning device for detecting lane through classification of lane candidate pixels and testing method, testing device using the same
US10410354B1 (en) Method and apparatus for multi-model primitive fitting based on deep geometric boundary and instance aware segmentation
EP2339537B1 (en) Method of determining reference features for use in an optical object initialization tracking process and object initialization tracking method
US8744168B2 (en) Target analysis apparatus, method and computer-readable medium
CN111476709B (en) Face image processing method and device and electronic equipment
CN108701234A (en) Licence plate recognition method and cloud system
US8606022B2 (en) Information processing apparatus, method and program
CN106648078B (en) Multi-mode interaction method and system applied to intelligent robot
Josifovski et al. Object detection and pose estimation based on convolutional neural networks trained with synthetic data
JP6997369B2 (en) Programs, ranging methods, and ranging devices
CN110232418B (en) Semantic recognition method, terminal and computer readable storage medium
CN108520263B (en) Panoramic image identification method and system and computer storage medium
Seib et al. Object recognition using hough-transform clustering of surf features
KR102185979B1 (en) Method and apparatus for determining type of movement of object in video
CN115546549A (en) Point cloud classification model construction method, point cloud classification method, device and equipment
CN114185430A (en) Human-computer interaction system and method and intelligent robot
CN114358205A (en) Model training method, model training device, terminal device, and storage medium
Richtsfeld et al. Implementation of Gestalt principles for object segmentation
KR102026280B1 (en) Method and system for scene text detection using deep learning
CN115620082A (en) Model training method, head posture estimation method, electronic device, and storage medium
CN113837236A (en) Method and device for identifying target object in image, terminal equipment and storage medium
KR102382883B1 (en) 3d hand posture recognition apparatus and method using the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination