CN115857668A

CN115857668A - Human-computer interaction method and device based on image recognition

Info

Publication number: CN115857668A
Application number: CN202211320829.0A
Authority: CN
Inventors: 吴志强
Original assignee: Future City Shanghai Architectural Planning And Design Co ltd
Current assignee: Future City Shanghai Architectural Planning And Design Co ltd
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2023-03-28

Abstract

The invention provides a man-machine interaction method and device based on image recognition. The method can comprise the steps of obtaining a person image aiming at a target person; identifying the age of the person image to obtain a target age group of the target person, and acquiring a target material library corresponding to the target age group; and performing character posture recognition on the character image to obtain a target character posture of the target character, and displaying an animation effect based on the target material library under the condition that the target character posture is a preset posture. Therefore, the animation effect similar to the age group of the target character can be displayed by capturing the character image and an image recognition technology based on artificial intelligence, the authenticity and interestingness of man-machine interaction are enhanced, and the user experience is improved.

Description

Human-computer interaction method and device based on image recognition

Technical Field

The invention relates to a computer technology, in particular to a man-machine interaction method and a man-machine interaction device based on image recognition.

Background

The intelligent interaction is used as an emerging intelligent technology in recent years and is widely applied to multiple fields, wherein the application of an intelligent interaction display screen is very wide and the development is rapid. However, at the present stage, the requirement of convenient interaction of users still cannot be met, especially in human-computer interaction, and there are many places which can be improved.

At present, most of human-computer interaction products are combined with an infrared ray or laser detection technology through a computer graphics calculation method, or only artificial intelligence is intervened in partial links, so that the advantages of the intelligent technology are not really exerted, user experience is poor, and human-computer interaction effect is poor.

Disclosure of Invention

In view of the above, the present invention discloses a human-computer interaction method based on image recognition. The method may include: acquiring a figure image aiming at a target figure; identifying the age of the person image to obtain a target age group of the target person, and acquiring a target material library corresponding to the target age group; and performing character posture recognition on the character image to obtain a target character posture of the target character, and displaying an animation effect based on the target material library under the condition that the target character posture is a preset posture.

In some embodiments, the age identifying the person image to obtain the target age group of the target person includes: carrying out face detection on the figure image to obtain a target face image; extracting face key points at preset positions of the target face image to obtain at least one face key point; inputting the coordinate information of the at least one face key point and the target face image into an age identification model to obtain a target age group of the target person; the age identification model comprises a deep learning model obtained by training face key points labeled in advance with age information and face image combination samples.

In some embodiments, the performing the person posture recognition on the person image to obtain the target person posture of the target person includes: carrying out human body detection on the figure images to obtain target human body images corresponding to each target figure; extracting human key points at preset positions of the target human body images aiming at each target human body image to obtain at least one human key point corresponding to each target person; and aiming at each target person, obtaining the target person posture of the target person according to the position information of the corresponding at least one human body key point.

In some embodiments, the method of determining whether the target person gesture is a preset gesture comprises: determining whether the relative position between at least one human key point corresponding to the target person meets a preset relative position condition; determining the target character posture as a preset posture under the condition that the relative position meets the preset relative position condition; determining that the target person gesture is not a preset gesture in a case where the relative position does not satisfy the preset relative position condition.

In some embodiments, the human body key points include a right shoulder key point J7, a right elbow key point J9, a right wrist key point J11, a left shoulder key point J8, a left elbow key point J10, a left wrist key point J12, a right hip key point J13, a right knee key point J15, a right ankle key point J17, a right toe key point J20, a right toe key point J22, a right heel key point J24, a left hip key point J14, a left knee key point J16, a left ankle key point J18, a left toe key point J21, a left toe key point J23, and a left heel key point J25.

In some embodiments, the determining whether the relative position between the at least one human key point corresponding to the target person meets a preset relative position condition includes: in the case where the personal image includes a target person, if

Determining that the relative position meets the preset relative position condition; wherein, y _Jn The ordinate represents the human body key point Jn;

in the case where the personal image includes two target persons, if for the first target person,

y _J11 >y _J7 ，y _J9 >y _J7 ，|y _J11 -y _J9 |<|y _J12 -y _J10 |，y _J12 >y _J10 >y _J8 ，

y _J17 >y _J18 and, in addition,

with respect to the second target person, it is,

y _J12 >y _J8 ，y _J10 >y _J8 ，y _J11 <y _J9 <y _J7 ，x _J11 >x _J9 ，

determining that the relative position meets the preset relative position condition; wherein, y _Jn Representing the ordinate of the human keypoint Jn.

In some embodiments, the animation effect comprises at least one of: an animated character corresponding to the target character, the animated character having the target character pose; and a preset interface corresponding to the preset gesture.

In some embodiments, the target story library includes animated character images of animated characters of the target age group; the displaying of the animation character corresponding to the target character based on the target material library comprises: inputting the target character posture and the animation character image in the target material library into a character generation model which is trained in advance to obtain a target animation character image with the target character posture; and displaying the target animation character image.

In some embodiments, the preset interface comprises at least one of: successfully completing a celebration interface with a preset gesture; and a subsequent interface corresponding to the current interface.

The invention further provides a man-machine interaction device based on the image recognition. The device comprises: the acquisition module is used for acquiring a character image aiming at a target character; the identification and acquisition module is used for identifying the ages of the person images to obtain a target age group of the target person and acquiring a target material library corresponding to the target age group; and the recognition and display module is used for recognizing the character posture of the character image to obtain the target character posture of the target character, and displaying the animation effect based on the target material library under the condition that the target character posture is a preset posture.

Based on the technical scheme recorded in any embodiment, the animation effect similar to the age group of the target character can be displayed by capturing the image of the character and the image recognition technology based on artificial intelligence, the authenticity and the interestingness of man-machine interaction are enhanced, and the user experience is improved.

Drawings

The drawings that will be used in the description of the embodiments or the related art will be briefly described below.

Fig. 1 is a schematic flow chart of a method of a human-computer interaction method based on image recognition according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a human-computer interaction scenario illustrated in the present invention;

FIG. 3 is a block diagram of a computing system according to the present invention;

FIG. 4 is a flow chart of human-computer interaction shown in the present invention;

FIG. 5 is a schematic flow chart of an age identification method according to the present invention;

FIG. 6 is a schematic diagram illustrating one key point of the present invention;

FIG. 7 is a schematic flow chart of a person gesture recognition method according to the present invention;

8A, 8B, 8C, 8D, 8E, 8F are schematic diagrams illustrating a user completing a horizontal right arm lift action;

FIGS. 9A, 9B, and 9C are schematic views of a two-player device in a combined position;

FIG. 10 is a schematic structural diagram of a human-computer interaction device based on image recognition according to the present invention;

fig. 11 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It should also be understood that the word "if" as used herein may be interpreted as "at 8230; \8230when or" when 8230; \8230, when "or" in response to a determination ", depending on the context.

The invention provides a man-machine interaction method based on image recognition. According to the method, the animation effect similar to the age group of the target figure is displayed by capturing the figure image and the image recognition technology based on artificial intelligence, so that the authenticity and interestingness of man-machine interaction are enhanced, and the user experience is improved.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method of a human-computer interaction method based on image recognition according to an embodiment of the present invention.

The man-machine interaction method based on image recognition shown in fig. 1 can be applied to electronic equipment. The electronic equipment can execute the method by carrying software logic corresponding to the man-machine interaction method based on image recognition. The type of the electronic device may be a notebook computer, a server, a mobile phone, a Personal Digital Assistant (PDA), and the like. The type of the electronic device is not particularly limited in the present invention. The electronic device may also be a client device or a server device.

As shown in fig. 1, the method may include:

s102, acquiring a person image aiming at the target person.

The target person refers to a user needing human-computer interaction, namely a person in the person image.

In general, the target person interacts with the human-computer interaction system to complete the interaction. The human-computer interaction system may include an image capture device that may be used to capture images of a person. The step can acquire the figure image acquired by the image acquisition equipment. Of course, the device may maintain a certain frame rate of the captured images, and this step may acquire at least a portion of the images from the captured images asynchronously or synchronously.

And S104, identifying the ages of the person images to obtain a target age group of the target person, and acquiring a target material library corresponding to the target age group.

The invention can adopt an artificial intelligence mode to identify the age.

In some embodiments, multiple classifiers may be employed for age identification. Namely, some character images are obtained in advance, and age groups of the character images are labeled manually to form training samples. The multi-classifier is then trained based on these training samples to have the ability to predict age groups.

The multi-classifier can then be used for age group prediction. It should be noted that in order to improve the accuracy of age prediction, the following embodiments may include an artificial intelligence based age identification method.

Some material libraries corresponding to different age groups can be deployed in advance in the human-computer interaction system. The material library may include information such as human material, sounds, etc. corresponding to the age group. In this step, after the target age group of the target person is predicted, a corresponding target material library may be obtained.

S106, performing character posture recognition on the character image to obtain a target character posture of the target character, and displaying an animation effect based on the target material library under the condition that the target character posture is a preset posture.

The invention can adopt an artificial intelligence mode to recognize the character posture.

In some embodiments, multiple classifiers may be trained in advance for character pose prediction. Then, based on the predicted structure of the multi-classifier, it is determined whether the pose of the target person is a preset pose. It should be noted that in order to improve the accuracy of age prediction, the following embodiments may include an artificial intelligence based age identification method.

In the case where the target character pose is a preset pose, animation effects may be generated and displayed in a display device based on materials such as character materials, sounds, and the like in the target material library.

Through the scheme recorded in S102-S106, the animation effect similar to the age group of the target character can be displayed through the capture of the human image and the image recognition technology based on artificial intelligence, the authenticity and the interestingness of man-machine interaction are enhanced, and the user experience is improved.

The following embodiments are described with reference to specific human-computer interaction scenarios.

Referring to fig. 2, fig. 2 is a schematic view of a human-computer interaction scene according to the present invention. As shown in fig. 2, the target person 41 is interacting with the human-computer interaction device 01. The human-computer interaction system 01 can be used to capture the person image of the target person, and recognize, analyze and track the human target such as the target person 41.

The human-computer interaction system 01 can also comprise a computing system 11 and an image acquisition device 12. Image capture device 12 may be, for example, a camera that may be used to visually monitor one or more persons, such as target person 41, such that facial features of the one or more persons and gestures performed may be captured and analyzed on-the-fly to perform one or more controls or actions on a user interface of an operating system or application.

The human-computer interaction system 01 may be connected to audiovisual equipment 02 for displaying video and audio, such as a television, an LED screen, a projector, a photovoltaic glass, etc. The audiovisual device 02 may receive the audiovisual signals from the computing system 11 and may then output audio and video to the target person 41. The audio-visual equipment is connected with the computing system in a wired mode such as an HDMI cable and a VGA cable or in a wireless mode such as Bluetooth and WIFI.

As shown in FIG. 2, human-computer interaction system 01 may be used to recognize and analyze a user's human target, such as target person 41. For example, target person 41 may be tracked using image capture device 12 such that extremity movements of target person 41 may be interpreted as controls that may be used to affect an application or operating system executed by computing system 11.

As shown in FIG. 2, the application executing on the computing system 11 may be a picture-switching human-computer-interaction game in which a target character 41 standing in the recommended capture area 302 may be participating. The computing system 11 may use the audiovisual device 02 to provide a visual representation of the picture switch to the target person 41. The computer environment 11 and the image capturing device 12 of the human-computer interaction system 01 can be used to recognize and analyze a character gesture (e.g., a left-hand or right-hand lifting motion) of the target character 41 in the physical space, thereby performing control of human-computer interaction based on the character gesture.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating a computing system according to the present invention. As shown in fig. 3, computing system 11 includes hardware components and software components.

One embodiment of a computing system 11 and image capture device 12 are used in human-computer interaction system 01 to identify a target person in a capture area. Image capture device 12 may include a camera 121 that captures live images and may also include storage 122. The storage 122 may store images or other information captured by the camera 121 for archive management.

Image capture device 12 communicates with computing system 11 via communication link 13. The communication link 13 may be a wired connection including an ethernet cable connection or a wireless connection such as bluetooth, WIFI, etc. The computing system 11 may interact with the image capture device 12 to obtain a person image of the target person 41. Computing system 11 includes base operating system 111, gesture filter 112, gesture library 113, age recognition algorithm 118, multimedia content database 119.

The base operating system 111 may be understood to be an operating system that supports the operation of other systems and/or units. The gesture filter 112 is used to perform character gesture recognition and perform a related determination of whether the character gesture hits a preset gesture. The gesture library 113 stores a number of preset gestures. The multimedia content database 119 can be understood as the aforementioned material library, and includes materials such as pictures, videos, and audios corresponding to different age groups.

Referring to fig. 4, fig. 4 is a flowchart illustrating a human-computer interaction according to the present invention. As shown in fig. 4, the method may include S401-S406.

S401, acquiring a person image of a target person.

In this step, the interactive system 01 may capture an image of a person located in the interactive space 302 through an image capturing device and transmit the image of the person to the computing system 11 for age recognition and person gesture recognition.

S402, identifying the age of the person image to obtain the target age range of the target person.

In the step, the spatial position between the key points of the face and the face image can be combined for age identification based on the face key point detection technology, so that the accuracy of age group determination is improved.

Referring to fig. 5, fig. 5 is a schematic flow chart of an age identification method according to the present invention. As shown in fig. 5, the method may include S502-S506.

And S502, carrying out face detection on the figure image to obtain a target face image.

In this step, a target detection-based model may be used for face detection. For example, the target detection model may be FASTER-RCNN, and training is performed based on some image samples labeled with face detection frames, so that the target detection model has the face detection capability. And then, carrying out face detection by using the target detection model to obtain a target face image in the character image.

S504, extracting the face key points at the preset positions of the target face image to obtain at least one face key point.

In this step, a key point extraction model can be used to extract key points. The key point extraction model is a regression model constructed based on deep learning. The output of the key point extraction model comprises the number of channels with the same number as the number of the key points of the human face, and the feature map of each channel is used for predicting one key point of the human face.

Referring to fig. 6, fig. 6 is a schematic diagram illustrating a key point according to the present invention. The key points illustrated in fig. 6 include a face key point and a human body key point. The face key points comprise a nose key point J0, a right eye key point J1, a left eye key point J2, a right ear key point J3, a left ear key point J4, a vertex key point J5 and a chin key point J6.

In this embodiment, the output of the keypoint extraction model includes 7 channels, and the feature map of each channel is used to predict a face keypoint.

The key point extraction model can be trained in advance based on the image sample marked with the face key points, so that the model has the capability of extracting the face key points. And then, the method can be used for extracting the key points of the human face based on the model.

S506, inputting the coordinate information of the at least one face key point and the target face image into an age identification model to obtain a target age group of the target person.

The age identification model comprises a deep learning model obtained by training face key points labeled in advance with age information and face image combination samples. Therefore, the age identification model has the capability of age identification. The target age bracket of the target person can be determined through the model.

Through S502-S506, the age can be identified by combining the space positions between the key points of the human face and the human face image based on the detection technology of the key points of the human face, so that the age can be identified by combining the distance factors between human face organs of people in different age groups, and the accuracy of age group determination is improved.

After the age group is obtained, in S403, a target material library corresponding to the target age group is obtained.

In this step, a target material library corresponding to the target age group may be selected from the multimedia content database 119.

S404, performing character posture recognition on the character image to obtain a target character posture of the target character.

In the step, preset human body key points can be extracted based on a key point extraction technology, and then a skeleton model is constructed based on the extracted key points to obtain the target character posture and improve the character posture identification accuracy.

With continued reference to fig. 6, as shown in fig. 6, the body key points include a right shoulder key point J7, a right elbow key point J9, a right wrist key point J11, a left shoulder key point J8, a left elbow key point J10, a left wrist key point J12, a right hip key point J13, a right knee key point J15, a right ankle key point J17, a right toe key point J20, a right toe key point J22, a right heel key point J24, a left hip key point J14, a left knee key point J16, a left ankle key point J18, a left toe key point J21, a left toe key point J23, and a left heel key point J25.

Referring to fig. 7, fig. 7 is a flowchart illustrating a person gesture recognition method according to the present invention. As shown in fig. 7, the method may include S702-S706.

S702, carrying out human body detection on the person images to obtain target human body images corresponding to each target person.

In this step, a target detection-based model can be used for face detection. For example, the target detection model may be FASTER-RCNN, and training is performed based on some image samples labeled with human body detection frames, so that the target detection model has the human body detection capability. Then, the target detection model can be used for human body detection, and a target human body image in the person image is obtained.

S704, aiming at each target human body image, extracting human body key points at preset positions of the target human body image to obtain at least one human body key point corresponding to each target person.

In some cases more than one target person may be included in the person image. In this step, a key point extraction model may be used to extract key points. The key point extraction model is a regression model constructed based on deep learning. The output of the key point extraction model comprises the number of channels with the same number as the number of the human key points, and the feature graph of each channel is used for predicting one human key point.

In this embodiment, the output of the keypoint extraction model includes 18 channels, and the feature map of each channel is used to predict a human keypoint.

The key point extraction model can be trained in advance based on the image sample marked with the human key points, so that the model has the capability of extracting the human key points. And then, the method can be used for extracting the key points of the human body based on the model.

S706, aiming at each target person, obtaining the target person posture of the target person according to the position information of the corresponding at least one human key point.

The relative position between the body's keypoints may indicate some pose of the character, such as right elbow keypoint J9 being higher than right shoulder keypoint J7 may indicate that the pose of the target character is right-hand lift, and for example, right hand completion keypoint J11 being lower than right hip keypoint J13 indicates that the character pose is right-hand natural drop. And is not exhaustive herein.

Through S702-S706, the human posture can be recognized based on the human key point detection technology and by combining the spatial positions among the human key points, and the accuracy of determining the human posture is improved.

S405, judging whether the posture of the target person is a preset posture.

Because the spatial positions of the key points of the human body are not very same under different postures, the postures can be preset by presetting relative position body conditions among some key points of the human body, and then whether the posture of the target person is the preset posture can be judged by judging whether the detected key points of the human body meet the preset relative position body conditions.

Specifically, whether the relative position between at least one human body key point corresponding to the target person meets a preset relative position condition or not may be determined;

determining the target character posture as a preset posture under the condition that the relative position meets the preset relative position condition;

determining that the target person gesture is not a preset gesture in a case where the relative position does not satisfy the preset relative position condition.

In this embodiment, different determination methods are provided according to the number of target persons.

In the case where the personal image includes a target person, if

Please refer to fig. 8A, 8B, 8C, 8D, 8E, and 8F, which are schematic diagrams illustrating the actions of the user to complete horizontal arm raising.

Fig. 8A-8F depict human skeletal mapping for performing a right arm lift gesture, in accordance with one embodiment. The target person at each point in time is depicted. Where fig. 8A is the first point in time and fig. 8F is the last point in time. In fig. 8A, the target person starts with two arms on either side of the torso. As shown in fig. 8B, the target person lifts his right wrist J11 upward in the vertical Y-axis. FIG. 8C shows the target character continuing to raise the right arm. The y-axis coordinates of the key point positions of the right arms J9 and J11 are higher than those of the key point positions of the left arms J10 and J12. FIG. 8D shows that when the absolute value of the right wrist J11 point location y minus the right elbow J9 point location y is less than 1/3 times the absolute value of the left wrist J12 point location y minus the left elbow J10 point location y, this can be expressed as:

namely, the right hand completes the horizontal arm-lifting posture. Fig. 8E and 8F depict a return action of the user replacing the right arm back to the starting position.

The user moves continuously during the gesture, the capturing device captures the images during the gesture, and model algorithm judgment is carried out. The gesture library 113 of the computing system 11 has stored therein a set of gesture actions with rules set, and the gesture filter may suggest rules to properly detect such movements of the user. In one embodiment, gesture filter 112 may specify that the only mapping information examined is that of an arm in motion, and that the movements of the rest of the user's body may be filtered or ignored.

To improve the accuracy of detecting the right arm lift posture, a posture filter supplements the situation of the left lateral body or the right lateral body. And subtracting the absolute value of the point y coordinate of the left elbow J10 point from the point y coordinate of the left shoulder J8 of the target person, wherein the absolute value is less than 40 pixel points, and judging that the target person moves to the left side. And subtracting the absolute value of the y coordinate of the point position of the right elbow J9 from the y coordinate of the point position of the right shoulder J7 of the target person, wherein the absolute value is less than 40 pixel points, and judging that the target person moves to the right side.

In some cases, two target persons may be included in the person image. In the case where the personal image includes two target persons, if for the first target person,

y _J17 >y _J18 and, furthermore,

with respect to the second target person, it is,

y _J12 >y _J8 ，y _J10 >y _J8 ，y _J11 <y _J9 <y _J7 ，x _J11 >x _J9 ，

Please refer to fig. 9A, 9B, and 9C, which are schematic diagrams of a double person completing the combination posture.

Fig. 9A, 9B, 9C depict the

target characters

41, 42 interacting in two with the human-machine interaction system 01 to perform a two-person action-combining gesture. As depicted in fig. 9A, the two-digit target character begins to act in preparation for the hint screen shown by the audiovisual device 02.

The target person 41 lifts up the right arm and the right foot, and five judgments are satisfied: the coordinate y of the right wrist J11 point location is higher than the coordinate y of the right shoulder J7 point location, (2) the coordinate y of the right elbow point J9 point location is higher than the coordinate y of the right shoulder J7 point location, (3) the absolute value of the right wrist J11 point location y coordinate minus the coordinate y of the right elbow J9 point location is smaller than the absolute value of the left wrist J12 point location y coordinate minus the coordinate y of the left elbow J10 point location, (4) the coordinate y of the left wrist J12 point location is higher than the coordinate y of the left elbow J10 point location and higher than the coordinate y of the left shoulder J8 point location, and (5) the coordinate y of the right ankle J17 point location is higher than the coordinate y of the left ankle J18 point location. The five items of judgment respectively correspond to:

y _J11 >y _J7 、y _J9 >y _J7 、|y _J11 -y _J9 |<|y _J12 -y _J10 |、y _J12 >y _J10 >y _J8 、y _J17 >y _J18 ，

the target person 42 lifts the left arm and the right arm tilts outward at the same time, and four judgments are met: (6) The left wrist J12 point location y coordinate is higher than the left shoulder J8 point location y coordinate, (7) the left elbow J10 point location y coordinate is higher than the left shoulder J8 point location y coordinate, (8) the right wrist J11 point location y coordinate is lower than the right elbow J9 point location y coordinate, and the right elbow J9 point location y coordinate is lower than the right shoulder J7 point location y coordinate, and (9) the right wrist J11 point location x coordinate is greater than the right elbow J9 point location x coordinate. The four judgments respectively correspond to:

y _J12 >y _J8 、y _J10 >y _J8 、y _J11 <y _J9 <y _J7 、x _J11 >x _J9 ，

in fig. 9B, the

target characters

41 and 42 have correctly completed gesture execution, and the human-computer interaction system 01 responds to detection of a two-person combined action, and the audio-visual device 02 interface plays an animation effect.

In FIG. 9C, the audio visual device 02 interface plays through the preset animation effect and the

users

41, 42 return to the normal standing position.

S406, under the condition that the target character posture is a preset posture, displaying an animation effect based on the target material library.

The animation effect includes at least one of:

an animation character corresponding to the target character, the animation character having the target character pose;

and a preset interface corresponding to the preset gesture.

The preset interface comprises at least one of the following items:

successfully completing a celebration interface with a preset gesture;

and a subsequent interface corresponding to the current interface.

In some embodiments, the target story library includes animated character images of animated characters of the target age group. In S406, the target character pose and the animation character image included in the target material library may be input into a character generation model that has been trained in advance, so as to obtain a target animation character image having the target character pose; the target animated character image is then presented. Therefore, the display of the animation character with the target character posture is completed, the user experience is improved, and the human-computer interaction effect is improved.

In some embodiments, the character generation model may include Pose GAN, which is a model that may blend character gestures into an image. The training process of the model may refer to the generation of a GAN network against, i.e., training based on some character pose, an image of the character, and a combination of images of the image of the character incorporating the character pose. The generation of the target animation character image can be performed by utilizing the dry model, so that the combination of the character posture and the animation character in the material library is completed by utilizing the deep learning technology, the human-computer interaction is performed more vividly, and the user experience and the human-computer interaction effect are further improved.

Through S401-406, the animation effect similar to the age group of the target character can be displayed through the capture of the character image and the image recognition technology based on artificial intelligence in the graphic scene, the authenticity and the interestingness of man-machine interaction are enhanced, and the user experience is improved.

The invention further provides a man-machine interaction device 1000 based on image recognition. Referring to fig. 10, fig. 10 is a schematic structural diagram of a human-computer interaction device based on image recognition according to the present invention. The apparatus 1000 comprises:

an obtaining module 1010, configured to obtain a person image for a target person;

the identifying and acquiring module 1020 is configured to perform age identification on the person image to obtain a target age group of the target person, and acquire a target material library corresponding to the target age group;

the recognition and display module 1030 is configured to perform character posture recognition on the character image to obtain a target character posture of the target character, and display an animation effect based on the target material library when the target character posture is a preset posture.

The identifying and obtaining module 1020 is further configured to:

carrying out face detection on the figure image to obtain a target face image;

extracting face key points at preset positions from the target face image to obtain at least one face key point;

inputting the coordinate information of the at least one face key point and the target face image into an age identification model to obtain a target age group of the target person;

the age identification model comprises a deep learning model obtained by training face key points labeled in advance with age information and face image combination samples.

In some embodiments, the identifying and presenting module 1030 is further configured to:

carrying out human body detection on the figure images to obtain target human body images corresponding to each target figure;

extracting human key points at preset positions of the target human body images aiming at each target human body image to obtain at least one human key point corresponding to each target person;

and aiming at each target person, obtaining the target person posture of the target person according to the position information of the corresponding at least one human body key point.

In some embodiments, the apparatus 1000 further comprises a determining module configured to:

whether the relative position between at least one human key point corresponding to the target person meets a preset relative position condition is judged;

In some embodiments, the determining module is further configured to:

in the case where the personal image includes a target person, if

Determining that the relative position meets the preset relative position condition; wherein, y _Jn The ordinate of the key point Jn represents the human body;

y _J17 >y _J18 and, furthermore,

with respect to the second target person, there is,

y _J12 >y _J8 ，y _J10 >y _J8 ，y _J11 <y _J9 <y _J7 ，x _J11 >x _J9 ，

In some embodiments, the animation effect comprises at least one of:

an animated character corresponding to the target character, the animated character having the target character pose;

and a preset interface corresponding to the preset gesture.

In some embodiments, the target story library includes animated character images of animated characters of the target age group; the identifying and displaying module 1030, further configured to:

inputting the target character posture and the animation character image in the target material library into a character generation model which is trained in advance to obtain a target animation character image with the target character posture;

and displaying the target animation character image.

In some embodiments, the preset interface comprises at least one of:

successfully completing a celebration interface with a preset gesture;

and a subsequent interface corresponding to the current interface.

In the embodiment, the animation effect similar to the age group of the target character can be displayed by capturing the image of the character and the image recognition technology based on artificial intelligence, so that the authenticity and the interestingness of man-machine interaction are enhanced, and the user experience is improved.

The embodiment of the man-machine interaction device based on the image recognition can be applied to electronic equipment. Accordingly, the present invention discloses an electronic device, which may comprise: a processor.

A memory for storing processor-executable instructions.

Wherein the processor is configured to call the executable instructions stored in the memory to implement the human-computer interaction method based on image recognition shown in any one of the foregoing embodiments.

Referring to fig. 11, fig. 11 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

As shown in fig. 11, the electronic device may include a processor for executing instructions, a network interface for making network connections, a memory for storing operation data for the processor, and a non-volatile memory for storing human-machine interaction device corresponding instructions based on image recognition.

The embodiments of the apparatus may be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation. In terms of hardware, in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 11, the electronic device where the apparatus is located in the embodiment may also include other hardware generally according to the actual function of the electronic device, which is not described again.

It is to be understood that, in order to increase the processing speed, the human-computer interaction device corresponding instruction based on image recognition may also be directly stored in the memory, which is not limited herein.

The present invention proposes a computer-readable storage medium, which stores a computer program, which can be used to cause a processor to execute the human-computer interaction method based on image recognition shown in any of the foregoing embodiments.

As will be appreciated by one skilled in the art, one or more embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

"and/or" in the present invention means having at least one of the two. The embodiments of the present invention are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the data processing apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.

While this invention contains many specific implementation details, these should not be construed as limitations on the scope of any disclosure or of what may be claimed, but rather as descriptions of features specific to particular embodiments disclosed. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various platform modules and components in the embodiments described should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and platforms may generally be integrated together in a single software product or packaged into multiple software products.

While the invention has been described in terms of what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications, equivalent arrangements, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A human-computer interaction method based on image recognition is characterized by comprising the following steps:

acquiring a figure image aiming at a target figure;

identifying the age of the person image to obtain a target age group of the target person, and acquiring a target material library corresponding to the target age group;

and performing character posture recognition on the character image to obtain a target character posture of the target character, and displaying an animation effect based on the target material library under the condition that the target character posture is a preset posture.

2. The method of claim 1, wherein the age identifying the person image to obtain the target age group of the target person comprises:

carrying out face detection on the figure image to obtain a target face image;

3. The method of claim 1, wherein the performing of the character gesture recognition on the character image to obtain the target character gesture of the target character comprises:

aiming at each target human body image, extracting human body key points at preset positions of the target human body image to obtain at least one human body key point corresponding to each target person;

4. The method of claim 3, wherein determining whether the target person pose is a preset pose comprises:

5. The method of claim 4, wherein the body key points comprise a right shoulder key point J7, a right elbow key point J9, a right wrist key point J11, a left shoulder key point J8, a left elbow key point J10, a left wrist key point J12, a right hip key point J13, a right knee key point J15, a right ankle key point J17, a right toe key point J20, a right toe key point J22, a right heel key point J24, a left hip key point J14, a left knee key point J16, a left ankle key point J18, a left toe key point J21, a left toe key point J23, and a left heel key point J25.

6. The method according to claim 5, wherein the determining whether the relative position between the at least one human key point corresponding to the target person meets a preset relative position condition comprises:

in the case where the personal image includes a target person, if

/>

y _J17 >y _J18 and, furthermore,

with respect to the second target person, it is,

y _J12 >y _J8 ，y _J10 >y _J8 ，y _J11 <y _J9 <y _J7 ，x _J11 >x _J9 ，

7. The method of claim 1, wherein the animation effect comprises at least one of:

and a preset interface corresponding to the preset gesture.

8. The method of claim 7, wherein the target story library includes animated character images of animated characters of the target age group; the displaying of the animation character corresponding to the target character based on the target material library comprises:

and displaying the target animation character image.

9. The method of claim 7, wherein the preset interface comprises at least one of:

successfully completing a celebration interface of a preset gesture;

and a subsequent interface corresponding to the current interface.

10. A human-computer interaction device based on image recognition, characterized in that the device comprises:

the acquisition module is used for acquiring a character image aiming at a target character;

the identification and acquisition module is used for identifying the ages of the person images to obtain a target age group of the target person and acquiring a target material library corresponding to the target age group;

and the recognition and display module is used for recognizing the character posture of the character image to obtain the target character posture of the target character, and displaying the animation effect based on the target material library under the condition that the target character posture is a preset posture.