WO2019128558A1 - Analysis method and system of user limb movement and mobile terminal - Google Patents

Analysis method and system of user limb movement and mobile terminal Download PDF

Info

Publication number
WO2019128558A1
WO2019128558A1 PCT/CN2018/116700 CN2018116700W WO2019128558A1 WO 2019128558 A1 WO2019128558 A1 WO 2019128558A1 CN 2018116700 W CN2018116700 W CN 2018116700W WO 2019128558 A1 WO2019128558 A1 WO 2019128558A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
user
information
limb
module
Prior art date
Application number
PCT/CN2018/116700
Other languages
French (fr)
Chinese (zh)
Inventor
张文波
刘裕峰
刘锦龙
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to CN201711464338.2A priority Critical patent/CN108062533A/en
Priority to CN201711464338.2 priority
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2019128558A1 publication Critical patent/WO2019128558A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00302Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00335Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6201Matching; Proximity measures
    • G06K9/6215Proximity measures, i.e. similarity or distance measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches

Abstract

Disclosed in embodiments of the invention are an analysis method and system of user limb movement and a mobile terminal. The method comprises the following steps: acquiring a limb image of a user; identifying a limb language characterized by the limb image; and matching visualized information or audio information with the same meaning as the limb language. The limb language characterized by the limb image of the user in a picture is identified, and the visualized information or the audio information with the same meaning as the limb language is matched, thus the information expressed by limb features in the image is presented in a manner which can be directly interpreted by human, deep-level interpretation for human limb movement is realized, and communication between language-impaired persons or users without a common language is facilitated.

Description

Method, system and mobile terminal for analyzing user's limb movement

The present application claims priority to Chinese Patent Application No. 200911464338.2, entitled "analysis method, system and mobile terminal for user's limb movement", which is filed on December 28, 2017, the entire contents of which are incorporated by reference. In this application.

Technical field

The image processing field in the embodiment of the present application is particularly a method, a system, and a mobile terminal for analyzing a user's limb motion.

Background technique

Image comprehension is the study of a computer system to interpret images to achieve a science similar to the human visual system's understanding of the outside world. To understand a science in the outside world, the questions that are discussed are: what information is needed from the image to accomplish a task, and how to use it to obtain the necessary explanations. The study of image understanding involves and includes methods, devices, and specific application implementations for obtaining images.

In the related art, image understanding technology is used to recognize text information in a picture, and to recognize and convert the text in the bitmap form into editable text. The inventor created by the present application found in the research that the image understanding in the related art is limited to converting the fixed bitmap pattern into an editable form, and cannot perform deeper analysis and application according to the understanding result after understanding the image information. .

Summary of the invention

Embodiments of the present application provide an analysis method for a user's body motion by analyzing a body language of a user in an image, and matching information that can be directly perceived by a human body according to a body language, and displaying and applying information that can be directly perceived by a human being. System and mobile terminal.

In order to solve the above technical problem, a technical solution adopted by the embodiment created by the present application is to provide an analysis method for a user's limb motion, including the following steps:

Obtaining a limb image of the user;

Identifying the body language of the limb image representation;

Matching visual or audio information having the same meaning as the body language.

Optionally, the parsing method of the user's limb motion further includes the following steps:

Obtaining a face image of the user;

Identifying human facial motion information represented by the facial image;

An emoticon image having the same action meaning as the human facial motion information is matched.

Optionally, the step of acquiring a limb image of the user includes:

Obtaining a face image of the user;

The step of identifying the body language of the limb image representation includes:

Identifying human facial motion information represented by the facial image;

The step of matching the visual information or the audio information having the same meaning as the body language includes:

An emoticon image having the same action meaning as the human facial motion information is matched.

Optionally, before the step of acquiring a face image of the user, the method further includes the following steps:

Retrieving at least one of the pre-stored emoticons;

The emoticon image is placed in the display container according to a preset script to visually display the emoticon image.

Optionally, the step of matching an emoticon image having the same action meaning as the human facial motion includes the following steps:

Comparing the human facial motion information with an expression image within the display container;

When the action meaning represented by the expression picture in the display container is the same as the human face action information, it is confirmed that the display container has an expression picture having the same action meaning as the human face action.

Optionally, after the step of matching the emoticon image having the same action meaning as the human facial motion, the method further includes the following steps:

Obtaining matching degree information of the human facial action information and the emoticon image;

Calculating a bonus score corresponding to the matching degree information according to a preset matching rule.

Optionally, after the step of calculating the bonus score corresponding to the matching degree information according to the preset matching rule, the method further includes the following steps:

Recording all the bonus points within the preset first time threshold;

The bonus scores are summed to form a final score for the user within the first time threshold.

Optionally, the parsing method of the user's limb motion further includes the following steps:

Extracting a preset number of expression images representing human emotions from the expression packs in a preset unit time, and placing the expression images in the display container;

Collecting a face image of the user in a timed or real time in the unit time, and identifying the emotion information represented by the face image, and the matching degree between the face image and the emotion information;

An emoticon image having the same emotional meaning as the facial image is matched, and a bonus score of the facial image is confirmed according to the matching degree.

Optionally, the parsing method of the user's limb motion further includes the following steps:

Extracting a preset number of expression images representing human emotions from the expression packs in a preset unit time, and placing the expression images in the display container;

The step of acquiring a face image of the user includes:

Collecting a user's face image in a timed or real time in the unit time;

And the step of identifying the facial motion information of the facial image representation includes:

Identifying the emotion information represented by the face image, and matching the face image with the emotion information;

The step of matching an emoticon image having the same action meaning as the human facial action information includes:

An emoticon image having the same emotional meaning as the facial image is matched, and a bonus score of the facial image is confirmed according to the matching degree.

Optionally, the step of collecting the face image of the user in the unit time or in real time and identifying the emotion information represented by the face image includes the following steps:

Collecting a user's face image;

Inputting the face image into a preset emotion recognition model, and acquiring classification result and classification data of the face image;

Determining the emotion information of the face image according to the classification result, and determining a matching degree of the face image and the emotion information according to the classification data.

Optionally, the step of identifying the emotion information represented by the face image and the matching degree between the face image and the emotion information includes:

Inputting the face image into a preset emotion recognition model, and acquiring classification result and classification data of the face image;

Determining the emotion information of the face image according to the classification result, and determining a matching degree of the face image and the emotion information according to the classification data.

To solve the above technical problem, the embodiment of the present application further provides an analysis system for a user's limb movement, including:

An acquisition module, configured to acquire a limb image of the user;

a processing module for identifying a body language of the limb image representation;

An execution module for matching visual information or audio information having the same meaning as the body language.

Optionally, the analyzing system of the user's limb motion further includes:

a first obtaining submodule, configured to acquire a face image of the user;

a first processing submodule, configured to identify human facial motion information represented by the facial image;

The first execution sub-module is configured to match an emoticon image having the same action meaning as the human facial motion information.

Optionally, the acquiring module includes: a first acquiring submodule, configured to acquire a face image of the user;

The processing module includes: a first processing submodule, configured to identify human facial motion information represented by the facial image;

The execution module includes: a first execution sub-module, configured to match an emoticon image having the same action meaning as the human facial action information.

Optionally, the analyzing system of the user's limb motion further includes:

a first calling submodule, configured to call at least one of the pre-stored emoticons;

The first display sub-module is configured to place the emoticon image in a display container according to a preset script, so that the emoticon image is visually displayed.

Optionally, the analyzing system of the user's limb motion further includes:

a first comparison sub-module, configured to compare the human facial motion information with an expression image in a range of the display container;

a first confirmation sub-module, configured to confirm that the display container has the same action as the human face motion when the action meaning represented by the expression picture in the display container is the same as the human face motion information The expression of the meaning of the image.

Optionally, the analyzing system of the user's limb motion further includes:

a second obtaining submodule, configured to acquire matching degree information of the human facial action information and the emoticon image;

The second execution sub-module is configured to calculate a bonus score corresponding to the matching degree information according to a preset matching rule.

Optionally, the analyzing system of the user's limb motion further includes:

a first recording sub-module, configured to record all the bonus points in the preset first time threshold;

And a third execution sub-module, configured to accumulate the bonus scores to form a final score of the user within the first time threshold.

Optionally, the analyzing system of the user's limb motion further includes:

a third obtaining sub-module, configured to randomly extract a preset number of emoticons representing human emotions from the emoticon package in a preset unit time, and place the emoticon images in a display container;

a second processing submodule, configured to collect a face image of the user in a timed or real time in the unit time, and identify the emotion information represented by the face image, and match the face image with the emotion information degree;

And a fourth execution sub-module, configured to match an emoticon image having the same emotional meaning as the facial image, and confirm a bonus score of the facial image according to the matching degree.

Optionally, the analyzing system of the user's limb motion further includes:

a third obtaining sub-module, configured to randomly extract a preset number of emoticons representing human emotions from the emoticon package in a preset unit time, and place the emoticon images in a display container;

a first acquiring submodule, configured to collect a face image of the user in a timed or real time in the unit time;

a first processing sub-module, configured to identify emotion information represented by the face image, and a matching degree between the face image and the emotion information;

And a first execution submodule, configured to match an emoticon image having the same emotional meaning as the facial image, and confirm a bonus score of the facial image according to the matching degree.

Optionally, the analyzing system of the user's limb motion further includes:

a first collection sub-module, configured to collect a face image of the user;

a third processing sub-module, configured to input the facial image into a preset emotion recognition model, and obtain a classification result and classification data of the facial image;

a fifth execution sub-module, configured to determine emotion information of the face image according to the classification result, and determine a matching degree between the face image and the emotion information according to the classification data.

Optionally, the first processing submodule is configured to:

Inputting the face image into a preset emotion recognition model, and acquiring classification result and classification data of the face image;

Determining the emotion information of the face image according to the classification result, and determining a matching degree of the face image and the emotion information according to the classification data.

To solve the above technical problem, the embodiment of the present application further provides a mobile terminal, including:

One or more processors;

Memory

One or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the The method for analyzing the user's limb movement.

On the other hand, the embodiment of the present application provides a computer readable storage medium, where the computer readable storage medium stores a computer program, and when the computer program is executed by the processor, the foregoing An analysis method step of the user's limb movement.

On the other hand, the embodiment of the present application provides a computer program product, when it is run on a computer, causing the computer to perform the step of analyzing the user's limb motion according to any of the above embodiments.

The beneficial effects of the embodiments of the present application are: by identifying the body language of the user's limb image in the picture, and matching the visual or audio information having the same meaning as the body language. In this way, the information expressed by the limb features in the image is presented in a way that can be directly interpreted by humans, thereby realizing a deep interpretation of the human body movements, facilitating communication between the language disabled or the languageless users.

Of course, implementing any of the products or methods of the present application necessarily does not necessarily require all of the advantages described above to be achieved at the same time.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present application. Other drawings can also be obtained from those skilled in the art based on these drawings without paying any creative effort.

1 is a schematic diagram of a basic flow of a method for analyzing a user's limb motion according to an embodiment of the present application;

2 is a schematic diagram showing a first embodiment of a method for analyzing a user's limb motion according to an embodiment of the present application;

3 is a schematic view showing a second embodiment of a method for analyzing a user's limb motion according to an embodiment of the present application;

4 is a schematic flowchart of analyzing an application of a user's facial expression according to an embodiment of the present application;

FIG. 5 is a schematic flowchart diagram of an embodiment of displaying an emoticon image according to an embodiment of the present application;

FIG. 6 is a schematic flowchart of confirming that an emoticon image in a display container is the same as a human facial action information according to an embodiment of the present application;

FIG. 7 is a schematic flowchart of performing rewards by matching results according to an embodiment of the present application;

8 is a schematic flowchart of a statistical aliquot of an embodiment of the present application;

FIG. 9 is a schematic flowchart of analyzing emotion information of a face image according to an embodiment of the present application;

FIG. 10 is a schematic flowchart diagram of emotion information classification and matching degree detection of a face image according to an embodiment of the present application;

11 is a schematic diagram showing a third embodiment of a method for analyzing a limb motion of a user according to an embodiment of the present application;

12 is a basic structural block diagram of an analysis system for a user's limb movement according to an embodiment of the present application;

FIG. 13 is a schematic diagram of a basic structure of a mobile terminal according to an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present application.

In the flow of the description in the specification and claims of the present application and the above-described figures, a plurality of operations in a specific order are included, but it should be clearly understood that these operations may not follow the order in which they appear in this document. Execution or parallel execution, the serial number of the operation such as 101, 102, etc., is only used to distinguish different operations, and the serial number itself does not represent any execution order. Additionally, these processes may include more or fewer operations, and these operations may be performed sequentially or in parallel. It should be noted that the descriptions of “first” and “second” in this document are used to distinguish different messages, devices, modules, etc., and do not represent the order, nor the “first” and “second”. It is a different type.

The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present application without creative efforts are within the scope of the present application.

Example

Please refer to FIG. 1. FIG. 1 is a schematic flow chart of a method for analyzing a user's limb motion according to an embodiment of the present invention.

As shown in FIG. 1 , a method for analyzing a user's limb motion includes the following steps:

S1100, acquiring a limb image of the user;

The limb images in this embodiment may include, but are not limited to, a face image, a gesture motion image, and/or a lip motion image.

Among them, the face image can also be referred to as a face image.

In an implementation manner, the terminal acquires a target image that includes a limb image of the user stored in the local storage space by accessing the local storage space designation area. In another implementation manner, the user's limb image is directly acquired in real time by turning on the photographing device disposed on the terminal or connected to the terminal.

In another implementation, the terminal may acquire a target image including the user's limb image stored in the external device having the storage function by accessing the connected specific area of the external device having the storage function.

In one case, the limb image may refer to an image corresponding to the region where the user's limb is located in the target image. In another case, the limb image may refer to a complete target image.

S1200. Identify a body language of the limb image representation;

Body language refers to the specific meaning of the action representation of the limb image. The body language includes (not limited to): the emotion information represented by the user's face image, the language information of the gesture action image action representation or the language information represented by the lip motion image.

That is to say, body language refers to the specific meaning of the action representation of the limb image, including but not limited to: emotional information represented by the user's facial image, language information of the gesture action representation in the gesture motion image, and/or lip motion image Language information characterized by lip movements.

In the present embodiment, the technical solution adopted for identifying the body language of the limb image representation may be: performing recognition by a deep learning method. Specifically, a large number of pictures including body images of the human body are collected as training samples, and subjective judgments of body functions expressed by various body images are obtained, and subjective meanings of body movements of each training sample are obtained, and the subjective meaning is set. The expected output for this training sample. Then, the training sample is input into the convolutional neural network model, and the feature data of the training sample is extracted, and the classification data of the training sample is output, and the classification data is the probability value of the training sample belonging to each classification result in the current training. The classification results in the examples are the names of different body language. The classification result with the largest probability value and greater than the preset measurement threshold is the excitation output of the training sample in the current round of training. Comparing whether the desired output is consistent with the excitation output, and the training ends when the expected output is consistent with the excitation output; when the expected output is inconsistent with the excitation output, correcting the weight of the convolutional neural network by the back propagation algorithm to adjust the output result . After adjusting the weight of the convolutional neural network, the training samples are re-inputted and cycled until the desired output is consistent with the excitation output.

Among them, the classification result can be set according to the demand. The classification result can be several according to the complexity of the output, and the more the classification result, the higher the complexity of the training.

In one case, it is sometimes necessary to repeatedly input to verify the stability of the output, and when the stability is good, the training is ended. That is to say: in order to ensure the stability of the output of the convolutional neural network, the training samples can be repeatedly input to the convolutional neural network until the excitation output of the convolutional neural network for each training sample, the expectation corresponding to the training sample After the output of the same probability exceeds the preset value, the training ends.

In one case, the expected output is consistent with the excitation output, and may refer to: the number of training samples whose corresponding excitation output is consistent with the corresponding desired output, not less than the preset number. Correspondingly, the expected output is inconsistent with the excitation output, and may refer to: the number of training samples whose corresponding excitation output is consistent with the corresponding desired output is less than the preset number.

In another case, the expected output is consistent with the excitation output, and may refer to: the ratio of the training samples whose corresponding excitation output is consistent with the corresponding expected output, and the ratio of the total number of training samples is not less than the preset ratio. Correspondingly, the expected output is inconsistent with the excitation output, and may refer to: the ratio of the training samples whose corresponding excitation output is consistent with the corresponding expected output, and the ratio of the total number of training samples is less than the preset ratio.

By training a large number of limb images that can characterize different body language to a convolved convolutional neural network model, the body language of the limb image (not involved in training) can be quickly and accurately determined.

S1300: Matching visual information or audio information having the same meaning as the body language.

Visual information refers to information that can be observed by human eyes, including but not limited to: text information, picture information, and/or video information.

The body language represented by the user's limb image is obtained by the convolutional neural network model, that is, the text information of the user's limb image representation is obtained. The text information is used as a search key, and the visual information or the audio information having the same meaning as the text information is retrieved in the local database. In some embodiments, in order to facilitate matching, the visualization information or the audio information stored in the local database is set according to the meaning of its expression, so as to facilitate the body language to perform corresponding matching by retrieving the tags.

That is, in one implementation, the body language represented by the user's limb image is obtained by the convolutional neural network model, that is, the description information for characterizing the limb in the limb image is obtained, and the description information is one. A type of textual information can be called a literal language. Further, using the description information as a search key, in the first preset database, the visualization information or the audio information based on the same meaning as the description information is retrieved. In some implementations, for the convenience of matching, one or more labels may be configured according to the meaning expressed by each of the visualization information or the audio information for the visualization information or the audio information stored in the first preset database, so as to facilitate subsequent The retrieval process, that is, retrieval matching through tags at the time of retrieval.

The first preset database may be stored locally in the terminal, or may be stored in an external device with a storage function connected to the terminal.

Please refer to FIG. 2. FIG. 2 is a schematic diagram showing the first embodiment of the method for analyzing the limb motion of the user.

As shown in FIG. 2, in some embodiments, the parsing method of the user's limb motion is used to parse the user's limb motion, and convert the motion into a text language, and obtain the user's limb image in real time, and convert the image into text. The language is output. For example, identify sign language in dummy or special operations and convert the body language into a written language. In Figure 2, the "hello" expressed by the user's body language is converted into the text language "hello".

Please refer to FIG. 3. FIG. 3 is a schematic diagram showing a second embodiment of a method for analyzing a user's limb motion in the present embodiment.

As shown in FIG. 3, in some embodiments, the emotional information represented by the facial expression action of the user is identified. The expression having the same emotional meaning as the emotion information is retrieved for output, but is not limited thereto. In some embodiments, text, pictures, animations, or speech that have the same emotional meaning as the emotional information can be output. As shown in FIG. 3, when the user performs a chat, when the user has an expression of joy, an expression with a meaning of joy is sent to the other party. For example: an emoticon with a meaning of joy.

In the above embodiment, the body language represented by the limb image of the user in the picture is recognized, and the visual information or the audio information having the same meaning as the body language is matched. In this way, the information expressed by the limb features in the image is presented in a way that can be directly interpreted by humans, thereby realizing a deep interpretation of the human body movements, facilitating communication between the language disabled or the languageless users.

Please refer to FIG. 4. FIG. 4 is a schematic flowchart of analyzing an application of a user's facial expression according to an embodiment of the present invention.

As shown in FIG. 4, the method for analyzing the user's limb motion further includes the following steps:

S2100: acquiring a face image of the user;

In an implementation manner, the terminal acquires a target image including a face image of the user stored in the local storage space by accessing the local storage space designation area. In another implementation manner, the user's face image is directly acquired in real time by turning on the photographing device disposed on the terminal or connected to the terminal.

In another implementation, the terminal may acquire a target image including a face image of the user stored in the external device having the storage function by accessing the connected specific area of the external device having the storage function.

S2200: Identify human face motion information represented by the facial image;

The facial motion information of the human body includes emotional information represented by the facial motion of the human body, and may also be referred to as facial expression information, such as emotions, sorrows, sorrows, and the like; and may also be an action that characterizes the user without emotional representation, such as licking, tongue, or wrinkle forehead. .

In the embodiment, the technical solution for recognizing the facial motion information of the human face image representation may be: performing the method of deep learning. Specifically, a large number of pictures including facial images of the human body are collected as training samples, and subjective judgments of human body facial motion information expressed by various facial images are obtained, and subjective meanings of limb movements of each training sample are obtained, and The subjective meaning is set to the desired output of the training sample. Then, the training sample is input into the convolutional neural network model, and the feature data of the training sample is extracted, and the classification data of the training sample is output, and the classification data is the probability value of the training sample belonging to each classification result in the current training. The classification result in the embodiment is the name of different human facial motion information. The classification result with the largest probability value and greater than the preset measurement threshold is the excitation output of the training sample in the current round of training. Comparing whether the desired output is consistent with the excitation output, and the training ends when the expected output is consistent with the excitation output; when the expected output is inconsistent with the excitation output, correcting the weight of the convolutional neural network by the back propagation algorithm to adjust the output result . After adjusting the weight of the convolutional neural network, the training samples are re-inputted and cycled until the desired output is consistent with the excitation output.

Among them, the classification result can be set according to the demand. The classification result can be several according to the complexity of the output, and the more the classification result, the higher the complexity of the training.

In one case, it is sometimes necessary to repeatedly input to verify the stability of the output, and when the stability is good, the training is ended. That is to say: in order to ensure the stability of the output of the convolutional neural network, the training samples can be repeatedly input to the convolutional neural network until the excitation output of the convolutional neural network for each training sample, the expectation corresponding to the training sample After the output of the same probability exceeds the preset value, the training ends.

In one case, the expected output is consistent with the excitation output, and may refer to: the number of training samples whose corresponding excitation output is consistent with the corresponding desired output, not less than the preset number. Correspondingly, the expected output is inconsistent with the excitation output, and may refer to: the number of training samples whose corresponding excitation output is consistent with the corresponding desired output is less than the preset number.

In another case, the expected output is consistent with the excitation output, and may refer to: the ratio of the training samples whose corresponding excitation output is consistent with the corresponding expected output, and the ratio of the total number of training samples is not less than the preset ratio. Correspondingly, the expected output is inconsistent with the excitation output, and may refer to: the ratio of the training samples whose corresponding excitation output is consistent with the corresponding expected output, and the ratio of the total number of training samples is less than the preset ratio.

Through the training of a large number of face images capable of characterizing different facial motion information to the convolved convolutional neural network model, it is possible to quickly and accurately determine the facial motion information of the face image (not participating in the training).

The S2300 matches an emoticon image having the same action meaning as the human facial motion information.

An emoticon image refers to an external device with a storage function connected to a terminal or a terminal, and stores an emoticon or animated emoticon designed to simulate a user's expression.

The human face motion information represented by the user's face image is obtained by the convolutional neural network model, that is, the text information of the user's face image representation is obtained. Using the text information as a search key, an emoticon image having the same meaning as the text information is retrieved in the local database. In some embodiments, to facilitate matching, the emoticon images stored in the local database are set with one or more tags according to the meaning of their expressions, so that the facial motion information of the human body can be matched by the retrieval tag.

That is to say, in an implementation manner, the facial motion information represented by the facial image is obtained by the convolutional neural network model, that is, the description information for characterizing the facial expression in the facial image is obtained, The description information is a type of text information, which may be referred to as a word language. Further, using the description information as a search key, in the second preset database, an emoticon image based on the same meaning as the description information is retrieved. In some implementation manners, in order to facilitate matching, one or more tags may be configured according to the meaning expressed by each emoticon image for the emoticon images stored in the second preset database, so as to facilitate the subsequent retrieval process, that is, Search and match through tags when searching.

The second preset database may be stored locally in the terminal, or may be stored in an external device with a storage function connected to the terminal.

After parsing the expression information of the user, that is, the emotion information, the specific meaning represented by the expression information is obtained, and then the expression image with the same meaning as the expression information is matched, which is convenient for the user to input, and also combines the analysis result with the user expression. Deeper interaction processing.

Please refer to FIG. 5. FIG. 5 is a schematic flow chart of an embodiment of displaying an emoticon image according to an embodiment of the present invention.

As shown in FIG. 5, before step S2100, the following steps are further included:

S2011, calling at least one of the pre-stored expression images;

In an implementation manner, an expression package including a plurality of emoticons or a plurality of emoticons garbled in the area or folder are stored in the specified storage area or in the folder in the terminal storage space. Among them, each facial expression image is characterized by a human facial motion.

That is to say, an external device having a storage function in a designated area of the terminal or connected may store an emoticon pack including a plurality of emoticons. Among them, each facial expression image is characterized by a human facial motion.

In one implementation, one or more emoticons may be called for display according to a preset script.

S2012: The emoticon image is placed in a display container according to a preset script, so that the emoticon image is visually displayed.

The script is a preset program for controlling the display action of the emoticon picture, wherein the time control of the emoticon time in the display area is set, the motion control for setting the motion track of the picture in the display area, and the setting picture are successfully matched in the display area. Emoticon rendering control. By traversing the above controls, the display of the emoticon in the display container can be completed.

Place the emoticon image in the display container. The emoticon placed in the display container is displayed in the display area of the terminal for viewing by the user after the typographic rendering is performed in the display container by using the parameters set by the preset script.

In some embodiments, the emoticon image is used to emulate the application, and the face image of the user is collected in real time in the state where the terminal camera is turned on, and then the face image is displayed on the terminal screen, and the user imitates the emoticon image within the display screen range. The action of classifying and recognizing the image simulated by the user, when the facial expression of the user is the same as the action of one or more expression images in the range of the display screen, scoring the successfully matched expression image, and pressing the preset The script renders the illuminating of the emoticon.

That is, in some embodiments, an emoticon can be used to emulate an application. In a state where the terminal camera is turned on, the terminal can use the opened camera to collect the face image of the user in real time, and then display the face image in the display area of the terminal, such as the display screen of the terminal. The display screen of the terminal can display an emoticon image. Further, the user can imitate the expression in the emoticon image displayed in the display screen range of the terminal, and make an expression that is the same as the expression in the emoticon image displayed in the display screen range of the terminal. The action, in turn, the terminal uses the opened camera to collect the expression action made by the user, and classifies and recognizes the collected image, that is, the image that the user performs imitating. When the facial expression made by the user user, for example, the facial expression is the same as the expression in one or more expression images in the range of the display screen of the terminal, the successful matching facial expression image is scored, and the preset script pair is pressed. The illuminating rendering of the emoticon.

Please refer to FIG. 6. FIG. 6 is a schematic flowchart of confirming that the emoticon image in the range of the display container is the same as the facial motion information of the human body in the embodiment.

As shown in FIG. 6, step S2300 specifically includes the following steps:

S2310: Compare the human face motion information with an expression image in a range of the display container;

The face image of the user is collected in real time, and then the face image is displayed on the terminal screen, and the user imitates the action of displaying the image in the screen range, classifies and recognizes the image imitated by the user, and then classifies the result and the range within the display container. The action information represented by the expression picture is compared.

That is to say, in an implementation manner, the terminal places the emoticon image in the display container according to the preset script, so that the emoticon image is visually displayed, that is, the terminal displays the emoticon image on the display screen of the terminal, and can be collected by the camera in real time. a face image of the user, wherein the face image may include: an expression action performed by the user, and the performed gesture action may be: an action performed by the user to imitate an expression represented by the expression image displayed in the display screen of the terminal, Or, the action that the user makes at will. Further, after obtaining the face image of the user, the terminal may classify and recognize the facial expressions included in the facial image of the user, that is, recognize the facial motion information in the facial image, and further, the facial motion information of the human body and The expression images within the display container range, that is, within the display screen range of the terminal, are compared to determine the comparison result.

S2320: When the action meaning represented by the expression picture in the display container is the same as the human face action information, it is confirmed that the display container has an expression picture having the same action meaning as the human face action.

When the user's facial expression is the same as the action of one or more expression picture representations within the range of the display screen, it is confirmed that there is an expression picture having the same action meaning as the human face motion in the display container. The facial expression of the user is an expression represented by the facial motion information of the human body.

In some embodiments, when the facial expression action of the user, that is, the action meaning of the representation of the user's facial expression and the expression image is the same, it is also necessary to calculate the bonus score according to the matching degree of the two. For details, please refer to FIG. 7. FIG. 7 is a schematic flowchart of performing a reward by matching results in the embodiment.

As shown in FIG. 7, after step S2300, the following steps are further included:

S2411: Obtain matching degree information of the human face motion information and the expression image;

In the present embodiment, the analysis of the human face motion information operation is performed by the classification result of the convolutional neural network model, and the classification result output by the convolutional neural network model classification layer is the probability value of the facial image belonging to each classification result. The probability value may be in the range of 0-1. Correspondingly, the classification result corresponding to the face image may include a plurality of values generally between 0-1. For example, for example, the classification result is set as: four emotional results of emotions, and after the face image is input, [0.75 0.2 0.4 0.3] is obtained, and since 0.75 is the maximum value and is greater than the preset threshold value of 0.5, the person The classification result of the face image is "Hi". The matching degree information of the human facial motion information and the facial expression image expressing the emotion as "hi" is 0.75, that is, the similarity of the facial expression image and the facial expression image expressing the emotion as "hi" is 75%.

For example: the classification result is set as: four expressions of laugh, cry, frown and no expression. After inputting the obtained face image, [0.79 0.1 0.3 0.1] is obtained, because 0.79 is the maximum value and is greater than the preset. When the threshold is 0.5, the classification result of the face image is "laugh". Furthermore, the matching degree information of the human face motion information and the expression image with the meaning of “laughing” is 0.75, that is, the similarity of the characterized facial expression of the facial image and the expression image with the meaning of “laughing” is 75%. .

S2412. Calculate, according to a preset matching rule, a bonus score corresponding to the matching degree information.

The matching rule is a preset method of calculating a bonus score based on the matching degree information. For example, according to the matching degree information, the matching result is divided into “perfect, very good, good and missed”, wherein “perfect” is the classification result in the interval of 0.9-1.0; “very good” is the matching degree information. The classification result is in the range of 0.7-0.9; "good" is the classification result within the interval of 0.5-0.7; "missing" is the classification result with the matching degree information being 0.5 or less. And set a "perfect" match result score of 30 points; a "very good" match result score of 20 points; a "good" match result score of 10 points; a "miss" match result The score is 0.

The bonus score corresponding to the matching degree information is calculated according to a preset matching rule.

By matching the matching result by the matching degree information, the matching quality of the matching result can be further refined, and a more accurate bonus score can be obtained.

In some embodiments, the matching results are continuously recorded for a predetermined period of time, and the scores of the users within the duration are counted after the time is over. For details, refer to FIG. 8. FIG. 8 is a schematic flowchart of a statistical aliquot of the embodiment.

As shown in FIG. 8, after step S2412, the following steps are further included:

S2421: Record all the bonus points in the preset first time threshold;

In one implementation, the first time threshold is the length of time of the predetermined match game, for example, the length of time for setting a match game is 3 minutes. The setting of the specific time length is not limited thereto, and in some alternative embodiments, the time length of the first time threshold can be shorter or longer.

S2422, accumulating the bonus scores to form a final score of the user within the first time threshold.

The total score of the user's bonus score within the first time threshold is counted as the total score of the user participating in the match within the first time threshold.

That is, all the bonus scores recorded within the first time threshold are added, and the result obtained is added as the final score of the user participating in the matching within the first time threshold.

Please refer to FIG. 9. FIG. 9 is a schematic flowchart of analyzing emotion information of a face image according to an embodiment of the present invention.

As shown in FIG. 9, the method for analyzing the user's limb motion further includes the following steps:

S3100: randomly extract a preset number of expression images representing human emotions from the expression pack in a preset unit time, and place the expression image in the display container;

The unit time is the time to load a wave of emoticons into the display container. For example, the time for loading a wave of emoticons is 5 seconds, that is, the time when a wave of emoticons appears in the display container is 5 seconds, and a new wave of expressions after 5 seconds. The picture will be replaced. The emoticon image loaded per unit time can be preset, and the setting rule can be fixed. For example, each wave of emoticons in the unit time is added by 5 or not, and in some embodiments, According to the network state of the terminal, the setting is random, wherein the better the network state is, the larger the preset data is set: in other embodiments, the addition of the emoticon image can be incremental, and the increasing number is set according to the actual situation. Set, such as one, two or more increments at a time.

S3200: Collect a face image of the user in a timed or real time in the unit time, and identify the emotion information represented by the face image and the matching degree between the face image and the emotion information;

The user's face image is acquired in real time by the camera connected to or connected to the terminal, but is not limited thereto. In some embodiments, the face image can be extracted by a timing method (for example, 0.1 s). .

The emotional information of the face image is parsed and confirmed by the classification result of the convolutional neural network model. The classification result output by the convolutional neural network model classification layer is the probability value of the facial image belonging to each classification result, wherein the probability value The value range may be 0-1. Correspondingly, the classification result corresponding to the face image may include multiple values between 0-1. For example: the classification result is set as: four emotion results of emotions, anger and sorrow, after the face image is input, [0.75 0.2 0.4 0.3] is obtained, and since 0.75 is the maximum value and is greater than the preset threshold value of 0.5, the face is The classification result of the image is "Hi". According to the classification result, the expression image with the same emotion as “Hi” in the display container is determined, and the matching degree information of the emotion information and the expression image is 0.75, that is, the expression action represented by the face image is the similarity between the emotional action and the expression image. It is 75%.

S3300: Match an emoticon image having the same emotional meaning as the facial image, and confirm a bonus score of the facial image according to the matching degree.

The matching rule is a preset method of calculating a bonus score based on the matching degree information. For example, according to the matching degree information, the matching result is divided into “perfect, very good, good and missed”, wherein “perfect” is the classification result in the interval of 0.9-1.0; “very good” is the matching degree information. The classification result is in the range of 0.7-0.9; "good" is the classification result within the interval of 0.5-0.7; "missing" is the classification result with the matching degree information being 0.5 or less. And set a "perfect" match result score of 30 points; a "very good" match result score of 20 points; a "good" match result score of 10 points; a "miss" match result The score is 0.

The bonus score corresponding to the matching degree information is calculated according to a preset matching rule.

Please refer to FIG. 10. FIG. 10 is a schematic flowchart of the emotion information classification and the matching degree detection of the face image according to the embodiment.

As shown in FIG. 10, step 3200 specifically includes the following steps:

S3210: collecting a face image of the user;

The user's face image is acquired in real time by turning on the shooting device that is connected to or connected to the terminal, but is not limited thereto. In some embodiments, facial images can be acquired in a timed manner (e.g., 0.1 s).

S3220: Input the face image into a preset emotion recognition model, and obtain a classification result and classification data of the face image;

The emotion recognition model is specifically a convolutional neural network model trained to a convergent state.

In the embodiment, the technical solution for identifying the emotion information represented by the face image may be: performing the method by deep learning. Specifically, a large number of pictures including face images of the human body are collected as training samples, and subjective judgments of the emotion information expressed by the various face images are obtained, and the subjective meaning of the limb movements of each training sample is acquired, and the subjective meaning is obtained. The meaning is set to the expected output of the training sample. Then, the training sample is input into the convolutional neural network model, and the feature data of the training sample is extracted, and the classification data of the training sample is output, and the classification data is the probability value of the training sample belonging to each classification result in the current training. The classification result in the embodiment is the name of the different emotion information. The classification result with the largest probability value and greater than the preset measurement threshold is the excitation output of the training sample in the current round of training. Comparing whether the desired output is consistent with the excitation output, and the training ends when the expected output is consistent with the excitation output; when the expected output is inconsistent with the excitation output, correcting the weight of the convolutional neural network by the back propagation algorithm to adjust the output result . After adjusting the weight of the convolutional neural network, the training samples are re-inputted and cycled until the desired output is consistent with the excitation output.

Among them, the classification result can be set according to the demand. The classification result can be several according to the complexity of the output, and the more the classification result, the higher the complexity of the training.

In one case, it is sometimes necessary to repeatedly input to verify the stability of the output, and when the stability is good, the training is ended. That is to say: in order to ensure the stability of the output of the convolutional neural network, the training samples can be repeatedly input to the convolutional neural network until the excitation output of the convolutional neural network for each training sample, the expectation corresponding to the training sample After the output of the same probability exceeds the preset value, the training ends.

In one case, the expected output is consistent with the excitation output, and may refer to: the number of training samples whose corresponding excitation output is consistent with the corresponding desired output, not less than the preset number. Correspondingly, the expected output is inconsistent with the excitation output, and may refer to: the number of training samples whose corresponding excitation output is consistent with the corresponding desired output is less than the preset number.

In another case, the expected output is consistent with the excitation output, and may refer to: the ratio of the training samples whose corresponding excitation output is consistent with the corresponding expected output, and the ratio of the total number of training samples is not less than the preset ratio. Correspondingly, the expected output is inconsistent with the excitation output, and may refer to: the ratio of the training samples whose corresponding excitation output is consistent with the corresponding expected output, and the ratio of the total number of training samples is less than the preset ratio.

S3230. Determine emotion information of the face image according to the classification result, and determine a matching degree between the face image and the emotion information according to the classification data.

The analysis of the human facial motion information action is confirmed by the classification result of the convolutional neural network model, and the classification result output by the convolutional neural network model classification layer is the probability value of the facial image belonging to each classification result, wherein the probability The value may be in the range of 0-1. Correspondingly, the classification result corresponding to the face image may include multiple values between 0-1. For example: the classification result is set as: four emotion results of emotions, anger and sorrow, after the face image is input, [0.75 0.2 0.4 0.3] is obtained, and since 0.75 is the maximum value and is greater than the preset threshold value of 0.5, the face is The classification result of the image is "Hi". The matching degree information of the human facial motion information and the facial expression image expressing the emotion as "hi" is 0.75, that is, the similarity of the facial expression image and the facial expression image expressing the emotion as "hi" is 75%.

For example: the classification result is set as: four expressions of laugh, cry, frown and no expression. After inputting the obtained face image, [0.79 0.1 0.3 0.1] is obtained, because 0.79 is the maximum value and is greater than the preset. When the threshold is 0.5, the classification result of the face image is "laugh". Furthermore, the matching degree information of the human facial motion information and the facial expression image having the meaning of “laughing” is 0.75, that is, the similarity between the facial expression image represented by the facial image and the facial expression image expressing the “laughing” is 75%.

Please refer to FIG. 11. FIG. 11 is a schematic diagram showing a third embodiment of a method for analyzing a user's limb motion in the present embodiment.

As shown in FIG. 11, the self-portrait image of the user can be simultaneously displayed in the display area of the terminal, and the emoticon image is displayed in the screen. The user imitates the same expression action according to the displayed emoticon picture, and the terminal detects whether the emotic expression is the same as an emoticon picture in the display area. When the matching is the same, the matching expression is enlarged and displayed, and the corresponding reward is displayed according to the matching degree. Score.

The above-mentioned imitation expression is: the user imitates the expression when performing the same expression action according to the displayed emoticon picture.

To solve the above technical problem, the embodiment of the present application further provides an analysis system for a user's limb motion. For details, please refer to FIG. 12. FIG. 12 is a basic structural block diagram of an analysis system for a user's limb motion according to the embodiment.

As shown in FIG. 12, an analysis system for a user's limb motion includes an acquisition module 2100, a processing module 2200, and an execution module 2300. The obtaining module 2100 is configured to acquire a limb image of the user; the processing module 2200 is configured to identify the body language of the limb image representation; and the executing module 2300 is configured to match the visual information or the audio information having the same meaning as the body language.

The above embodiment recognizes the body language represented by the limb image of the user in the picture and matches the visual information or audio information having the same meaning as the body language. In this way, the information expressed by the limb features in the image is presented in a way that can be directly interpreted by humans, thereby realizing a deep interpretation of the human body movements, facilitating communication between the language disabled or the languageless users.

In some embodiments, the analysis system of the user's limb motion further includes: a first acquisition sub-module, a first processing sub-module, and a first execution sub-module. The first obtaining sub-module is configured to acquire a facial image of the user; the first processing sub-module is configured to identify human facial motion information represented by the facial image; and the first executing sub-module is configured to match the facial motion information of the human body. An emoticon image with the same action meaning.

In some embodiments, the acquiring module includes: a first acquiring sub-module, configured to acquire a face image of the user; and the processing module includes: a first processing sub-module, configured to identify the facial image representation The human face motion information; the execution module includes: a first execution sub-module, configured to match an emoticon image having the same action meaning as the human facial motion information.

In some embodiments, the analysis system of the user's limb motion further includes: a first calling sub-module and a first display sub-module. The first calling sub-module is configured to call the pre-stored at least one emoticon image; the first display sub-module is configured to place the emoticon image in the display container according to the preset script, so that the emoticon image is visually displayed.

In some embodiments, the first calling sub-module is configured to call the pre-stored at least one emoticon image before the acquiring the user's facial image; the first display sub-module is configured to place the emoticon image in the display container according to the preset script. Inside, to make the emoticon visual display.

In some embodiments, the analysis system of the user's limb motion further includes: a first comparison sub-module and a first confirmation sub-module. The first comparison sub-module is configured to compare the facial motion information of the human body with the facial expression image in the range of the display container; the first confirmation sub-module is used for the meaning of the action represented by the expression image in the display container. When the human face motion information is the same, it is confirmed that there is an expression picture having the same action meaning as the human face motion in the display container.

In some embodiments, the analysis system of the user's limb motion further includes: a second acquisition sub-module and a second execution sub-module. The second obtaining submodule and the second executing submodule. The second obtaining sub-module is configured to obtain the matching degree information of the human facial action information and the emoticon image; and the second executing sub-module is configured to calculate the bonus score corresponding to the matching degree information according to the preset matching rule.

In some embodiments, after acquiring the emoticon image having the same action meaning as the human facial motion, the second obtaining sub-module acquires matching degree information of the human facial motion information and the emoticon image; the second executive The module is configured to calculate a bonus score corresponding to the matching degree information according to a preset matching rule.

In some embodiments, the analysis system of the user's limb motion further includes: a first recording sub-module and a third execution sub-module. The first recording sub-module is configured to record all the bonus scores in the preset first time threshold; the third execution sub-module is configured to accumulate the bonus scores to form a final score of the user within the first time threshold.

In some embodiments, the analysis system of the user's limb motion further includes: a first recording sub-module and a third execution sub-module. The first recording sub-module is configured to: after calculating the bonus score corresponding to the matching degree information according to the preset matching rule, record all the bonus scores in the preset first time threshold; and use the third execution sub-module The bonus scores are summed to form the final score of the user within the first time threshold.

In some embodiments, the analysis system of the user's limb motion further includes: a third acquisition sub-module, a second processing sub-module, and a fourth execution sub-module. The third obtaining sub-module is configured to randomly extract a preset number of emoticons representing human emotions from the emoticon package in a preset unit time, and place the emoticon images in the display container; the second processing sub-module is used for The user's face image is collected in a unit time or in real time, and the emotion information represented by the face image and the matching degree between the face image and the emotion information are recognized; the fourth execution sub-module is used to match the same emotion with the face image. The expression of the meaning of the image, and the bonus score of the face image is confirmed according to the matching degree.

In some embodiments, the analysis system of the user's limb motion further includes: a third acquisition sub-module, configured to randomly extract a preset number of expression images representing human emotions from the expression package in a preset unit time, and The embedding image is placed in the display container; the first obtaining sub-module is configured to collect the facial image of the user in a timed or real-time manner in the unit time; the first processing sub-module is configured to identify the facial image The characterized emotion information, and the matching degree between the face image and the emotion information; the first execution sub-module is configured to match the expression image having the same emotional meaning as the face image, and confirm according to the matching degree The bonus score of the face image.

In some embodiments, the analysis system of the user's limb motion further includes: a first collection sub-module, a third processing sub-module, and a fifth execution sub-module, wherein the first collection sub-module is configured to collect a user's face image; The third processing sub-module is configured to input the facial image into the preset emotion recognition model, and obtain the classification result and the classification data of the facial image; the fifth execution sub-module is configured to determine the emotional information of the facial image according to the classification result, And matching the face image with the emotion information according to the classification data.

In some embodiments, the first processing sub-module is configured to input the facial image into a preset emotion recognition model, and acquire a classification result and classification data of the facial image; according to the classification As a result, the emotion information of the face image is determined, and the degree of matching between the face image and the emotion information is determined according to the classification data.

The terminal in this embodiment refers to the mobile terminal and the PC end, and the mobile terminal is taken as an example for description.

This embodiment also provides a mobile terminal. For details, refer to FIG. 13. FIG. 13 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.

It should be noted that in the present embodiment, all the programs in the parsing method for implementing the user's limb motion in the embodiment are stored in the memory 1520 of the mobile terminal, and the processor 1580 can call the program in the memory 1520 to execute the user limb. All the functions listed in the analysis method of the action. The method for analyzing the user's limb motion in the present embodiment is described in detail in the function implemented by the mobile terminal, and details are not described herein.

The embodiment of the present application further provides a mobile terminal. As shown in FIG. 13 , for the convenience of description, only the parts related to the embodiment of the present application are shown. For details that are not disclosed, refer to the method part of the embodiment of the present application. The terminal may be any terminal device including a mobile terminal, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), an in-vehicle computer, and the terminal is a mobile terminal as an example:

FIG. 13 is a block diagram showing a partial structure of a mobile terminal related to a terminal provided by an embodiment of the present application. Referring to FIG. 13, the mobile terminal includes: a radio frequency (RF) circuit 1510, a memory 1520, an input unit 1530, a display unit 1540, a sensor 1550, an audio circuit 1560, a wireless fidelity (Wi-Fi) module 1570, The processor 1580, and the power supply 1590 and the like. It will be understood by those skilled in the art that the mobile terminal structure shown in FIG. 13 does not constitute a limitation of the mobile terminal, and may include more or less components than those illustrated, or some components may be combined, or different component arrangements.

The following describes the components of the mobile terminal in detail with reference to FIG. 13:

The RF circuit 1510 can be used for receiving and transmitting signals during the transmission or reception of information or during a call. Specifically, after receiving the downlink information of the base station, the processing is processed by the processor 1580. In addition, the data designed for the uplink is sent to the base station. Generally, RF circuit 1510 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, RF circuitry 1510 can also communicate with the network and other devices via wireless communication. The above wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division). Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), and the like.

The memory 1520 can be used to store software programs and modules, and the processor 1580 executes various functional applications and data processing of the mobile terminal by running software programs and modules stored in the memory 1520. The memory 1520 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a voiceprint playing function, an image playing function, etc.), and the like; the storage data area may be stored. Data created according to the use of the mobile terminal (such as audio data, phone book, etc.). Moreover, memory 1520 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 1530 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the mobile terminal. Specifically, the input unit 1530 may include a touch panel 1531 and other input devices 1532. The touch panel 1531, also referred to as a touch screen, can collect touch operations on or near the user (such as the user using a finger, a stylus, or the like on the touch panel 1531 or near the touch panel 1531. Operation), and drive the corresponding connecting device according to a preset program. Optionally, the touch panel 1531 may include two parts: a touch detection device and a touch controller. Wherein, the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information. The processor 1580 is provided and can receive commands from the processor 1580 and execute them. In addition, the touch panel 1531 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves. In addition to the touch panel 1531, the input unit 1530 may also include other input devices 1532. Specifically, other input devices 1532 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.

The display unit 1540 can be used to display information input by the user or information provided to the user as well as various menus of the mobile terminal. The display unit 1540 can include a display panel 1541. Alternatively, the display panel 1541 can be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 1531 may cover the display panel 1541. After the touch panel 1531 detects a touch operation on or near the touch panel 1531, the touch panel 1531 transmits to the processor 1580 to determine the type of the touch event, and then the processor 1580 according to the touch event. The type provides a corresponding visual output on display panel 1541. Although the touch panel 1531 and the display panel 1541 are used as two independent components to implement the input and input functions of the mobile terminal in FIG. 13, in some embodiments, the touch panel 1531 and the display panel 1541 may be integrated. And realize the input and output functions of the mobile terminal.

The mobile terminal can also include at least one type of sensor 1550, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1541 according to the brightness of the ambient light, and the proximity sensor may close the display panel 1541 when the mobile terminal moves to the ear. / or backlight. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity. It can be used to identify the attitude of the mobile terminal (such as horizontal and vertical screen switching, Related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as well as other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which can be configured in mobile terminals, No longer.

An audio circuit 1560, a speaker 1561, and a microphone 1562 can provide an audio interface between the user and the mobile terminal. The audio circuit 1560 can transmit the converted electrical data of the received audio data to the speaker 1561, and convert it into a voiceprint signal output by the speaker 1561. On the other hand, the microphone 1562 converts the collected voiceprint signal into an electrical signal by the audio. Circuit 1560 is converted to audio data upon reception, processed by audio data output processor 1580, transmitted via RF circuitry 1510 to, for example, another mobile terminal, or output audio data to memory 1520 for further processing.

Wi-Fi is a short-range wireless transmission technology. The mobile terminal can help users to send and receive e-mail, browse web pages and access streaming media through the Wi-Fi module 1570. It provides users with wireless broadband Internet access. Although FIG. 13 shows the Wi-Fi module 1570, it can be understood that it does not belong to the essential configuration of the mobile terminal, and may be omitted as needed within the scope of not changing the essence of the invention.

The processor 1580 is a control center of the mobile terminal that connects various portions of the entire mobile terminal using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 1520, and recalling data stored in the memory 1520. The mobile terminal performs various functions and processing data to perform overall monitoring on the mobile terminal. Optionally, the processor 1580 may include one or more processing units; preferably, the processor 1580 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like. The modem processor primarily handles wireless communications. It will be appreciated that the above described modem processor may also not be integrated into the processor 1580.

The mobile terminal also includes a power source 1590 (such as a battery) for powering various components. Preferably, the power source can be logically coupled to the processor 1580 through a power management system to manage functions such as charging, discharging, and power management through the power management system.

Although not shown, the mobile terminal may further include a camera, a Bluetooth module, and the like, and details are not described herein again.

Corresponding to the above method embodiment, the embodiment of the present application provides a computer readable storage medium, where the computer readable storage medium stores a computer program, and when the computer program is executed by the processor, the implementation of the embodiment of the present application is implemented. The method of analyzing the method of the user's limb movement as described above.

The embodiment of the present application recognizes the body language represented by the limb image of the user in the picture, and matches the visual information or the audio information having the same meaning as the body language. In this way, the information expressed by the limb features in the image is presented in a way that can be directly interpreted by humans, thereby realizing a deep interpretation of the human body movements, facilitating communication between the language disabled or the languageless users.

Corresponding to the above method embodiment, the embodiment of the present application provides a computer program product, when it is run on a computer, causing the computer to perform the step of analyzing the user's limb motion according to any one of the foregoing embodiments provided by the embodiment of the present application. .

The embodiment of the present application recognizes the body language represented by the limb image of the user in the picture, and matches the visual information or the audio information having the same meaning as the body language. In this way, the information expressed by the limb features in the image is presented in a way that can be directly interpreted by humans, thereby realizing a deep interpretation of the human body movements, facilitating communication between the language disabled or the languageless users.

It should be noted that the preferred embodiments of the present application are given in the specification of the present application and the accompanying drawings. However, the present application can be implemented in many different forms, and is not limited to the embodiments described in the specification. The examples are not intended to be limiting as to the scope of the present application, and the embodiments are provided to make the understanding of the disclosure of the present application more comprehensive. Further, each of the above technical features is further combined with each other to form various embodiments that are not enumerated above, and are considered to be within the scope of the specification of the present application; further, those skilled in the art can improve or change according to the above description. All such improvements and modifications are intended to fall within the scope of the appended claims.

Claims (19)

  1. An analytical method for a user's limb movement, comprising the steps of:
    Obtaining a limb image of the user;
    Identifying the body language of the limb image representation;
    Matching visual or audio information having the same meaning as the body language.
  2. The method for analyzing a limb motion of a user according to claim 1, wherein the step of acquiring a limb image of the user comprises:
    Obtaining a face image of the user;
    The step of identifying the body language of the limb image representation includes:
    Identifying human facial motion information represented by the facial image;
    The step of matching the visual information or the audio information having the same meaning as the body language includes:
    An emoticon image having the same action meaning as the human facial motion information is matched.
  3. The method for analyzing a user's limb motion according to claim 2, wherein the step of acquiring a face image of the user further comprises the following steps:
    Retrieving at least one of the pre-stored emoticons;
    The emoticon image is placed in the display container according to a preset script to visually display the emoticon image.
  4. The method for analyzing a user's limb motion according to claim 3, wherein the step of matching the emoticon image having the same action meaning as the human facial motion comprises the following steps:
    Comparing the human facial motion information with an expression image within the display container;
    When the action meaning represented by the expression picture in the display container is the same as the human face action information, it is confirmed that the display container has an expression picture having the same action meaning as the human face action.
  5. The method for analyzing a user's limb motion according to claim 3, wherein the step of matching the emoticon image having the same action meaning as the human facial motion further comprises the following steps:
    Obtaining matching degree information of the human facial action information and the emoticon image;
    Calculating a bonus score corresponding to the matching degree information according to a preset matching rule.
  6. The method for analyzing a user's limb motion according to claim 5, wherein the step of calculating the bonus score corresponding to the matching degree information according to the preset matching rule further comprises the following steps:
    Recording all the bonus points within the preset first time threshold;
    The bonus scores are summed to form a final score for the user within the first time threshold.
  7. The method for analyzing a user's limb movement according to claim 2, wherein the method for analyzing the user's limb motion further comprises the following steps:
    Extracting a preset number of expression images representing human emotions from the expression packs in a preset unit time, and placing the expression images in the display container;
    The step of acquiring a face image of the user includes:
    Collecting a user's face image in a timed or real time in the unit time;
    And the step of identifying the facial motion information of the facial image representation includes:
    Identifying the emotion information represented by the face image, and matching the face image with the emotion information;
    The step of matching an emoticon image having the same action meaning as the human facial action information includes:
    An emoticon image having the same emotional meaning as the facial image is matched, and a bonus score of the facial image is confirmed according to the matching degree.
  8. The method for analyzing a user's limb motion according to claim 7, wherein the step of identifying the emotion information represented by the face image and the matching degree between the face image and the emotion information comprises: :
    Inputting the face image into a preset emotion recognition model, and acquiring classification result and classification data of the face image;
    Determining the emotion information of the face image according to the classification result, and determining a matching degree of the face image and the emotion information according to the classification data.
  9. An analytical system for a user's limb movement, comprising:
    An acquisition module, configured to acquire a limb image of the user;
    a processing module for identifying a body language of the limb image representation;
    An execution module for matching visual information or audio information having the same meaning as the body language.
  10. The analysis system of the user's limb movement according to claim 9, wherein the acquisition module comprises: a first acquisition sub-module, configured to acquire a face image of the user;
    The processing module includes: a first processing submodule, configured to identify human facial motion information represented by the facial image;
    The execution module includes: a first execution sub-module, configured to match an emoticon image having the same action meaning as the human facial action information.
  11. The analysis system of the user's limb movement according to claim 10, wherein the analysis system of the user's limb movement further comprises:
    a first calling submodule, configured to call at least one of the pre-stored emoticons;
    The first display sub-module is configured to place the emoticon image in a display container according to a preset script, so that the emoticon image is visually displayed.
  12. The analysis system of the user's limb movement according to claim 10, wherein the analysis system of the user's limb movement further comprises:
    a first comparison sub-module, configured to compare the human facial motion information with an expression image in a range of the display container;
    a first confirmation sub-module, configured to confirm that the display container has the same action as the human face motion when the action meaning represented by the expression picture in the display container is the same as the human face motion information The expression of the meaning of the image.
  13. The analysis system of the user's limb movement according to claim 10, wherein the analysis system of the user's limb movement further comprises:
    a second obtaining submodule, configured to acquire matching degree information of the human facial action information and the emoticon image;
    The second execution sub-module is configured to calculate a bonus score corresponding to the matching degree information according to a preset matching rule.
  14. The analysis system of the user's limb movement according to claim 10, wherein the analysis system of the user's limb movement further comprises:
    a first recording sub-module, configured to record all the bonus points in the preset first time threshold;
    And a third execution sub-module, configured to accumulate the bonus scores to form a final score of the user within the first time threshold.
  15. The analysis system of the user's limb movement according to claim 9, wherein the analysis system of the user's limb movement further comprises:
    a third obtaining sub-module, configured to randomly extract a preset number of emoticons representing human emotions from the emoticon package in a preset unit time, and place the emoticon images in a display container;
    a first acquiring submodule, configured to collect a face image of the user in a timed or real time in the unit time;
    a first processing sub-module, configured to identify emotion information represented by the face image, and a matching degree between the face image and the emotion information;
    And a first execution submodule, configured to match an emoticon image having the same emotional meaning as the facial image, and confirm a bonus score of the facial image according to the matching degree.
  16. The analysis system of the user's limb movement according to claim 15, wherein the first processing sub-module is configured to
    Inputting the face image into a preset emotion recognition model, and acquiring classification result and classification data of the face image;
    Determining the emotion information of the face image according to the classification result, and determining a matching degree of the face image and the emotion information according to the classification data.
  17. A mobile terminal, comprising:
    One or more processors;
    Memory
    One or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to execute rights The method for analyzing a user's limb movement as described in any one of claims 1-8.
  18. A computer readable storage medium, wherein the computer readable storage medium stores a computer program, the computer program being executed by a processor to implement the user's limb motion according to any one of claims 1-8 Analytic method.
  19. A computer program product, characterized in that, when run on a computer, the computer is caused to perform the method of analyzing the user's limb motion as claimed in any one of claims 1-8.
PCT/CN2018/116700 2017-12-28 2018-11-21 Analysis method and system of user limb movement and mobile terminal WO2019128558A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711464338.2A CN108062533A (en) 2017-12-28 2017-12-28 Analytic method, system and the mobile terminal of user's limb action
CN201711464338.2 2017-12-28

Publications (1)

Publication Number Publication Date
WO2019128558A1 true WO2019128558A1 (en) 2019-07-04

Family

ID=62140685

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116700 WO2019128558A1 (en) 2017-12-28 2018-11-21 Analysis method and system of user limb movement and mobile terminal

Country Status (2)

Country Link
CN (1) CN108062533A (en)
WO (1) WO2019128558A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062533A (en) * 2017-12-28 2018-05-22 北京达佳互联信息技术有限公司 Analytic method, system and the mobile terminal of user's limb action

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101314081A (en) * 2008-07-11 2008-12-03 深圳华为通信技术有限公司 Lecture background matching method and apparatus
CN104333730A (en) * 2014-11-26 2015-02-04 北京奇艺世纪科技有限公司 Video communication method and video communication device
CN105976843A (en) * 2016-05-18 2016-09-28 乐视控股(北京)有限公司 In-vehicle music control method, device, and automobile
CN108062533A (en) * 2017-12-28 2018-05-22 北京达佳互联信息技术有限公司 Analytic method, system and the mobile terminal of user's limb action

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101442861B (en) * 2008-12-19 2013-01-02 上海广茂达光艺科技股份有限公司 Control system and control method for LED lamplight scene
CN103842941B (en) * 2011-09-09 2016-12-07 泰利斯航空电子学公司 Gesticulate action in response to the passenger sensed and perform the control of vehicle audio entertainment system
CN104349214A (en) * 2013-08-02 2015-02-11 北京千橡网景科技发展有限公司 Video playing method and device
CN104345873A (en) * 2013-08-06 2015-02-11 北大方正集团有限公司 File operation method and file operation device for network video conference system
CN104464390A (en) * 2013-09-15 2015-03-25 南京大五教育科技有限公司 Body feeling education system
CN104598012B (en) * 2013-10-30 2017-12-05 中国艺术科技研究所 A kind of interactive advertising equipment and its method of work
CN106257489A (en) * 2016-07-12 2016-12-28 乐视控股(北京)有限公司 Expression recognition method and system
CN106502424A (en) * 2016-11-29 2017-03-15 上海小持智能科技有限公司 Based on the interactive augmented reality system of speech gestures and limb action
CN106997457A (en) * 2017-03-09 2017-08-01 广东欧珀移动通信有限公司 Human limbs recognition methods, human limbs identifying device and electronic installation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101314081A (en) * 2008-07-11 2008-12-03 深圳华为通信技术有限公司 Lecture background matching method and apparatus
CN104333730A (en) * 2014-11-26 2015-02-04 北京奇艺世纪科技有限公司 Video communication method and video communication device
CN105976843A (en) * 2016-05-18 2016-09-28 乐视控股(北京)有限公司 In-vehicle music control method, device, and automobile
CN108062533A (en) * 2017-12-28 2018-05-22 北京达佳互联信息技术有限公司 Analytic method, system and the mobile terminal of user's limb action

Also Published As

Publication number Publication date
CN108062533A (en) 2018-05-22

Similar Documents

Publication Publication Date Title
US9690982B2 (en) Identifying gestures or movements using a feature matrix that was compressed/collapsed using principal joint variable analysis and thresholds
US9685161B2 (en) Method for updating voiceprint feature model and terminal
CN106030440B (en) Intelligent circulation audio buffer
CN105320726B (en) Reduce the demand to manual beginning/end point and triggering phrase
US10373616B2 (en) Interaction with a portion of a content item through a virtual assistant
US20150338917A1 (en) Device, system, and method of controlling electronic devices via thought
US20080111710A1 (en) Method and Device to Control Touchless Recognition
US20160042228A1 (en) Systems and methods for recognition and translation of gestures
CN107430501A (en) The competition equipment responded to speech trigger
CN107615378A (en) Equipment Voice command
US8793118B2 (en) Adaptive multimodal communication assist system
CN105389107A (en) Electronic touch communication
US20080027984A1 (en) Method and system for multi-dimensional action capture
KR20100062207A (en) Method and apparatus for providing animation effect on video telephony call
CN105126355A (en) Child companion robot and child companioning system
WO2015127825A1 (en) Expression input method and apparatus and electronic device
CN108292203A (en) Active assistance based on equipment room conversational communication
US20140212854A1 (en) Multi-modal modeling of temporal interaction sequences
CN101808047A (en) Instant messaging partner robot and instant messaging method with messaging partner
WO2014190178A2 (en) Method, user terminal and server for information exchange communications
TW201033819A (en) Instant communication interacting system and method thereof
WO2014000645A1 (en) Interacting method, apparatus and server based on image
TW201032190A (en) Speech interactive system and method
CN103902046B (en) Intelligent prompt method and terminal
US9734730B2 (en) Multi-modal modeling of temporal interaction sequences

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18896894

Country of ref document: EP

Kind code of ref document: A1