CN113723284A

CN113723284A - Information generation method, terminal device and storage medium

Info

Publication number: CN113723284A
Application number: CN202111004868.5A
Authority: CN
Inventors: 李希加
Original assignee: Weikun Shanghai Technology Service Co Ltd
Current assignee: Weikun Shanghai Technology Service Co Ltd
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2021-11-30

Abstract

The application is applicable to the technical field of artificial intelligence, and provides an information generation method, terminal equipment and a storage medium, wherein the method comprises the following steps: acquiring training process information according to a preset acquisition period, wherein the training process information comprises training video images; detecting key points of a human body on the training video image to obtain a plurality of human body area graphs and key point coordinates of the human body presented by each human body area graph; determining target actions executed by the corresponding human bodies in the training process and the execution times of the target actions according to the change condition of the key point coordinates of the human bodies in the whole training process along with time; determining a training atmosphere score according to the target action executed by each human body and the execution times of the target action, wherein the training atmosphere score is used for indicating the concentration degree of the trainees on the training; based on the training atmosphere score, evaluation result information for the current training course is generated.

Description

Information generation method, terminal device and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to an information generating method, a terminal device, and a storage medium.

Background

At present, organizations such as enterprises or schools often hold various training courses, and in practice, some information related to the training courses is generally required to be collected to evaluate the effect of the training courses. For example, all trainees participating in the training course can fill in the training effect feedback table to realize the evaluation of the effect of the training course.

In the related art, the mode of filling the form by all students who participate in the training course is adopted to collect the information related to the training course, so as to realize the evaluation of the effect of the training course, the evaluation mode needs to occupy more time of the students, the evaluation mode is also very subjective and completely depends on the manual scoring of the students, when part of the students are in trouble due to suspicion, and do not want to participate in the evaluation scoring, the collected feedback information is incomplete, and the efficiency and the accuracy of evaluating the effect of the training course are low.

Disclosure of Invention

In view of this, embodiments of the present application provide an information generating method, a terminal device, and a storage medium, so as to solve the problem in the related art that efficiency and accuracy of evaluating an effect of a training course are not high.

A first aspect of an embodiment of the present application provides an information generating method, including:

acquiring training process information according to a preset acquisition period, wherein the training process information comprises training video images;

detecting key points of a human body on the training video image to obtain a plurality of human body area graphs and key point coordinates of the human body presented by each human body area graph;

determining target actions executed by the corresponding human bodies in the training process and the execution times of the target actions according to the change condition of the key point coordinates of the human bodies in the whole training process along with time;

determining a training atmosphere score according to the target action executed by each human body and the execution times of the target action, wherein the training atmosphere score is used for indicating the concentration degree of the trainees on the training;

based on the training atmosphere score, evaluation result information for the current training course is generated.

Further, according to the change situation of the key point coordinates of each human body in the whole training process along with time, the target action executed by the corresponding human body in the training process and the execution times of the target action are determined, and the method comprises the following steps:

generating a left wrist coordinate curve and a right wrist coordinate curve for each human body under a preset coordinate system, wherein the horizontal axis of the preset coordinate system is a time axis, and the vertical axis of the preset coordinate system is the key point coordinates of corresponding key points at different times;

and determining the intersection points of the left wrist coordinate curve and the right wrist coordinate curve of each human body, and determining whether the human body executes the clapping action and the execution times of the clapping action according to the distribution intervals of the intersection points on the time axis.

Further, determining whether the human body performs the clapping action and the execution times of the clapping action according to the distribution interval of each intersection point on the time axis, comprising:

calculating the time interval between every two adjacent intersection points, and dividing the two adjacent intersection points with the time interval larger than a preset time length threshold into two different intersection point groups to obtain a plurality of intersection point groups;

and if the number of the intersection points is larger than the preset number and the time interval between every two adjacent intersection points belongs to the preset time interval range, determining that the intersection point group corresponds to one clapping action.

generating relative position difference curves of the upper lip and the lower lip aiming at each human body, wherein the relative position difference curves of the upper lip and the lower lip are used for describing the relative position difference between the key point coordinates of the upper lip and the key point coordinates of the lower lip;

and determining whether the corresponding human body executes the response action and the execution times of the response action according to the value distribution and the continuous change duration of the relative position difference curve of the upper lip and the lower lip.

Further, according to the value distribution and the duration of the relative position difference curve of the upper and lower lips, whether the corresponding human body executes the response action or not and the execution times of the response action are determined, including:

aiming at each human body, determining a plurality of continuously changing areas in a relative position difference curve of an upper lip and a lower lip corresponding to the human body, and if any area meets a preset response condition, determining that the area corresponds to a response action;

wherein the preset response condition comprises at least one of the following items:

the continuous change time length of the area is longer than the preset change time length;

within a preset reference time before the region, the coordinate value of at least one wrist key point of the human body is higher than or equal to the coordinate value of an eye key point of the human body;

the face plane of this human body is parallel with lecturer's face plane in this region, and wherein, face plane is the plane that left eye key point, right eye key point and mouth key point formed.

Further, if the training process information further includes training audio information, the method further includes:

and determining the instructor audio score of the instructor according to training audio information of different time periods in the training process, wherein the instructor audio score is used for indicating the pronunciation standard degree of the instructor, and the preset audio information in the preset audio information set corresponds to an audio score.

Further, determining the instructor audio score of the instructor according to training audio information of different time periods in the training process, wherein the method comprises the following steps:

and extracting audio features of training audio information at different time intervals in the training process to obtain a plurality of audio features.

Determining training audio information belonging to the lecturer from the extracted training audio information based on the plurality of audio features, and recording the training audio information as target audio information;

and selecting preset audio information matched with the target audio information from the preset audio information set, and determining the audio score corresponding to the selected preset audio information as the instructor audio score of the instructor.

Further, based on the training atmosphere score, generating evaluation result information for the current training course, including:

and determining the comprehensive score of the current training course according to the weighted sum of the training atmosphere score, the teacher audio score and the student attendance score, and generating evaluation result information aiming at the current training course according to the comprehensive score, wherein the student attendance score is the ratio of the number of the actual attendance students to the number of expected attendance students.

A second aspect of an embodiment of the present application provides an information generating apparatus, including:

the information acquisition unit is used for acquiring training process information according to a preset acquisition cycle, wherein the training process information comprises training video images;

the model application unit is used for detecting key points of the human body of the training video image to obtain a plurality of human body area diagrams and key point coordinates of the human body presented by each human body area diagram;

the action determining unit is used for determining target actions executed by the corresponding human bodies in the training process and the execution times of the target actions according to the change condition of the key point coordinates of the human bodies in the whole training process along with time;

a score determination unit for determining a training atmosphere score indicating a concentration level of a trainee on training, based on a target motion executed by each human body and the number of times of execution of the target motion;

and the result generation unit is used for generating evaluation result information aiming at the current training course based on the training atmosphere score.

A third aspect of embodiments of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the terminal device, where the processor implements the steps of the information generating method provided in the first aspect when executing the computer program.

A fourth aspect of the embodiments of the present application provides a storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the information generating method provided by the first aspect.

The information generation method, the terminal device and the storage medium provided by the embodiment of the application have the following beneficial effects: by collecting the training process information and automatically analyzing the training process information, a training atmosphere score for indicating the concentration degree of the trainee on the training can be obtained, so that evaluation result information for the current training course is generated based on the training atmosphere score. Because the concentration degree of the trainees in the training process can accurately reflect the training effect of the whole training course, the evaluation result information for accurately evaluating the training effect can be generated, and the efficiency and the accuracy for evaluating the effect of the training course are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the embodiments or the related technical descriptions will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of an implementation of an information generating method according to an embodiment of the present application;

FIG. 2 is a flow chart of another implementation of an information generating method provided in an embodiment of the present application;

fig. 3 is a block diagram of an information generating apparatus according to an embodiment of the present application;

fig. 4 is a block diagram of a terminal device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

In the embodiment of the application, the effect of the training course is evaluated based on an artificial intelligence technology.

The information generation method according to the embodiment of the present application may be executed by a terminal device. When the information generation method is executed by the terminal device, the execution subject is the terminal device.

It should be noted that the terminal device may include, but is not limited to, a server, a mobile phone, a tablet, a wearable smart device, or the like. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform.

Referring to fig. 1, fig. 1 shows a flowchart of an implementation of an information generating method provided in an embodiment of the present application, including:

step 101, collecting training process information according to a preset collection period.

The preset acquisition period is usually a preset period value, and may be 1 second as an example. The training process information is typically information generated by a training process, and the training process information may be used to describe the training process. The training process information may include, but is not limited to, training video images. The training video images are typically images of all trainees during the training process.

In practical applications, the training process information may further include training audio information.

Here, the terminal device may collect the training process information according to a preset collection period.

Step 102, detecting key points of the human body on the training video image to obtain a plurality of human body area diagrams and key point coordinates of the human body presented by each human body area diagram.

The human body region map is usually the region of the training video image where the trainee is present. When there are multiple trainees, multiple body area maps, each for a trainee, are typically detected from the training video images.

In practice, the executing agent may input the training video image into a human key point detection model trained in advance to obtain a plurality of human region maps and key point coordinates of the human body represented by each human region map.

The human body key point detection model can be used for analyzing the corresponding relation between the image and each human body area graph in the image and the coordinates of the key points of the human body presented by the human body area graph. In practice, the human body keypoint detection model may be a model obtained by training an initial model (e.g., a Convolutional Neural Network (CNN), a residual error Network (ResNet), etc.) based on a training sample by using a machine learning method.

Here, the terminal device may perform human body key point detection on each frame of training video image to obtain a plurality of human body region maps corresponding to each frame of training video image and key point coordinates of a human body represented by each human body region map.

For example, if there are 4 trainees participating in the training course, which are a, B, C, and D, respectively, human body key point detection is performed on the training video image, and 4 human body region maps, A, B, C, D respectively, can be obtained, if the human body region map corresponding to the a is a, the human body region map corresponding to the B is B, the human body region map corresponding to the C is C, and the human body region map corresponding to the D is D, then the coordinates of each key point of the a presented in the human body region map a, the coordinates of each key point of the B presented in the human body region map B, the coordinates of each key point of the C presented in the human body region map C, and the coordinates of each key point of the D presented in the human body region map D can be obtained.

Since the seat positions of the trainees in the training process are usually fixed, for example, the relative position relations among the training video images of the trainees at different moments are usually fixed, the terminal device can identify the same human body in different training video images based on the human body region, and can also identify the same human body in different training video images based on the stable relative seat relations among the trainees. Then, the coordinates of the key points belonging to the same human body can be analyzed.

It should be noted that, for each body region map, the key points of the body represented by the body region map may include, but are not limited to: a left wrist keypoint, a right wrist keypoint, a left eye keypoint, a right eye keypoint, an upper lip keypoint, a lower lip keypoint, and the like. The key points of the human body are generally coordinates of the key points of the human body in the corresponding training video image.

And 103, determining target actions executed by the corresponding human bodies in the training process and the execution times of the target actions according to the change condition of the key point coordinates of the human bodies in the whole training process along with time.

The target action may include, but is not limited to, a clapping action, a response action, and the like.

Here, for each human body, the terminal device may determine the action performed by the human body and the number of times of performing the action by analyzing a change in the coordinates of key points of the human body over time throughout the training process. As an example, the executing agent may determine the number of times that a human body stands by analyzing the change of the coordinates of the eye key points of the human body with time, so as to determine the response action performed by the human body in the whole training process and the number of times of performing the response action.

And 104, determining a training atmosphere score according to the target action executed by each human body and the execution frequency of the target action, wherein the training atmosphere score is used for indicating the concentration degree of the trainees on the training.

The executive may determine the training ambience score in a variety of ways. As an example, the execution subject may obtain scores for the trainees after obtaining the number of executions of the target action performed by each trainee, and then determine the average of the scores of all the trainees as the training atmosphere score.

In practice, if the target action includes a clapping action and a response action, determining the training atmosphere score according to the target action executed by each human body and the execution times of the target action based on the training atmosphere score may include: and determining a weighted sum of the score corresponding to the execution frequency of the clapping action and the score corresponding to the execution frequency of the response action as the training atmosphere score. As an example, a mean value of the number of executions of the clapping action and a mean value of the number of executions of the response action may be calculated, and then a weighted sum of a score corresponding to the mean value of the number of executions of the clapping action and a score corresponding to the mean value of the number of executions of the response action may be determined as the training atmosphere score.

And 105, generating evaluation result information aiming at the current training course based on the training atmosphere score.

Here, each training atmosphere score may correspond to one evaluation result information. For example, the evaluation result information may be "the training effect of the current training course is excellent" if the training atmosphere score is 100 points, and may be "the training effect of the current training course is good" if the training atmosphere score is 80 points.

According to the method provided by the embodiment, the training atmosphere score for indicating the concentration degree of the trainee on the training can be obtained by collecting the training process information and automatically analyzing the training process information, so that the evaluation result information aiming at the current training course is generated based on the training atmosphere score. Because the concentration degree of the trainees in the training process can accurately reflect the training effect of the whole training course, the evaluation result information for accurately evaluating the training effect can be generated, and the efficiency and the accuracy for evaluating the effect of the training course are improved.

In some optional implementation manners of this embodiment, the determining, according to a change of the coordinate of the key point of each human body in the whole training process over time, the target action executed by the corresponding human body in the training process and the number of times of executing the target action may include:

first, a left wrist coordinate curve and a right wrist coordinate curve for each human body are generated under a preset coordinate system.

The horizontal axis of the preset coordinate system is a time axis, and the vertical axis of the preset coordinate system is the key point coordinates of the corresponding key points at different times. The left wrist coordinate curve is a curve formed by coordinate values of the left wrist key point at different moments. The right wrist coordinate curve is a curve formed by coordinate values of the right wrist key point at different moments. Since the keypoint coordinates are typically two-dimensional image coordinates, the left wrist coordinate curve and the right wrist coordinate curve are typically spatial curves.

In practice, for each human body, the executing body may obtain a left wrist coordinate curve for the human body by using the left wrist key point coordinate combinations of the human body at different times, and obtain a right wrist coordinate curve for the human body by using the right wrist key point coordinate combinations of the human body at different times. That is, a left wrist coordinate curve and a right wrist coordinate curve can be obtained for each student.

Then, for each human body, the intersection point of the left wrist coordinate curve and the right wrist coordinate curve for the human body is determined, and whether the human body performs the clapping action and the execution times of the clapping action are determined according to the distribution interval of the intersection points on the time axis.

Here, when the student performs the clapping operation, the left wrist key point coordinates and the right wrist key point coordinates are generally the same, that is, when a student claps, the left wrist coordinate curve and the right wrist coordinate curve for the student have an intersection. In practical applications, a trainee may initiate several clapping actions during the whole training process, each clapping action may be several clapping times, and the time interval between two adjacent clapping times is usually small, for example, in the range of 0.5 seconds to 1 second. The interval duration of two adjacent clapping actions usually needs to be longer than a certain time interval, such as 5 seconds. Therefore, the executing agent may determine whether the corresponding trainee performs the clapping action and the number of times the clapping action is performed by the trainee throughout the training process based on the distribution interval of the intersections. For example, two adjacent intersection points with a time interval greater than a preset time threshold may be divided into two clapping actions, and so on, to obtain multiple clapping actions.

In some optional implementations, determining whether the human body performs the clapping action and the execution times of the clapping action according to the distribution intervals of the intersection points on the time axis may include: and calculating the time interval between every two adjacent intersection points, and dividing the two adjacent intersection points with the time interval larger than a preset time length threshold value into two different intersection point groups to obtain a plurality of intersection point groups. And if the number of the intersection points is larger than the preset number and the time interval between every two adjacent intersection points belongs to the preset time interval range, determining that the intersection point group corresponds to one clapping action.

Wherein, because the two wrist points are usually moved close to and away from each other in a certain period during the clapping process, although the clapping frequency is fast or slow, the interval between every two adjacent clapping points basically belongs to the preset time interval range, such as 0.5-1 second. In addition, it is generally considered that the clapping is performed more than once, that is, the intersection point of two curves can be two or more in one clapping process, and therefore, the preset number is usually an integer greater than 2.

Here, the execution body may calculate a time interval between every two adjacent intersections, and divide the two adjacent intersections, the time interval being greater than a preset time threshold, e.g., 5 seconds, into two different intersection groups, resulting in a plurality of intersection groups. Then, for each intersection point group, if the number of intersection points is larger than the preset number and the time interval between every two adjacent intersection points belongs to the preset time interval range, determining that the intersection point group corresponds to one clapping action. It is to be noted that the number of times of execution of clapping motions of the trainee can be accurately determined in the form of grouping.

first, a relative position difference curve of the upper and lower lips for each human body is generated.

The relative position difference curve of the upper lip and the lower lip is used for describing the relative position difference between the coordinates of the key points of the upper lip and the coordinates of the key points of the lower lip. The relative position difference curve of the upper lip and the lower lip is a plane curve. Each human body can generate a relative position difference curve of the upper lip and the lower lip.

Here, the execution body may calculate a relative position difference between the coordinates of the upper lip key point and the coordinates of the lower lip key point of the human body for each human body, for example, may calculate a euclidean distance between two coordinates, which is a relative position difference between the two coordinates.

And then, determining whether the corresponding human body executes the response action and the execution times of the response action according to the value distribution and the continuous change duration of the relative position difference curve of the upper lip and the lower lip.

Here, since the relative position difference between the coordinates of the key point of the upper lip and the coordinates of the key point of the lower lip is generally in a continuously changing state while the person is speaking. When the person does not speak, the relative position difference between the coordinates of the key point of the upper lip and the coordinates of the key point of the lower lip is usually fixed, for example, 0 or a fixed value. The execution body may determine each continuously changing region in the relative position difference curve of the upper and lower lips as corresponding to one response motion.

It should be noted that, since the human body usually keeps moving between the upper lip and the lower lip while speaking, it can be more accurately determined whether the trainee is responding to the training by the relative position difference curve of the upper lip and the lower lip. Here, the response may be to ask a question or to answer a question.

In some optional implementation manners, determining whether the corresponding human body executes the response action and the execution times of the response action according to the value distribution and the continuous change duration of the relative position difference curve of the upper lip and the lower lip, includes: and determining a plurality of continuously changing areas in the relative position difference curve of the upper lip and the lower lip corresponding to each human body, and if any area meets a preset response condition, determining that the area corresponds to one response action.

in the first item, the duration of the change of the area is longer than the preset change duration.

And in the second item, in the preset reference time before the region, the coordinate value of at least one wrist key point of the human body is higher than or equal to the coordinate value of the eye key point of the human body. The preset reference time period is usually a preset time period value, and may be 5 seconds, for example. The eye key points can be left eye key points or right eye key points. Here, the coordinate value of the wrist key point is higher than or equal to the eye key point of the human body, and it may be that the trainee is lifting his hand.

And thirdly, the face plane of the human body in the area is parallel to the face plane of the instructor. The face plane is a plane formed by the key points of the left eye, the right eye and the mouth. Here, when the student answers, the student and the lecturer usually face each other, and the planes corresponding to the faces of the student and the lecturer are usually parallel to each other.

The implementation mode can judge whether the student executes the response action from multiple angles, so that the execution times of the response action of the student can be accurately determined.

Referring to fig. 2, fig. 2 is a flowchart illustrating an implementation of an information generating method according to an embodiment of the present disclosure. The information generating method provided by this embodiment, when the training process information further includes training audio information, may include the following steps:

step 201, training process information is collected according to a preset collection period.

The training process information comprises training video images and can also comprise training audio information.

Step 202, performing human body key point detection on the training video image to obtain a plurality of human body area diagrams and key point coordinates of the human body presented by each human body area diagram.

And step 203, determining the target action executed by the corresponding human body in the training process and the execution times of the target action according to the change condition of the key point coordinates of each human body in the whole training process along with time.

And 204, determining a training atmosphere score according to the target action executed by each human body and the execution frequency of the target action, wherein the training atmosphere score is used for indicating the concentration degree of the trainees on the training.

In the present embodiment, the specific operations of steps 201-204 are substantially the same as the operations of steps 101-104 in the embodiment shown in fig. 1, and are not repeated herein.

Step 205, determining the instructor audio score of the instructor according to the training audio information at different time intervals in the training process.

The instructor audio score is used for indicating the pronunciation standard degree of the instructor, and the preset audio information in the preset audio information set corresponds to the audio score.

Here, the execution subject may select training audio information of a plurality of time periods to obtain a plurality of training audio information, and then may compare the audio feature of each obtained training audio information with the audio feature of the preset audio information in the preset audio information set, so as to obtain preset audio information and a corresponding audio score corresponding to each training audio.

In some optional implementations of this embodiment, the determining the instructor audio score of the instructor according to the training audio information at different time periods in the training process may include the following steps:

step one, extracting audio features of training audio information at different time intervals in a training process to obtain a plurality of audio features.

Here, the terminal device may obtain training audio information for a plurality of different periods. Then, audio features of the training audio information may be extracted. In this way, multiple audio features may be obtained.

And step two, determining training audio information belonging to the lecturer from the extracted multiple pieces of training audio information based on the multiple audio features, and recording the training audio information as target audio information.

Here, since the lecturer speaks for a relatively long duration, most of the audio comes from the lecturer, and thus most of the audio characteristics are substantially the same. Thus, the training audio information corresponding to the audio feature with the largest occurrence number can be determined and is the training audio information from the instructor.

And step three, selecting preset audio information matched with the target audio information from the preset audio information set, and determining the audio score corresponding to the selected preset audio information as the instructor audio score of the instructor. The instructor audio score is used for indicating the pronunciation standard degree of the instructor, and the preset audio information in the preset audio information set corresponds to the audio score.

Here, the target audio information may be subjected to similarity calculation with each of the preset audio information in the preset audio information set stored in advance, so as to find the preset audio information most similar to the target audio information. And then, taking the audio score corresponding to the preset audio information as the instructor audio score of the instructor.

And step 206, generating evaluation result information aiming at the current training course based on the training atmosphere score.

According to the method and the device, the evaluation result information for the current training course can be obtained, meanwhile, the instructor audio score for describing the pronunciation standard degree of the instructor is obtained, and the reference evaluation can be further performed on the current training course.

In some optional implementations of the embodiment, generating evaluation result information for the current training course based on the training atmosphere score may include:

Wherein, the attendance score of the student is the ratio of the number of the actual attendance students to the number of expected attendance students.

Here, analysis is performed from a plurality of angles, from the degree of the lecturer's pronunciation criteria, the attendance rate of the trainee, and the degree of concentration of the trainee, so that evaluation result information for accurately evaluating the training effect can be generated, which contributes to further improvement of the efficiency and accuracy of evaluating the effect of the training course.

Referring to fig. 3, fig. 3 is a block diagram of an information generating apparatus 300 according to an embodiment of the present disclosure. The information generating apparatus in this embodiment includes units for executing the steps in the embodiments corresponding to fig. 1 to fig. 2. Please refer to fig. 1-2 and the related descriptions of the embodiments corresponding to fig. 1-2. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 3, the information generating apparatus 300 includes:

the information acquisition unit 301 is used for acquiring training process information according to a preset acquisition cycle, wherein the training process information comprises training video images;

the model application unit 302 is used for detecting key points of a human body on the training video image to obtain a plurality of human body area diagrams and key point coordinates of the human body presented by each human body area diagram;

the action determining unit 303 is configured to determine a target action executed by each human body in the training process and the execution times of the target action according to the change condition of the key point coordinates of each human body in the whole training process along with time;

an atmosphere determination unit 304 for determining a training atmosphere score indicating the concentration degree of the trainee on the training, based on the target motion executed by each human body and the execution frequency of the target motion;

a result generation unit 305 for generating evaluation result information for the current training course based on the training atmosphere score.

As an embodiment of the present application, the action determining unit 303 is specifically configured to: generating a left wrist coordinate curve and a right wrist coordinate curve for each human body under a preset coordinate system, wherein the horizontal axis of the preset coordinate system is a time axis, and the vertical axis of the preset coordinate system is the key point coordinates of corresponding key points at different times; and determining the intersection points of the left wrist coordinate curve and the right wrist coordinate curve of each human body, and determining whether the human body executes the clapping action and the execution times of the clapping action according to the distribution intervals of the intersection points on the time axis.

As an embodiment of the present application, the action determining unit 303 is further specifically configured to: calculating the time interval between every two adjacent intersection points, and dividing the two adjacent intersection points with the time interval larger than a preset time length threshold into two different intersection point groups to obtain a plurality of intersection point groups; and if the number of the intersection points is larger than the preset number and the time interval between every two adjacent intersection points belongs to the preset time interval range, determining that the intersection point group corresponds to one clapping action.

As an embodiment of the present application, the action determining unit 303 is specifically configured to: generating relative position difference curves of the upper lip and the lower lip aiming at each human body, wherein the relative position difference curves of the upper lip and the lower lip are used for describing the relative position difference between the key point coordinates of the upper lip and the key point coordinates of the lower lip; and determining whether the corresponding human body executes the response action and the execution times of the response action according to the value distribution and the continuous change duration of the relative position difference curve of the upper lip and the lower lip.

As an embodiment of the present application, the action determining unit 303 is further specifically configured to: aiming at each human body, determining a plurality of continuously changing areas in a relative position difference curve of an upper lip and a lower lip corresponding to the human body, and if any area meets a preset response condition, determining that the area corresponds to a response action;

wherein the preset response condition comprises at least one of the following items: the continuous change time length of the area is longer than the preset change time length; within a preset reference time before the region, the coordinate value of at least one wrist key point of the human body is higher than or equal to the coordinate value of an eye key point of the human body; the face plane of this human body is parallel with lecturer's face plane in this region, and wherein, face plane is the plane that left eye key point, right eye key point and mouth key point formed.

As an embodiment of the present application, if the training process information further includes training audio information, the apparatus may further include a pronunciation determination unit (not shown in the figure). The pronunciation determining unit is used for determining the audio score of the instructor according to training audio information of different time periods in the training process, wherein the audio score of the instructor is used for indicating the pronunciation standard degree of the instructor, and the preset audio information in the preset audio information set corresponds to the audio score.

As an embodiment of the present application, the pronunciation determining unit is specifically configured to: extracting audio features of training audio information at different time intervals in a training process to obtain a plurality of audio features; determining training audio information belonging to the lecturer from the extracted training audio information based on the plurality of audio features, and recording the training audio information as target audio information; and selecting preset audio information matched with the target audio information from the preset audio information set, and determining the audio score corresponding to the selected preset audio information as the instructor audio score of the instructor.

As an embodiment of the present application, the result generating unit 305 is specifically configured to: and determining the comprehensive score of the current training course according to the weighted sum of the training atmosphere score, the teacher audio score and the student attendance score, and generating evaluation result information aiming at the current training course according to the comprehensive score, wherein the student attendance score is the ratio of the number of the actual attendance students to the number of expected attendance students.

The device that this embodiment provided through gathering training process information to and automatically carry out the analysis to training process information, can obtain the training atmosphere score that is used for instructing the student to the degree of concentration of training, thereby based on training atmosphere score, generate the evaluation result information to current training course. Because the concentration degree of the trainees in the training process can accurately reflect the training effect of the whole training course, the evaluation result information for accurately evaluating the training effect can be generated, and the efficiency and the accuracy for evaluating the effect of the training course are improved.

It should be understood that, in the structural block diagram of the information generating apparatus shown in fig. 3, each unit is configured to execute each step in the embodiment corresponding to fig. 1-2, and each step in the embodiment corresponding to fig. 1-2 has been explained in detail in the above embodiment, and please refer to the relevant description in the embodiments corresponding to fig. 1-2 and fig. 1-2 specifically, and is not described again here.

Fig. 4 is a block diagram of a terminal device according to another embodiment of the present application. As shown in fig. 4, the terminal device 400 of this embodiment includes: a processor 401, a memory 402 and a computer program 403, e.g. a program of the information generating method, stored in the memory 402 and executable on the processor 401. The processor 401 implements the steps in each embodiment of the above-described information generation method, such as the steps 101 to 105 shown in fig. 1 or the steps 201 to 206 shown in fig. 2, when executing the computer program 403. Alternatively, when the processor 401 executes the computer program 403, the functions of the units in the embodiment corresponding to fig. 3, for example, the functions of the units 301 to 305 shown in fig. 3, are implemented, for which reference is specifically made to the relevant description in the embodiment corresponding to fig. 3, which is not described herein again.

Illustratively, the computer program 403 may be divided into one or more units, which are stored in the memory 402 and executed by the processor 401 to accomplish the present application. One or more of the units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 403 in the terminal device 400. For example, the computer program 403 may be divided into an information acquisition unit, a model application unit, an action determination unit, an atmosphere determination unit, and a result generation unit, each of which functions as described above.

The terminal device may include, but is not limited to, a processor 401, a memory 402. Those skilled in the art will appreciate that fig. 4 is merely an example of a terminal device 400 and does not constitute a limitation of terminal device 400 and may include more or fewer components than shown, or combine certain components, or different components, e.g., a turntable device may also include input output devices, network access devices, buses, etc.

The Processor 401 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 402 may be an internal storage unit of the terminal device 400, such as a hard disk or a memory of the terminal device 400. The memory 402 may also be an external storage device of the terminal device 400, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device 400. Further, the memory 402 may also include both an internal storage unit of the terminal device 400 and an external storage device. The memory 402 is used for storing computer programs and other programs and data required by the turntable device. The memory 402 may also be used to temporarily store data that has been output or is to be output.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated module, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. The computer readable storage medium may be non-volatile or volatile. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable storage medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable storage media that does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. An information generating method, characterized in that the method comprises:

detecting key points of the human body on the training video image to obtain a plurality of human body area graphs and key point coordinates of the human body presented by each human body area graph;

and generating evaluation result information aiming at the current training course based on the training atmosphere score.

2. The information generating method according to claim 1, wherein the determining the target action and the number of times of executing the target action performed by the corresponding human body in the training process according to the change of the key point coordinates of each human body in the whole training process with time includes:

3. The information generating method according to claim 2, wherein the determining whether or not the human body performs the clapping action and the number of times of performing the clapping action according to the distribution interval of the intersection points on the time axis includes:

4. The information generating method according to claim 1, wherein the determining the target action and the number of times of executing the target action performed by the corresponding human body in the training process according to the change of the key point coordinates of each human body in the whole training process with time includes:

5. The information generating method according to claim 4, wherein the determining whether the corresponding human body executes the response action and the execution times of the response action according to the value distribution and the duration of the relative position difference curve of the upper and lower lips includes:

6. The information generating method according to any one of claims 1 to 5, wherein if the training process information further includes training audio information, the method further includes:

according to training audio information of different time periods in a training process, determining an instructor audio score of an instructor, wherein the instructor audio score is used for indicating the pronunciation standard degree of the instructor, and audio scores are corresponding to preset audio information in a preset audio information set.

7. The information generating method as claimed in claim 6, wherein the determining the instructor audio score of the instructor based on the training audio information at different time intervals in the training process comprises:

extracting audio features of training audio information at different time intervals in a training process to obtain a plurality of audio features;

determining training audio information belonging to the lecturer from the extracted training audio information based on the audio features, and recording the training audio information as target audio information;

and selecting preset audio information matched with the target audio information from a preset audio information set, and determining an audio score corresponding to the selected preset audio information as the instructor audio score of the instructor.

8. The information generation method according to claim 6, wherein the generating evaluation result information for a current training course based on the training atmosphere score includes:

according to training atmosphere score the teacher audio score and the weighted sum of student attendance score determine the comprehensive score of current training course, and according to comprehensive score generates the evaluation result information to current training course, wherein, student attendance score is the ratio of actual student attendance number to expected student attendance number.

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 8 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.