CN112069931A

CN112069931A - State report generation method and state monitoring system

Info

Publication number: CN112069931A
Application number: CN202010844814.9A
Authority: CN
Inventors: 周鲁平; 胡晓华
Original assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Current assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2020-12-11

Abstract

The application is suitable for the field of intelligent education, and provides a state report generation method and a state monitoring system, wherein the state monitoring system comprises: the system comprises a state identification device, a server and a user terminal; the method comprises the following steps: the state identification device determines a corresponding state monitoring result of a user in a monitoring time period based on a user video acquired in the monitoring time period, and sends the state monitoring result to the server; the server generates state information corresponding to the monitoring time period based on the received state monitoring result; and if the server receives a status report generation request sent by the user terminal, the server feeds back the status information to the user terminal based on the status report generation request and instructs the user terminal to generate and display a status report. The method and the device can realize real-time supervision of the state of the user and automatically generate the state report so that the user can know the state of the user.

Description

State report generation method and state monitoring system

Technical Field

The application belongs to the field of intelligent education, and particularly relates to a state report generation method and a state monitoring system.

Background

With the advancement of the times, the education industry has become more and more civilized and generalized, and various courses are available for users to choose. During the learning period, the user can put most of the mind into learning, the user does not pay attention to the state of the user during the learning period, and generally, people can not supervise the learning of the user all the time, so that the user can not know the condition of the user during the learning period. Therefore, there is a need in the industry for an intelligent educational product that supervises a user while the user is learning a course and automatically generates a status report based on the user's performance during learning to let the user know the status during learning.

Disclosure of Invention

The embodiment of the application provides a state report generation method and a state monitoring system, which can monitor the performance of a user during learning and automatically generate a state report so that the user can know the state during learning.

In a first aspect, an embodiment of the present application provides a method for generating a status report, which is applied to a status monitoring system, where the status monitoring system includes: the system comprises a state identification device, a server and a user terminal; the status report generation method comprises the following steps: the state identification device determines a corresponding state monitoring result of a user in a monitoring time period based on a user video acquired in the monitoring time period, and sends the state monitoring result to the server; the state monitoring result comprises: learning state information, sitting posture information, and attention information; the server generates state information corresponding to the monitoring time period based on the received state monitoring result; and if the server receives a status report generation request sent by the user terminal, the server feeds back the status information to the user terminal based on the status report generation request and instructs the user terminal to generate and display a status report.

In a possible implementation manner of the first aspect, one user video corresponds to one monitoring period, and there are a plurality of monitoring periods; the status report comprises a plurality of status pages, and one status page corresponds to one monitoring period.

Illustratively, the monitoring period refers to a time period from a starting time to an end time of the user video, and the user video is a video with a continuous and uninterrupted time axis; the server may feed the status information back to the user terminal based on the status report generation request, and instruct the user terminal to generate and display a status report, and the method specifically includes: the server generates a request for feeding back state information corresponding to a plurality of monitoring periods based on the state report, and generates a state page corresponding to a target monitoring period based on the state information of the target monitoring period, wherein the target monitoring period is any one of the monitoring periods.

It should be understood that, when the user terminal displays the status report, the user terminal may specifically display a status page of the status report based on user operation, that is, the user may browse the status page corresponding to a certain monitoring period according to the user's own needs, so as to know the user status of the monitoring period.

In a second aspect, an embodiment of the present application provides a condition monitoring system, where the condition monitoring system includes: the system comprises a state identification device, a server and a user terminal; the state identification device is used for determining a corresponding state monitoring result of a user in a monitoring time period based on a user video acquired in the monitoring time period and sending the state monitoring result to the server; the state monitoring result comprises: learning state information, sitting posture information, and attention information; the server is used for generating state information corresponding to the monitoring time period based on the received state monitoring result; the server is further configured to, if a status report generation request sent by the user terminal is received, feed back the status information to the user terminal based on the status report generation request, and instruct the user terminal to generate and display a status report.

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method performed by the state identification device of any of the first aspect above when executing the computer program and/or the method performed by the server.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, including: the computer-readable storage medium stores a computer program which, when executed by a processor, implements a method performed by the state recognition apparatus of any of the first aspects described above, and/or a method performed by the server.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Compared with the prior art, the embodiment of the application has the advantages that:

compared with the prior art, the method for generating the status report has the advantages that the user is supervised in the learning process of the user in real time, the status report is generated by identifying the sitting posture and the attention state of the user in the learning process, so that the status of the user in the learning process is recorded, and the user can check and timely correct the status in the subsequent learning process based on the learning report.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a flow chart of an implementation of a method provided by a first embodiment of the present application;

FIG. 2 is a schematic diagram of an application scenario provided in an embodiment of the present application;

FIG. 3 is a data interaction diagram of a condition monitoring system according to an embodiment of the present application;

FIG. 4 is a flowchart of an implementation of a method provided by the second embodiment of the present application;

fig. 5 is a schematic diagram of a key point extraction network provided in a third embodiment of the present application;

fig. 6 is a schematic flowchart of a method S303 according to a fourth embodiment of the present application;

FIG. 7 is a flow chart of an implementation of a method provided in a fifth embodiment of the present application;

FIG. 8 is a flowchart of an implementation of a method provided by a sixth embodiment of the present application;

FIG. 9 is a flowchart of an implementation of a method provided by the seventh embodiment of the present application;

FIG. 10 is a diagram illustrating the effect of status reporting provided by a seventh embodiment of the present application;

FIG. 11 is a schematic diagram of a condition monitoring system according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

Fig. 1 shows a data interaction diagram of a method provided in a first embodiment of the present application, which is detailed as follows:

referring to fig. 1, the status report generating method provided in this embodiment is applied to a status monitoring system, where the status monitoring system includes: the system comprises a state identification device, a server and a user terminal; the status report generation method comprises S101-S103.

In S101, the state identification device determines a corresponding state monitoring result of the user in the monitoring period based on the user video acquired in the monitoring period, and sends the state monitoring result to the server.

In this embodiment, the status monitoring result includes: learning status information, sitting posture information, and attention information. The user video can be collected through a camera and uploaded to the state recognition device through the camera, and the state recognition device can also comprise the camera. The camera is arranged at a preset position, such as a desk used by a user during learning, a display used for displaying learning videos of the user, or a book shelf for placing books during reading; the camera can be used for shooting the upper half body area of the user when the user is in a sitting posture state for learning, wherein the upper half body comprises the part above the chest of the human body of the user, particularly comprises a human face and shoulders.

In one possible implementation, the user video includes a plurality of frames of user images. The state identification device may determine a corresponding state monitoring result of the user in the monitoring period based on the user video acquired in the monitoring period, and the process may specifically be: the state recognition device determines the corresponding learning state information, the sitting posture information and the attention information in the monitoring time period according to all user images in the user video collected in the monitoring time period; the learning state information is used for representing that the user is in a learning state or a rest state in each time period in the monitoring period; the sitting posture information is used for representing the sitting postures of the user in each time period in the monitoring time period, and the sitting postures comprise standard sitting postures or non-standard sitting postures and can also be sitting postures of specific types; the attention information is used for representing whether the attention of the user is concentrated in each time period in the monitoring period.

Optionally, the state recognition device separately introduces all user images into a learning state recognition model, recognizes a learning state corresponding to each user image, and obtains the learning state information of the monitoring period based on the learning states of all user images; it should be understood that, in order to monitor the state information of the user during learning in real time, only when the user is in the learning state, the sitting posture state information and the attention state information of the user at this time need to be determined, that is, when the learning state of the user image indicates that the user is in the learning state, the state recognition device respectively guides the user image into a sitting posture recognition model and an attention recognition model, recognizes the sitting posture state and the attention state corresponding to the user image, and obtains the sitting posture state information and the attention state information of the monitoring period based on the sitting posture state and the attention state of all the user images; when the learning state of the user image indicates that the user is in a resting state, the state identifying device may not perform the operation of determining the sitting posture and the attention state, or identify the sitting posture and the attention state as preset values (for example, null values) indicating that the user is in a resting state (i.e., the user image may not include the human body of the user), and not identify the sitting posture and the attention state of the user at that time.

In S102, the server generates status information corresponding to the monitoring period based on the received status monitoring result.

In this embodiment, the server generates the status information corresponding to the monitoring period based on the received status monitoring result sent by the status identification device in the above S101; illustratively, the state information is used to characterize states of the user in the monitoring period, including a learning state, a sitting posture state, and an attention state, and specifically, the server determines, according to the state monitoring result, a time period corresponding to a different learning state, a time period corresponding to a different sitting posture state, and a time period corresponding to a different attention state in the monitoring period.

In one possible implementation, the monitoring period is 10:00 to 11: 00; the server generates the state information corresponding to the monitoring time period based on the received state monitoring result, and specifically may be: the server divides the monitoring time period into a learning time period and a rest time period according to the learning state information in the state monitoring result, for example, the learning state information represents that the user is in a learning state in 10:00 to 10:40 of the monitoring time period and is in a rest state in 10:41 to 11:00 of the monitoring time period; the server determines that the learning time period is specifically 10:00 to 10:40 and the rest time period is specifically 10:41 to 11: 00; the server divides the learning time period in the monitoring period into a sitting posture standard time period and a sitting posture non-standard time period (can also be divided into a plurality of time periods corresponding to different specific sitting posture types) according to the sitting posture state information in the state monitoring result, for example, the sitting posture state information indicates that the user is in a non-standard sitting posture state in 10: 10-10: 15 and 10: 20-10: 30 of the monitoring period, and the rest time periods are in a standard sitting posture state, the server determines that the sitting posture standard time period is 10: 00-10: 09, 10: 16-10: 19 and 10: 31-10: 40, and the sitting posture non-standard time period is 10: 10-10: 15 and 10: 20-10: 30; the server divides the learning time period in the monitoring period into an attention focusing time period and an attention non-focusing time period according to the attention state information in the state monitoring result, for example, the attention state information indicates that the user is in an attention non-focusing state in 10:10 to 10:20 and 10:35 to 10:40 of the monitoring period, and the other time periods are in an attention focusing state, the server determines that the attention focusing time period is 10:00 to 10:09 and 10:21 to 10:34, and the attention non-focusing time period is 10:10 to 10:20 and 10:35 to 10: 40.

It should be understood that the status information corresponding to the monitoring time period may specifically include the number of times that the user changes from the standard sitting posture state to the non-standard sitting posture state and the number of times that the user changes from the attention focusing state to the attention focusing state, that is, the number of times that the user has improper sitting postures and the number of times that the user has attention focusing in the monitoring time period, for example, with reference to the above example, the number of times that the user has improper sitting postures is specifically 2 times, and the number of times that the user has attention focusing is specifically 2 times.

In S103, if the server receives a status report generation request sent by the user terminal, the server feeds back the status information to the user terminal based on the status report generation request, and instructs the user terminal to generate and display a status report.

In this embodiment, in order to generate a status report, the user terminal needs to send the status report generation request to the server, where the server needs to feed back the status information of the monitoring period to the user terminal based on the status report generation request.

In a possible implementation manner, the user terminal sends the status report generation request to the server based on a user operation (for example, an operation of a user triggering to view the status report); and if the server receives a status report generation request sent by the user terminal, the server feeds back the status information to the user terminal based on the status report generation request and instructs the user terminal to generate and display a status report so as to meet the user requirement.

In this embodiment, when the user terminal generates a status report, that is, the user terminal generates the status report based on the status information fed back by the server, specifically, the user terminal presets an interface frame of the status report, and fills data in the status information into positions of the data corresponding to the interface frame.

In a possible implementation manner, the receiving, by the user terminal, the status information that is fed back by the server based on the status report generation request, and generating and displaying the status report based on the fed-back status information may specifically be: the user terminal receives the state information corresponding to the monitoring period, which is generated and requested to be fed back by the server based on the state report, based on user operation (for example, operation of a user for triggering and viewing the state report corresponding to the monitoring period); the user terminal generates the status report corresponding to the monitoring time period based on the status information; and the user terminal outputs the status report through a display device for a user to check.

It should be understood that, in a possible implementation, the steps performed by the state identification means may be performed on the server, i.e. the state identification means is identified as a module of the server.

In this embodiment, the status reporting method applied to the status monitoring system can monitor the status of the user in real time, and generate the status report for representing the status of the user by identifying the status of the user in the user video corresponding to the monitoring period, so as to store the status of the user in the monitoring period in the form of the status report, so as to enable the user to view the status report at any time for status adjustment.

Fig. 2 shows a schematic view of an application scenario provided in an embodiment of the present application. Referring to fig. 2, the user learns using the table and the chair shown in the drawing. The camera is used for collecting the user video in the user learning period; the state recognition device acquires the user video in the user learning period, determines a corresponding state monitoring result of the user in the user learning period, and sends the state monitoring result to the server; the server generates state information corresponding to the user learning time period based on the received state monitoring result; the user terminal sends a status report generation request to the server; and the user terminal receives the state information which is fed back by the server based on the state report generation request, and generates and displays a state report based on the fed back state information so as to make the user know the state in the user learning time period.

FIG. 3 illustrates a data interaction diagram of a condition monitoring system according to an embodiment of the present application; referring to fig. 3, in the condition monitoring system provided in an embodiment of the present application, the condition identifying device sends the condition monitoring result to the server; the user terminal sends a status report generation request to the server; and the server generates a request for feeding back the state information to the user terminal based on the state report.

Fig. 4 shows a flowchart of an implementation of the method provided in the second embodiment of the present application. Referring to fig. 4, with respect to the embodiment shown in fig. 1, the method S101 provided in this embodiment includes S401 to S404, which are detailed as follows:

further, the state identification device determines a corresponding state monitoring result of the user in the monitoring period based on the user video collected in the monitoring period, and includes:

in S401, the state recognition device divides the user video into a plurality of video segments based on a detection period.

In this embodiment, the status monitoring result includes: learning state information, sitting posture information, and attention information; the state identification device determines a corresponding state monitoring result of the user in the monitoring time period, mainly according to a plurality of frames of user images in the user video; in order to simplify the calculation and to implement the function of determining the state monitoring result, the state recognition device needs to preset a detection period, divide the user video into a plurality of video segments based on the detection period, wherein the duration of each video segment is consistent with the period duration of the detection period, and then the state recognition device analyzes each video segment respectively so as to determine the state monitoring result corresponding to the monitoring period in the following. As an example, the duration of the monitoring period may be one hour, and the duration of the detection period may be one minute, that is, the state identification device divides the user video into sixty video segments based on the detection period.

In S402, the state identification device imports multiple frames of user images in a target video segment into a key point extraction network, respectively, to obtain key point images corresponding to the user images of each frame.

In this embodiment, the target video segment is any one of the video segments; in the foregoing, each video segment is analyzed, taking any frame of user image in the target video segment as an example, the state identification device imports the user image into the key point extraction network to obtain a key point image corresponding to the user image, and illustratively, the key point extraction network is configured to extract key point feature information about the user in the user image; the keypoint image comprises feature information of all keypoints in the user image. The key point extraction network may be a trained key point recognition model for extracting a target object from an image, and exemplarily, the key point extraction network may be an openpos human body key point recognition model, where the key points include a left eye key point, a right eye key point, a nose key point, a left ear key point, a right ear key point, a left shoulder key point, a right shoulder key point, and a middle (neck) key point.

In a possible implementation manner, the specific implementation manner of S402 may be: the key point extraction network extracts feature information about a left-eye key point, a right-eye key point, a nose key point, a left-ear key point, a right-ear key point, a left-shoulder key point, a right-shoulder key point and a middle (neck) key point of the user in the user image, and obtains the key point image based on the feature information.

In S403, the state identification device determines the user state corresponding to the target video segment based on the key point images corresponding to all the user images in the target video segment.

In the present embodiment, the user states include a learning state, a sitting state, and an attentive state.

In one possible implementation, the state recognition device includes a learning recognition unit, a sitting posture recognition unit, and an attention recognition unit; the state identification device determines the user state corresponding to the video segment based on the key point images corresponding to all the user images in the video segment, and specifically may be: the learning identification unit determines a learning state corresponding to the target video segment based on key point images corresponding to all user images in the target video segment, wherein the learning state is used for representing whether the user is in a learning state or a resting state in the target video segment; the sitting posture identifying unit determines a sitting posture state corresponding to the target video segment based on the key point images corresponding to all the user images in the target video segment, wherein the sitting posture state is used for representing the sitting posture of the user in the target video segment, such as a standard sitting posture state or a non-standard sitting posture state, and can also be a sitting posture state of a specific type; the attention recognition unit determines an attention state corresponding to the target video segment based on key point images corresponding to all user images in the target video segment, wherein the attention state is used for representing whether the attention of the user on the target video segment is focused or not.

It should be understood that the learning identification unit, the sitting posture identification unit, and the attention identification unit may be specifically a trained learning identification model, a sitting posture identification model, and an attention identification model, and the learning identification model, the sitting posture identification model, and the attention identification model are configured to take a plurality of key point images corresponding to the target video segment as input and output the learning state, the sitting posture state, and the attention state corresponding to the target video segment respectively.

In S404, the state identification device encapsulates the user states of all video segments in the monitoring period, so as to obtain the state monitoring result.

In this embodiment, one video segment in the user video corresponds to one detection cycle in the monitoring period, and one video segment corresponds to one user state, that is, one user state corresponds to one detection cycle in the monitoring period. The state identification device encapsulates the user states of all video segments in the monitoring period to obtain the state monitoring result, which may specifically be: and the state identification device packages a plurality of user states according to the sequence of each detection cycle in the monitoring period to obtain a state monitoring result corresponding to the monitoring period, wherein the state monitoring result comprises the corresponding relation between the plurality of user states and the plurality of detection cycles in the monitoring period. As an example, the duration of the detection period may be one minute, that is, the state monitoring result is used to represent the user state corresponding to each minute of the user in the monitoring period.

In this embodiment, the state recognition device imports the user image into a key point extraction network to obtain a key point image, and determines the user state corresponding to the target video segment based on all key point images corresponding to the target video segment, so as to reduce the amount of calculation in determining the user state, so as to improve the efficiency of the state report generation method.

Fig. 5 shows a schematic diagram of a key point extraction network provided in a third embodiment of the present application. Referring to fig. 5, with respect to the embodiment shown in fig. 4, the method S402 provided in this embodiment includes S501 to S502, which are detailed as follows:

further, the step of importing, by the state recognition device, a plurality of frames of user images in the target video segment into a key point extraction network, to obtain a key point image corresponding to each frame of the user image includes:

referring to fig. 5, in this embodiment, the key point extraction network includes a human body identification layer and a key point identification layer.

In S501, the state recognition device imports a target user image into the human body recognition layer, and captures a human body image from the target user image.

In one possible implementation manner, the target user image is any user image in the multiple frames of user images; the above introducing the target user image into the human body recognition layer and capturing the human body image from the target user image may specifically be: and preprocessing the target user image, determining human body edge contour information in the target user image according to a preprocessed result, and intercepting a human body image containing a human face and an upper half of a human body in the target user image according to the human body edge contour information.

The preprocessing the target user image may specifically be: carrying out image processing means for highlighting the edge contour such as image sharpening processing on the target user image to obtain a preprocessed user image; the determining of the human body edge contour information in the target user image according to the preprocessed user image may specifically be: leading the preprocessed user image into a trained human body recognition model for determining the human body edge contour to obtain human body edge contour information; the above intercepting the human body image including the human face and the upper half of the user from the preprocessed user image according to the human body edge contour information may specifically be: and determining the edge contour of the target human body on the preprocessed user image according to the human body edge contour information, and intercepting an area surrounded by the edge contour of the target human body to identify the area as the human body image. It should be understood that the human body recognition model may be a model trained in the prior art and used for confirming the human body edge contour information in the image containing the human body, and will not be described in detail herein.

In S502, the state recognition device imports the human body image into the key point recognition layer, extracts a plurality of key points on the human body image, and outputs a key point image corresponding to the target user image.

In this embodiment, the key point identification layer is configured to identify a plurality of key points on the human body image with respect to the user, for example, the plurality of key points include a left eye key point, a right eye key point, a nose key point, a left ear key point, a right ear key point, a left shoulder key point, a right shoulder key point, and a middle (neck) key point. Optionally, the key point extraction layer may be an openpos human key point identification model, and specific implementation is not described herein again.

In a possible implementation manner, the importing the human body image into a key point identification layer, and outputting a key point image corresponding to the target user image may specifically be: extracting feature information of a left eye key point, a right eye key point, a nose key point, a left ear key point, a right ear key point, a left shoulder key point, a right shoulder key point and a middle part (neck) key point of the user in the human body image, obtaining the key point image based on the feature information, specifically, extracting each key point from the human body image, associating the position of each key point with the specific type of each key point, and obtaining the key point image containing the plurality of key points.

In this embodiment, in the key point extraction network, a human body recognition layer is arranged, so that feature information of an unimportant background environment in the target user image can be removed, and only the feature information of the target human body is kept as much as possible, which is equivalent to preprocessing the target user image, and reducing the information amount of an image to be processed in a subsequent step (or reducing the calculation amount of the subsequent step) so as to improve the efficiency of subsequently determining the state information; the key point identification layer is arranged, so that identification of key points of different target human bodies (in various postures or wearing various clothes) can be realized, the suitable crowd for attention detection can be enlarged by extracting the key points on the human body image, the feature information to be processed subsequently can be further simplified, and only the feature information of the key points of the human body image is reserved, so that the efficiency of subsequently determining the state information is improved.

Fig. 6 shows a flowchart of an implementation of the method provided in the fourth embodiment of the present application. Referring to fig. 6, with respect to the embodiment shown in fig. 4, the method S403 provided in this embodiment includes S601 to S611, which are detailed as follows:

further, the determining, by the state recognition device, the user state corresponding to the target video segment based on the key point images corresponding to all the user images in the target video segment includes:

referring to fig. 6, in S601, the state identification apparatus determines whether a key point preset by the key point extraction network is included in a target key point image; if the key point is included in the target key point image, the state recognition device recognizes a user image corresponding to the target key point image as a learning image; if the key point is not included in the target key point image, the state recognition device recognizes the user image corresponding to the target key point image as a rest image.

In this embodiment, the target keypoint image is any one of the keypoint images corresponding to all the user images; if the target keypoint image includes at least one keypoint, which indicates that the user image corresponding to the target keypoint image includes a human body of the user, that is, the user is in a sitting posture for learning at the moment, the state recognition device may recognize the user image corresponding to the target keypoint image as the learning image to represent that the user in the user image is in a learning state; if the target keypoint image does not include any keypoint, which indicates that the user image corresponding to the target keypoint image does not include any part of the human body of the user, namely the user is at rest and is in a state of leaving a seat at the moment, the state identification device may identify the user image corresponding to the target keypoint image as the rest image to represent that the user in the user image is in a rest state.

In S602, the state recognition device determines a learning state corresponding to the target video segment based on all the learning images and all the rest images in the target video segment.

In this embodiment, the learning state includes a first state and a second state, and the first state is used to represent a state where the user is at rest; the second state is used to characterize a state for being in learning; if the number of all learning images in the target video segment is larger than that of all rest images, the state recognition device determines that the learning state corresponding to the target video segment is learning and is used for expressing that the user is in the learning state in the target video segment; if the number of all the learning images in the target video segment is less than or equal to the number of all the rest images, the state identification device determines that the learning state corresponding to the target video segment is a rest state, and is used for expressing that the user is in the rest state in the target video segment.

It should be understood that, referring to the embodiment shown in fig. 4, the state recognition device determines the sitting posture and the attention posture of the user state based on the feature information of the key points in the key point image, that is, when the learning state represents that the user is in a learning state, the state recognition device is only necessary and capable of determining the specific values of the sitting posture and the attention posture of the user state; when the learning state represents that the user is in a resting state, the state recognition device does not need to and cannot determine specific values of the sitting state and the attentive state of the user state.

Referring to fig. 6, further, if the learning state corresponding to the target video segment is the first state, then S603 or S604 is executed;

in this embodiment, when the learning state is the first state, the state recognition device does not need to determine specific values of the sitting posture state and the attention state of the user state, but in order to keep a program complete and smooth running, the state recognition device still needs to perform assignment operation on the sitting posture state and the attention state of the user state; the above state recognition device performing assignment operation on the sitting posture state in the user state includes S603; the above state recognition device assigns the attention state among the user states, including S604.

In S603, the state recognition device sets a value of a sitting posture state corresponding to the target video segment to a preset value.

In this embodiment, the preset value may be a null value (or zero). The value of the sitting posture state corresponding to the target video segment is a null value, and is used for representing that the user does not have a sitting posture state in the target video segment, namely that the user is not in a sitting posture state in the target video segment.

In S604, the state recognition apparatus sets the value of the attention state corresponding to the target video segment to a preset value.

In this embodiment, the preset value may be a null value (or zero). The value of the attention state corresponding to the target video segment is a null value, and is used for representing that the user does not have the attention state in the target video segment, that is, the user does not learn in the target video segment, let alone that the learning attention is concentrated or not concentrated.

Referring to fig. 6, further, if the learning state corresponding to the target video segment is the second state, S605 to S608 or S609 to S611 are executed;

in this embodiment, when the learning state is the second state, the state recognition device is only necessary and capable of determining the sitting posture state of the user state and the specific value of the attention state, the determining of the sitting posture state of the user state by the state recognition device includes S605 to S608, and the determining of the attention state of the user state by the state recognition device includes S609 to S611.

In S605, the state recognition device imports the target key point image into a sitting posture recognition network to obtain a sub-sitting posture state of the user image corresponding to the target key point image.

In this embodiment, the sitting posture identification network is configured to determine the sub-sitting posture state according to feature information in the target keypoint image; the sitting posture recognition network is a trained classification model, the key point images are used as input, and the sub-sitting posture state is used as the category of the key point images for outputting. In a possible implementation manner, the state recognition device imports the target key point image into the sitting posture recognition network, extracts key point feature information of the target key point image, performs calculation based on the key point feature information and internal parameters of the sitting posture recognition network, and determines the sub-sitting posture state.

In S606, the state recognition device determines the sitting posture state of the target video segment based on the sub-sitting posture states of all user images in the target video segment.

In this embodiment, the sub-sitting posture includes a standard sitting posture and an off-standard sitting posture; the state identification device determines the sitting posture state of the target video segment based on the sub-sitting posture states of all the user images in the video segment, and specifically may be: the state recognition device recognizes a specific value (standard sitting posture or non-standard sitting posture) of a highest sitting posture state among all the sitting posture states as the sitting posture state of the video segment.

In S607, if the sitting posture state is a non-standard sitting posture, the state recognition device uploads the target video segment to the server, so that the server determines the sitting posture type corresponding to the target video segment based on the target video segment.

In this embodiment, the determining, by the server, a sitting posture type corresponding to the video segment based on the target video segment may specifically include: the server respectively guides a plurality of frames of user images in the target video segment into a key point accurate extraction network to obtain key point accurate images corresponding to the user images of each frame; and the server determines the sitting posture type corresponding to the target video segment based on the key point accurate images corresponding to all the user images in the target video segment.

It should be noted that the keypoint accurate image is different from the keypoint image in the embodiment of fig. 3 in that the number of keypoints in the keypoint accurate image is greater than that in the keypoint image, so that the server can have more keypoint feature information to determine the sitting posture type.

In one possible implementation, the sitting posture types include eight sitting posture types, i.e., right sitting posture, head lowering, head raising, waist bending, head left leaning, head right leaning, body left leaning, body right leaning, and the like; the value of the sitting posture of the posture correction is set so as to avoid that the state recognition means erroneously recognizes the sitting posture state, and the sitting posture type of the posture correction is equal to the sitting posture state of the standard sitting posture state. The server respectively guides multiple frames of user images in the target video segment into a key point accurate extraction network to obtain key point accurate images corresponding to the user images of the frames, and determines the sitting posture type corresponding to the target video segment based on the key point accurate images corresponding to all the user images in the target video segment, which may be specifically referred to as S402 and S403, and is not described herein again.

In S608, the state recognition device receives the sitting posture type sent by the server, and updates the sitting posture state corresponding to the target video segment according to the sitting posture type.

In this embodiment, the sitting posture types may include eight sitting posture types, such as a right posture, a head-down posture, a head-up posture, a waist-down posture, a head-left inclination, a head-right inclination, a body-left inclination, and a body-right inclination.

In a possible implementation manner, if the sitting posture type is not a correct posture, the state recognition device receives the sitting posture type sent by the server, and updates the sitting posture state according to the sitting posture type, which may specifically be: the state recognition device updates the value of the sitting posture from the non-standard sitting posture state to a set of the non-standard sitting posture state and the sitting posture type, for example, the sitting posture type is bending, and the sitting posture state is updated to the non-standard sitting posture state and the bending state.

In another possible implementation manner, if the sitting posture type is a correcting posture, the state recognition device receives the sitting posture type sent by the server, and updates the sitting posture state according to the sitting posture type, which may specifically be: the state recognition device updates the value of the sitting posture state from a non-standard sitting posture state to a standard sitting posture state.

In this embodiment, the state recognition device determines a sitting posture state corresponding to the target video segment based on the key point images corresponding to all user images in the target video segment, and the server determines a sitting posture type corresponding to the target video segment based on the key point precise images corresponding to all user images in the target video segment, where the number of key points of the key point precise images is greater than the number of key points of the key point images, and the server can recognize a specific sitting posture type of the user in the target video segment based on more key point feature information, so that the user can perform targeted adjustment when adjusting the state according to the subsequently generated state report.

In S609, the state recognition device imports the target keypoint image into a gesture recognition network to obtain gesture information of the user image corresponding to the target keypoint image.

In this embodiment, the gesture recognition network is an algorithm model trained based on a deep learning algorithm, and is configured to determine the gesture information based on feature information of each keypoint in the target keypoint image, with the target keypoint image as input and the gesture information as output. The pose information is used for representing the pose of the user in the user image, and exemplarily, the pose information may include head-down, normal and head-up, etc. for representing the pose condition of the user body in the user image.

In a possible implementation manner, the state recognition device may import the target keypoint image into a gesture recognition network to obtain gesture information of the user image corresponding to the target keypoint image, and specifically may: and the state recognition device extracts the characteristic information of each key point in the target key point image, and calculates according to the internal parameters of the gesture recognition network and the characteristic information to obtain the gesture information.

In another possible implementation manner, the posture information may include a head rotation vector and a body rotation vector; the head rotation vector is used to characterize an orientation of the human head of the user within the user image; the human body rotation vector is used for representing the orientation of the human body chest of the user in the user image; the state recognition device may specifically import the target keypoint image into a gesture recognition network to obtain gesture information of the user image corresponding to the target keypoint image, where the gesture information is: the state recognition device extracts feature information of each face key point in the target key point image, and calculates according to internal parameters of the gesture recognition network and the feature information to obtain the head rotation vector; and the state recognition device extracts the characteristic information of each human body key point in the key point image, and calculates according to the internal parameters of the posture recognition network and the characteristic information to obtain the human body rotation vector.

In S610, the state recognition device determines whether the user image corresponding to the target keypoint image is recognized as a posture change image based on the posture information of the user image corresponding to the target keypoint image and the posture information of the user image of the frame preceding the user image corresponding to the target keypoint image.

In a possible implementation manner, the state recognition device determines whether the user image is recognized as a posture change image based on the posture information of the user image and the posture information of the user image in the frame previous to the user image, and specifically may be: and if the user image is different from the posture information of the user image in the last frame of the user image, recognizing the user image as a posture change image, otherwise, not recognizing.

In another possible implementation manner, the posture information may include a head rotation vector and a body rotation vector; the state recognition device may determine whether the user image is recognized as a posture change image based on the posture information of the user image and the posture information of the user image in the previous frame of the user image, and specifically may be: and if the difference value of the head rotation vectors of the user image and the user image in the last frame of the user image is greater than or equal to a preset threshold value, or the difference value of the human body rotation vectors of the user image and the user image in the last frame of the user image is greater than or equal to a preset threshold value, identifying the user image as a changed frame image.

In S611, the state recognition apparatus determines the attention state corresponding to the target video segment based on all the pose change images in the target video segment.

In this embodiment, the state recognition device determines the attention state corresponding to the target video segment based on all the posture change images in the target video segment, and specifically may be: the state recognition device judges whether the occupation ratio of all posture change images in the target video segment is larger than or equal to a preset ratio or not; if the occupation ratio values of all the posture change images in the target video segment are larger than or equal to a preset ratio, the state recognition device recognizes the attention state corresponding to the target video segment as an inattentive state, and otherwise, recognizes the attention state as an attentive state.

In this embodiment, by comparing the pose information of two adjacent frames of the user image in the target video segment, the pose change image in the target video segment is identified, and the attention state of the target video segment is determined based on the proportion value of all the pose change images in the target video segment in the video segment, so as to subsequently determine the user state corresponding to the target video segment.

In this embodiment, only the user state when the user is in the learning state in the target video segment is determined, and the user state when the user is in the resting state in the target video segment is not determined, so that unnecessary calculation can be reduced.

Fig. 7 shows a flowchart of an implementation of the method provided in the fifth embodiment of the present application. Referring to fig. 7, in comparison with the embodiment shown in fig. 4, the method S102 provided in this embodiment includes S701, which is detailed as follows:

in this embodiment, the status information includes: a learning period, a rest period, and a fatigue period.

Further, the server generates the status information corresponding to the monitoring period based on the received status monitoring result, including:

in S701, the server divides the monitoring period into a learning period, a rest period, and a fatigue period based on the learning state corresponding to each of the video segments in the monitoring period.

In this embodiment, the server divides the monitoring period into a learning period, a rest period and a fatigue period based on the learning state corresponding to each video segment in the monitoring period, and specifically may be: identifying all detection cycles corresponding to all video segments with learning states as rest in the monitoring time period as the rest time period; identifying all detection cycles corresponding to all video segments with learning states in the monitoring time period as the learning time period; presetting standard learning time length; and if a continuous and uninterrupted target time period with the duration exceeding the standard learning duration exists in the learning time period, identifying a part of the time period with the duration exceeding the standard learning duration as the fatigue time period. Illustratively, the monitoring period is 10:00 to 11:00, the detection period is one minute, learning states corresponding to video segments in the monitoring period, specifically, the learning states corresponding to each minute in 10:00 to 10:40 are both learning states, and the learning states corresponding to each minute in 10:41 to 11:00 are both rest, the server identifies 10:00 to 10:40 as the learning period, and identifies 10:41 to 11:00 as the rest period; presetting the standard learning time length to be 30 minutes; the server identifies 10:31 to 10:40 as the fatigue period.

It should be understood that the state information corresponding to the monitoring period includes not only the distribution of the learning period, the rest period and the fatigue period in the monitoring period and the total duration of each period, but also the number of times that the user changes from the standard sitting posture state to the non-standard sitting posture state and the number of times that the user changes from the attention focusing state to the attention non-focusing state in the monitoring period.

In this embodiment, the step of generating the status information corresponding to the monitoring period is arranged to be executed on the server, and data processing resources (such as a memory and a processor) of the server may be fully utilized to execute the method provided by this embodiment, so as to improve the efficiency of generating the status information.

Fig. 8 shows a flowchart of an implementation of the method provided in the sixth embodiment of the present application. Referring to fig. 8, with respect to any of the above embodiments, the method S103 provided in this embodiment includes S801, which is specifically detailed as follows:

in this embodiment, one user video corresponds to one monitoring period, there may be a plurality of monitoring periods, and the status report generation request includes at least one monitoring period.

Further, if the server receives a status report generation request sent by the user terminal, the server feeds back the status information to the user terminal based on the status report generation request, including:

in S801, if the server receives a status report generation request sent by the user terminal, the server feeds back, to the user terminal, status information corresponding to the at least one monitoring period.

In this embodiment, before the server receives a status report generation request sent by the user terminal, the user terminal generates the status report generation request based on a user operation (for example, an operation of a user triggering to view the status report corresponding to at least one monitoring period), and sends the status report generation request to the server. And when the server receives a status report generation request sent by the user terminal, the server feeds back the status information corresponding to all monitoring periods included in the status report generation request to the user terminal so as to meet the user requirements.

It should be understood that the monitoring period should be a continuous period of time.

In this embodiment, the monitoring time period may be multiple, and the method provided in this embodiment may implement monitoring on the users in the multiple monitoring time periods, and the generated status report meets the requirement that the user views the status information corresponding to the multiple monitoring time periods.

Fig. 9 shows a flowchart of an implementation of the method provided in the seventh embodiment of the present application. Referring to fig. 9, with respect to the embodiment shown in fig. 8, the method S103 provided in this embodiment includes S901 to S902, which are detailed as follows:

further, the user terminal generates and displays a status report, including:

in S901, the user terminal generates a status page corresponding to each monitoring period based on the fed back status information corresponding to the at least one monitoring period.

In this embodiment, a monitoring period is taken as an example for explanation, and the user terminal generates a status page corresponding to the monitoring period based on status information corresponding to the monitoring period; the status page includes a preset interface frame, and the user terminal generates a status page corresponding to a monitoring period based on status information corresponding to the monitoring period, which may specifically be: and the user terminal fills data in the state information corresponding to the monitoring time period into the interface frame to generate the state page corresponding to the monitoring time period.

In S902, the user terminal generates the status report based on the status pages corresponding to all monitoring periods.

In this embodiment, the status report includes at least one of the status pages. The status page includes a plurality of data describing the status of the user during the monitoring period, including a distribution of learning time periods and rest time periods, a distribution of sitting standard time periods and sitting non-standard time periods, a distribution of attentional time periods and inattentional time periods, a number of times the user transitions from the standard sitting state to the non-standard sitting state, and a number of times the user transitions from the attentional state to the inattentional state.

It should be appreciated that, illustratively, the status report also includes status data generated from all status pages, for example, the status data corresponding to a certain time period is generated according to all status pages corresponding to a plurality of monitoring time periods contained in the certain time period, for example, the status data is a total learning duration in a certain day, the day includes 3 monitoring periods of 10:00 to 11:00, 14:00 to 15:00 and 20:00 to 21:00, the learning period is 10:00 to 10:40 within the monitoring period of 10:00 to 11:00, the learning period is 14:00 to 14:40 within the monitoring period of 14:00 to 15:00, the learning period is 20:00 to 20:40 in the monitoring period of 20:00 to 21:00, and the total learning period is 123 minutes (minutes including a limit itself).

To further illustrate the status report, fig. 10 shows a schematic diagram of the effect of the status report provided by a seventh embodiment of the present application, referring to fig. 10, fig. 10 shows the status page of the monitoring periods from 10:00 to 11:00 in the status report, in the display page of the status report, the user can select one of the monitoring periods to be displayed, as "1/3" in fig. 10 indicates that 3 monitoring periods are included in the current day of 5, 7 and 2020, and the status page corresponding to the first monitoring period of the 3 monitoring periods, specifically, the status page corresponding to the monitoring of 10:00 to 11:00, is currently being displayed; the illustrated total duration refers to the sum of the durations of the 3 monitoring periods, and the illustrated intervals refer to the earliest starting time to the latest ending time in the 3 monitoring periods, for example, the 3 monitoring periods are 10:00 to 11:00, 14:00 to 15:00, and 20:00 to 21:00 respectively; the illustrated "learning 2 hours and 30 minutes" refers to the sum of the learning durations in the 3 monitoring periods, and the illustrated "rest 0 hours and 30 minutes" is worth the sum of the rest durations in the 3 monitoring periods.

Referring to fig. 10, as shown in the bar graph of fig. 10, during the monitoring period 10:00 to 11:00, the highest bar refers to the period of attention deficit, the second highest bar refers to the period of sitting deficit, the third highest bar refers to the rest period, the fourth highest bar refers to the fatigue period, and the lowest bar refers to the learning period. The learning standard time period is preset to 40 minutes in fig. 10. As shown in fig. 10, the status page further includes a sitting posture reminding number (i.e., the number of times the user changes from the standard sitting posture state to the non-standard sitting posture state), an out-of-sitting time (the sum of the time periods of the out-of-sitting posture), an attention reminding number (i.e., the number of times the user changes from the attention focusing state to the attention focusing state), an out-of-focusing time (the sum of the time periods of the out-of-focusing), a fatigue number (the number of occurrences of the fatigue time periods), and a fatigue time (the sum of the time periods of the fatigue time periods).

In this embodiment, the user terminal generates the status page corresponding to each monitoring period based on a user operation, so as to generate the status report. When the subsequent user terminal outputs the status report based on user operation, the user terminal can output the status page corresponding to the monitoring period that the user wants to watch, and the monitoring period that the user wants to watch is determined based on the user operation.

Fig. 11 shows a schematic structural diagram of a condition monitoring system provided in an embodiment of the present application, corresponding to the method described in the foregoing embodiment, and only shows a part related to the embodiment of the present application for convenience of description.

Referring to fig. 11, the condition monitoring system includes: the system comprises a state identification device, a server and a user terminal; the state identification device is used for determining a corresponding state monitoring result of a user in a monitoring time period based on a user video acquired in the monitoring time period and sending the state monitoring result to the server; the state monitoring result comprises: learning state information, the sitting posture information, and the attention information; the server is used for generating state information corresponding to the monitoring time period based on the received state monitoring result; the user terminal is used for sending a status report generation request to the server; and the user terminal is also used for receiving the state information which is generated by the server and requested to be fed back based on the state report, and generating and displaying the state report based on the fed back state information.

It should be noted that, for the information interaction, the execution process, and other contents between the above-mentioned apparatuses, the specific functions and the technical effects of the embodiments of the method of the present application are based on the same concept, and specific reference may be made to the section of the embodiments of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device may be the foregoing state identification device or the server, or may be a combination of the foregoing state identification device and the server. As shown in fig. 12, the electronic device 12 includes: at least one processor 120 (only one shown in fig. 12), a memory 121, and a computer program 122 stored in the memory 121 and executable on the at least one processor 120, wherein the processor 120 executes the computer program 122 to implement the steps in any of the various method embodiments corresponding to the state recognition apparatus and/or the server.

The electronic device 12 may be a desktop computer, a notebook, a palm top computer, a cloud server, or other computing devices. The electronic device may include, but is not limited to, a processor 120, a memory 121. Those skilled in the art will appreciate that fig. 12 is merely an example of electronic device 12 and does not constitute a limitation of electronic device 12 and may include more or fewer components than shown, or some components may be combined, or different components, such as input output devices, network access devices, etc.

The Processor 120 may be a Central Processing Unit (CPU), and the Processor 120 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 121 may be an internal storage unit of the electronic device 12 in some embodiments, such as a hard disk or a memory of the electronic device 12. The memory 121 may also be an external storage device of the electronic device 12 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 12. Further, the memory 121 may also include both an internal storage unit and an external storage device of the electronic device 12. The memory 121 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer programs. The memory 121 may also be used to temporarily store data that has been output or is to be output.

The present application further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program can implement the steps in the method embodiments corresponding to the above state recognition apparatus and/or the server.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/electronic device, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/electronic device and method may be implemented in other ways. For example, the above-described apparatus/electronic device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method for generating a status report is applied to a status monitoring system, and the status monitoring system comprises: the system comprises a state identification device, a server and a user terminal;

the method comprises the following steps:

the state identification device determines a corresponding state monitoring result of a user in a monitoring time period based on a user video acquired in the monitoring time period, and sends the state monitoring result to the server; the state monitoring result comprises: learning state information, sitting posture information, and attention information;

the server generates state information corresponding to the monitoring time period based on the received state monitoring result;

and if the server receives a status report generation request sent by the user terminal, the server feeds back the status information to the user terminal based on the status report generation request and instructs the user terminal to generate and display a status report.

2. The method of claim 1, wherein the status recognition device determines the corresponding status monitoring result of the user in the monitoring period based on the user video collected in the monitoring period, and comprises:

the state recognition device divides the user video into a plurality of video segments based on a detection period;

the state recognition device respectively guides a plurality of frames of user images in a target video segment into a key point extraction network to obtain key point images corresponding to the user images of each frame, wherein the target video segment is any one of the video segments;

the state identification device determines the user state corresponding to the target video segment based on the key point images corresponding to all the user images in the target video segment; the user states include a learning state, a sitting state, and an attentive state;

and the state recognition device packages the user states of all video segments in the monitoring time period to obtain the state monitoring result.

3. The method of claim 2, wherein the keypoint extraction network comprises a human body recognition layer and a keypoint recognition layer; the state recognition device respectively guides a plurality of frames of user images in a target video segment into a key point extraction network to obtain key point images corresponding to the user images of each frame, and the state recognition device comprises:

the state recognition device leads a target user image into the human body recognition layer, and intercepts a human body image from the target user image, wherein the target user image is any user image in the multi-frame user images;

and the state recognition device leads the human body image into the key point recognition layer, extracts a plurality of key points on the human body image, and outputs the key point image corresponding to the target user image.

4. The method according to claim 2, wherein the determining the user status corresponding to the target video segment by the status recognition device based on the key point images corresponding to all the user images in the target video segment comprises:

the state identification device judges whether a key point preset by the key point extraction network is contained in a target key point image, wherein the target key point image is any key point image in key point images corresponding to all the user images;

if the key point is included in the target key point image, the state recognition device recognizes a user image corresponding to the target key point image as a learning image; if the key point is not contained in the target key point image, identifying the user image corresponding to the target key point image as a rest image;

the state recognition device determines a learning state corresponding to the target video segment based on all learning images and all rest images in the target video segment; the learning state comprises a first state and a second state, wherein the first state is used for representing the state that the user is at rest; the second state is used to characterize the state for being in learning.

5. The method according to claim 4, wherein the state recognition device determines the user state corresponding to the target video segment based on the key point images corresponding to all the user images in the target video segment, further comprising:

if the learning state corresponding to the target video segment is the first state, the state recognition device sets the value of the sitting posture state corresponding to the target video segment to be a preset value;

if the learning state corresponding to the target video segment is the second state, the state recognition device leads the target key point image into a sitting posture recognition network to obtain a sub-sitting posture state of the user image corresponding to the target key point image;

the state recognition device determines the sitting posture state of the target video segment based on the sub-sitting posture states of all the user images in the target video segment;

if the sitting posture state is a non-standard sitting posture, the state recognition device uploads the target video segment to the server, so that the server determines a sitting posture type corresponding to the target video segment based on the target video segment;

and the state recognition device receives the sitting posture type sent by the server and updates the sitting posture state corresponding to the target video segment according to the sitting posture type.

6. The method according to claim 4, wherein the state recognition device determines the user state corresponding to the target video segment based on the key point images corresponding to all the user images in the target video segment, further comprising:

if the learning state corresponding to the target video segment is the first state, the state recognition device sets the value of the attention state corresponding to the target video segment to be a preset value;

if the learning state corresponding to the target video segment is the second state, the state recognition device imports the target key point image into a gesture recognition network to obtain gesture information of a user image corresponding to the target key point image;

the state recognition device determines whether the user image corresponding to the target key point image is recognized as a posture change image or not based on the posture information of the user image corresponding to the target key point image and the posture information of the user image of the previous frame of the user image corresponding to the target key point image;

the state recognition device determines the attention state corresponding to the target video segment based on all the posture change images in the target video segment.

7. The method of claim 2, wherein the state information comprises: a learning time period, a rest time period, and a fatigue time period; the server generates the state information corresponding to the monitoring time period based on the received state monitoring result, and the state information comprises:

the server divides the monitoring period into a learning period, a rest period and a fatigue period based on the learning state corresponding to each video segment in the monitoring period.

8. A condition monitoring system, comprising: the system comprises a state identification device, a server and a user terminal;

the state identification device is used for determining a corresponding state monitoring result of a user in a monitoring time period based on a user video acquired in the monitoring time period and sending the state monitoring result to the server; the state monitoring result comprises: learning state information, sitting posture information, and attention information;

the server is used for generating state information corresponding to the monitoring time period based on the received state monitoring result;

the server is further configured to, if a status report generation request sent by the user terminal is received, feed back the status information to the user terminal based on the status report generation request, and instruct the user terminal to generate and display a status report.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements a method performed by the state recognition apparatus according to any one of claims 1 to 7 when executing the computer program and/or a method performed by the server.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 7, which is performed by the state recognition apparatus, and/or which is performed by the server.