CN115933930A

CN115933930A - Method, terminal and device for analyzing attention of learning object in education meta universe

Info

Publication number: CN115933930A
Application number: CN202211436017.2A
Authority: CN
Inventors: 刘德建; 金伟华; 钟正; 徐建
Original assignee: Central China Normal University; Fujian TQ Digital Co Ltd
Current assignee: Central China Normal University; Fujian TQ Digital Co Ltd
Priority date: 2022-11-16
Filing date: 2022-11-16
Publication date: 2023-04-07

Abstract

The invention discloses an attention analysis method, a terminal and a device for learning objects in an educational meta-universe.A visual object set corresponding to the learning objects is generated according to a virtual scene of the educational meta-universe space, and the geometric-attribute information of each visual object in the visual object set is determined; acquiring a learning video of the learning object, and determining the sight direction and the micro expression of the learning object according to the learning video; determining a target visual object watched by the learning object according to the geometric-attribute information and the sight direction of each visual object, and associating the target visual object with the micro-expression; analyzing the attention of the learning object according to the target visual object and the micro expression related to the target visual object; the attention of the learning objects in the education meta universe can be comprehensively and accurately analyzed.

Description

Method, terminal and device for analyzing attention of learning object in education meta universe

Technical Field

The invention relates to the field of teaching application of a meta universe, in particular to a method, a terminal and a device for analyzing attention of a learning object in an educational meta universe.

Background

With the continuous promotion of digital transformation of education, the new technologies such as artificial intelligence, big data, virtual reality, learning analysis and the like are gradually deepened in teaching application, and the attention of learners is more and more needed in the process of analyzing online teaching for improving the application level of teaching resources. In recent years, the meta universe has attracted much attention, and as a vertical application of the meta universe in the field of education, the teaching meta universe has emerged in a series of applications in the fields of situational teaching, personalized learning, game learning and teacher research and repair. The learning effect of the learner in the application scene is automatically analyzed, so that the attention of the learner can be more intelligently sensed, and the emotion of the learner can be more accurately identified.

However, in the existing study effect analysis in many education fields, dimensions such as study effect, use will, participation degree, technical acceptance and the like of learners are evaluated by adopting questionnaire survey and manual interview, and compared with a data-driven analysis method, the accuracy is not enough, the evidence is thin, and the analysis method is difficult to serve as a convincing analysis voucher. Therefore, the computer vision technology is introduced into the analysis of the attention of the learner in the education meta-universe, the intelligent perception and the accurate analysis of the attention of the learner can be realized, the teaching design, the teaching interaction and the teaching mode of the education meta-universe are further improved, and the important support is provided for the application of the education meta-universe in various teaching scenes.

There are also a number of problems with current systems for analyzing the attention of learners in the universe of educational meta-space: (1) The hardware equipment is high in price, the attention of the learner is analyzed by using the professional wearable equipment and the eye tracker, although the analysis result is accurate, the hardware equipment is high in cost and high in deployment and learning cost, and the method is difficult to be widely applied to a teaching scene of the learner in different places; (2) The method is not tightly combined with a teaching scene, the existing attention analysis system usually concentrates on the eye movement frequency and the fixation point position of a learner, and is difficult to provide accurate tutoring for optimizing and improving the teaching scene of the educational meta universe; (3) The analysis index is single, most of the existing attention analysis systems use a single eye movement index to estimate the attention of the learner, so that the attention analysis evidence is single, and a more comprehensive attention analysis result cannot be generated.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: provided are a method, a terminal and a device for analyzing attention of a learning object in an education meta-universe, wherein the attention of the learning object in the education meta-universe can be comprehensively and accurately analyzed.

In order to solve the technical problems, the invention adopts a technical scheme that:

a method for analyzing attention of learning objects in an educational meta universe comprises the following steps:

s1, generating a visual object set corresponding to a learning object according to a virtual scene of an educational meta-space, and determining geometric-attribute information of each visual object in the visual object set;

s2, acquiring a learning video of the learning object, and determining the sight direction and the micro expression of the learning object according to the learning video;

s3, determining a target visual object watched by the learning object according to the geometric-attribute information of each visual object and the sight direction, and associating the target visual object with the micro expression;

and S4, analyzing the attention of the learning object according to the target visual object and the micro expression related to the target visual object.

In order to solve the technical problem, the invention adopts another technical scheme as follows:

an attention analysis terminal for learning objects in an educational meta-universe comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor executes the computer program to realize each step in the attention analysis method for the learning objects in the educational meta-universe.

an attention analysis apparatus for educating a learning object in a meta universe, comprising:

the virtual teaching resource organization module is used for generating a visual object set corresponding to a learning object according to a virtual scene of the educational meta-space, and determining the geometric-attribute information of each visual object in the visual object set;

the sight direction and micro expression determining module is used for acquiring a learning video of the learning object and determining the sight direction and the micro expression of the learning object according to the learning video;

the gazing model association module is used for determining a target visual object gazed by the learning object according to the geometric-attribute information of each visual object and the sight direction and associating the target visual object with the micro-expression;

and the attention analysis module is used for analyzing the attention of the learning object according to the target visual object and the micro expression associated with the target visual object.

The invention has the beneficial effects that: generating a visual object set corresponding to the learning object according to the virtual scene of the educational meta-space; acquiring a learning video of a learning object, and determining the sight direction and the micro expression of the learning object according to the learning video; determining a target visual object watched by a learning object according to the sight direction, and associating the target visual object with the micro expression; analyzing the attention of the learning object according to the target visual object and the micro expression related to the target visual object; the method has the advantages that expensive hardware equipment is not needed, the target visual object watched by the learning object is determined according to the sight direction of the learning object, the watching point position of the learning object is concerned, the virtual object of the teaching scene corresponding to the watching point position is also concerned, the learning object and the teaching scene are fused tightly, the association among virtual space, plane and space in an education universe, the attention of the learning object and a teaching model is improved, meanwhile, the target visual object watched in the sight direction is associated with the corresponding micro-expression, the attention of the learning object is analyzed according to the target visual object and the associated micro-expression, and the obtained attention analysis result is more comprehensive and accurate.

Drawings

FIG. 1 is a flowchart illustrating steps of a method for analyzing attention of learning objects in an educational meta-universe, according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an attention analysis terminal for learning objects in an educational meta-universe according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an attention analyzing apparatus for learning objects in the universe of educational units according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of mapping triangular patches of a model surface in a virtual scene to an octree according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of displaying an image of a metastic scene on a terminal screen according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a learning video of a learning object acquired by a camera of a display terminal according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of three attitude deflection angle parameters in head pose positioning according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating a region of interest division of a terminal screen according to an embodiment of the present invention;

FIG. 9 is a diagram of a terminal screen overlaying a learning object attention thermodynamic diagram according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a further subdivision of the attention analyzing apparatus for learning objects in the educational meta-space according to the embodiment of the present invention.

Detailed Description

In order to explain technical contents, achieved objects, and effects of the present invention in detail, the following description is made with reference to the accompanying drawings in combination with the embodiments.

The method, the terminal and the device for analyzing the attention of the learning object in the educational meta universe can be applied to the attention analysis of the learning object in the educational meta universe scene, and are explained by specific embodiments as follows:

in an alternative embodiment, referring to fig. 1, a method for analyzing attention of a learning object in an educational meta universe includes the steps of:

specifically, S11, subdividing a virtual scene of an educational meta-space, and generating a model object set;

wherein, the virtual scene of the element space can be subdivided by using the octree, and the model object set can be organized and generated;

s12, determining a visual range of a visual frustum corresponding to the learning object according to the viewpoint position, the direction and the target point position of the learning object, determining an intersection of the visual range and the model object set, and generating a visual object set corresponding to the learning object according to the intersection;

the method comprises the steps that images of a meta-universe scene can be rendered in a graph rendering pipeline through traversing a model object set contained in or intersected with a view frustum, and therefore a visual object set corresponding to a learning object is obtained;

s13, according to the corresponding relation between the fixation position of the sight line of the learning object on the terminal screen and the coordinate of the sight line of the learning object in the metaspace space, determining a visual object to which the fixation position belongs, establishing the corresponding relation between the fixation position and the visual object to which the fixation position belongs, and determining the geometric-attribute information of each visual object in the visual object set according to the corresponding relation, namely according to the fixation position of the fixation point of the learning object, realizing the geometric-attribute query of the visual object in the metaspace;

step S1 mainly implements a virtual teaching resource organization, and in an optional embodiment, includes the following steps:

(1) Generation of a set of model objects: using a triangular patch to represent a boundary surface of a geometric model of a teaching object (such as a teaching aid, a teaching component, an experimental instrument and the like) in a scene, using an octree to divide a virtual scene of a metacosmic space, mapping the triangular patch on the model surface in the scene onto the octree according to the position of a barycentric coordinate as shown in fig. 4, and organizing and generating a model object set of each scene;

(2) Mapping of space to screen: according to the viewpoint position, the direction and the target point position of the learning object, the visible range of the view frustum is determined, a model object set containing or intersecting the view frustum is traversed, images of the metacosmic scene are rendered in a graphic rendering pipeline through model, view, projection and viewport transformation, and the visible object set corresponding to the learning object is generated and displayed on a terminal screen as shown in fig. 5.

(3) Object picking: acquiring the position of the sight of a learning object on a terminal screen, calculating the space coordinate of the point in the metasma according to the setting of the display depth and template parameters, determining a triangular patch, a geometric model and a virtual object (the virtual object can be the visual object) to which the point belongs by adopting an octree traversal mode, and realizing the query of the geometric-attribute of the virtual object under the support of an object-relationship database, namely, converting the gaze position of the learning object on the terminal screen into the space coordinate corresponding to the metasma, determining the triangular patch to which the space coordinate belongs by traversing the octree, wherein the object-relationship database stores the corresponding relationship between the geometric model and the attribute data of the virtual object, therefore, after determining the triangular patch to which the space coordinate belongs, the virtual object to which the space coordinate belongs can be determined by querying the geometric model of the object-relationship database, that is, the virtual object to which the gaze position of the learning object on the terminal screen belongs is determined, so that the corresponding relationship between the gaze position and the visual object to which the gaze position belongs is established, and the corresponding relationship can be used for querying attribute information (such as object name, category, use and the like) of each visual object in the visual object set.

The algorithm for determining the space coordinates of the metauniverse corresponding to the fixation position comprises the following steps:

(3-1): converting coordinates of the sight of the learning object on the terminal screen into projection window coordinates proj _x 、proj _y And proj _z Equation 1 is a conversion calculation equation:

wherein sw and sh are respectively the width and the height of a terminal screen, and (x, y) are coordinates of the learning object sight falling on the terminal screen;

(3-2): converting the projection window coordinate into a space coordinate by using a projection transformation matrix shown in formula 2, wherein formula 3 is the variable value in the projection transformation matrix calculated by the frustum parameter, and formula 4 is the space point coordinate x obtained by conversion calculation ^’ 、y ^’ And z ^’ ：

Wherein w and h are the width and height of the near-cutting surface, and Z _n And Z _f The distances of the near cutting surface and the far cutting surface in the view frustum are respectively, Q is the ratio of the distance from the viewpoint to the far cutting surface to the distance from the near cutting surface to the far cutting surface in the view frustum, and fov is the angle of view.

the step is mainly to determine the sight direction of a learning object and identify the micro expression of the learning object by means of a learning video;

for determining the direction of sight of the learning object: acquiring a learning video of a learning object through interactive data acquisition, specifically, acquiring the learning video of the learning object by using a camera, and synchronously displaying the learning video in a video frame at the upper left corner of a terminal; extracting a head posture deflection angle by using a backbone network and three branch networks, and calculating an Euler rotation angle; acquiring four purkinje spot coordinates by adopting a labeling connectivity analysis algorithm, constructing a polynomial equation, and estimating the sight direction of a learner;

in an optional embodiment, the determining the direction of the line of sight of the learning object according to the learning video comprises:

locating a head region of the learning object according to the learning video;

extracting a head posture characteristic based on the head region, and calculating an Euler rotation angle of the head according to the head posture characteristic;

determining eye feature point distribution from the head region according to the Euler rotation angle of the head;

and acquiring a pupil center image and corresponding coordinates in the learning video by using a pupil center extraction algorithm according to the eye feature point distribution, and estimating the sight direction of the learning object according to the pupil center image and the corresponding coordinates.

Specifically, the method comprises the following steps:

(1) Learner video data acquisition: as shown in fig. 6, a built-in camera or an external camera of a display terminal is used to collect a high-definition video of a learning object during learning, the video image is synchronously displayed in the upper left corner area of the screen of the display terminal, if a face of the learning object is not detected, the display terminal is prompted to turn on the camera or prompt that a face image needs to be displayed in an upper left corner video frame, in fig. 6, 601 represents the watching direction from the eyes of the learning object to the screen of the terminal, and 602 represents the main axis of the video collected by the camera;

(2) Head posture positioning: detecting a video image of a learning object in real time by using a target detection and tracking algorithm, positioning a head area of the learning object in the image, extracting a head posture characteristic by using a backbone network, and respectively acquiring X, Y and a Z axis from the head posture characteristic by adopting three branch networks, wherein three posture deflection angle parameters shown in figure 7 are used for calculating an Euler rotation angle of a head, in the figure, 701 represents a head posture action sequence rotating around an X axis, 702 represents a head posture action sequence rotating around a Y axis, and 703 represents a head posture action sequence rotating around a Z axis;

wherein the head pose positioning algorithm comprises the steps of:

(2-1): acquiring a high-definition video sequence of a learning object acquired by a camera when the learning object participates in learning;

(2-2): sequentially stacking a convolutional neural network, an RFB module and a down-sampling layer to construct a learner head detection network model, inputting a video sequence frame, and outputting an acquired head region of a learning object;

(2-3): adopting a convolutional neural network layer, a maximum pooling layer, a modified linear unit, a Dropout layer and a full connecting layer to construct a backbone network, extracting the head region characteristics of the learning object by using the backbone network, and outputting the head region characteristics as the head posture characteristics of the learning object;

(2-4): three attitude deflection angles along X, Y and the Z axis are estimated from the head attitude feature of the learning object respectively using three branch networks, and three attitude rotation matrices as shown in

formulas

5, 6, and 7 are constructed according to the deflection angles:

/>

wherein, theta _x ，θ _y And theta _z The deflection angles of the three postures are respectively;

(2-5): sequentially rotating according to X, Y and the Z axis, wherein the formula 8 is a calculation formula of an Euler rotation matrix:

the Euler rotation matrix is written as equation 9:

(2-6): the calculation of the euler rotation angle of the head of the learning object is shown in

equations

10, 11, and 12:

θ _x ＝atan2(-r _yz ，r _zz ) (formula 10)

θ _z ＝atan2(-r _xy ，r _xx ) (formula 12)

Wherein atan2 is the calculated azimuth function;

(3) Eye sight tracking: acquiring eye feature points of a learning object by using a face recognition algorithm based on geometric features, acquiring pupil center images and coordinates in a video image by using a pupil center extraction algorithm according to the distribution of the eye feature points, acquiring four Purkinje coordinates by using a labeling connectivity analysis algorithm, constructing a coordinate series by using a queue, constructing a cubic nonlinear equation according to the relative relationship between the current pupil center coordinate and the previous frame coordinate in the queue, and estimating the sight direction of the learning object;

for micro-expression recognition of a learning object, adding label categories for micro-expression picture sets of the learning object mainly according to changes of eyes, corners and eyebrows of a face; extracting the facial features of the input image by using a convolutional neural network, and positioning the micro expression of the learner on the output image after multi-step processing; identifying the category of the micro expression, adding a timestamp and course information, and uploading to a cloud server;

in another optional embodiment, the determining the micro-expression of the learning object according to the learning video comprises:

collecting micro-expression pictures of all learning objects in the online teaching process, adding corresponding micro-expression label categories to each micro-expression picture, and generating a micro-expression image labeling set;

intercepting an image of the learning video at a preset frequency, extracting face feature key points of a learning object in the image by using a convolutional neural network, and positioning a target micro expression of the learning object according to the face feature key points;

matching the target micro expression in the micro expression image annotation set according to the target micro expression, and determining the category corresponding to the target micro expression;

specifically, the micro-expression recognition comprises the following steps:

(1) Labeling the micro-expression image: collecting micro-expression picture sets of a plurality of learning objects in the online teaching process, and adding five micro-expression label categories of indifference, aversion, depression, surprise and happiness to the face image according to the variation range of the face eyes, mouth corners and eyebrows of the learning objects in the image to generate a micro-expression image labeling set;

(2) Micro-expression positioning: intercepting a video sequence image at the frequency of one frame per second, extracting the key points of the facial features of the learning object in the image by using a convolutional neural network, taking the key points as input parameters, carrying out nonlinear transformation and compression on the features sequentially through a full connection layer and a normalized exponential function, and positioning the micro expression of the learning object on the output image;

(3) Micro-expression recognition: matching the learning object micro expression image with the labeling type by using a feature matching algorithm, identifying the micro expression of the learning object, determining the micro expression type of the current learning object, adding a timestamp and course information according to the coding type of the indifferent, aversive, pressing, surprised or happy micro expression to which the learning object belongs, and uploading the information to a cloud server;

the specific steps of the feature matching algorithm are as follows:

(3-1): extracting extreme points of the micro expression of the learning object in the image by using an extreme point detection algorithm shown in formula 13:

l (x, y, σ) = G (x, y, σ) × I (x, y) (formula 13)

Wherein, (x, y) is pixel coordinates, σ is a scale space factor, I (x, y) and G (x, y, σ) are an original image and a gaussian function, respectively, and a convolution operator;

(3-2): constructing a Gaussian difference pyramid shown in formula 14, and acquiring micro-expression key points of a stable learning object:

d (x, y, σ) = L (x, y, k σ) -L (x, y, σ) (formula 14)

Wherein k is the ratio of adjacent scale factors;

(3-3): using a taylor series expansion gaussian difference pyramid function in the scale space as shown in formula 15, interpolating and searching micro expression key points, and removing key points with low contrast:

wherein, X = (X, y, sigma) ^T ；

(3-4): the gradient direction characteristics of the neighborhood pixels of the key points are used for assigning direction parameters to each key point, and the gradient values and the corresponding directions of the key points are calculated by formulas 16 and 17:

(3-5): selecting a square pixel area of a peripheral 16 × 16 grid by taking the key point as a center, dividing the square pixel area into 4 sub-areas according to a 4 × 4 rule, calculating gradient accumulated values of 8 directions of each sub-area, and converting each micro-expression key point of a learning object into a 4 × 4 × 8= 128-dimensional feature descriptor;

(3-6): obtaining the distance between the micro expression image of the learning object and the image of the labeling category by using an Euclidean distance calculation formula shown in formula 18:

/>

wherein M and N are descriptor vectors of the image;

selecting the labeling category corresponding to the minimum distance d (x, y) as the micro-expression category of the learning object;

in an alternative embodiment, determining the target visual object at which the learning object gazes according to the geometric-attribute information of each visual object and the gaze direction includes:

determining a target watching area of the learning object according to the sight line direction, and determining a target visual object watched by the learning object according to the target watching area and the geometric-attribute information of each visual object;

in another optional embodiment, the determining the target visual object at which the learning object gazes according to the geometric-attribute information of each visual object and the gaze direction includes:

dividing a terminal screen into a plurality of interest areas, and determining corresponding target interest areas according to the sight line direction;

determining the three-dimensional coordinates of each corner point of the target interest area in a meta-space, and generating a minimum circumscribed cuboid;

extracting the associated visual objects which are intersected with the minimum external cuboid or contained in the minimum external cuboid according to the geometric-attribute information of each visual object to generate an associated visual object set;

rejecting the occluded associated visual objects in the associated visual object set to generate target visual objects;

in an optional embodiment, after determining the corresponding target interest region according to the gaze direction, the method further includes:

counting the frequency of the sight line of the learning object falling in each interest area and the frequency of the corresponding micro expression in a preset time period;

counting the frequency of the sight line of the learning object falling in each interest area within a preset time period, wherein the counting comprises:

respectively calculating the maximum angle of the learning object with head nodding, head swinging and head shaking postures deviating from the terminal screen according to the length and width of the terminal screen and the eye center coordinates of the learning object;

judging whether the angle of the single head deflection of the learning object falls within the maximum angle range and the duration is longer than a preset number of seconds, if so, counting the frequency of the interest area where the sight of the learning object falls, otherwise, not counting;

in an actual application scenario, a display terminal screen is divided into a plurality of rectangular interest areas, such as: they may be numbered in left-to-right, top-to-bottom order; calculating the coordinates of the sight line and the fixation point of the learning object according to the sight line direction and the fixation point of the learning object, and positioning the position of the sight line and the fixation point in the interest area; setting a sampling interval, and counting the frequency of the learner sight falling in each interest area in a time period, thereby realizing the positioning of the attention of the learning object;

specifically, the method comprises the following steps:

(1) Dividing interest areas: acquiring the length and width values of a terminal screen of a learning object, setting a length threshold value, dividing the whole screen into a plurality of interest areas in rectangular shapes, numbering the interest areas from left to right and from top to bottom as shown in fig. 8, and after starting an attention analysis function, superposing transparent layers on a screen display image to represent the attention frequency of learners in each interest area;

(2) And (3) visual line positioning: based on internal and external orientation elements of a camera, a head local coordinate system of a learning object is constructed, the position and the direction of a display terminal in the coordinate system are determined, a polynomial nonlinear regression model is adopted according to the sight line direction and the fixation point of the learning object, the fixation point coordinate of the learner is calculated, and the region of interest where the learner is located is estimated;

using a polynomial nonlinear regression model as shown in equation 19, the coordinates of the learner's point of regard are calculated:

wherein, a ₀ ,a ₁ ,…,a ₁₁ And b ₀ ,b ₁ ,…,b ₁₁ Is the unknown number of the model, (x, y) is the coordinates of the center of the learner's eye, (x) _e ,y _e ,z _e ) Three orientations converted for the head Euler of the learner;

(3) Attention acquisition: setting sampling intervals, counting the frequency of the sight of a learning object participating in course learning in each interest region in the time period, respectively generating corresponding attribute values of each sampling interval by using the starting time, the ending time, the interest region number and the watching frequency as attributes, recording the results in a JSON file according to the time sequence, and uploading the results to a cloud server;

wherein, the step of counting the frequency of the sight line of the learning object falling in each interest area comprises the following steps:

(3-1): according to the length and width of the terminal screen and the eye center coordinates of the learning object,

formulas

20, 21 and 22 are respectively used for calculating the maximum angle of deviation of three head postures of nodding, swinging and shaking of the learner from the terminal screen:

wherein h and d are respectively the length and width of the terminal screen, and x, y and z are the eye center coordinates of the learner;

(3-2): and (3) when the range of the single head deflection angle of the learning object is (-alpha, alpha), (-beta, beta) and (-gamma, gamma), and the duration exceeds a preset number of seconds, such as 2 seconds, recording that 1 time of the sight line of the learning object falls in the interest region, and otherwise, not counting.

After the sight direction and the micro expression of the learning object are determined, the related data can be sent to the cloud for unified data management, specifically:

cloud data management: the cloud receives and analyzes the attention and micro-expression recognition data, and classifies the attention and micro-expression recognition data according to the course and the timestamp information; counting the fixation frequency of each interest area by adopting a data mapping mode, and identifying information for the associated expression; compressing, encoding and transmitting the gazing frequency of the course learning object and the associated micro-expression information to the client from the cloud according to the request of the client, and the method comprises the following steps:

(1) Data classification: when the cloud server monitors that attention or micro-expression identification data are received from the learning object terminal, the JSON file is used for analyzing the attribute and the numerical value of the file in a standard mode, the file is classified according to course and timestamp information, and the time interval, the number of the interested area, the watching frequency or the micro-expression category attribute and the numerical value thereof are inserted into a database table;

(2) And (3) data statistics: extracting attention and micro-expression information uploaded by all learning objects in a certain period of time in a corresponding course from a cloud database by using a database query command, counting the watching frequency of each interest area by adopting a data mapping mode, obtaining the watching frequency of all participants in the course, and identifying information for the associated expressions;

(3) Data distribution: according to an attention acquisition request sent by a learning object client, the cloud server counts the attention focus and associated micro-expression information of each participant in a current time interval in real time, adopts LoRa as a data transmission protocol of an attention analysis system, and sends a statistical result to a request terminal from a cloud server after compressing and encoding.

When the gazing direction of the learning object is determined and a gazing model associated with the gazing direction, namely a visual model, is to be determined, a three-dimensional coordinate of each corner point of an interest area in a metauniverse space can be obtained by using an inverse projection transformation method, and a minimum external cuboid is generated; extracting a related target teaching object set intersected with or included in the target teaching object set by adopting a gravity center judgment method; according to the object picking method in the step S1 (3), the occluded object model is removed to obtain the associated gaze model, specifically:

(1) Region of interest plane to space mapping: calculating horizontal and vertical coordinate values of four corner points of a rectangle according to the serial number of a display terminal where the interest area is located, acquiring a space three-dimensional coordinate in a corresponding element universe of screen coordinates of each corner point by using an inverse projection transformation method, traversing to obtain maximum and minimum XYZ values, and determining a minimum external cuboid;

(2) Determining the interest area associated teaching object: traversing a virtual object surface model organized by octree according to the range of an external cuboid corresponding to the interest region, judging whether a surface triangular patch contains or intersects with the external cuboid by using a gravity center judgment method to obtain an object model containing or intersecting, and generating a related target teaching object set;

(3) And (3) gaze model association: according to the object picking method in the step S1 (3), model objects related to all points in the gazing area of the learner are inquired, marks are added to the result objects in the target teaching object set, the related objects which are shielded and are not added with marks are removed, and the remaining models are the object models gazed by the learner, namely the target visual objects.

S4, analyzing the attention of the learning object according to the target visual object and the micro expression related to the target visual object;

specifically, the method comprises the following steps:

evaluating the concentration effect of the learning object according to the micro expression and the micro expression frequency associated with the target visual object in the preset time period;

determining the frequency of the target visual object watched by the learning object in the preset time period according to the frequency of the sight line of the learning object falling in each interest area in the preset time period and the frequency of the target visual object watched by the learning object in the preset time period;

generating an attention thermodynamic diagram according to the frequency of the target visual objects watched by the learning objects in the preset time period, setting the transparency of a thermodynamic diagram layer in a layering manner according to the high-low order of the frequency, and overlaying the thermodynamic diagram on the corresponding target visual objects in the terminal screen;

in a specific application scene, in the process of attention analysis, weighting coefficients of positive, neutral and negative emotions are given to a learning object, and the concentration effect of the learning object is estimated by adopting an interest calculation formula; representing the emotion of each category of the learning object in the learning process by using a bar graph, and drawing a learning object emotion time sequence diagram; generating an attention thermodynamic diagram according to the statistical result of the attention of the learning object, and adding the transparency of a thermodynamic diagram layer by adopting hierarchical arrangement:

(1) Learning concentration effect evaluation: dividing happiness and surprise into positive emotions and neutral emotions according to an Ackerman emotion classification standard, dividing apathy, aversion and depression into negative emotions, giving weight coefficients to various emotions, performing classification statistics on duration of various emotions in the course participation process of a learner, and estimating the concentration effect of the learner by adopting a learner interest calculation formula;

wherein, the learning concentration effect evaluation step comprises:

(1-1) weighting coefficients of 1.0, 0.25, -0.75, and-0.5 are respectively given to happiness, surprise, apathy, aversion, and depression according to the microexpression of the learning object;

(1-2) calculating the learning interest of the learner by equation 23:

where 1.0 and 0.25 are respectively the favorable and surprising weighting factors, t _Happy And t _{Is surprised} Respectively the total time length of the learner with happiness and surprise, and T is the total time length of the learner participating in the course learning;

(1-3) the calculation of the concentration effect of the learner is shown in equation 24:

wherein, t ₁ 、t ₂ And t ₃ The total duration of positive, neutral and negative emotions, respectively, of the learning object, -0.25, -0.75 and-0.5 are the weight coefficients of apathy, aversion and suppression, respectively, t ₃ 、t ₅ And t ₆ Total duration of indifference, aversion and depression of the learning object, t ₇ Recording the total invalid time length for the sight line positioning of the learning object;

(2) And (3) drawing a learning object emotion time sequence diagram: downloading micro expression type codes of a single learning object or all learning objects participating in the course from a cloud server in a class period, taking the time sequence of the course and the frequency of expression types in each period as the horizontal axis and the vertical axis of a Cartesian coordinate system, and representing emotions of all types of the learning objects in the learning process by using bar graphs;

(3) Note that the thermodynamic diagram is generated: counting the frequency of the learning object fixation point in each interest area in each time period, generating an attention thermodynamic diagram according to the learning object attention frequency, hierarchically setting the transparency of a thermodynamic diagram layer according to the numerical values of the frequency from high to low, and overlaying the diagram layer on a screen image of the metastic display terminal attention object model, as shown in fig. 9.

In another alternative embodiment, referring to fig. 2, an attention analysis terminal for learning objects in an educational meta-space includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the attention analysis method for learning objects in the educational meta-space according to the above embodiments when executing the computer program.

In another alternative embodiment, referring to fig. 3, an attention analysis apparatus for learning objects in an educational meta-space, corresponding to the above method for analyzing attention of learning objects in the educational meta-space, includes:

the virtual teaching resource organization module is used for generating a visual object set corresponding to a learning object according to a virtual scene of an educational meta-space, and determining the geometric-attribute information of each visual object in the visual object set;

the gazing model association module is used for determining a target visual object gazed by the learning object according to the geometric-attribute information of each visual object and the sight direction and associating the target visual object with the micro expression;

the attention analysis module is used for analyzing the attention of the learning object according to the target visual object and the micro expression related to the target visual object;

in another alternative embodiment, as shown in fig. 10, the attention analysis apparatus for learning objects in the universe of educational elements may be further subdivided, and includes a virtual teaching resource organization module, an interactive data collection module, a micro-expression recognition module, an attention location module, a cloud data management module, a gaze model association module, and an attention analysis module;

the virtual teaching resource organization module is used for realizing the step S1 in the attention analysis method of the learning object in the educational element universe;

the interactive data acquisition module is used for realizing related steps of determining the sight direction of a learning object according to the learning video in the attention analysis method of the learning object in the educational meta universe;

the micro expression recognition module is used for realizing relevant steps of micro expression recognition on the learning object in the attention analysis method of the learning object in the education meta universe;

the attention positioning module is used for realizing related steps of positioning the attention of the learning object in the attention analysis method of the learning object in the education meta universe;

the cloud data management module is used for realizing relevant steps of cloud data management in the attention analysis method of the learning objects in the education meta universe;

the gaze model association module is used for realizing the relevant steps of obtaining the associated gaze model in the attention analysis method of the learning object in the education meta universe;

the attention analysis module is used for realizing the step S4 in the attention analysis method of the learning object in the education meta universe.

In summary, according to the attention analysis method, the terminal and the device for the learning object in the education meta-space, provided by the invention, the octree is used for subdividing the virtual scene of the meta-space, and the visual object set corresponding to the learning object is generated according to the virtual scene of the education meta-space; collecting a learning video of a learning object, extracting a head posture deflection angle by using a neural network, obtaining a Purkinje coordinate by using an analysis algorithm, estimating a sight line direction of a learner, collecting a picture set of an online learning object, labeling a micro-expression category according to facial feature change, extracting a face feature by using the neural network, positioning a micro-expression of the face feature, and identifying a micro-expression category; determining a target visual object watched by a learning object according to the sight direction, acquiring three-dimensional coordinates of each corner point of an interest region in a metauniverse space by using an inverse projection transformation method, generating a minimum external cuboid, extracting a related target teaching object set intersected with or contained in the minimal external cuboid by using a gravity center judgment method, removing a shielded related object model to obtain the target visual object, and associating the target visual object with the micro expression; analyzing the attention of the learning object according to the target visual object and the micro expression related to the target visual object, representing various types of emotions in the learning process by using a column diagram, drawing an emotion sequence diagram of the learning object, generating an attention thermodynamic diagram according to an attention statistical result, and adding the transparency of a thermodynamic diagram layer by adopting a layered arrangement; the method has the advantages that expensive hardware equipment is not needed, the target visual object watched by the learning object is determined according to the sight direction of the learning object, the watching point position of the learning object is concerned, the virtual object of the teaching scene corresponding to the watching point position is also concerned, the close fusion of the learning object and the teaching scene is realized, the association between the virtual space, the plane and the space in the education metaspace and the attention of the learning object and the teaching model is improved, meanwhile, the target visual object watched in the sight direction is associated with the corresponding micro-expression, the attention of the learning object is analyzed according to the target visual object and the associated micro-expression, the obtained attention analysis result is more comprehensive and accurate, along with the rapid application and fusion of the metaspace in the education field, the learner attention analysis technology based on computer vision and data driving has the non-invasive advantage, and has a wide application prospect in the education field.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.

Claims

1. A method for analyzing attention of learning objects in an educational meta universe is characterized by comprising the following steps:

and S4, analyzing the attention of the learning object according to the target visual object and the micro expression associated with the target visual object.

2. The method according to claim 1, wherein the step S1 comprises:

subdividing a virtual scene of an educational meta-space to generate a model object set;

determining a visual range of a visual frustum corresponding to the learning object according to the viewpoint position, the direction and the target point position of the learning object, determining an intersection of the visual range and the model object set, and generating a visual object set corresponding to the learning object according to the intersection;

according to the corresponding relation between the fixation position of the sight line of the learning object on the terminal screen and the coordinate of the sight line of the learning object in the meta-space, determining a visual object to which the fixation position belongs, establishing the corresponding relation between the fixation position and the visual object to which the fixation position belongs, and determining the geometric-attribute information of each visual object in the visual object set according to the corresponding relation;

the determining a target visual object watched by the learning object according to the geometric-attribute information of each visual object and the sight line direction comprises:

and determining a target watching area of the learning object according to the sight direction, and determining a target visual object watched by the learning object according to the target watching area and the geometric-attribute information of each visual object.

3. The method according to claim 2, wherein the determining the visual object to which the gaze position belongs according to the correspondence between the gaze position of the line of sight of the learning object on the terminal screen and the coordinates thereof in the metaspace space comprises:

converting the fixation position coordinate of the sight line of the learning object on the terminal screen into a projection window coordinate proj _x 、proj _y And proj _z ：

In the formula, sw and sh respectively represent the width and the height of a terminal screen, and (x, y) represent the fixation position coordinates of the sight of a learning object on the terminal screen;

according toThe projective transformation matrix of formula 2 transforms the projection window coordinates into the corresponding coordinates x of the meta-space by formula 3 and formula 4 ^’ 、y ^’ And z ^’ ：

Wherein w and h represent the width and height, respectively, of the near-cutting surface of the frustum, and Z _n And Z _f Respectively representing the distance between a near cutting surface and a far cutting surface in the visual section, Q representing the ratio of the distance between the gazing position and the far cutting surface to the distance between the near cutting surface and the far cutting surface in the visual section, and fov representing the angle of view;

according to the determined coordinate x of the meta-space ^’ 、y ^’ And z ^’ Determining a visual object to which the coordinates of the metacosmic space belong.

4. The method of any one of claims 1-3, wherein the determining the direction of the learning object's line of sight from the learning video comprises:

locating a head region of the learning object according to the learning video;

extracting a head posture characteristic based on the head region, and calculating a head Euler rotation angle according to the head posture characteristic;

determining eye feature point distribution from the head region according to the head euler rotation angle;

5. The method of any one of claims 1 to 3, wherein said determining the micro-expression of the learning object from the learning video comprises:

intercepting an image of the learning video at a preset frequency, extracting a face feature key point of a learning object in the image by using a convolutional neural network, and positioning a target micro expression of the learning object according to the face feature key point;

and matching the target micro expression in the micro expression image labeling set according to the target micro expression, and determining the category corresponding to the target micro expression.

6. The method according to claim 1, wherein the determining the target visual object at which the learning object gazes according to the geometric-attribute information of each visual object and the sight line direction comprises:

dividing a terminal screen into a plurality of interest areas, and determining corresponding target interest areas according to the sight direction;

and removing the occluded associated visual objects in the associated visual object set to generate the target visual object.

7. The method of claim 6, wherein the step of determining the corresponding target region of interest according to the gaze direction further comprises the steps of:

the step S4 includes:

evaluating the concentration effect of the learning object according to the micro expression and the frequency of the micro expression associated with the target visual object in the preset time period;

and generating an attention thermodynamic diagram according to the frequency of the target visual objects watched by the learning objects in the preset time period, hierarchically setting the transparency of the thermodynamic diagram layer according to the frequency sequence, and hierarchically adding the thermodynamic diagram to the corresponding target visual objects in the terminal screen.

8. The method of claim 7, wherein counting the frequency of the learning objects' gaze falling in the respective interest areas within the preset time period comprises:

respectively calculating the maximum angles of the head postures of the learning object, such as nodding, head swinging and head shaking, deviating from the terminal screen according to the length and the width of the terminal screen and the eye center coordinates of the learning object;

and judging whether the angle of the single head deflection of the learning object falls within the maximum angle range and the duration is longer than a preset number of seconds, if so, counting the frequency of the interest area where the sight of the learning object falls, and otherwise, not counting.

9. An attention analysis terminal for teaching objects in a meta space, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of a method for attention analysis of learning objects in the meta space as claimed in any one of claims 1 to 8 when executing the computer program.

10. An attention analysis device for teaching a learning object in a meta universe, comprising: