CN113221007A

CN113221007A - Method for recommending answering behavior

Info

Publication number: CN113221007A
Application number: CN202110563070.8A
Authority: CN
Inventors: 刘菲; 卜晨阳; 孙帅; 胡学钢
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2021-05-21
Filing date: 2021-05-21
Publication date: 2021-08-06
Anticipated expiration: 2041-05-21
Also published as: CN113221007B

Abstract

The application discloses a method for recommending answer behaviors. Wherein, the method comprises the following steps: acquiring corresponding scores of the exercises relevant to the target knowledge points, which are answered by the target object at different moments; inputting the scores into a cognitive tracking model to obtain target cognitive state values of the target object to the target knowledge points at different moments; inputting a target cognitive state value corresponding to the current decision time into a target model corresponding to a target knowledge point to obtain the frequency of recommending a target object to continue to answer a target test question, wherein the current decision time is the starting moment of answering, and the target model comprises: the model for the enhanced learning of the answer behavior EBQ solves the technical problems that the dynamic cognitive state of a student is not researched and tracked in the related technology, whether the student user needs to do related test exercise or not and how many relevant exercise answer behaviors are recommended based on the dynamic cognitive state.

Description

Method for recommending answering behavior

Technical Field

The application relates to the field of exercise recommendation, in particular to a method for recommending answer behaviors.

Background

With the continuous deepening of education informatization and the rapid development of the internet, online education has become a new important research and application direction formed by the integration of computers with the traditional education field.

Existing recommendation systems are limited to recommending content-related problems to student users. However, when the student user is in a dynamically changing cognitive state, the recommendation of answer behavior of whether to practice related exercises or how many related exercises is required has research significance and application value. Especially, under the condition that most students adopt the sea of questions tactics at present, how to efficiently select the correct answer behavior at different stages is beneficial to improving the learning efficiency. In the related art, no research is available for tracking the dynamic cognitive state of a student, and on the basis of the dynamic cognitive state, recommendation is made for whether a student user needs to practice related exercises and how many times of answer behaviors of the related exercises are required.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the application provides a method for recommending answer behaviors, which is used for at least solving the technical problems that no research is carried out on tracking the dynamic cognitive state of a student in the related technology, whether a student user needs to carry out related test practice or not and how many times of answer behaviors of related exercise practice are recommended based on the dynamic cognitive state.

According to an aspect of an embodiment of the present application, there is provided a method for recommending answer behavior, including: acquiring corresponding scores of the exercises relevant to the target knowledge points, which are answered by the target object at different moments; inputting the scores into a cognitive tracking model to obtain target cognitive state values of the target object to the target knowledge points at different moments; inputting a target cognitive state value corresponding to the current decision time into a target model corresponding to a target knowledge point to obtain the frequency of recommending a target object to continue to answer a target test question, wherein the current decision time is the starting moment of answering, and the target model comprises: the answer behavior reinforcement learning EBQ model, EBQ model is used to represent at least one cognitive state value and recommend a relationship between continued answers and the degree value of the problem associated with the target knowledge point, the target problem being a plurality of different problems associated with the target knowledge point.

Optionally, the EBQ model includes: the Q matrix in the reinforcement learning algorithm, wherein the number of the cognitive state values corresponds to the number of the row elements of the Q matrix, the number of times of recommending the continuous response and the target knowledge point related exercises corresponds to the number of Q matrix columns, and the times of recommending the target object to continuously respond and the target knowledge point related exercises are obtained, and the Q matrix comprises the following steps: determining a current cognitive state value corresponding to the current decision time according to the cognitive tracking model; determining a row of a Q matrix corresponding to the current cognitive state value, and acquiring the maximum state value in the row; and determining the column of the Q matrix corresponding to the maximum state value, and taking the times of the recommended continuous answer and the problem related to the target knowledge point corresponding to the column as the times of the target object continuous answer and the problem related to the target knowledge point.

Optionally, the Q matrix is determined by: acquiring a zero matrix, wherein the zero matrix is a matrix with all elements of the matrix being zero; and updating the zero matrix to obtain a Q matrix at least based on the reward model corresponding to the cognitive tracking model and the EBQ model.

Optionally, obtaining a zero matrix comprises: determining a dynamic cognitive state set and an action space set, wherein the dynamic cognitive state set comprises: a plurality of initial cognitive state values; the action space set includes: the initial action times are used for indicating the times of the target object continuously answering the test questions related to the target knowledge points; and determining the number of the initial cognitive state values as the number of zero matrix rows, and determining the number of the action times values as the number of zero matrix columns to construct a zero matrix.

Optionally, the zero matrix is updated at least based on the reward model corresponding to the cognitive tracking model and the EBQ model to obtain a Q matrix: inputting sample scores corresponding to a plurality of sample objects into a cognitive tracking model, and determining sample cognitive state values corresponding to the plurality of sample objects at different moments, wherein the sample cognitive state values comprise: a first cognitive state value corresponding to a first decision time and a second cognitive state value corresponding to a second decision time, wherein the first decision time is adjacent to the second decision time and is a decision time before the second decision time; determining EBQ a reward model corresponding to the model, obtaining potential energy difference corresponding to the reward model, and updating the zero matrix based on the potential energy difference and the first cognitive state value and the second cognitive state value to obtain a Q matrix.

Optionally, obtaining a potential energy difference corresponding to the reward model, and updating the zero matrix based on the potential energy difference, the first cognitive state value, and the second cognitive state value to obtain a Q matrix, including: determining a corresponding target row of the first cognitive state value in a zero matrix; randomly selecting a target value from the target row, and determining a target action number value corresponding to the column where the target value is located; determining columns corresponding to the target action times numerical values as target columns; determining the product of the predetermined discount factor and the second cognitive state value, and taking the difference value of the product and the first cognitive state value as potential energy difference; replacing elements corresponding to target columns in target rows in the zero matrix with potential energy differences to obtain an initial Q matrix; and when the function expression corresponding to the reward model is determined to obtain the maximum value, replacing the potential energy difference in the initial Q matrix with the corresponding potential energy difference in the potential energy difference set by the corresponding potential energy difference set, and taking the initial Q matrix after replacement as the Q matrix.

Optionally, after determining that the number of the plurality of initial cognitive state values is the number of zero matrix rows and determining that the number of the plurality of action number values is the number of zero matrix columns to construct a zero matrix, the method further includes: forming an array by using the same row of elements of the zero matrix, and taking the initial cognitive state value as a reference value of any element in the array, wherein the reference values correspond to the array one by one; wherein the initial action number value is determined according to the column of the zero matrix.

Optionally, the initial action number value is determined according to a column of a zero matrix, and includes: and subtracting a preset value from the serial number of the column of the zero matrix to obtain an initial action number value, wherein the preset value is an integer.

Optionally, determining a target row corresponding to the first cognitive state value in the zero matrix includes: comparing the first cognitive state value to a reference value; if the first cognitive state value is consistent with the reference value in size, taking the reference value as a target reference value, and taking a line where the target reference value is located as a target line; and if the first cognitive state value is inconsistent with the reference value, taking the corresponding reference value as a target reference value when the absolute value of the difference value between the first cognitive state value and the reference value is minimum, and taking the line where the target reference value is located as a target line.

According to another aspect of the embodiments of the present application, there is also provided an answer behavior recommendation device, including: the acquisition module is used for acquiring corresponding scores of the exercises which are made by the target object and associated with the target knowledge points at different moments; the first determining module is used for inputting the scores into the cognitive tracking model to obtain target cognitive state values of the target object to the target knowledge points at different moments; the second determination module is used for inputting the target cognitive state value corresponding to the current decision time into the target model corresponding to the target knowledge point to obtain the frequency of recommending the target object to continue to answer the target test question, wherein the current decision time is the starting time of answering, and the target model comprises: the answer behavior reinforcement learning EBQ model, EBQ model is used to represent at least one cognitive state value and recommend a relationship between continued answers and the degree value of the problem associated with the target knowledge point, the target problem being a plurality of different problems associated with the target knowledge point.

According to another aspect of the embodiments of the present application, there is also provided a non-volatile storage medium, where the non-volatile storage medium includes a stored program, and the program controls a device in which the non-volatile storage medium is located to execute any one of the answer behavior recommendation methods when the program is executed.

According to another aspect of the embodiments of the present application, there is also provided a processor, configured to execute a program, where the program executes any one of the answer behavior recommendation methods when running.

In the embodiment of the application, a mode of recommending the number of times that the target object continues to answer the target test questions based on an EBQ model is adopted, and corresponding scores of the questions related to the target knowledge points are answered by the target object at different moments are obtained; inputting the scores into a cognitive tracking model to obtain target cognitive state values of the target object to the target knowledge points at different moments; inputting a target cognitive state value corresponding to the current decision time into a target model corresponding to a target knowledge point to obtain the frequency of recommending a target object to continue to answer a target test question, wherein the current decision time is the starting moment of answering, and the target model comprises: the answer behavior reinforcement learning EBQ model is a model EBQ, and is used for representing at least one cognition state value and recommending the relationship between continuous answer and the times of the questions related to the target knowledge point, the target questions are a plurality of different questions related to the target knowledge point, the dynamic cognition state based on the target object is achieved, the answer behavior reinforcement learning EBQ model is constructed, and the technical effect of recommending the times of continuously answering the target questions by the target object is achieved, so that the technical problems that the dynamic cognition state of students is not researched and tracked in the related technology, whether the student users need to practice related questions or not and how many questions of the answer behaviors of related question exercises are recommended based on the dynamic cognition state are solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a flowchart illustrating an alternative method for recommending answer behavior according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an alternative answering behavior recommendation device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

To facilitate a better understanding of the embodiments related to the present application, technical terms or partial terms that may be related to the embodiments of the present application are explained as follows:

the simulated annealing algorithm is based on the solid annealing principle, is an algorithm based on probability, heats the solid to be sufficiently high, then slowly cools the solid, when heating, the particles in the solid become disordered along with the temperature rise, the internal energy is increased, when slowly cooling, the particles gradually get ordered, each temperature reaches an equilibrium state, and finally, the internal energy is reduced to the minimum when reaching a ground state at normal temperature. The earliest idea of the Simulated Annealing algorithm (SA) was proposed by N.Metropolis [1] et al in 1953. In 1983, s.kirkpatrick et al successfully introduced the annealing concept into the field of combinatorial optimization. The method is a random optimization algorithm based on a Monte-Carlo iterative solution strategy, and the starting point is based on the similarity between the annealing process of solid matters in physics and a general combinatorial optimization problem. The simulated annealing algorithm starts from a certain high initial temperature, and randomly searches a global optimal solution of the objective function in a solution space by combining with the probability jump characteristic along with the continuous decrease of the temperature parameter, namely, the global optimal solution can jump out probabilistically in a local optimal solution and finally tends to be global optimal. The simulated annealing algorithm is a general optimization algorithm, theoretically, the algorithm has probability global optimization performance, and is widely applied to engineering at present, such as the fields of VLSI, production scheduling, control engineering, machine learning, neural networks, signal processing and the like. The simulated annealing algorithm is an optimization algorithm which can effectively avoid trapping in a serial structure which is locally minimum and finally tends to global optimum by endowing a search process with time-varying probability jump property and finally tends to zero.

In accordance with an embodiment of the present application, there is provided an embodiment of a method for recommending a question answering behavior, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

Fig. 1 is a method for recommending answer behavior according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:

step S102, obtaining corresponding scores of the target object answering the exercises relevant to the target knowledge points at different moments;

step S104, inputting the scores into a cognitive tracking model to obtain target cognitive state values of the target object to the target knowledge points at different moments;

step S106, inputting the target cognitive state value corresponding to the current decision time into a target model corresponding to the target knowledge point to obtain the frequency of recommending the target object to continue to answer the target test question, wherein the current decision time is the starting time of answering, and the target model comprises: the answer behavior reinforcement learning EBQ model, EBQ model is used to represent at least one cognitive state value and recommend a relationship between continued answers and the degree value of the problem associated with the target knowledge point, the target problem being a plurality of different problems associated with the target knowledge point.

In the method for recommending the answering behavior, the corresponding scores of the exercises which are associated with the target knowledge points and answered by the target object at different moments are obtained, the scores are input into a cognitive tracking model, the target cognitive state values of the target object to the target knowledge points at different moments are obtained, the target cognitive state values corresponding to the current decision time are input into a target model corresponding to the target knowledge points, the frequency of recommending the target object to continue answering the target test questions is obtained, wherein the current decision time is the starting moment of answering, and the target model comprises the following steps: the answer behavior reinforcement learning EBQ model is a model EBQ, and is used for representing at least one cognition state value and recommending the relationship between continuous answer and the times of the questions related to the target knowledge point, the target questions are a plurality of different questions related to the target knowledge point, the dynamic cognition state based on the target object is achieved, the answer behavior reinforcement learning EBQ model is constructed, and the technical effect of recommending the times of continuously answering the target questions by the target object is achieved, so that the technical problems that the dynamic cognition state of students is tracked due to the fact that research is not available in the related technology, whether the target object needs to be trained in related questions or not and how many relevant exercise answer behaviors are recommended based on the dynamic cognition state are solved.

The cognitive state value indicates the degree of grasp of the target knowledge point by the target object, wherein the larger the cognitive state value, the higher the degree of grasp of the target knowledge point by the target object (i.e., the better the grasp), and the smaller the cognitive state value, the lower the degree of grasp of the target knowledge point by the target object (i.e., the worse the grasp).

The description of the alternative embodiments related to the present application will now be given with reference to specific application steps:

1) the reinforcement learning model EBQ (explicit latent & Q-learning) parameters are initialized (assuming an online education platform, each student answers a question at each time step. ) (ii) a

2) Constructing EBQ a student dynamic cognition state oriented answer behavior reinforcement learning model;

3) EBQ training a model;

4) and recommending the answering behavior facing the dynamic cognitive state of the student.

The method comprises the following specific steps:

step 1: the reinforcement learning model EBQ parameters are initialized.

Step 1.1: accuracy of initializing cognitive state ∈ 10^-CAnd C is a natural number greater than 1. There are C cognitive states in total.

Step 1.2: and initializing the answering behavior in M.

Step 1.3: the initialization decision time P is 1(P ∈ {1, 2.., P }).

Step 2: and constructing an answering behavior reinforcement learning model EBQ facing the dynamic cognitive state of the student.

Step 2.1: constructing EBQ state space S ═<S_k>，

Where S is the representation EBQ state space, S_kRepresenting the dynamic cognitive state of the student.

Step 2.2: motion space a of construction EBQ<A_m>. Where A is a doublet representing EBQ action space. A. the_mThe question is recommended to be answered M times continuously, wherein {0, 1, 2,. and M } (M belongs to {0,. and M }); in particular, when m is 0, it indicates that no question is recommended to be answered.

Step 2.3: the matrix Q in the model of the initialization EBQ is a zero matrix of size C x M.

Step 2.4: constructing EBQ reward model r^p＝ρ(s^p，a^p，s^p+1). Wherein s is^pState representing the pth decision time; a is^pAn act of representing a pth decision time; s^p+1The state of the p +1 th decision time is shown, and the p th decision time is in s^pAfter the state executes the ap action, the state is transferred to the state of the (p + 1) th decision time, rho(s)^p，a^p，s^p+1) Is a reward function.

When p is t, a^pWhen m, p +1 is t + m

If γ is 1, then

When a is^pWhen m is 0, i.e. the action is not to continue answering, then(s) is performed^p，a^p) Prize of ρ(s)^p，a^p，s^p+1)＝e⁰＝1。

Step 2.5: a migration model of EBQ is constructed. At the p-th decision time in state s^pPerforming action a^pIn the case of (2), the state is shifted to s^p+1When p is t, p +1 is t + m, which is obtained from the cognitive tracking model

Is i.e. s^p+1。

Step 2.6: the objective function of setting EBQ is:

and step 3: EBQ model training.

Step 3.1: and when the knowledge point I is equal to 1, circularly executing the operation 3.2-3.10.

Step 3.2: and when N is less than or equal to N, circularly executing the operation 3.3-3.9.

Step 3.3: the problem that student N (N ∈ {1, 2.,. N }) answers is associated with knowledge point I (I ∈ {1, 2.,. I }) is scored as

The dynamic cognitive state of the student n on the knowledge point i at time steps 1to t is obtained by taking the dynamic cognitive state as the input of cognitive tracking

Step 3.4: and when the decision time P is equal to 1 and P is less than or equal to P, circularly executing the operations 3.5-3.9.

Step 3.5: will state

Probability selection (action selection can be performed by adopting methods such as simulated annealing and the like) Q [ s^p，：]Medium optimal action a^p。

Step 3.6: calculation by cognitive tracking model

Step 3.7: r is calculated according to equation (4)^p＝ρ(s^p，a^p，s^p+1)。

Step 3.8: updating the matrix Q: q [ s ]^p*10^C，a^p]＝r^p。

Step 3.9: p is p + 1; t is t + m.

Step 3.10: EBQ model EBQ returning knowledge points i_i＝Q。

And 4, step 4: and recommending the answering behavior facing the dynamic cognitive state of the student.

Step 4.1: at time steps 1to t, student N (N ∈ {1, 2., N }) answers the problem associated with knowledge point I (I ∈ {1, 2., I }) with a score of

Will be provided with

As the input of cognitive tracking, the dynamic cognitive state of the student n on the knowledge point i at the time steps from 1to t can be obtained

Step 4.2: will state

Inputting the probability to EBQ model, selecting probability (selecting action by simulation annealing method) EBQ_i[s^p*10^C，：]Medium optimal action a^p，a^pNamely the answer behavior recommended by the user.

Namely, a reinforced learning model EBQ (iterative latent & Q-learning) is subjected to parameter initialization, and then an answer Behavior reinforced learning model EBQ facing to the dynamic cognitive state of the student is constructed; then EBQ model training is carried out; and finally, recommending the answer behavior facing the dynamic cognitive state of the student.

In order to facilitate the understanding of the related embodiments by those skilled in the art, the related steps are illustrated.

The method comprises the following specific steps:

step 1: the reinforcement learning model EBQ parameters are initialized.

Step 1.1: accuracy of initializing cognitive state ∈ 10^-CAnd C is a natural number. It has a total of 10^cAnd (4) a cognitive state.

When C is initialized to 1, then e is 0.1.

Step 1.2: and initializing M answering behaviors.

The initialization M is 6.

Step 1.3: the initialization decision time P is 1(P ∈ {1, 2.., P }).

The initialization P is 1 and P is 3.

Step 2.1: constructing EBQ state space S ═<S_k>，

S＝<S_k>，S_k＝{0，0.1，...，0.9，1}；

Step 2.2: motion space a of construction EBQ<A_m>。A_m1,. M } represents that the recommendation continues to answer the question M times; in particular, when m is 0, it indicates that no question is recommended to be answered.

A＝<A_m>，A_m＝{0，1，2，3，4，5}

Step 2.3: initializing EBQ the matrix Q in the model to 10^CZero matrix of size M.

The initialization matrix Q is a zero matrix of 10 × 6:

the rows represent the meanings: taking a value of a cognitive state; the column meanings: answering a plurality of questions;

step 2.4: constructing EBQ reward model r^p＝ρ(s^p，a^p，s^p+1). Wherein s is^pState representing the pth decision time; a is^pAn act of representing a pth decision time; s^p+1The state of the p +1 th decision time is shown, and the p th decision time is in s^pState execution a^pTransition to state at decision time p +1, ρ(s)^p，a^p，s^p+1) Is a reward function.

When p is t, a^pWhen m, p +1 is t + m

If γ is 1, then

It should be noted that, in the above formula

The potential energy function is expressed, gamma represents a discount factor of the potential energy which does not come, e is a natural base number, time steps are different time, and obviously, when gamma is 1, the discount is not realized.

Step 2.5: a migration model of EBQ is constructed. At the p-th decision time in state s^pPerforming action a^pIn the case of (2), the state is shifted to s^p+1. When p is t, p +1 is t + m. From cognitive tracking models

Is i.e. s^p+1。

Step 2.6: the objective function of setting EBQ is:

so that f (x) obtains the variable point x (or the set of x) corresponding to the maximum value, corresponding to the model parameter in step 3;

and step 3: EBQ, training the model, wherein the main purpose of the step is to update the matrix Q and find an optimal matrix Q so that the matrix Q can satisfy the maximum objective function.

I is 1to I, and I is the number of knowledge points;

n＝1to N；

T ═ 10, that is, student 1 answers the problem associated with knowledge point 1 at these 10 time steps, and its score is expressed as (1, 0, 0, 0, 1, 1, 0, 1, 1, 1,), and using it as an input for cognitive tracking, the cognitive state of student 1 at time steps 1to 10 can be obtained (0.5, 0.3, 0.2, 0.1, 0.5, 0.4, 0.5, 0.5, 0.6, 0.7);

p＝1to P

Step 3.5: will state

Probability selection (action selection can be carried out by adopting methods such as simulated annealing and the like to avoid local optimization) Q [ s^p*10^C，：]Medium optimal action a^p。

That is, the cognitive state value of the first student (n ═ 1) at the first time step (t ═ 1);

look up matrix Q [5 ]:]＝[0 0 0 0 0 0]all values ofIn the case of, for example, randomly selecting an action, a^p2 ═ m; calculating a cognitive state value of 0.5 by using a cognitive model;

step 3.6: calculation by cognitive tracking model

Calculated according to a cognitive tracking model (0.5, 0.3, 0.2, 0.1, 0.5, 0.4, 0.5, 0.5, 0.6, 0.7),

two questions are given for p ═ 1 and m ═ 2, then t + m equals 3.

Step 3.7: r is calculated according to equation (5)^p＝ρ(s^p，a^p，s^p+1)。

Let γ equal to 1, ρ(s)^p，a^p，s^p+1)＝0.2-0.5+e^-2＝-0.165

Step 3.8: updating the matrix Q: q [ s ]^p*10^C，a^p]＝r^p。

Updating the matrix Q

That is, 0.5 corresponds to the 5 th row, and m-2 corresponds to the 3 rd column

Step 3.9: p is p + 1; t is t + m.

Step 3.10: EBQ model EBQ returning knowledge points i_i＝Q。

The matrix Q is updated iteratively to finally obtain EBQ_i＝Q

Will be provided with

Student 1 answers the problem associated with knowledge point 1 at these 5 time steps, and the score is expressed as (0, 1, 0, 0), and using this as the input of cognitive tracking, the cognitive state (0, 0.3, 0.2, 0.1) of student 1 at time steps 1to 5 can be obtained.

Step 4.2: will state

Inputting the probability to EBQ model, selecting probability (selecting action by simulation annealing method) EBQ_i[s^p*10^C，：]Medium optimal action a^p。a^pNamely the answer behavior recommended by the user.

Look up matrix Q [1 ]:]＝[0.35 0.28 -0.55 2.25 3.3 -9.8]select the optimal action, i.e. a¹M 4; namely, the user is recommended to practice 4 exercises related to the knowledge point under the condition that the cognitive state of the user is 0.1.

In some embodiments of the present application, the EBQ model includes: in the Q matrix in the reinforcement learning algorithm, it should be noted that the number of cognitive state values corresponds to the number of row elements of the Q matrix, and the number of times of recommending a continuous response and a problem related to a target knowledge point corresponds to the number of Q matrix columns, so as to obtain the number of times of recommending a target object to continuously respond and a problem related to a target knowledge point, including: determining a current cognitive state value corresponding to the current decision time according to the cognitive tracking model; determining a row of a Q matrix corresponding to the current cognitive state value, and acquiring the maximum state value in the row; and determining the column of the Q matrix corresponding to the maximum state value, and taking the times of the recommended continuous answer and the problem related to the target knowledge point corresponding to the column as the times of the target object continuous answer and the problem related to the target knowledge point.

In some optional embodiments of the present application, the Q matrix is determined by: acquiring a zero matrix, wherein the zero matrix is a matrix with all elements of the matrix being zero; the Q matrix is obtained by updating the zero matrix based on at least the reward model corresponding to the cognitive tracking model and EBQ model, and it is easy to note that it corresponds to the EBQ model training process of step 3 above.

In some embodiments of the present application, obtaining a zero matrix includes: determining a dynamic cognitive state set and an action space set, wherein the dynamic cognitive state set comprises: a plurality of initial cognitive state values; the action space set includes: the initial action times are used for indicating the times of the target object continuously answering the test questions related to the target knowledge points; and determining the number of the initial cognitive state values as the number of zero matrix rows, and determining the number of the action times values as the number of zero matrix columns to construct a zero matrix, namely corresponding to the step 2.1 to the step 2.3.

Optionally, the zero matrix is updated at least based on the reward model corresponding to the cognitive tracking model and the EBQ model to obtain a Q matrix: inputting sample scores corresponding to a plurality of sample objects into a cognitive tracking model, and determining sample cognitive state values corresponding to the plurality of sample objects at different moments, wherein the sample cognitive state values comprise: a first cognitive state value corresponding to a first decision time and a second cognitive state value corresponding to a second decision time, wherein the first decision time is adjacent to the second decision time and is a decision time before the second decision time; determining EBQ a reward model (i.e., r) corresponding to the model^p) Obtaining reward model correspondencesPotential energy difference (i.e. the reward function ρ (s))^p，a^p，s^p+1) And updating the zero matrix based on the potential energy difference and the first cognitive state value and the second cognitive state value to obtain a Q matrix, namely corresponding to the step 2.4 to the step 2.6.

Specifically, obtaining a potential energy difference corresponding to the reward model (i.e., the above equation (6)), and updating the zero matrix based on the potential energy difference and the first cognitive state value and the second cognitive state value to obtain the Q matrix is implemented as follows: determining a corresponding target row of the first cognitive state value in a zero matrix; randomly selecting a target value from the target row, and determining a target action number value corresponding to the column where the target value is located; determining columns corresponding to the target action times numerical values as target columns; determining the product of the predetermined discount factor and the second cognitive state value, and taking the difference value of the product and the first cognitive state value as potential energy difference; replacing elements corresponding to target columns in target rows in the zero matrix with potential energy differences to obtain an initial Q matrix; and when the function expression corresponding to the reward model is determined to obtain the maximum value, replacing the potential energy difference in the initial Q matrix with the corresponding potential energy difference in the potential energy difference set by the corresponding potential energy difference set, and taking the initial Q matrix after replacement as the Q matrix.

It should be noted that, after determining the number of the initial cognitive state values as the number of zero matrix rows and determining the number of the action number values as the number of zero matrix columns to construct a zero matrix, the same row elements of the zero matrix may be grouped into an array, and the initial cognitive state values are used as reference values of any one element in the array, where the reference values correspond to the array one to one; wherein the initial action number value is determined according to the column of the zero matrix.

In some embodiments of the present application, the initial action number value is determined according to a column of a zero matrix, specifically: the initial action number value is obtained by subtracting a predetermined value from the serial number of the column of the zero matrix, where the predetermined value is an integer, and it should be noted that, in a preferred embodiment, the predetermined value is an integer value of 1, so that the corresponding initial action number value of the jth column of the matrix is j-1, for example, the 5 th column of the matrix is 4 times the corresponding initial action number value of the column.

In some optional embodiments of the present application, determining a corresponding target row of the first cognitive state value in the zero matrix may be determined by: comparing the first cognitive state value to a reference value; if the first cognitive state value is consistent with the reference value in size, taking the reference value as a target reference value, and taking a line where the target reference value is located as a target line; if the first cognitive state value is not consistent with the reference value in size, taking the corresponding reference value as a target reference value when the absolute value of the difference value between the first cognitive state value and the reference value is minimum, and taking the row where the target reference value is located as a target row, for example, the first cognitive state value is 0.1, the initial state value corresponding to the array composed of the elements of the first row of the matrix is 0.1, namely the reference value is 0.1, the first cognitive state value is zero to the corresponding target row in the matrix is 1, and for example, when the first cognitive state value is 0.26, and the reference value corresponding to the array composed of the elements of the second row is 0.2, the absolute value corresponding to the first cognitive state value and the reference value is 0.06; and if the reference value corresponding to the array formed by the elements in the third row is 0.3, the absolute value corresponding to the first cognitive state value and the reference value is 0.04, and the 3 rd row of the target behavior is determined.

It should be noted that the cognitive tracking model may be a Bayesian Knowledge tracking (Bayesian Knowledge tracking) model.

Fig. 2 is a device for recommending answer behavior according to an embodiment of the present application, as shown in fig. 2, the device includes:

the acquisition module 40 is used for acquiring corresponding scores of the target object answering the exercises related to the target knowledge points at different moments;

the first determining module 42 is configured to input the score into the cognitive tracking model, so as to obtain target cognitive state values of the target object to the target knowledge point at different times;

the second determining module 44 is configured to input the target cognitive state value corresponding to the current decision time into a target model corresponding to the target knowledge point, so as to obtain the number of times for recommending the target object to continue to answer the target test question, where the current decision time is a starting time of answering, and the target model includes: the answer behavior reinforcement learning EBQ model, EBQ model is used to represent at least one cognitive state value and recommend a relationship between continued answers and the degree value of the problem associated with the target knowledge point, the target problem being a plurality of different problems associated with the target knowledge point.

In the device for recommending the answering behavior, an obtaining module 40 is used for obtaining corresponding scores of the exercises which are related to the target knowledge points and answered by the target object at different moments; the first determining module 42 is configured to input the score into the cognitive tracking model, so as to obtain target cognitive state values of the target object to the target knowledge point at different times; the second determining module 44 is configured to input the target cognitive state value corresponding to the current decision time into a target model corresponding to the target knowledge point, so as to obtain the number of times for recommending the target object to continue to answer the target test question, where the current decision time is a starting time of answering, and the target model includes: the answer behavior reinforcement learning EBQ model is a model EBQ, and is used for representing at least one cognition state value and recommending the relationship between continuous answer and the times of the questions related to the target knowledge point, the target questions are a plurality of different questions related to the target knowledge point, the dynamic cognition state based on the target object is achieved, the answer behavior reinforcement learning EBQ model is constructed, and the technical effect of recommending the times of continuously answering the target questions by the target object is achieved, so that the technical problems that the dynamic cognition state of students is tracked due to the fact that research is not available in the related technology, whether the target object needs to be trained in related questions or not and how many relevant exercise answer behaviors are recommended based on the dynamic cognition state are solved.

Specifically, the storage medium is used for storing program instructions for executing the following functions, and the following functions are realized:

acquiring corresponding scores of the exercises relevant to the target knowledge points, which are answered by the target object at different moments; inputting the scores into a cognitive tracking model to obtain target cognitive state values of the target object to the target knowledge points at different moments; inputting a target cognitive state value corresponding to the current decision time into a target model corresponding to a target knowledge point to obtain the frequency of recommending a target object to continue to answer a target test question, wherein the current decision time is the starting moment of answering, and the target model comprises: the answer behavior reinforcement learning EBQ model, EBQ model is used to represent at least one cognitive state value and recommend a relationship between continued answers and the degree value of the problem associated with the target knowledge point, the target problem being a plurality of different problems associated with the target knowledge point.

Specifically, the processor is configured to call a program instruction in the memory, and implement the following functions:

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A method for recommending answering behavior is characterized by comprising the following steps:

acquiring corresponding scores of the exercises relevant to the target knowledge points, which are answered by the target object at different moments;

inputting the scores into a cognitive tracking model to obtain target cognitive state values of the target object to the target knowledge points at different moments;

inputting a target cognitive state value corresponding to the current decision time into a target model corresponding to the target knowledge point to obtain the frequency of recommending the target object to continue to answer the target test question, wherein the current decision time is the starting time of answering, and the target model comprises: and (3) a response behavior reinforcement learning EBQ model, wherein the EBQ model is used for representing at least one cognitive state value and recommending the relation of continuous response and the times of the questions related to the target knowledge point, and the target questions are a plurality of different questions related to the target knowledge point.

2. The method of claim 1, wherein the EBQ model comprises: the Q matrix in the reinforcement learning algorithm, wherein the number of the cognitive state values corresponds to the number of the row elements of the Q matrix, the number of the times of recommending the continuous response and the target knowledge point-related exercises corresponds to the number of the Q matrix columns, and the number of times of recommending the target object to continuously respond and the target knowledge point-related exercises is obtained, and the Q matrix comprises the following steps:

determining a current cognitive state value corresponding to the current decision time according to the cognitive tracking model;

determining a row of the Q matrix corresponding to the current cognitive state value, and acquiring a maximum state value in the row;

and determining the column of the Q matrix corresponding to the maximum state value, and taking the times of the recommended continuous response and the target knowledge point related exercises corresponding to the column as the times of the target object continuous response and the target knowledge point related exercises.

3. The method of claim 2, wherein the Q matrix is determined by:

acquiring a zero matrix, wherein the zero matrix is a matrix with all elements of the matrix being zero;

and updating the zero matrix to obtain the Q matrix at least based on the cognitive tracking model and the reward model corresponding to the EBQ model.

4. The method of claim 3, wherein obtaining a zero matrix comprises:

determining a dynamic cognitive state set and an action space set, wherein the dynamic cognitive state set comprises: a plurality of initial cognitive state values; the set of action spaces includes: the initial action times are used for indicating the times of the target object continuously answering the test questions related to the target knowledge point;

determining the number of the initial cognitive state values as the number of the zero matrix rows, and determining the number of the action times values as the number of the zero matrix columns to construct a zero matrix.

5. The method of claim 3, wherein the Q matrix is obtained by updating the zero matrix based on at least the cognitive tracking model and the reward model corresponding to the EBQ model:

inputting sample scores corresponding to a plurality of sample objects into the cognitive tracking model, and determining sample cognitive state values corresponding to the plurality of sample objects at different moments, wherein the sample cognitive state values comprise: a first cognitive state value corresponding to a first decision time and a second cognitive state value corresponding to a second decision time, wherein the first decision time is adjacent to the second decision time and is a decision time before the second decision time;

determining an incentive model corresponding to the EBQ model, acquiring potential energy difference corresponding to the incentive model, and updating the zero matrix based on the potential energy difference, the first cognitive state value and the second cognitive state value to obtain the Q matrix.

6. The method of claim 5, wherein obtaining a potential energy difference corresponding to the reward model, and updating the zero matrix based on the potential energy difference and the first cognitive state value and the second cognitive state value to obtain the Q matrix comprises:

determining a corresponding target row of a first cognitive state value in the zero matrix;

randomly selecting a target value from a target row, and determining a target action number value corresponding to a column where the target value is located;

determining columns corresponding to the target action times numerical values as target columns;

determining a product of a predetermined discount factor and a second cognitive state value, and taking a difference value of the product and the first cognitive state value as the potential energy difference;

replacing elements corresponding to target columns in target rows in the zero matrix with the potential energy difference to obtain an initial Q matrix;

and when determining that the function expression corresponding to the reward model obtains the maximum value, replacing the potential energy difference in the initial Q matrix with the corresponding potential energy difference in the potential energy difference set by the corresponding potential energy difference set, and taking the initial Q matrix after the replacement is completed as the Q matrix.

7. The method of claim 6, wherein after determining the number of the plurality of initial cognitive state values as the number of zero matrix rows and determining the number of the plurality of action number values as the number of zero matrix columns to construct a zero matrix, the method further comprises:

forming an array by using the same row elements of the zero matrix, and taking the initial cognitive state value as a reference value of any element in the array, wherein the reference values are in one-to-one correspondence with the array; wherein the initial action number value is determined according to the column of the zero matrix.

8. The method of claim 7, wherein the initial action count value is determined from a column of the zero matrix, comprising:

and subtracting a preset value from the serial number of the column of the zero matrix to obtain the initial action number value, wherein the preset value is an integer.

9. The method of claim 7, wherein determining a corresponding target row of first cognitive state values in the zero matrix comprises:

comparing the first cognitive state value to the reference value;

if the first cognitive state value is consistent with the reference value in size, taking the reference value as a target reference value, and taking a line where the target reference value is located as the target line;

if the first cognitive state value is inconsistent with the reference value, taking the corresponding reference value as a target reference value when the absolute value of the difference value between the first cognitive state value and the reference value is minimum, and taking the row where the target reference value is located as the target row.

10. An answering behavior recommendation device, comprising:

the acquisition module is used for acquiring corresponding scores of the exercises which are made by the target object and associated with the target knowledge points at different moments;

the first determining module is used for inputting the scores into a cognitive tracking model to obtain target cognitive state values of the target object to the target knowledge points at different moments;

a second determining module, configured to input a target cognitive state value corresponding to a current decision time into a target model corresponding to the target knowledge point, to obtain a number of times for recommending the target object to continue to answer a target test question, where the current decision time is a starting time of answering, and the target model includes: and (3) a response behavior reinforcement learning EBQ model, wherein the EBQ model is used for representing at least one cognitive state value and recommending the relation of continuous response and the times of the questions related to the target knowledge point, and the target questions are a plurality of different questions related to the target knowledge point.

11. A non-volatile storage medium, comprising a stored program, wherein when the program runs, the non-volatile storage medium controls a device to execute the answer behavior recommendation method according to any one of claims 1to 9.

12. A processor, characterized in that the processor is configured to run a program, wherein the program is configured to execute the method for recommending answer behavior according to any of claims 1to 9 when running.