CN110288878B

CN110288878B - Self-adaptive learning method and device

Info

Publication number: CN110288878B
Application number: CN201910584394.2A
Authority: CN
Inventors: 马海平; 刘淇; 陈恩红; 王士进; 童世炜; 黄振亚
Original assignee: University of Science and Technology of China USTC; iFlytek Co Ltd
Current assignee: University of Science and Technology of China USTC; iFlytek Co Ltd
Priority date: 2019-07-01
Filing date: 2019-07-01
Publication date: 2021-10-08
Anticipated expiration: 2039-07-01
Also published as: CN110288878A

Abstract

The embodiment of the invention provides a self-adaptive learning method and device, and belongs to the technical field of machine learning. The method comprises the following steps: determining a candidate knowledge unit set according to a target learning path and a first knowledge unit currently learned by a student, wherein the target learning path comprises all knowledge units required to be learned by the student; and determining the probability of the optimal solution when each knowledge unit in the candidate knowledge unit set is used as a target knowledge unit according to the current learning state of the student, and using the knowledge unit corresponding to the maximum probability in the candidate knowledge unit set as the target knowledge unit, wherein the target knowledge unit is the next knowledge unit needing to be learned by the student. The knowledge structure and the learning state of the student can be combined to recommend the next knowledge unit to be learned, so that the knowledge mastering degree of the student at different moments can be accurately analyzed, the recommendation result is more consistent with the cognitive rule, and efficient learning paths can be formulated for different students in a personalized manner.

Description

Self-adaptive learning method and device

Technical Field

The invention relates to the technical field of machine learning, in particular to a self-adaptive learning method and device.

Background

At present, the traditional education, especially the classroom education, only aims at one class or one group to carry out the universal education, and the individual requirements of students are difficult to meet. Meanwhile, the traditional education has large demand on education resources, and under the condition of insufficient current education resources, the situation of uneven distribution of the education resources is easy to generate, so that the phenomenon of inequality of education is easy to cause. An adaptive learning method is urgently needed to recommend knowledge units suitable for students to meet the individual learning requirements of different students.

Disclosure of Invention

In order to solve the above problems, embodiments of the present invention provide an adaptive learning method and apparatus that overcome the above problems or at least partially solve the above problems.

According to a first aspect of embodiments of the present invention, there is provided an adaptive learning method, including:

determining a candidate knowledge unit set according to a target learning path and a first knowledge unit currently learned by a student, wherein the target learning path comprises all knowledge units required to be learned by the student;

and determining the probability of the optimal solution when each knowledge unit in the candidate knowledge unit set is used as a target knowledge unit according to the current learning state of the student, and using the knowledge unit corresponding to the maximum probability in the candidate knowledge unit set as the target knowledge unit, wherein the target knowledge unit is the next knowledge unit needing to be learned by the student.

According to a second aspect of the embodiments of the present invention, there is provided an adaptive learning apparatus including:

the first determining module is used for determining a candidate knowledge unit set according to a target learning path and a first knowledge unit currently learned by a student, wherein the target learning path comprises all knowledge units required to be learned by the student;

and the second determining module is used for determining the probability of the optimal solution when each knowledge unit in the candidate knowledge unit set is used as the target knowledge unit according to the current learning state of the student, and taking the knowledge unit corresponding to the maximum probability in the candidate knowledge unit set as the target knowledge unit, wherein the target knowledge unit is the next knowledge unit needing learning of the student.

According to a third aspect of embodiments of the present invention, there is provided an electronic apparatus, including:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the adaptive learning method provided by any of the various possible implementations of the first aspect.

According to a fourth aspect of the present invention, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the adaptive learning method provided by any one of the various possible implementations of the first aspect.

According to the self-adaptive learning method and device provided by the embodiment of the invention, the candidate knowledge unit set is determined according to the target learning path and the first knowledge unit currently learned by the student. And determining the probability of the optimal solution when each knowledge unit in the candidate knowledge unit set is used as a target knowledge unit according to the current learning state of the student, and using the knowledge unit corresponding to the maximum probability in the candidate knowledge unit set as the target knowledge unit. The knowledge structure and the learning state of the student can be combined to recommend the next knowledge unit to be learned, so that the knowledge mastering degree of the student at different moments can be accurately analyzed, the recommendation result is more consistent with the cognitive rule, and efficient learning paths can be formulated for different students in a personalized manner.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of embodiments of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a self-adaptive learning method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a target learning path according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a default model according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an adaptive learning apparatus according to an embodiment of the present invention;

fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

At present, the adaptive learning methods mainly include the following two methods:

(1) method based on learning state

Because the learning abilities of different students are different, the learning benefits brought by each knowledge unit are different for each student. Therefore, the learning states of different students and their variation laws are different. Based on the project reaction theory, the states and abilities of students can be presumed according to the performances of the students on different knowledge units, and the knowledge units with moderate difficulty are recommended to the students based on the states and the abilities. Besides, the adaptive learning process can be regarded as a Markov decision process, the evolution process of the student learning state is simulated by using a transition matrix of the Markov decision process, and a reinforcement learning algorithm is used for mining the relation between the learning state and the knowledge unit.

(2) Knowledge structure based method

And (4) performing data analysis on the relation between the knowledge units, and recommending by combining the similarity, difficulty and the like of the knowledge units. Specifically, a knowledge graph can be introduced into the knowledge graph, and a recommendation rule is formulated based on the characteristics and the relation of knowledge units to plan a learning path for students by expressing a knowledge system into a graph form. Or, the learning ability of the student can be reflected on the learning track of the student, and the similar learning ability corresponds to the similar knowledge structure, so that the similar knowledge units can be recommended to the student by using a method such as collaborative filtering. For example, the traditional e-commerce recommendation method migration is applied to educational recommendation.

In view of the first approach, the method based on the learning state of the student cannot effectively utilize the existing knowledge structure, and may provide an illogical learning path violating the human cognitive rules. For the second method, the method based on the knowledge structure cannot effectively make an individual learning method for different students, and only can plan a universal learning path from a group level as the conventional education method, so that the high efficiency of learning cannot be guaranteed.

In view of the problems in the above two manners, the embodiment of the present invention provides an adaptive learning method. The method may be used in a recommendation scenario of a knowledge unit, which is not specifically limited in this embodiment of the present invention. Specifically, when a student learns a certain course, different knowledge units are laid back and forth, and only after learning a function and a limit, the student can learn derivatives and calculus, so that after learning a certain knowledge unit, the student needs to recommend the next knowledge unit to be learned to the student. Referring to fig. 1, the method includes:

101. and determining a candidate knowledge unit set according to a target learning path and a first knowledge unit currently learned by the student, wherein the target learning path comprises all knowledge units required to be learned by the student.

The target learning path refers to a universal learning path, which may include knowledge units and a sequence between the knowledge units, and may be specifically represented in a form of a connection diagram with a direction. The first knowledge unit can be determined according to the learning progress of the student, and if the student is currently learning the 3 rd knowledge unit or the 3 rd knowledge unit is just finished learning, the 3 rd knowledge unit is the first knowledge unit currently learned by the student. In addition, the target learning path includes the first knowledge unit, and the candidate knowledge unit set is also the knowledge unit screened from the target learning path.

102. And determining the probability of the optimal solution when each knowledge unit in the candidate knowledge unit set is used as a target knowledge unit according to the current learning state of the student, and using the knowledge unit corresponding to the maximum probability in the candidate knowledge unit set as the target knowledge unit, wherein the target knowledge unit is the next knowledge unit needing to be learned by the student.

The current learning state of the student may include the historical test result of the student and the learning target of the student, and may be embodied in a vector form, which is not specifically limited in the embodiment of the present invention. It should be noted that after determining the next knowledge unit to be learned in the above manner, the knowledge unit may be recommended to the student. If the student is learning the knowledge unit or has finished learning the knowledge unit, the knowledge unit can be used as the first knowledge unit currently learned by the student, and the next knowledge unit can be continuously recommended to the student according to the above steps 101 to 102. Through the recommendation process, the recommended knowledge units in each step can form a learning path until the students finish learning. The learning path can be used as a new target learning path and used in the process of adaptive learning.

According to the method provided by the embodiment of the invention, the candidate knowledge unit set is determined according to the target learning path and the first knowledge unit currently learned by the student. And determining the probability of the optimal solution when each knowledge unit in the candidate knowledge unit set is used as a target knowledge unit according to the current learning state of the student, and using the knowledge unit corresponding to the maximum probability in the candidate knowledge unit set as the target knowledge unit. The knowledge structure and the learning state of the student can be combined to recommend the next knowledge unit to be learned, so that the knowledge mastering degree of the student at different moments can be accurately analyzed, the recommendation result is more consistent with the cognitive rule, and efficient learning paths can be formulated for different students in a personalized manner.

Based on the content of the foregoing embodiment, as an optional embodiment, before determining the probability that each knowledge unit in the candidate knowledge unit set is the optimal solution when being used as the target knowledge unit according to the current learning state of the student, the current learning state of the student may also be obtained. The embodiment of the present invention does not specifically limit the manner of obtaining the current learning state of the student, including but not limited to: acquiring a current learning state vector of a student according to a historical test record of the student, wherein the historical test record is used for representing a test result of a knowledge unit in a target learning path; and acquiring an indication vector of the student, splicing the indication vector and the current learning state vector to obtain a vector serving as the current learning state of the student, wherein the indication vector is used for representing a knowledge unit serving as a learning target in a target learning path.

The historical test records refer to the answering conditions or the learning conditions of students to knowledge units examined in each historical test. Each historian can be represented by a historian vector, and all historians can be represented by the following sequence x ═ (x)₁，x₂,...). With x₁For example, x₁Showing the answer or learning condition of the first historical test record, i.e. the historical test vector, x, corresponding to the first historical test record₂The corresponding historical test vector of the second historical test record is shown, and the following vectors have the same reason. Taking the answering situation as an example, x₁May be twice the number of knowledge units.

For example, if only the knowledge unit with the ID of 130 is examined in the first historical test and a question is examined, and the student answers the question corresponding to the knowledge unit with the ID of 130, x₁＝(0，0，...,0，1⁽²⁶¹⁾,0,0,...,0). Wherein "261" refers to x₁And (4) an element of dimension 261 having a value of 1 indicating that the student answered the topic corresponding to the knowledge unit having an ID of 130. If the student wrote the question corresponding to the knowledge unit with ID 130, x₁＝(0，0，...,0，1⁽²⁶⁰⁾,0,0,...,0). Wherein "260" refers to x₁To middleAn element of dimension 260 having a value of 1 indicating that the student wrote the topic corresponding to the knowledge unit with ID 130.

That is, whether the question corresponding to each knowledge unit is right-to-answer or wrong-to-answer can be represented by two-dimensional elements. For example, whether the question corresponding to the knowledge unit with the ID of 1 is a right answer or a wrong answer can be represented by the elements of the 1 st dimension and the 2 nd dimension. Whether the question corresponding to the knowledge unit with the ID of 130 is a right answer or a wrong answer can be represented by elements in 260 th dimension and 261 th dimension.

As can be seen from the above, the historian can be represented by the historian vector, and the sequence x ═ x (x) of the historian vector is formed₁，x₂,..) can reflect the learning state of the student and the evolution rule of the learning situation of the student. Therefore, according to the historical test vector of the student, the current learning state vector of the student can be further obtained. The current learning state vector can reflect the learning state of the student after multiple historical test records, and can also reflect the evolution rule of the learning condition of the student.

In addition, the dimensions of the indicator vector may be the same as the number of knowledge units. By numbering all knowledge units, each knowledge unit may correspond to an element of a dimension. For example, the ith knowledge unit may correspond to an element indicating the ith dimension in the vector. For an element indicating the ith dimension in the vector, if the element of the ith dimension is 1, it may indicate that the ith knowledge unit is a knowledge unit that is a learning target. If the element in the ith dimension is 0, it may indicate that the ith knowledge unit is not a knowledge unit that is a learning target. Of course, the actual implementation process may be the reverse, that is, 1 represents a knowledge unit that is not a learning target, and 0 represents a knowledge unit that is a learning target, which is not specifically limited in the embodiment of the present invention.

It should be noted that there may be more than one knowledge unit as a learning target, and the embodiment of the present invention is not particularly limited to this. For example, the indication vector may be (0,0,0,0, 1)⁽⁵⁾,0,0,1⁽⁸⁾,0,...,0,1⁽¹⁰⁰⁾0, …, 0). Wherein "5", "8" and "100" mean the 5 thThe 8 th and 100 th knowledge units are knowledge units as learning targets. After the current learning state vector and the indication vector of the student are obtained, the current learning state vector and the indication vector can be spliced, and the vector obtained after splicing is used as the current learning state of the student.

According to the method provided by the embodiment of the invention, the current learning state vector of the student is obtained according to the historical test record of the student. And acquiring an indication vector of the student, and splicing the indication vector and the current learning state vector to obtain a vector serving as the current learning state of the student. The current learning state of the student can reflect the learning state of the student after multiple historical test records, and can also reflect the evolution law of the learning condition of the student, so that efficient learning paths can be formulated for different students in a personalized manner according to the current learning state of the student.

Based on the content of the above embodiment, as an optional embodiment, the historical test records are historical test vectors; accordingly, the embodiment of the present invention does not specifically limit the manner of obtaining the current learning state vector of the student according to the historical test records of the student, including but not limited to: and inputting each historical test vector into a preset model, and outputting a learning state vector corresponding to the latest historical test vector at the test moment as a current learning state vector.

Specifically, each historical test vector may correspond to a learning state vector. For example, History test vector x₁Can be used for learning the state vector S₁History test vector x_tCan be used for learning the state vector S_t. The preset model can be used for predicting the answer or learning condition of the student in the next test. The input of the preset model can be different historical test vectors, the output can be the answer or the learning condition of the student tested next time, and the output result can also be represented by the vectors. Since each historian test is performed at a different time, there is a sort in the time of the historian test vectors. The learning state vector corresponding to the latest historical test vector at the test time can reflect the current state of the student due to the combination of all the previous historical test vectorsThe learning state and the learning condition are regular in evolution and can be used as a current learning state vector. In addition, the preset model may be a long-short term memory model, and may also be a depth knowledge tracking model that is improved based on the long-short term memory model, which is not limited in this embodiment of the present invention.

According to the method provided by the embodiment of the invention, the current learning state vector can reflect the current learning state and the evolution rule of the learning condition of the student, so that an efficient learning path can be formulated for different students individually according to the current learning state of the student.

Based on the content of the above embodiment, as an optional embodiment, the preset model at least includes an embedded layer, a hidden layer, and a full connection layer; accordingly, the embodiment of the present invention does not specifically limit the manner of inputting each historical test vector into the preset model and outputting the learning state vector corresponding to each historical test vector, and includes but is not limited to: inputting each historical test vector into the embedding layer, and outputting a learning representation vector corresponding to each historical test vector; inputting each learning representation vector into the hidden layer, and outputting a learning state hidden vector corresponding to each historical test vector; and inputting the initial learning state hidden vector and each learning state hidden vector into the full connection layer, and outputting the learning state vector corresponding to the latest historical test vector at the test time.

In particular, since the historical test vectors may be sparse, the sparse vectors may be changed into dense vectors by the embedding layer, so as to compress the representation of the learning or answering situation. Fig. 2 is a diagram for a structure of a predetermined model and a process of outputting a learning state vector. In FIG. 2, x₁To x_tRepresenting historical test vectors, x₁' to x_t' denotes a learning token vector output after passing through an embedding layer, h₁To h_tRepresenting a learning state hidden vector h output after passing through a hidden layer₀Representing an initial learning state hidden vector, S₁To S_tRepresenting the learned state vector output after passing through the fully connected layer. As can be seen from FIG. 2, S₁Is based on h₀And h₁Obtained of S₂Is based on h₀、h₁And h₂Obtained by analogy with the following, S_tIs based on h₀To h_tAnd (4) obtaining the product.

It should be noted that the hidden vector of the initial learning state mainly plays a role of auxiliary computation, and its possible error will follow the slave S₁Is calculated to S_tBut gradually weakens. In addition, as shown in fig. 2, the preset model actually outputs the learning state vector corresponding to each historical testing vector, and the embodiment of the invention mainly needs to use S_tThat is, the learning state vector corresponding to the latest historical test vector at the test time is used as the current learning state vector.

Based on the content of the foregoing embodiment, as an optional embodiment, the embodiment of the present invention does not specifically limit the manner of determining the candidate knowledge unit set according to the target learning path and the first knowledge unit currently learned by the student, and includes but is not limited to: determining a second knowledge unit in m hops before a first knowledge unit in a target learning path and a third knowledge unit in n hops after the first knowledge unit in the target learning path, wherein m and n are positive integers not less than 1; and determining a candidate knowledge unit set according to the first knowledge unit, the second knowledge unit, the third knowledge unit and the knowledge unit serving as a learning target in the target learning path.

The values of m and n may be set according to requirements, and may be the same as or different from each other, which is not specifically limited in this embodiment of the present invention. As shown in fig. 3, fig. 3 is a schematic diagram of a target learning path. In fig. 3, each node represents a knowledge unit, and different knowledge units are distinguished by reference numbers in the nodes. Taking the node with the label 3 as the first knowledge unit and m as 1 as an example, the second knowledge unit in the previous 1 hops of the first knowledge unit is the node corresponding to the label 1. Taking n as 2 as an example, the third knowledge unit in 2 hops after the first knowledge unit is the node with the reference number 4 and the reference number 8. After the second knowledge unit and the third knowledge unit in the target learning path are determined, the candidate knowledge unit set can be directly formed by the second knowledge unit and the third knowledge unit. It should be noted that the number of the second knowledge unit and the third knowledge unit may be more than one.

It should be noted that, according to the conventional learning manner, after learning the first knowledge unit with the reference number 3, the student should need to continue learning backwards. However, it is considered that the student also has the possibility of reviewing, so that the second knowledge unit located before the first knowledge unit is also taken into consideration in the following learning, and the second knowledge unit is also placed into the candidate knowledge unit set.

According to the method provided by the embodiment of the invention, the second knowledge unit positioned in front of the first knowledge unit is also brought into the consideration range which may need to be learned subsequently, so that the students can review the knowledge to achieve a better learning effect.

Considering that the student learning process needs to follow a target learning path and aim at a learning end point (namely a knowledge unit as a learning target), if a candidate knowledge unit set is directly composed of the second knowledge unit and the third knowledge unit, it may cause that the knowledge unit in the candidate knowledge unit set cannot reach the learning end point, and further cause that the student learning process does not conform to the cognitive rule. For this situation, based on the content of the foregoing embodiment, as an alternative embodiment, the embodiment of the present invention does not specifically limit the manner of determining the candidate knowledge unit set according to the first knowledge unit, the second knowledge unit, the third knowledge unit and the knowledge unit as the learning target in the target learning path, and includes but is not limited to: screening the second knowledge unit and the third knowledge unit based on a preset condition and a target learning path, and forming a candidate knowledge unit set by the screened knowledge units; the preset condition is that a communication path can be formed between the first knowledge unit and the target knowledge unit.

For example, in fig. 3,

nodes

1, 2, 3, 4, 8, and 9 are capable of forming a communication path, while

nodes

2, 0,1, and 3 are incapable of forming a communication path due to the problem of the direction of the connection between the nodes.

According to the method provided by the embodiment of the invention, the second knowledge unit and the third knowledge unit are screened to obtain the candidate knowledge unit set, and then the next knowledge unit to be learned can be determined based on the candidate knowledge unit set obtained by screening. Since each knowledge unit in the candidate knowledge unit set, the first knowledge unit and the target knowledge unit can form a communication path, the learning path determined based on the communication path accords with the cognitive rules.

Based on the content of the foregoing embodiment, as an optional embodiment, the embodiment of the present invention does not specifically limit the manner of determining the probability of the optimal solution when each knowledge unit in the candidate knowledge unit set is used as the target knowledge unit according to the current learning state of the student, and includes, but is not limited to: acquiring a learning ability incremental value generated after a knowledge unit serving as a learning target in a target learning path is learned from a first knowledge unit; and determining a final value of a preset parameter in the strategy network model according to the learning capacity increment value, inputting the current learning state into the strategy network model, and outputting the probability that each knowledge unit in the candidate knowledge unit set is an optimal solution when the knowledge unit is used as a target knowledge unit.

The above process can be realized by a reinforcement learning mode, that is, the next knowledge unit to be learned by the student is determined by the strategy network model and the value network model. The reinforcement learning process mainly involves the following three elements, namely, status, action and reward.

Wherein, the state refers to the current learning state of the student. For the above embodiments, step 101 and step 102, each time the next knowledge unit to be learned by the student is determined to be an action. In addition, in the learning process, the reward signal is always 0 until the target knowledge unit is learned, that is, the learning process is finished. After the learning process is finished, the learning ability of the student can be definedThe force increment value serves as a reward signal. Wherein the learning ability incremental value

The calculation process of (c) can refer to the following formula:

in the above-mentioned formula,

represents a learning ability incremental value, E_sShowing the test results at the beginning of the learning phase, E_eShowing the test results after the end of the learning period, E_supIndicating the full score of the test. Wherein, before and after the learning stage, students can take part in the test once respectively, thereby obtaining E according to the test result_sAnd E_e。

Based on the above, the optimization goal of reinforcement learning can be given in a mathematical form:

in the above formula, γ is the constant of the discount factor in reinforcement learning, N is the total number of times in the learning stage, and r_jA reward signal, R, indicating each learning_iThe sum of the reward signals representing each step from the ith step to the nth step (i.e. representing the student learning from the first knowledge unit to the targeted knowledge unit).

Based on the content of the foregoing embodiment, as an optional embodiment, the embodiment of the present invention does not specifically limit the manner of determining the final value of the preset parameter in the policy network model according to the incremental learning capability value, which includes but is not limited to: and inputting the current learning state into the value network model, adjusting the value of a preset parameter in the value network model to minimize the difference between the output result of the value network model and the learning capacity incremental value, and taking the value of the preset parameter when the difference is minimum as the final value of the preset parameter, wherein the value network model and the strategy network model both comprise the preset parameter.

In particular, a value network model v (· | θ) may be defined first_v) For estimating the total value v of bonus earnings available in the future for a certain state_i＝v(state_i|θ_v). Wherein, theta_vRefers to preset parameters in the value network model. Applying a stochastic strategy to the value network model, the actor-judger algorithm may be applied to the recommendation generation of knowledge units. When proceeding to the action of step i, the policy gradient function may refer to the following formula:

in the above formula, π (. cndot.) represents a policy function in the policy network model. Each knowledge unit in the candidate knowledge unit set is equivalent to a learning path strategy and is in a given current learning state_iAnd after the values of the preset parameters are taken, the probability value of the optimal action when each knowledge unit in the candidate knowledge unit set is taken as the next knowledge unit needing to be learned can be output.

The final value of the preset parameter can be determined through a loss function of the value network model, and the loss function can refer to the following formula:

combining the policy gradient function of the policy network model with the loss function of the value network model, the loss function of the whole network can be obtained, specifically referring to the following formula:

in the above formulaIn the formula, both alpha and beta are hyperparameters. The output result v of the value network model can be enabled to be obtained by adjusting the value of the preset parameter in the value network model_iThe change is carried out until the difference between the output result of the value network model and the incremental value of the learning ability is minimum, and the value at the moment can be taken as a preset parameter theta_vTo obtain the final value of (1). Because the strategy network model and the value network model both contain preset parameters, after the preset parameters are determined, the current learning state can be input into the strategy network model, that is, the probability that each knowledge unit in the candidate knowledge unit set is the optimal solution when being used as a target knowledge unit can be output, and therefore the knowledge unit corresponding to the maximum probability can be used as the next knowledge unit needing learning by the student. It should be noted that the optimal solution refers to an optimization goal of reinforcement learning, that is, a difference between an output result of the value network model and a learning ability increment value is minimized.

According to the method provided by the embodiment of the invention, the learning path recommendation problem is converted into a stepwise Markov decision problem, and an actor-critic algorithm is applied to dynamically update the recommendation strategy, so that the knowledge units capable of realizing efficient learning are sequentially recommended to different students.

Based on the content of the foregoing embodiments, an embodiment of the present invention provides an adaptive learning apparatus, which is configured to execute the adaptive learning method provided in the foregoing method embodiments. Referring to fig. 4, the apparatus includes:

a first determining module 401, configured to determine a candidate knowledge unit set according to a target learning path and a first knowledge unit currently learned by a student, where the target learning path includes all knowledge units that the student needs to learn;

a second determining module 402, configured to determine, according to the current learning state of the student, a probability that each knowledge unit in the candidate knowledge unit set is an optimal solution when being used as a target knowledge unit, and use a knowledge unit corresponding to a maximum probability in the candidate knowledge unit set as the target knowledge unit, where the target knowledge unit is a knowledge unit that needs to be learned next by the student.

As an alternative embodiment, the apparatus further comprises:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a current learning state vector of a student according to a historical test record of the student, and the historical test record is used for representing a test result of a knowledge unit in a target learning path;

the second acquisition module is used for acquiring the instruction vectors of students;

and the splicing module is used for splicing the indication vector and the current learning state vector to obtain a vector which is used as the current learning state of the student, and the indication vector is used for representing a knowledge unit which is used as a learning target in the target learning path.

As an optional embodiment, the first obtaining module is configured to input each historical test vector into a preset model, and output a learning state vector corresponding to a historical test vector at a latest test time as a current learning state vector.

As an optional embodiment, the preset model at least comprises an embedded layer, a hidden layer and a full connection layer; correspondingly, the first acquisition module is used for inputting each historical test vector into the embedding layer and outputting a learning representation vector corresponding to each historical test vector; inputting each learning representation vector into the hidden layer, and outputting a learning state hidden vector corresponding to each historical test vector; and inputting the initial learning state hidden vector and each learning state hidden vector into the full connection layer, and outputting the learning state vector corresponding to the latest historical test vector at the test time.

As an optional embodiment, the first determining module 401 is configured to determine a second knowledge unit in m hops before the first knowledge unit in the target learning path and a third knowledge unit in n hops after the first knowledge unit in the target learning path, where m and n are positive integers not less than 1; and determining a candidate knowledge unit set according to the first knowledge unit, the second knowledge unit, the third knowledge unit and the knowledge unit serving as a learning target in the target learning path.

As an alternative embodiment, the second determining module 402 includes:

an acquisition unit configured to acquire a learning ability incremental value generated after learning from the first knowledge unit to a knowledge unit as a learning target in the target learning path;

the second determining unit is used for determining the final value of the preset parameter in the strategy network model according to the learning capacity incremental value;

and the second output unit is used for inputting the current learning state into the strategy network model and outputting the probability that each knowledge unit in the candidate knowledge unit set is the optimal solution when the knowledge unit is taken as the target knowledge unit.

As an optional embodiment, the second determining unit is configured to input the current learning state to the value network model, adjust a value of a preset parameter in the value network model to minimize a difference between an output result of the value network model and the learning ability incremental value, and use the value of the preset parameter when the difference is minimum as a final value of the preset parameter, where the value network model and the policy network model both include the preset parameter.

The device provided by the embodiment of the invention determines the candidate knowledge unit set according to the target learning path and the first knowledge unit currently learned by the student. And determining the probability of the optimal solution when each knowledge unit in the candidate knowledge unit set is used as a target knowledge unit according to the current learning state of the student, and using the knowledge unit corresponding to the maximum probability in the candidate knowledge unit set as the target knowledge unit. The knowledge structure and the learning state of the student can be combined to recommend the next knowledge unit to be learned, so that the knowledge mastering degree of the student at different moments can be accurately analyzed, the recommendation result is more consistent with the cognitive rule, and efficient learning paths can be formulated for different students in a personalized manner.

Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: a processor (processor)510, a communication Interface (Communications Interface)520, a memory (memory)530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may call logic instructions in memory 530 to perform the following method: determining a candidate knowledge unit set according to a target learning path and a first knowledge unit currently learned by a student, wherein the target learning path comprises all knowledge units required to be learned by the student; and determining the probability of the optimal solution when each knowledge unit in the candidate knowledge unit set is used as a target knowledge unit according to the current learning state of the student, and using the knowledge unit corresponding to the maximum probability in the candidate knowledge unit set as the target knowledge unit, wherein the target knowledge unit is the next knowledge unit needing to be learned by the student.

Furthermore, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the method provided in the foregoing embodiments when executed by a processor, and the method includes: determining a candidate knowledge unit set according to a target learning path and a first knowledge unit currently learned by a student, wherein the target learning path comprises all knowledge units required to be learned by the student; and determining the probability of the optimal solution when each knowledge unit in the candidate knowledge unit set is used as a target knowledge unit according to the current learning state of the student, and using the knowledge unit corresponding to the maximum probability in the candidate knowledge unit set as the target knowledge unit, wherein the target knowledge unit is the next knowledge unit needing to be learned by the student.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An adaptive learning method, comprising:

determining the probability of an optimal solution when each knowledge unit in the candidate knowledge unit set is used as a target knowledge unit according to the current learning state of the student, and using the knowledge unit corresponding to the maximum probability in the candidate knowledge unit set as the target knowledge unit, wherein the target knowledge unit is the next knowledge unit to be learned by the student;

wherein the current learning state comprises the historical quiz scores of the student and the learning target of the student;

the probability of determining that each knowledge unit in the candidate knowledge unit set is an optimal solution when being used as a target knowledge unit is realized through reinforcement learning according to the current learning state of the student, the state in the reinforcement learning process is the current learning state of the student, the action in the reinforcement learning process is to determine the next knowledge unit needing learning of the student, and the reward in the reinforcement learning process is the learning capacity increment value of the student after the student learns to the target knowledge unit.

2. The adaptive learning method according to claim 1, wherein before determining the probability of the optimal solution for each knowledge unit in the candidate knowledge unit set as the target knowledge unit according to the current learning state of the student, the method further comprises:

acquiring a current learning state vector of the student according to a historical test record of the student, wherein the historical test record is used for representing a test result of a knowledge unit in the target learning path;

and acquiring an indication vector of the student, splicing the indication vector and the current learning state vector to obtain a vector as the current learning state of the student, wherein the indication vector is used for representing a knowledge unit serving as a learning target in the target learning path.

3. The adaptive learning method of claim 2, wherein the historian is a historian vector; correspondingly, the obtaining the current learning state vector of the student according to the historical test record of the student comprises:

and inputting each historical test vector into a preset model, and outputting a learning state vector corresponding to the latest historical test vector at the test moment as the current learning state vector.

4. The adaptive learning method according to claim 3, wherein the predetermined model at least comprises an embedded layer, a hidden layer and a fully connected layer; correspondingly, the inputting each historical test vector into the preset model, and outputting the learning state vector corresponding to the latest historical test vector at the test time includes:

inputting each historical test vector into the embedding layer, and outputting a learning representation vector corresponding to each historical test vector;

inputting each learning representation vector into the hidden layer, and outputting a learning state hidden vector corresponding to each historical test vector;

and inputting the initial learning state hidden vector and each learning state hidden vector into the full connection layer, and outputting the learning state vector corresponding to the latest historical test vector at the test time.

5. The adaptive learning method according to claim 1, wherein the determining a set of candidate knowledge units according to the target learning path and the first knowledge unit currently learned by the student comprises:

determining a second knowledge unit in m hops before the first knowledge unit in the target learning path and a third knowledge unit in n hops after the first knowledge unit in the target learning path, wherein m and n are positive integers not less than 1;

and determining the candidate knowledge unit set according to the first knowledge unit, the second knowledge unit, the third knowledge unit and a knowledge unit serving as a learning target in the target learning path.

6. The adaptive learning method according to claim 1, wherein the determining the probability of the optimal solution for each knowledge unit in the candidate knowledge unit set as the target knowledge unit according to the current learning state of the student comprises:

acquiring a learning ability increment value generated after the first knowledge unit learns the knowledge unit serving as a learning target in the target learning path;

and determining a final value of a preset parameter in a strategy network model according to the learning capacity increment value, inputting the current learning state into the strategy network model, and outputting the probability that each knowledge unit in the candidate knowledge unit set is an optimal solution when the knowledge unit is used as a target knowledge unit.

7. The adaptive learning method according to claim 6, wherein the determining a final value of a preset parameter in a policy network model according to the learning ability incremental value comprises:

and inputting the current learning state into a value network model, adjusting the value of the preset parameter in the value network model to minimize the difference between the output result of the value network model and the learning capacity incremental value, and taking the value of the preset parameter when the difference is minimum as the final value of the preset parameter, wherein the value network model and the strategy network model both comprise the preset parameter.

8. An adaptive learning apparatus, comprising:

a second determining module, configured to determine, according to the current learning state of the student, a probability that each knowledge unit in the candidate knowledge unit set is an optimal solution when being used as a target knowledge unit, and use a knowledge unit corresponding to a maximum probability in the candidate knowledge unit set as the target knowledge unit, where the target knowledge unit is a knowledge unit that needs to be learned next by the student;

9. An electronic device, comprising:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 7.

10. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1 to 7.