CN110288878A

CN110288878A - Adaptive learning method and device

Info

Publication number: CN110288878A
Application number: CN201910584394.2A
Authority: CN
Inventors: 马海平; 刘淇; 陈恩红; 王士进; 童世炜; 黄振亚
Original assignee: University of Science and Technology of China USTC; iFlytek Co Ltd
Current assignee: University of Science and Technology of China USTC; iFlytek Co Ltd
Priority date: 2019-07-01
Filing date: 2019-07-01
Publication date: 2019-09-27
Anticipated expiration: 2039-07-01
Also published as: CN110288878B

Abstract

The embodiment of the present invention provides a kind of adaptive learning method and device, belongs to machine learning techniques field.Include: the first blocks of knowledge currently learnt according to target learning path and student, determine candidate's blocks of knowledge set, includes all blocks of knowledge that student needs to learn in target learning path；According to the current learning states of student, determine probability when each blocks of knowledge is as object knowledge unit in candidate's blocks of knowledge set for optimal solution, and using the corresponding blocks of knowledge of maximum probability in candidate blocks of knowledge set as object knowledge unit, object knowledge unit is the next blocks of knowledge for needing to learn of student.Due to recommending next blocks of knowledge for needing to learn in combination with the learning state of the structure of knowledge and student, so as to accurately analyze student in the acquisition of knowledge degree of different moments, and recommendation results are made more to meet cognitive law, and then efficient learning path can be formulated personalizedly for different students.

Description

Adaptive learning method and device

Technical field

The present invention relates to machine learning techniques field more particularly to a kind of adaptive learning methods and device.

Background technique

Traditional education at present, especially classroom instruction carry out general type education just for a class or a group, difficult To meet student individuality demand.Meanwhile traditional education is big to educational resource demand, the present education inadequate resource the case where Under, the phenomenon that being easy to produce the uneven situation of educational resource distribution, be easy to cause educational inequality.Now it is badly in need of a kind of adaptive Learning method, to recommend the blocks of knowledge for being suitble to student's study to meet the individualized learning demand of different students.

Summary of the invention

To solve the above-mentioned problems, the embodiment of the present invention provides one kind and overcomes the above problem or at least be partially solved State the adaptive learning method and device of problem.

According to a first aspect of the embodiments of the present invention, a kind of adaptive learning method is provided, comprising:

The first blocks of knowledge currently learnt according to target learning path and student determines candidate's blocks of knowledge set, mesh It include all blocks of knowledge that student needs to learn in mark learning path；

According to the current learning states of student, determine that each blocks of knowledge is as object knowledge in candidate's blocks of knowledge set It is the probability of optimal solution when unit, and using the corresponding blocks of knowledge of maximum probability in candidate blocks of knowledge set as object knowledge Unit, object knowledge unit are the next blocks of knowledge for needing to learn of student.

According to a second aspect of the embodiments of the present invention, a kind of adaptive learning device is provided, comprising:

First determining module, the first blocks of knowledge for currently being learnt according to target learning path and student are determined and are waited Blocks of knowledge set is selected, includes all blocks of knowledge that student needs to learn in target learning path；

Second determining module, it is each in determining candidate's blocks of knowledge set to know for the current learning states according to student When knowing unit as object knowledge unit it is the probability of optimal solution, and knows maximum probability is corresponding in candidate blocks of knowledge set Unit is known as object knowledge unit, and object knowledge unit is the next blocks of knowledge for needing to learn of student.

According to a third aspect of the embodiments of the present invention, a kind of electronic equipment is provided, comprising:

At least one processor；And

At least one processor being connect with processor communication, in which:

Memory is stored with the program instruction that can be executed by processor, and the instruction of processor caller is able to carry out first party Adaptive learning method provided by any possible implementation in the various possible implementations in face.

According to the fourth aspect of the invention, a kind of non-transient computer readable storage medium, non-transient computer are provided Readable storage medium storing program for executing stores computer instruction, and computer instruction makes the various possible implementations of computer execution first aspect In adaptive learning method provided by any possible implementation.

Adaptive learning method and device provided in an embodiment of the present invention, by current according to target learning path and student First blocks of knowledge of study determines candidate's blocks of knowledge set.According to the current learning states of student, candidate's knowledge list is determined Member set in each blocks of knowledge as object knowledge unit when be optimal solution probability, and by candidate blocks of knowledge set most The corresponding blocks of knowledge of maximum probability is as object knowledge unit.Under recommending due to the learning state in combination with the structure of knowledge and student One blocks of knowledge for needing to learn, so as to accurately analyze student in the acquisition of knowledge degree of different moments, and to push away It recommends result and more meets cognitive law, and then efficient learning path can be formulated personalizedly for different students.

It should be understood that above general description and following detailed description be it is exemplary and explanatory, can not Limit the embodiment of the present invention.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.

Fig. 1 is a kind of flow diagram of adaptive learning method provided in an embodiment of the present invention；

Fig. 2 is a kind of schematic diagram of target learning path provided in an embodiment of the present invention；

Fig. 3 is a kind of structural schematic diagram of preset model provided in an embodiment of the present invention；

Fig. 4 is a kind of structural schematic diagram of adaptive learning device provided in an embodiment of the present invention；

Fig. 5 is the block diagram of a kind of electronic equipment provided in an embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.

Currently, adaptive learning method mainly has the following two kinds:

(1) based on the method for learning state

Since the learning ability of different students is different, hence for each student, learned brought by each blocks of knowledge It is also different to practise income.Therefore, the learning state and its changing rule of different students is different.It is managed based on Item Response Pattern By can speculate the state and ability of student, and recommend to student based on this according to performance of the student on different blocks of knowledge The moderate blocks of knowledge of difficulty.In addition to this it is possible to adaptive learning process is considered as a Markovian decision process, it will The transfer matrix of the evolution process of student's learning state Markovian decision process is dug to simulate using nitrification enhancement Dig the relationship between learning state and blocks of knowledge.

(2) method of knowledge based structure

By carrying out data analysis to the relationship between blocks of knowledge, carried out in conjunction with similitude, difficulty of blocks of knowledge etc. Recommend.Specifically, knowledge mapping can be introduced wherein, by the way that knowledge hierarchy is expressed as map form, knowledge based unit Feature and relationship formulate recommendation rules to plan learning path for student.Alternatively, the learning ability due to student can reflect It learns on track, and similar learning ability corresponds to the similar structure of knowledge, so as to use the methods of collaborative filtering to Student recommends similar blocks of knowledge.For example, the method migration that traditional electric business is recommended is applied in education recommendation.

For above-mentioned first way, the method based on student's learning state can not effectively utilize existing knowledge knot Structure may provide the illogical learning path for violating human cognitive rule.And it is directed to the above-mentioned second way, based on knowing Know structure method can not effectively for different students formulate individualized learning method, can only as traditional education method, from Group's level cooks up the learning path of general type, to not can guarantee the high efficiency of study.

The problem of for above two mode, the embodiment of the invention provides a kind of adaptive learning methods.It should Method can be used for the recommendation scene of blocks of knowledge, and the present invention is not especially limit this.Specifically, since student exists When learning certain subject, different blocks of knowledge is front and back place mat, such as finishes function and the limit, could learn derivative and micro- product Point, so that student after having learnt a certain blocks of knowledge, needs to recommend next blocks of knowledge for needing to learn to it.Referring to figure 1, this method comprises:

101, the first blocks of knowledge currently learnt according to target learning path and student determines candidate's blocks of knowledge collection It closes, includes all blocks of knowledge that student needs to learn in target learning path.

Wherein, target learning path refers to universality learning path, may include blocks of knowledge and blocks of knowledge it Between sequencing, can be specifically indicated by way of the connection schematic diagram with direction.First blocks of knowledge can root It is determined according to the studying progress of student, as student is currently learning the 3rd blocks of knowledge or the 3rd blocks of knowledge just Study finishes, then the 3rd blocks of knowledge is the first blocks of knowledge that the student currently learns.In addition, in target learning path Comprising first blocks of knowledge, candidate blocks of knowledge set is also the blocks of knowledge screened from target learning path.

102, according to the current learning states of student, determine that each blocks of knowledge is as target in candidate's blocks of knowledge set It is the probability of optimal solution when blocks of knowledge, and using the corresponding blocks of knowledge of maximum probability in candidate blocks of knowledge set as target Blocks of knowledge, object knowledge unit are the next blocks of knowledge for needing to learn of student.

Wherein, the current learning states of student may include the history performance of the test of the student and the learning objective of the student, And can be embodied by way of vector, the present invention is not especially limit this.It should be noted that according to upper After the mode of stating determines next blocks of knowledge for needing to learn, the blocks of knowledge can be recommended to student.If the student is learning The blocks of knowledge learns the blocks of knowledge that is over, then the first knowledge that can currently learn the blocks of knowledge as the student Unit, and continue as the student according to above-mentioned steps 101 to 102 and recommend next blocks of knowledge.By above-mentioned recommendation process, directly To student's study, the blocks of knowledge that each step is recommended can form a learning path.The learning path can be used as New target learning path, and it is used for the process of adaptive learning.

Method provided in an embodiment of the present invention passes through the first knowledge list currently learnt according to target learning path and student Member determines candidate's blocks of knowledge set.According to the current learning states of student, each knowledge in candidate's blocks of knowledge set is determined It is the probability of optimal solution when unit is as object knowledge unit, and by the corresponding knowledge of maximum probability in candidate blocks of knowledge set Unit is as object knowledge unit.Since the learning state in combination with the structure of knowledge and student is recommended next to need what is learnt to know Know unit, so as to accurately analyze student in the acquisition of knowledge degree of different moments, and recommendation results is made more to meet cognition Rule, and then can efficient learning path be formulated for different students personalizedly.

Content based on the above embodiment is determined as a kind of alternative embodiment in the current learning states according to student When each blocks of knowledge is as object knowledge unit in candidate blocks of knowledge set for the probability of optimal solution before, can also obtain The current learning states of student.The embodiment of the present invention does not limit the mode for the current learning states for obtaining student specifically, wraps It includes but is not limited to: being test and recorded according to the history of student, obtain the current learning states vector of student, history test record is used for Indicate the test result to blocks of knowledge in target learning path；The instruction vector for obtaining student, by instruction vector and current Current learning states of the vector obtained after state vector is spliced as student are practised, instruction vector is for indicating that target learns As the blocks of knowledge of learning objective in path.

Wherein, history test record refers to the blocks of knowledge for examining in the test of each history, and student knows these Know the answer situation or study situation of unit.Each history test record can test vector by a history and carry out table Show, all history test records can be indicated x=(x by following sequence₁, x₂...).With x₁For, x₁Indicate be First time history test record answer or study situation namely first time history test record corresponding history test to Amount, x₂What is indicated is that second of history test records corresponding history test vector, and subsequent vector is similarly.It is with answer situation Example, x₁Dimension can be twice of blocks of knowledge quantity.

For example, if the blocks of knowledge that only an examination ID is 130 in the test of first time history, and has examined a problem, and be somebody's turn to do Student has answered questions the corresponding topic of blocks of knowledge that ID is 130, then x₁=(0,0 ..., 0,1⁽²⁶¹⁾, 0,0 ..., 0).Wherein, " 261 " refer to x₁In the 261st dimension element, it is corresponding that the value of the element is that 1 expression student has answered questions blocks of knowledge that ID is 130 Topic.The corresponding topic of blocks of knowledge that ID is 130 if the student has answered wrong, x₁=(0,0 ..., 0,1⁽²⁶⁰⁾, 0, 0 ..., 0).Wherein, " 260 " refer to x₁In the 260th dimension element, the value of the element is that have answered ID wrong be 130 to 1 expression student The corresponding topic of blocks of knowledge.

That is, mistake is answered questions or answered to the corresponding topic of each blocks of knowledge, the element that can pass through two dimensions carries out table Show.For example, ID be 1 the corresponding topic of blocks of knowledge answer questions or answer mistake, can by the 1st dimension and the 2nd dimension element into Row indicates.Mistake is answered questions or answered to the corresponding topic of blocks of knowledge that ID is 130, can pass through the 260th dimension and the member of the 261st dimension Element is indicated.

As shown in the above, history test record can test vector by history and be indicated, and these are gone through History tests the sequence x=(x of vector composition₁, x₂...) and it can reflect the learning state of student and the differentiation of student study situation Rule.Therefore, vector is test according to the history of student, can further obtains the current learning states vector of student.Wherein, when Preceding learning state vector can reflect the learning state of the student after multiple history test record, can also reflect that student learns The development law of situation.

In addition, the dimension of instruction vector can be identical as blocks of knowledge quantity.By the way that all blocks of knowledge are numbered, Each blocks of knowledge can correspond to the element of a dimension.For example, i-th of blocks of knowledge can correspond to i-th dimension in instruction vector Element.I-th of blocks of knowledge, which is as, to be indicated if the element of i-th dimension is 1 for the element of i-th dimension in instruction vector Practise the blocks of knowledge of target.If the element of i-th dimension is 0, i-th of blocks of knowledge can be indicated not as the knowledge of learning objective Unit.It certainly, can also in turn during actual implementation namely 1 represents blocks of knowledge not as learning objective, 0 generation Table is the blocks of knowledge as learning objective, and the present invention is not especially limit this.

It should be noted that the blocks of knowledge as learning objective can not make this with more than one, the embodiment of the present invention It is specific to limit.For example, instruction vector can be (0,0,0,0,1⁽⁵⁾,0,0,1⁽⁸⁾,0,...,0,1⁽¹⁰⁰⁾,0,…,0).Wherein, " 5 ", " 8 " and " 100 " indicate that the 5th, the 8th and the 100th blocks of knowledge is the blocks of knowledge as learning objective.It is obtaining After the current learning states vector and instruction vector of student, the two can be spliced, thus the vector that will be obtained after splicing Current learning states as student.

Method provided in an embodiment of the present invention is recorded by being test according to the history of student, obtains the current study of student State vector.The instruction vector for obtaining student, the vector that will indicate that vector and current learning states vector obtain after being spliced Current learning states as student.Record junior scholar is test by multiple history since the current learning states of student can reflect Raw learning state can also reflect that student learns the development law of situation, thus the subsequent current learning states according to student, Can efficient learning path be formulated for different students personalizedly.

Content based on the above embodiment, as a kind of alternative embodiment, history test is recorded as history test vector；Phase Ying Di, the embodiment of the present invention do not test record to according to the history of student, obtain the mode of the current learning states vector of student Make it is specific limit, including but not limited to: each history test vector being input in preset model, the output test moment is the latest History tests the corresponding learning state vector of vector, and as current learning states vector.

Specifically, each history test vector can correspond to a learning state vector.For example, history tests vector x₁ Learning state vector S can be corresponded to₁, history test vector x_tLearning state vector S can be corresponded to_t.Wherein, preset model can be used for The answer or study situation of the student is test in prediction next time.The input of preset model can test vector for different history, defeated Out can be to test the answer of the student next time or learning situation, output result can also be indicated by vector.Due to going through every time History test was carried out in different moments, so that history test vector has sequence in time.And test the moment the latest History tests the corresponding learning state vector of vector, and it is to be able to reflect that all history, which tests vector, before being combined due to it The current learning states of the student and study situation development law, so as to as current learning states vector.In addition, default Model can be specially shot and long term memory models, can also chase after for the deep knowledge after being improved based on shot and long term memory models Track model, the present invention is not especially limit this.

Method provided in an embodiment of the present invention, since current learning states vector is the current study for being able to reflect the student State and study situation development law can be not classmate thus subsequent current learning states according to student personalizedly It is raw to formulate efficient learning path.

Content based on the above embodiment, as a kind of alternative embodiment, preset model include at least embeding layer, hidden layer and Full articulamentum；Correspondingly, the embodiment of the present invention is not input in preset model to by each history test vector, exports each go through The mode of the corresponding learning state vector of history test vector specifically limits, including but not limited to: each history is test vector It is input to embeding layer, exports the corresponding study characterization vector of each history test vector；Each study characterization vector is input to Hidden layer exports the corresponding hidden vector of learning state of each history test vector；By the hidden vector of initial learning state and each study The hidden vector of state is input to full articulamentum, the corresponding learning state vector of history test vector of output test moment the latest.

Specifically, since history test vector may be more sparse, to sparse vector can be become by embeding layer For dense vector, to be used to compress the characterization of study or answer situation.Wherein, the structure of preset model and output learning state to The process of amount can refer to Fig. 2.In Fig. 2, x₁To x_tIndicate that history tests vector, x₁' to x_t' indicate to export after embeding layer Study characterize vector, h₁To h_tIndicate the hidden vector of learning state exported after hidden layer, h₀Indicate initial learning state it is hidden to Amount, S₁To S_tIndicate the learning state vector exported after full articulamentum.As shown in Figure 2, S₁It is based on h₀And h₁It obtains, S₂ It is based on h₀、h₁And h₂It obtains, subsequent and so on, S_tIt is based on h₀To h_tIt obtains.

It is calculated it should be noted that the initial hidden vector of learning state mainly plays the role of auxiliary, it is possible Error can be with from S₁It calculates to S_tAnd gradually weaken.In addition, such as Fig. 2, preset model can actually export each history test to Corresponding learning state vector is measured, the embodiment of the present invention is mainly needed using S_t, namely test moment history the latest test to Corresponding learning state vector is measured, and as current learning states vector.

Content based on the above embodiment, as a kind of alternative embodiment, the embodiment of the present invention does not learn to according to target The first blocks of knowledge that path and student currently learn determines that the mode of candidate's blocks of knowledge set specifically limits, including but It is not limited to: determining the second blocks of knowledge and target learning path in target learning path before the first blocks of knowledge in m jump In after the first blocks of knowledge n jump in third blocks of knowledge, m and n are the positive integer not less than 1；According to the first knowledge list Blocks of knowledge in member, the second blocks of knowledge, third blocks of knowledge and target learning path as learning objective, determines that candidate knows Know unit set.

Wherein, the value of m and n can be configured according to demand, and the two may be the same or different, and the present invention is implemented Example is not especially limited this.As shown in figure 3, Fig. 3 is a kind of schematic diagram of target learning path.In Fig. 3, each node A blocks of knowledge is represented, different blocks of knowledge is distinguished by the label in node.With the node marked as 3 for the One blocks of knowledge, for m is 1, then before the first blocks of knowledge 1 jump in the second blocks of knowledge, the as corresponding section of label 1 Point.By taking n is 2 as an example, after the first blocks of knowledge 2 jump in third blocks of knowledge, for node marked as 4 and marked as 8.? After determining the second blocks of knowledge and third blocks of knowledge in target learning path, can directly it be known by the second blocks of knowledge and third Know unit and forms candidate blocks of knowledge set.It should be noted that the respective quantity of the second blocks of knowledge and third blocks of knowledge It may more than one.

It should also be noted that, according to conventional mode of learning, student after having learnt the first blocks of knowledge marked as 3, It should need to continue to learn backward.But it is considered that student learns a possibility that also review in need, to be located at the first knowledge list The second blocks of knowledge before member be also included in it is subsequent may need within the scope of the considerations of learning, and then the second blocks of knowledge is also put It is placed in candidate blocks of knowledge set.

Method provided in an embodiment of the present invention, since the second blocks of knowledge before being located at the first blocks of knowledge is also included in It is subsequent to need within the scope of the considerations of learning, it is reviewed so as to allow student to realize, to reach better learning effect.

In view of student's learning process need to be in accordance with target learning path and to learn terminal (namely knowing as learning objective Know unit) for the purpose of, and if directly forming candidate blocks of knowledge set by the second blocks of knowledge and third blocks of knowledge, it can It can will lead in candidate blocks of knowledge set that there are blocks of knowledge cannot reach study terminal, so that students'learning is not Meet cognitive law.For the situation, content based on the above embodiment, as a kind of alternative embodiment, the embodiment of the present invention Not to according in the first blocks of knowledge, the second blocks of knowledge, third blocks of knowledge and target learning path as learning objective Blocks of knowledge determines that the mode of candidate's blocks of knowledge set specifically limits, including but not limited to: being based on preset condition and target Learning path screens the second blocks of knowledge and third blocks of knowledge, and forms candidate by the blocks of knowledge after screening and know Know unit set；Wherein, preset condition is for energy and the first blocks of knowledge and as composition connection road between the blocks of knowledge of target Diameter.

For example, in Fig. 3, node 1,2,3,4,8,9 can make up communication path, and node 2,0,1,3, due to section Line directionality problem between point, can not form communication path.

Method provided in an embodiment of the present invention is obtained by screening to the second blocks of knowledge and third blocks of knowledge Candidate blocks of knowledge set, it is subsequent to determine based on the obtained candidate blocks of knowledge set of screening and next need what is learnt to know Know unit.Due to each blocks of knowledge and the first blocks of knowledge in candidate blocks of knowledge set and as the blocks of knowledge of target Between can form communication path, thus make based on this determine learning path meet cognitive law.

Content based on the above embodiment, as a kind of alternative embodiment, the embodiment of the present invention is not worked as to according to student Preceding learning state determines probability when each blocks of knowledge is as object knowledge unit in candidate's blocks of knowledge set for optimal solution Mode specifically limit, including but not limited to: obtain from the first blocks of knowledge learn into target learning path as study Generated learning ability incremental value after the blocks of knowledge of target；According to learning ability incremental value, determine in strategy network model The final value of parameter preset, and current learning states are input to tactful network model, it exports in candidate blocks of knowledge set Each blocks of knowledge is the probability of optimal solution when as object knowledge unit.

Wherein, the above process can be realized by way of intensified learning, namely pass through tactful network model and value network Network model determines the next blocks of knowledge for needing to learn of student.The process of intensified learning is mainly concerned with following three element, Respectively state, movement and reward.

Wherein, state is the current learning states for referring to student.For the step 101 and step in above-described embodiment 102, determine that the blocks of knowledge that the next needs of student learn is a movement each time.In addition, in learning process, reward Signal is always 0, until study to the blocks of knowledge namely learning process as target terminates.It, can after learning process To define the learning ability incremental value of student as prize signal.Wherein, learning ability incremental valueCalculating process can refer to Following formula:

In above-mentioned formula,Indicate learning ability incremental value, E_sIndicate the performance of the test when study stage starts, E_eTable Performance of the test after the dendrography habit stage, E_supIndicate the full marks value of test.Wherein it is possible to before the study stage starts and tie Shu Hou makes student respectively participate in primary test, to respectively obtain E according to test result_sAnd E_e。

Based on above content, the optimum target of intensified learning can be provided with mathematical form:

In above-mentioned formula, γ is the discount factor constant in intensified learning, and N is the total degree in the study stage, r_jTable Show the prize signal learnt every time, R_iIndicate to move that (namely what is indicated is student from the first knowledge list to the movement of N step from the i-th step Meta learning is to the blocks of knowledge for being used as target) between each step movement the sum of prize signal.

Content based on the above embodiment, as a kind of alternative embodiment, the embodiment of the present invention is not to according to learning ability Incremental value determines that the mode of the final value of parameter preset in strategy network model specifically limits, including but not limited to: will work as Preceding learning state is input to PN model, the value of parameter preset in PN model is adjusted, so that value network Difference between the output result and learning ability incremental value of model is minimum, and using the value of parameter preset when difference minimum as The final value of parameter preset, PN model and tactful network model include parameter preset.

Specifically, can first define a PN model v (| θ_v), for estimating that a certain state can obtain in future The reward income total value v obtained_i=v (state_i|θ_v).Wherein, θ_vRefer to the parameter preset in PN model.It will be random Strategy is applied in PN model, performer-judge man algorithm can be applied on the recommendation generation of blocks of knowledge.? When proceeding to the movement of the i-th step, Policy-Gradient function can refer to following formula:

In above-mentioned formula, π () indicates the strategic function in tactful network model.It is every in candidate blocks of knowledge set One blocks of knowledge is equivalent to a kind of strategy of learning path, in given current learning states state_iWith the value of parameter preset Afterwards, when each blocks of knowledge in exportable candidate blocks of knowledge set is as next blocks of knowledge for needing to learn, make For the probability value of optimal movement.

Wherein, the final value of parameter preset can be determined by the loss function of PN model, loss function It can refer to following formula:

The loss function of the Policy-Gradient function of tactful network model and PN model is combined, it is available The loss function of whole network, with specific reference to following formula:

In above-mentioned formula, α and β are hyper parameter.It, can be with by adjusting the value of parameter preset in PN model So that the output result v of PN model_iIt changes, until the output result and learning ability increment of PN model Difference between value is minimum, can be using value at this time as parameter preset θ_vFinal value.Due to tactful network model with PN model includes parameter preset, and after determining parameter preset, current learning states can be input to tactful network Model, i.e., each blocks of knowledge is the general of optimal solution when as object knowledge unit in exportable candidate blocks of knowledge set Rate, so as to using the corresponding blocks of knowledge of maximum probability as the next blocks of knowledge for needing to learn of student.It needs to illustrate It is that optimal solution refers to can reach the optimum target of intensified learning, namely makes the output result of PN model and learn Difference between habit ability incremental value is minimum.

Method provided in an embodiment of the present invention, by converting Markovian decision gradually for learning path recommendation problem Problem, and performer-reviewer's algorithm is applied, dynamic updates Generalization bounds, to sequentially recommend to be able to achieve height to different students Imitate the blocks of knowledge of study.

Content based on the above embodiment, the embodiment of the invention provides a kind of adaptive learning device, adaptive Device is practised for executing the adaptive learning method provided in above method embodiment.Referring to fig. 4, which includes:

First determining module 401, the first blocks of knowledge for currently being learnt according to target learning path and student determine Candidate blocks of knowledge set includes all blocks of knowledge that student needs to learn in target learning path；

Second determining module 402 determines each in candidate's blocks of knowledge set for the current learning states according to student It is the probability of optimal solution when blocks of knowledge is as object knowledge unit, and maximum probability in candidate blocks of knowledge set is corresponding For blocks of knowledge as object knowledge unit, object knowledge unit is the next blocks of knowledge for needing to learn of student.

As a kind of alternative embodiment, the device further include:

First obtains module, records for being test according to the history of student, obtains the current learning states vector of student, go through History test record is for indicating the test result to blocks of knowledge in target learning path；

Second obtains module, for obtaining the instruction vector of student；

Splicing module, for the vector obtained after vector and current learning states vector are spliced will to be indicated as student Current learning states, instruction vector is used to indicate blocks of knowledge in target learning path as learning objective.

Module is obtained as a kind of alternative embodiment, first, for each history test vector to be input to preset model In, the corresponding learning state vector of history test vector of output test moment the latest, and as current learning states vector.

As a kind of alternative embodiment, preset model includes at least embeding layer, hidden layer and full articulamentum；Correspondingly, first Module is obtained, for each history test vector to be input to embeding layer, exports the corresponding learning table of each history test vector Levy vector；Each study characterization vector is input to hidden layer, exports the corresponding hidden vector of learning state of each history test vector； The hidden vector of initial learning state and the hidden vector of each learning state are input to full articulamentum, the history of output test moment the latest Test the corresponding learning state vector of vector.

As a kind of alternative embodiment, the first determining module 401, for determining the first blocks of knowledge in target learning path Third knowledge list in the second blocks of knowledge and target learning path in m jump before after the first blocks of knowledge in n jump Member, m and n are the positive integer not less than 1；According to the first blocks of knowledge, the second blocks of knowledge, third blocks of knowledge and target The blocks of knowledge in path as learning objective is practised, determines candidate's blocks of knowledge set.

As a kind of alternative embodiment, the second determining module 402, comprising:

Acquiring unit, for obtaining the knowledge learnt into target learning path as learning objective from the first blocks of knowledge Generated learning ability incremental value after unit；

Second determination unit is used for according to learning ability incremental value, and parameter preset is final in determining strategy network model Value；

Second output unit exports candidate blocks of knowledge collection for current learning states to be input to tactful network model Each blocks of knowledge is the probability of optimal solution when as object knowledge unit in conjunction.

As a kind of alternative embodiment, the second determination unit, for current learning states to be input to PN model, The value of parameter preset in PN model is adjusted, so that the output result of PN model and learning ability incremental value Between difference it is minimum, and using the value of parameter preset when difference minimum as the final value of parameter preset, value network mould Type and tactful network model include parameter preset.

Device provided in an embodiment of the present invention passes through the first knowledge list currently learnt according to target learning path and student Member determines candidate's blocks of knowledge set.According to the current learning states of student, each knowledge in candidate's blocks of knowledge set is determined It is the probability of optimal solution when unit is as object knowledge unit, and by the corresponding knowledge of maximum probability in candidate blocks of knowledge set Unit is as object knowledge unit.Since the learning state in combination with the structure of knowledge and student is recommended next to need what is learnt to know Know unit, so as to accurately analyze student in the acquisition of knowledge degree of different moments, and recommendation results is made more to meet cognition Rule, and then can efficient learning path be formulated for different students personalizedly.

Fig. 5 illustrates the entity structure schematic diagram of a kind of electronic equipment, as shown in figure 5, the electronic equipment may include: place Manage device (processor) 510, communication interface (Communications Interface) 520,530 He of memory (memory) Communication bus 540, wherein processor 510, communication interface 520, memory 530 complete mutual lead to by communication bus 540 Letter.Processor 510 can call the logical order in memory 530, to execute following method: according to target learning path and Raw the first blocks of knowledge currently learnt, determines candidate's blocks of knowledge set, includes that student needs to learn in target learning path All blocks of knowledge；According to the current learning states of student, each blocks of knowledge conduct in candidate's blocks of knowledge set is determined Be the probability of optimal solution when object knowledge unit, and using the corresponding blocks of knowledge of maximum probability in candidate blocks of knowledge set as Object knowledge unit, object knowledge unit are the next blocks of knowledge for needing to learn of student.

In addition, the logical order in above-mentioned memory 530 can be realized by way of SFU software functional unit and conduct Independent product when selling or using, can store in a computer readable storage medium.Based on this understanding, originally Substantially the part of the part that contributes to existing technology or the technical solution can be in other words for the technical solution of invention The form of software product embodies, which is stored in a storage medium, including some instructions to So that a computer equipment (can be personal computer, electronic equipment or the network equipment etc.) executes each reality of the present invention Apply all or part of the steps of a method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random AccessMemory), magnetic or disk etc. it is various It can store the medium of program code.

The embodiment of the present invention also provides a kind of non-transient computer readable storage medium, is stored thereon with computer program, The computer program is implemented to carry out the various embodiments described above offer method when being executed by processor, for example, according to target The first blocks of knowledge that learning path and student currently learn determines candidate's blocks of knowledge set, includes in target learning path Student needs all blocks of knowledge learnt；According to the current learning states of student, determine each in candidate's blocks of knowledge set It is the probability of optimal solution when blocks of knowledge is as object knowledge unit, and maximum probability in candidate blocks of knowledge set is corresponding For blocks of knowledge as object knowledge unit, object knowledge unit is the next blocks of knowledge for needing to learn of student.

The apparatus embodiments described above are merely exemplary, wherein unit can be as illustrated by the separation member Or may not be and be physically separated, component shown as a unit may or may not be physical unit, i.e., It can be located in one place, or may be distributed over multiple network units.It can select according to the actual needs therein Some or all of the modules achieves the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creative labor In the case where dynamic, it can understand and implement.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features； And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

1. a kind of adaptive learning method characterized by comprising

The first blocks of knowledge currently learnt according to target learning path and student determines candidate's blocks of knowledge set, the mesh It include all blocks of knowledge that the student needs to learn in mark learning path；

According to the current learning states of the student, determine that each blocks of knowledge is as target in the candidate blocks of knowledge set Be the probability of optimal solution when blocks of knowledge, and using the corresponding blocks of knowledge of maximum probability in the candidate blocks of knowledge set as The object knowledge unit, the object knowledge unit are the next blocks of knowledge for needing to learn of the student.

2. adaptive learning method according to claim 1, which is characterized in that the current study according to the student State, determine when each blocks of knowledge is as object knowledge unit in the candidate blocks of knowledge set for the probability of optimal solution it Before, further includes:

It is test and is recorded according to the history of the student, obtain the current learning states vector of the student, the history test note Employ the test result in expression to blocks of knowledge in the target learning path；

The instruction vector for obtaining the student obtains after being spliced the instruction vector and the current learning states vector Current learning states of the vector as the student, the instruction vector is for indicating in the target learning path as learning Practise the blocks of knowledge of target.

3. adaptive learning method according to claim 2, which is characterized in that the history test is recorded as history test Vector；Correspondingly, described test according to the history of the student records, and obtains the current learning states vector of the student, wraps It includes:

Each history test vector is input in preset model, corresponding of history test vector of output test moment the latest State vector is practised, and as the current learning states vector.

4. adaptive learning method according to claim 3, which is characterized in that the preset model includes at least insertion Layer, hidden layer and full articulamentum；Correspondingly, described that each history test vector is input in preset model, the output test moment The corresponding learning state vector of history test vector the latest, comprising:

Each history test vector is input to the embeding layer, export the corresponding study of each history test vector characterize to Amount；

Each study characterization vector is input to the hidden layer, export the corresponding learning state of each history test vector it is hidden to Amount；

The hidden vector of initial learning state and the hidden vector of each learning state are input to the full articulamentum, the output test moment is most The corresponding learning state vector of history test vector in evening.

5. adaptive learning method according to claim 1, which is characterized in that described according to target learning path and student The first blocks of knowledge currently learnt determines candidate's blocks of knowledge set, comprising:

Determine m before the first blocks of knowledge described in the target learning path jump in the second blocks of knowledge and the mesh Third blocks of knowledge after first blocks of knowledge described in mark learning path in n jump, m and n are the positive integer not less than 1；

According to first blocks of knowledge, second blocks of knowledge, the third blocks of knowledge and the target learning path The middle blocks of knowledge as learning objective determines the candidate blocks of knowledge set.

6. adaptive learning method according to claim 1, which is characterized in that the current study according to the student State determines probability when each blocks of knowledge is as object knowledge unit in the candidate blocks of knowledge set for optimal solution, Include:

Acquisition learns into the target learning path from first blocks of knowledge as institute after the blocks of knowledge of learning objective The learning ability incremental value of generation；

According to the learning ability incremental value, the final value of parameter preset in strategy network model is determined, and will be described current Learning state is input to the tactful network model, and each blocks of knowledge is as mesh in the output candidate blocks of knowledge set It is the probability of optimal solution when mark blocks of knowledge.

7. adaptive learning method according to claim 6, which is characterized in that described according to the learning ability increment Value determines the final value of parameter preset in strategy network model, comprising:

The current learning states are input to PN model, adjust parameter preset described in the PN model Value, so that the difference between the output result of the PN model and the learning ability incremental value is minimum, and will Final value of the value of the parameter preset as parameter preset when difference minimum, the PN model with it is described Tactful network model includes the parameter preset.

8. a kind of adaptive learning device characterized by comprising

First determining module, the first blocks of knowledge for currently being learnt according to target learning path and student, determines that candidate knows Know unit set, includes all blocks of knowledge that the student needs to learn in the target learning path；

Second determining module determines every in the candidate blocks of knowledge set for the current learning states according to the student It is the probability of optimal solution when one blocks of knowledge is as object knowledge unit, and by maximum probability in the candidate blocks of knowledge set Corresponding blocks of knowledge learns as the object knowledge unit, the next needs of the object knowledge unit student Blocks of knowledge.

9. a kind of electronic equipment characterized by comprising

At least one processor；And

At least one processor being connect with the processor communication, in which:

The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program to instruct energy Enough methods executed as described in claim 1 to 7 is any.

10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited Computer instruction is stored up, the computer instruction makes the computer execute the method as described in claim 1 to 7 is any.