CN114969301A

CN114969301A - Self-adaptive recommendation method, device, equipment and storage medium for online programming teaching

Info

Publication number: CN114969301A
Application number: CN202210902123.9A
Authority: CN
Inventors: 刘淇; 庄严; 黄振亚; 陈恩红; 苏喻
Original assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Current assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2022-08-30

Abstract

The application discloses a self-adaptive recommendation method, a device, equipment and a storage medium for online programming teaching, wherein the self-adaptive recommendation method for online programming teaching comprises the following steps: acquiring an answer information sample, and dividing the answer information sample into a support set and a query set; based on the support set, minimizing a cross entropy loss function on the support set, and determining a capability estimation algorithm function based on the cross entropy loss function on the support set; performing outer layer optimization on a model to be optimized based on the query set and the capability estimation algorithm function, determining a recommendation strategy algorithm function, and training to obtain a recommendation model based on the recommendation strategy algorithm function; acquiring the answer information of the learner, inputting the answer information into a recommendation model, and carrying out test question recommendation processing on the answer information based on the recommendation model to obtain a recommendation strategy of the learner. And (3) analyzing the answer information of the learner and recommending test questions based on a recommendation model obtained by a double-layer optimization method, so as to reduce the deviation of test question recommendation.

Description

Self-adaptive recommendation method, device, equipment and storage medium for online programming teaching

Technical Field

The application relates to the field of artificial intelligence application, in particular to a self-adaptive recommendation method, device, equipment and storage medium for online programming teaching.

Background

At present, a generation method of a recommendation strategy for online programming teaching mainly depends on expert experience related to learning education, and the recommendation strategy is often bound with a diagnosis model, namely, corresponding recommendation strategies are designed according to the characteristics of different diagnosis models. However, the adaptivity of such recommendation strategies is limited, and in the design of the recommendation strategies, expert experience has to be relied on to understand the details and principles of the diagnostic model in order to design the corresponding recommendation strategies, and in the ability assessment of learners, the conventional expert design methods cannot identify the complex behaviors of learners in the process, such as guessing, errors and other factors, and the recommendation deviation is large.

Disclosure of Invention

The application mainly aims to provide a self-adaptive recommendation method, device, equipment and storage medium for online programming teaching, and aims to solve the technical problem that in the prior art, a recommendation strategy depends on manual experience recommendation, and the recommendation deviation is large.

In order to achieve the above object, the present application provides an adaptive recommendation method for online programming teaching, including:

acquiring an answer information sample, and dividing the answer information sample into a support set and a query set;

minimizing a cross entropy loss function on the support set based on the support set, and determining a capability estimation algorithm function based on the cross entropy loss function on the support set;

performing outer layer optimization on a preset model to be optimized based on the query set and the capability estimation algorithm function, determining a recommendation strategy algorithm function, and training to obtain a recommendation model based on the recommendation strategy algorithm function;

and acquiring the answer information of the learner, inputting the answer information into the recommendation model, and recommending the test questions of the answer information based on the recommendation model to obtain the recommendation strategy of the learner.

Optionally, the step of determining the capability estimation algorithm function based on the cross entropy loss function on the support set includes:

and fusing the cross entropy loss function on the support set with a preset potential capacity estimation to obtain the capacity estimation algorithm function.

Optionally, the step of performing outer layer optimization on the model to be optimized based on the query set and the capability estimation algorithm function to determine a recommended policy algorithm function includes:

minimizing a cross entropy loss function on the set of queries based on the capability estimation algorithm function;

and performing outer layer optimization on the model to be optimized based on the cross entropy loss function on the query set, and determining a recommended strategy algorithm function.

Optionally, the step of performing test question recommendation processing on the answer information based on the recommendation model to obtain the recommendation strategy of the learner includes:

based on the recommendation model, evaluating the answer information and determining the ability information of the learner;

determining a recommended strategy for the learner based on the competency information.

Optionally, the step of evaluating the answer information based on the recommendation model to determine the learner's competence information includes:

determining questions answered in pairs and questions answered in wrong in the answer information;

respectively capturing performance information of the question of the answer pair and the question of the wrong answer through self-attention operation based on the recommendation model, and determining first ability information of the learner based on the performance information;

capturing a contradiction between the question of the answer pair and the question of the wrong answer through double attention operation based on the first ability information, and determining second ability information of the learner based on the contradiction;

and obtaining the learner's ability information based on the first ability information and the second ability information.

The application also provides a self-adaptation recommendation device of online programming teaching, the self-adaptation recommendation device of online programming teaching includes:

the system comprises a dividing module, a query module and a processing module, wherein the dividing module is used for obtaining an answer information sample and dividing the answer information sample into a support set and a query set;

the inner-layer optimization module is used for minimizing a cross entropy loss function on the support set based on the support set and determining a capability estimation algorithm function based on the cross entropy loss function on the support set;

the outer layer optimization module is used for carrying out outer layer optimization on a preset model to be optimized based on the query set and the capability estimation algorithm function, determining a recommendation strategy algorithm function, and training to obtain a recommendation model based on the recommendation strategy algorithm function;

and the recommendation module is used for acquiring the answer information of the learner, inputting the answer information into the recommendation model, and recommending the test questions to the answer information based on the recommendation model to obtain the recommendation strategy of the learner.

The present application further provides an adaptive recommendation device for online programming teaching, the adaptive recommendation device for online programming teaching includes: a memory, a processor, and a program stored on the memory for implementing the adaptive recommendation method for online programming teaching,

the memory is used for storing a program for realizing the self-adaptive recommendation method of the online programming teaching;

the processor is used for executing the program for realizing the self-adaptive recommendation method of the online programming teaching, so as to realize the steps of the self-adaptive recommendation method of the online programming teaching.

The present application also provides a storage medium having stored thereon a program for implementing an adaptive recommendation method for online programming teaching, the program for implementing an adaptive recommendation method for online programming teaching being executed by a processor to implement the steps of the adaptive recommendation method for online programming teaching.

Compared with the strategy recommended by depending on human experience in the prior art and the recommended deviation is large, the self-adaptive recommendation method, the self-adaptive recommendation device, the self-adaptive recommendation equipment and the storage medium for online programming teaching, provided by the application, are used for acquiring answer information samples and dividing the answer information samples into a support set and a query set; minimizing a cross entropy loss function on the support set based on the support set, and determining a capability estimation algorithm function based on the cross entropy loss function on the support set; performing outer layer optimization on a preset model to be optimized based on the query set and the capability estimation algorithm function, determining a recommendation strategy algorithm function, and training to obtain a recommendation model based on the recommendation strategy algorithm function; and acquiring the answer information of the learner, inputting the answer information into the recommendation model, and recommending the test questions of the answer information based on the recommendation model to obtain the recommendation strategy of the learner. The method is based on a recommendation model obtained by a double-layer optimization method, the answer information of the learner is analyzed, the test questions with the optimal learning value of the learner are recommended, expert experience is not needed, the complex behaviors of the learner are considered in the identification process, and the test question recommendation deviation is reduced.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive labor.

FIG. 1 is a schematic diagram of an apparatus architecture of a hardware operating environment according to an embodiment of the present application;

FIG. 2 is a schematic flowchart illustrating a first embodiment of an adaptive recommendation method for online programming teaching according to the present application;

fig. 3 is a schematic flow chart of information processing according to a first embodiment of the adaptive recommendation method for online programming teaching according to the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

As shown in fig. 1, fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present application.

The terminal in the embodiment of the application may be a PC, or may be a mobile terminal device having a display function, such as a smart phone, a tablet computer, an e-book reader, an MP3 (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4) player, a portable computer, or the like.

As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Optionally, the terminal may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display screen according to the brightness of ambient light, and a proximity sensor that may turn off the display screen and/or the backlight when the mobile terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the mobile terminal is stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer and tapping) and the like for recognizing the attitude of the mobile terminal; of course, the mobile terminal may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.

Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating device, a network communication module, a user interface module, and an adaptive recommendation program for online programming teaching.

In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and processor 1001 may be used to invoke an adaptive recommendation program for online programming teaching stored in memory 1005.

Referring to fig. 2, an embodiment of the present application provides an adaptive recommendation method for online programming teaching, where the adaptive recommendation method for online programming teaching includes:

step S100, obtaining an answer information sample, and dividing the answer information sample into a support set and a query set;

step S200, based on the support set, minimizing a cross entropy loss function on the support set, and based on the cross entropy loss function on the support set, determining a capability estimation algorithm function;

step S300, performing outer layer optimization on a preset model to be optimized based on the query set and the capability estimation algorithm function, determining a recommendation strategy algorithm function, and training to obtain a recommendation model based on the recommendation strategy algorithm function;

and step S400, acquiring the answer information of the learner, inputting the answer information into the recommendation model, and carrying out test question recommendation processing on the answer information based on the recommendation model to obtain the recommendation strategy of the learner.

In this embodiment, the specific application scenarios may be:

at present, a method for generating a recommendation strategy for learning and teaching mainly depends on expert experience related to learning and teaching, and corresponding recommendation strategies are designed according to the characteristics of different diagnosis models. However, the adaptivity of such recommendation strategies is limited, and in the design of the recommendation strategies, expert experience has to be relied on to understand the details and principles of the diagnosis model in order to design the corresponding recommendation strategies, and in the ability assessment of learners, the traditional expert design method cannot identify the complex behaviors of learners in the process, and the recommendation deviation is large.

The method comprises the following specific steps:

in this embodiment, the adaptive recommendation method for online programming teaching is applied to an adaptive recommendation device for online programming teaching.

In this embodiment, the answer information samples are answer information of multiple learners and used for training a recommendation model, wherein the content of the answer information samples is the same as the answer information of the learners, and is not described herein again.

In this embodiment, the device divides the answer information sample into support set and query set, i.e. randomly divides the learner i answer record into query set

And supporting set

Two parts.

in the present embodiment, in the inner-layer optimization, the recommendation strategy is sequentially answered from the support set according to learner i's previous answer

Selecting a question; next, minimize the support set

Cross entropy loss on to estimate the capability in the outer layer

Wherein the formula is as follows:

wherein q represents the question ID, a represents whether the answer of the test question is correct or not, if a is 1, the answer of the test question is correct, and if a is 0, the answer is wrong,

for cross-entropy loss of the capability diagnosis,

is the cross entropy loss to measure the accuracy of the capability estimate.

Specifically, the step S200 includes the following step S210:

and step S210, fusing the cross entropy loss function on the support set with a preset potential capacity estimation to obtain the capacity estimation algorithm function.

In this embodiment, since the inferred capacities from the previous response records are not unique, the apparatus improves the estimation accuracy and robustness by fusing one or more sets of preset potential capacity estimates on the cross-entropy loss function on the support set, wherein the preset potential capacity estimates are:

for example, a programming problem can be solved by a plurality of algorithm methods such as dynamic programming, breadth-first search, brute force search and the like at the same time, and the different using methods determine that the learner can have a plurality of different potential abilities.

Meaning that, due to uncertainty, the learner's previous behavior corresponds to a set of estimates rather than a single value. This approach has two advantages: 1) a set of estimates may be used to describe the learner's competence from different perspectives and to resist perturbations. Further fusing them can improve the accuracy of each step of estimation; 2) the estimation method is general and is based on the objective patterns that exist in the learner's interaction with the test question, without additional restrictions and assumptions being made on the learning process itself.

Adjusted at each turn

In the inner layer optimization (MLE), a diversity regularization term is added

. The specific calculation formula is as follows:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

is the MLE loss, ensuring the accuracy of the estimation.

Is the coefficient weight that controls the diversity regularization term. Due to optimization

It is not possible to get the average value before all m estimates are generated, so a provisional average of the previous i-1 estimates is used

Substitution

. All estimates

Are generated sequentially and balance accuracy and diversity between them.

specifically, the step S300 includes the following steps S310 to S320:

step S310, based on the capability estimation algorithm function, minimizing a cross entropy loss function on the query set;

and S320, performing outer layer optimization on the model to be optimized based on the cross entropy loss function on the query set, and determining a recommended strategy algorithm function.

In the embodiment, in the outer-layer optimization, the cross entropy loss on the learner's query set is minimized to optimize according to the capability estimation algorithm function estimated by the inner layer, so as to generate the final recommendation strategy, and the formula is as follows:

wherein n is the total number of learners in the data set,

for the cross-entropy loss of the capability diagnosis,

to measure the cross-entropy loss of accuracy of the capability estimate,

the learner's competence estimated for the current moment.

In this embodiment, this two-layer optimization presents the following three advantages: 1) the error in the capability estimate is mainly due to the difference in the selected problem, and this further guides

And (4) optimizing. Due to real ability

Is unknown, the present invention utilizes on a query set

The degree of fit of (c) to measure the estimation error in the outer layer. 2) Since the programmed learning may stop at any round/time according to different stopping rules, the present invention simplifies the goal and sums all steps to minimize losses. 3) Policy

Is independent of the type of diagnostic model and, more importantly, it allows for efficient topic selection adaptively based on the characteristics of a given diagnostic model. Once the recommendation strategy is generated, its parameters do not need to be updated in the learning process, and the next test question can be selected adaptively according to the previous answer records.

In another embodiment, a policy will be recommended

Is defined as the objective of an optimization problem. Based on this, an equivalent transformation can be performed on it:

wherein the content of the first and second substances,

is the average cross entropy of the diagnostic model over the query set. Thus, the two-tier optimization is translated into a maximized cumulative expected reward in reinforcement learning, i.e.

. In particular, in the reinforcement learning MDP

Is defined as: 1) the set of states S refers to all the response records obtained in the t-th round, i.e.

(ii) a 2) The action set A is usedIn the question bank, each question/action is selected at most once in the learning process of a learner; 3) the state transition function P refers to the selection of topics

Then observe the learner state as

The probability of (c). Because whether the learner answers the question is unknown in advance, the question is a source of uncertainty in the learning process; 4) the reward function R is the negative loss of the ability estimation on the query set in the tth round, i.e. the

。

Based on the above definition, the adaptive recommendation of online programming teaching in programming learning is redefined as a decision process: given the learner's previous answers and some diagnostic model, which test question best suits him and most accurately measures his ability. In fact, the decision processes of learner responses, ability estimation and recommendation strategies are mutually influenced and depended, and further become a complex device. The reinforcement learning framework can explore more most suitable test questions for different learners from a long-term perspective. For example, multiple test question combinations may generally provide more comprehensive and accurate learner competency assessment and promotion.

In the embodiment, the self-adaptive recommendation method for the online programming teaching based on reinforcement learning is applied to the self-adaptive recommendation of the online programming teaching on the online programming education platform and used for automatically selecting test questions suitable for learners in the learning process, so that the evaluation accuracy and the programming learning efficiency are improved. The learner data is answer data from the online education platform, and the data can be divided into data describing attributes of the learner and the test question and data describing answer behaviors of the learner. The performance data may then be screened based on the learning context and the target learner group. The filtered learner data is further partitioned into a query set and a support set to generate a recommendation strategy based on the optimization problem objective. The learner's answering behavior is input to the embedding layer and the behavior learning module from two channels according to the correctness, and the test questions are converted into low-dimensional dense feature vectors. Then, the feature vectors with low dimension and density are input into a contradiction capturing module, and the contradiction between the test question pairs is extracted by using a double attention mechanism, so that the guess and the error behavior of the learner are identified. Finally, in order to improve the accuracy of the adaptive recommendation of the online programming teaching, a new ability estimation method is designed based on the multifaceted characteristics of the learner in the learning process, and a group of estimation values are fused to enhance the robustness of the ability estimation in the learning.

In this embodiment, a learner is usually a learner on an online learning platform, the answer information of the learner is related to the question that the learner has made a historical answer on the online learning platform, and the answer information of the learner can be divided into the following two categories: one type is data describing various attributes of the learner and the test question, including but not limited to basic information of the learner, a question ID, a knowledge point to which the question relates, and the like. Another type is data describing the learner's response activity including, but not limited to, time of response, topic ID of response, result of response (correct or incorrect), etc.

In this embodiment, the method for acquiring the learner's answer information by the device may be extracted from a database of the online learning platform, or may be acquired by inputting the learner's answer information into the device, which is not limited in this embodiment.

In another embodiment, after the step of obtaining the learner's answer information, the device filters the learner's answer information based on the preset filtering criteria. In the embodiment of the invention, the composition data is screened according to the learning range and the target learner group, for example, if an online programming learning course is organized for primary and middle school students, only the knowledge points are reserved according to the related knowledge pointsProgramming problems that those first-level algorithms can solve. In the process, all answered records of learner i may be represented as tuple set D _i = ({ (q, a) }, q denotes the topic ID, and if a is 1, it means that it correctly answers the test question q, and a is 0, it means that the answer is wrong.

In this embodiment, the device inputs the answer information into a preset recommendation model, based on the recommendation model obtained by a double-layer optimization method training, analyzes the answer information of the learner and recommends a test question with the optimal learning value for the learner, without depending on expert experience, and considers the complex behavior of the learner in the recognition process to reduce the deviation of test question recommendation.

Specifically, the step S400 includes the following steps S410 to S420:

step S410, based on the recommendation model, evaluating the answer information and determining the ability information of the learner;

in this embodiment, the device evaluates the answer information based on the recommendation model, and determines the learner's competence information.

Specifically, for each learner, its competency may be expressed as

And is

Representing the true value of his ability to be evaluated, where d refers to the dimension of the ability (e.g., the number of knowledge points to be evaluated). An adaptive recommender for online programming teaching generally comprises two parts: 1) capability diagnostic model

The system is used for predicting the probability of the learner's answer to each test question to model the learner's ability; 2) recommendation strategy

The most suitable topic is selected from the topic library based on the past behavior of the learner. Specifically, in the first step of learningt wheel, which selects the next question

To give the learner a response

。

Specifically, the step S410 includes the following steps S411 to S414:

step S411, determining questions answered correctly and questions answered incorrectly in the answer information;

in this embodiment, the device determines the right answer and wrong answer in the answer information according to the data of the learner's answer behavior in the answer information, wherein the data of the learner's answer behavior includes the time of answering, the question ID of answering, the answer result (correct or wrong), and the like.

In this embodiment, referring to fig. 3, the network mainly comprises a dual-channel behavior learning module (denoted as PL), a contradiction capturing module (denoted as CL) and a decision module. First, the PL captures two different pieces of performance information, right and wrong, respectively, for learners (false answers are usually much less than correct answers in the real world). Second, the CL identifies and extracts inconsistencies in learner performance in an attempt to mitigate the effects of perturbations (i.e., guesses and miss factors). And finally, the decision layer selects the next question.

Specifically, firstly, embedding and characterizing the test questions in the answer records: given the current state of learner i

Whole question bank

Each test question is embedded into a d-dimensional continuous space to construct an embedded matrix

. Because the learner provides different information for different answer results (correct and wrong) of the same test questionThere are two representations for each question:

and

the embedded matrix is the answer pair and the wrong answer respectively.

Step S412, based on the recommendation model, respectively capturing the performance information of the right-answered question and the wrong-answered question through self-attention operation, and determining first competence information of the learner based on the performance information;

in this embodiment, a two-pass self-care operation is used to handle correct and incorrect answers, respectively. Setting k ₁ And k ₂ Representing the number of correct and incorrect answers at round t, respectively. The questions to be answered are represented as two embedded matrices,

and is

(where 1 is correct and 0 is wrong) and then entered into the self-attention layer. It is usually composed of two sublayers, namely a self-attention layer and a feed-forward network. Specifically, the self-attention layer is defined as:

wherein the projection matrix

Are the corresponding learnable parameters. The Attention function is implemented by a scaled dot product operation:

wherein C, K and V respectively represent Query (Query) and key(Key) and Value (Value).

Is a scaling factor that prevents the use of large inner product values. Meanwhile, in order to endow the layer with nonlinear characteristics and consider interaction in different potential dimensions, a point-by-point feed-forward network is applied

In (1). The specific calculation method comprises the following steps:

wherein

Is the ReLU activation function; weight matrix

And

has the dimension of

；b ⁽¹⁾ And b ⁽²⁾ Are two bias terms. Therefore, the temperature of the molten metal is controlled,

is the output of this two-channel behavior learning module, i.e., the learner's first competency information.

Step S413, capturing a contradiction between the question of the answer pair and the question of the wrong answer through a double attention operation based on the first competence information, and determining second competence information of the learner based on the contradiction;

in this embodiment, the contradiction between the question of the answer pair and the question of the wrong answer is finally defined as the complexity of the behavior of the learner in learning mainly represented by guessing and wrong factors, for example, when the learner encounters a wrong question or algorithm, the learner can test the sample by guessing to obtain a higher score (i.e. guessing factor); when faced with a simple test question, the answer may be wrong (i.e., a failure factor) due to carelessness or insufficient comprehensiveness. To achieve an increase in the efficiency of programming learning, the recommendation strategy should identify and eliminate these perturbations in the learner's performance in order to better select the next test question. For example, if a learner correctly answers a difficult test question, but incorrectly answers a simpler test question, the possible contradictions are: the problem may be guessed or missed in simple questions, or both.

In this embodiment, the device captures the contradiction between the correct and incorrect responses through a double attention operation, and the specific formula is as follows:

wherein

Is a question of examination

And

of the contradiction score, weight matrix

And

has the dimension of

. All of

A fractional matrix is formed

. Then, the contradiction scores

Normalization was performed from the row and column dimensions respectively by the Softmax function:

thereby, respectively constructing a matrix

. To further extract the problem pairs where there are contradictions, the module uses the generated matrices in performance learning

And using fractional matrices

Double attention and feed forward operations are performed:

wherein the content of the first and second substances,

and

and contradictory feature matrixes of the test questions in the two channels, namely second capability information, are respectively obtained.

Step S414, obtaining the learner' S competence information based on the first competence information and the second competence information.

In the present embodiment, it is preferred that,

is the output of this two-channel behavior learning module, i.e. learningThe first capability information of the person who has the first capability,

contradictory feature matrices, i.e., second ability information, of the test questions in the two channels, respectively, the device being based on the first ability information and the second ability information, the learner's ability information being

。

Step S420, based on the ability information, determining the recommendation strategy of the learner.

In this embodiment, the decision layer in the network is based on

The four matrixes determine the selection of the next question, namely the recommendation strategy of the learner. Here, an average pooling operation is applied to each matrix and they are concatenated into a vector

. Here, the Q-learning algorithm is used, which corresponds to the approximate action value function, noted

The Q-learning algorithm is a reinforcement learning algorithm, and other reinforcement learning algorithms such as a time difference method (TD method), Policy Gradient, Actor-critical, etc. may also be used in this embodiment, which is not limited herein. Using a feed-forward neural network to represent:

wherein

And

is a matrix of the weights that is,

and

is a bias term. To obtain

Then, the optimal strategy is the recommended strategy of the learner

。

Compared with the strategy recommended by depending on human experience in the prior art, the self-adaptive recommendation method for on-line programming teaching has the advantages that the recommended deviation is large, in the method, answer information samples are obtained and are divided into a support set and a query set; minimizing a cross entropy loss function on the support set based on the support set, and determining a capability estimation algorithm function based on the cross entropy loss function on the support set; performing outer layer optimization on a preset model to be optimized based on the query set and the capability estimation algorithm function, determining a recommendation strategy algorithm function, and training to obtain a recommendation model based on the recommendation strategy algorithm function; and acquiring the answer information of the learner, inputting the answer information into the recommendation model, and recommending the test questions of the answer information based on the recommendation model to obtain the recommendation strategy of the learner. The method is based on a recommendation model obtained by a double-layer optimization method, the answer information of the learner is analyzed, the test questions with the optimal learning value of the learner are recommended, expert experience is not needed, the complex behaviors of the learner are considered in the identification process, and the test question recommendation deviation is reduced.

The application also provides an adaptive recommendation device of online programming teaching, the adaptive recommendation device of online programming teaching includes:

and the recommendation module is used for acquiring the answer information of the learner, inputting the answer information into the recommendation model, and recommending the test questions of the answer information based on the recommendation model to obtain the recommendation strategy of the learner.

Optionally, the inner layer optimization module includes:

and the fusion module is used for fusing the cross entropy loss function on the support set with a preset potential capability estimation to obtain the capability estimation algorithm function.

Optionally, the outer layer optimization module includes:

a query set minimization loss function module for minimizing a cross entropy loss function on the query set based on the capability estimation algorithm function;

and the recommendation strategy algorithm function determining module is used for performing outer layer optimization on the model to be optimized based on the cross entropy loss function on the query set to determine a recommendation strategy algorithm function.

Optionally, the recommendation module includes:

the evaluation module is used for evaluating the answer information based on the recommendation model and determining the ability information of the learner;

and the recommendation strategy determination module is used for determining the recommendation strategy of the learner based on the capability information.

Optionally, the evaluation module comprises:

the classification module is used for determining questions answered correctly and questions answered incorrectly in the answer information;

the performance learning module is used for respectively capturing performance information of the right-answered question and the wrong-answered question through self-attention operation based on the recommendation model and determining first ability information of the learner based on the performance information;

a contradiction capturing module, configured to capture a contradiction between the question of the answer pair and the question of the wrong answer through a double attention operation based on the first competence information, and determine second competence information of the learner based on the contradiction;

and the competence information determining module is used for obtaining the competence information of the learner based on the first competence information and the second competence information.

The specific implementation manner of the adaptive recommendation device for online programming teaching is basically the same as that of each embodiment of the adaptive recommendation method for online programming teaching, and is not described herein again.

Referring to fig. 1, fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present application.

As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a network interface 1004, a learner interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The learner interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional learner interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory such as a disk memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Optionally, the adaptive recommendation device for online programming teaching may further include a rectangular learner interface, a network interface, a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. The rectangular learner interface may include a Display screen (Display), an input sub-module such as a Keyboard (Keyboard), and the optional rectangular learner interface may further include a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).

Those skilled in the art will appreciate that the adaptive recommendation device architecture of the online programming teaching shown in FIG. 1 does not constitute a limitation of the adaptive recommendation device of the online programming teaching, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a storage medium, may include an operating system, a network communication module, and an adaptive recommendation program for online programming teaching. The operating system is a program that manages and controls the adaptive recommendation device hardware and software resources for online programming teaching, supports the operation of the adaptive recommendation program for online programming teaching, and other software and/or programs. The network communication module is used for communication among the components in the memory 1005 and with other hardware and software in the adaptive recommendation system for online programming teaching.

In the adaptive recommendation device for online programming teaching shown in fig. 1, the processor 1001 is configured to execute an adaptive recommendation program for online programming teaching stored in the memory 1005, and implement any of the steps of the adaptive recommendation method for online programming teaching described above.

The specific implementation manner of the adaptive recommendation device for online programming teaching of the present application is substantially the same as that of each embodiment of the adaptive recommendation method for online programming teaching, and is not described herein again.

The present application also provides a storage medium having stored thereon a program for implementing an adaptive recommendation method for online programming teaching, the program being executed by a processor to implement the adaptive recommendation method for online programming teaching as follows:

acquiring the answering information of the learner;

and inputting the answer information into a preset recommendation model, and carrying out test question recommendation processing on the answer information based on the recommendation model to obtain the recommendation strategy of the learner, wherein the recommendation model is obtained by carrying out double-layer optimization method training on a model to be optimized based on an answer information sample.

Optionally, before the step of obtaining the learner's answer information, the method includes:

acquiring an answer information sample;

dividing the answer information sample into a support set and a query set;

based on the support set, carrying out inner layer optimization on a preset model to be optimized, and determining a capability estimation algorithm function;

and performing outer layer optimization on a preset model to be optimized based on the query set and the capability estimation algorithm function, determining a recommendation strategy algorithm function, and training to obtain the recommendation model based on the recommendation strategy algorithm function.

Optionally, the step of performing inner-layer optimization on a preset model to be optimized based on the support set to determine a capability estimation algorithm function includes:

minimizing a cross entropy loss function on the support set;

determining the capability estimation algorithm function based on a cross entropy loss function on the support set.

determining a recommendation strategy for the learner based on the competency information.

capturing contradictions between the questions of the answer pair and the wrong questions through double attention operation based on the first competence information, and determining second competence information of the learner based on the contradictions;

and obtaining the learner's competence information based on the first competence information and the second competence information.

The specific implementation of the storage medium of the present application is substantially the same as the embodiments of the adaptive recommendation method for online programming teaching, and is not described herein again.

The present application also provides a computer program product, comprising a computer program which, when executed by a processor, performs the steps of the above-described adaptive recommendation method for online programming teaching.

The specific implementation of the computer program product of the present application is substantially the same as the embodiments of the adaptive recommendation method for online programming teaching, and is not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. An adaptive recommendation method for online programming teaching is characterized in that the adaptive recommendation method for online programming teaching comprises the following steps:

2. The adaptive recommendation method for on-line programming teaching of claim 1 wherein said step of determining said capability estimation algorithm function based on a cross entropy loss function on said support set comprises:

3. The adaptive recommendation method for on-line programming teaching of claim 1, wherein said step of performing outer layer optimization on said model to be optimized based on said query set and said capability estimation algorithm function, and determining a recommendation policy algorithm function comprises:

minimizing a cross entropy loss function on the query set based on the capability estimation algorithm function;

4. The adaptive recommendation method for on-line programming teaching of claim 1, wherein the step of performing test question recommendation processing on the answer information based on the recommendation model to obtain the recommendation strategy of the learner comprises:

5. The adaptive recommendation method for on-line programming teaching of claim 4, wherein said step of evaluating said answer information based on said recommendation model to determine said learner's competency information comprises:

6. An adaptive recommendation device for online programming teaching, the adaptive recommendation device comprising:

7. An adaptive recommendation device for online programming teaching, the adaptive recommendation device comprising: a memory, a processor, and a program stored on the memory for implementing the adaptive recommendation method for online programming teaching,

the processor is used for executing a program for implementing the adaptive recommendation method of the online programming teaching to implement the steps of the adaptive recommendation method of the online programming teaching according to any one of claims 1 to 5.

8. A storage medium having stored thereon a program for implementing an adaptive recommendation method for online programming teaching, the program being executed by a processor to implement the steps of the adaptive recommendation method for online programming teaching according to any one of claims 1 to 5.