CN110334204B - Exercise similarity calculation recommendation method based on user records - Google Patents

Exercise similarity calculation recommendation method based on user records Download PDF

Info

Publication number
CN110334204B
CN110334204B CN201910444120.3A CN201910444120A CN110334204B CN 110334204 B CN110334204 B CN 110334204B CN 201910444120 A CN201910444120 A CN 201910444120A CN 110334204 B CN110334204 B CN 110334204B
Authority
CN
China
Prior art keywords
exercise
exercises
target
user
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910444120.3A
Other languages
Chinese (zh)
Other versions
CN110334204A (en
Inventor
王汉武
骆益军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201910444120.3A priority Critical patent/CN110334204B/en
Publication of CN110334204A publication Critical patent/CN110334204A/en
Application granted granted Critical
Publication of CN110334204B publication Critical patent/CN110334204B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention discloses a method for calculating and recommending exercise similarity based on user records, which effectively combines the advantages of an item2vec thought and a convolutional neural network, solves the problem that similar exercise patterns are difficult to match due to the fact that the exercise contains a large number of formula symbols in the conventional exercise recommendation, is complex in content structure and meaningfully semantically, can divide words of the exercise from the perspective of natural language processing, learns the specific grammatical meaning of the exercise, and matches the similar exercise patterns in terms of word meaning. Finally, the problem recommendation system can better recommend more matched similar problem types, and the problem recommendation quality is improved.

Description

Exercise similarity calculation recommendation method based on user records
The technical field is as follows:
the invention belongs to the field of software, and particularly relates to a method for calculating and recommending exercise similarity based on user records.
Background art:
the most commonly used algorithms, such as TIIDF, LSA, LDA, etc., for detecting text similarity based on machine learning can achieve a certain accuracy when the data formats and the comparison made are cleaned in place, but the similarity is only in weak semantic meanings, so the effect is not very good in practical recommendation use, the recommended topics are basically very similar (belong to a term with one meaning), how to improve the understanding of the algorithms in semantic layer is very important, it is very important to really obtain the correlation in the semantic meanings of the questions, the algorithm based on deep learning is used in many scenarios, the models based on LSTM and CNN can learn and represent the semantic meanings of sentences to a certain extent, so the matching of text similarity using the algorithm based on deep learning is better than the traditional machine learning method in effect, but the fundamental differences between the sentences and the original text are very different, the meaning of the sentences is more tortuous and more varied, and various text impurities (mathematical symbols, formulas, etc.), and the accuracy of the basic sentence matching on the models is greatly reduced. The existing deep learning-based approach also has difficulty in achieving satisfactory results.
The noun explains:
word2vec: the word embedding model is proposed in Google 2013, is actually a shallow neural network model, and has two network structures, namely CBOW and Skip-gram. The patent mainly uses a network model using word2vec as Skip-gram.
item2vec: the method of word2vec is mainly used in a recommendation system, commodity items are used as words in the word2vec, and a commodity item set purchased by a user at one time is used as sentences in the word2 vec.
skip-gram network model: the neural network is composed of an input layer, a mapping layer and an output layer, context is inferred through a target word, namely the target word is input, and a context word is obtained.
softmax: to normalize the exponential function, the output of multiple neurons is mapped into the (0, 1) interval, which can be considered as a probability.
Cross entropy: the difference distance between the actual output and the expected output is mainly calculated as a loss function of the model. I.e., the smaller the cross-entropy value, the closer the actual output and the desired output.
One exercise is a set of exercises that the customer does in a certain period of time or a set number of exercises that the customer does at one time.
The same type of problem: i.e., two problems are the same form of problem under one subject or one knowledge point.
The invention content is as follows:
the invention discloses a exercise similarity calculation recommendation method based on user records; the problem that the problem of inaccurate exercise recommendation is caused by improper similarity calculation due to the complex structure of exercise content can be well solved, and the exercise recommendation accuracy is effectively improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for calculating and recommending exercise similarity based on user records comprises the following steps:
step one, taking each exercise as a sentence to perform word segmentation processing to obtain word embedding vectors of the segmented words in the exercise, then connecting the word embedding vectors of all words in each exercise into a matrix according to the word sequence of the words appearing in the exercise to obtain an exercise matrix representing exercise information, and processing the exercise matrix by using a convolution neural network model: the convolutional neural network model adopts filters with different sizes to carry out convolution to obtain a plurality of output characteristics, and the results of the output characteristics are subjected to pooling processing and spliced into a vector1;
step two, taking the exercises as a whole, and calculating the similarity between the exercises: taking the exercises as a word, and taking a set of exercises which are done by each user once as a sentence; calculating the probability of two exercises appearing in the same exercise set at the same time as the similarity of the two exercises; finally, obtaining an embedded vector of each problem, namely a vector2;
splicing the vector1 and the vector2 to obtain a final vector, and training through the vector to obtain a trained model;
step four, inputting the latest exercise made by the user into the trained model, outputting a result as a recommendation probability that the probability that all exercises made by the user are in the same category corresponding to the exercises in the exercise library, sequencing the probabilities of all exercises in the result, and selecting a exercise which has the maximum recommendation probability and has not been made by the user to display to the user to complete a recommendation task; a is the set exercise recommendation number.
In a further improvement, the first step comprises the following steps:
the method comprises the following steps that firstly, a third-party base jieba Chinese word segmentation component is used for segmenting each exercise, obtained segmentation is trained through a skip-gram network model of word2vec, each word in the exercise is mapped into a d-dimensional word vector, word vectors of all segmentation words in each exercise are connected according to a semantic sequence in the exercise, and a representative exercise matrix is obtained; taking the exercise with the maximum number of words, wherein the number of words is n; processing each problem into an n-d matrix, and performing 0 complementing operation on the problems with the word number less than n to ensure that the dimensionality of input data is consistent; learning a problem matrix by using a convolution model, setting three sizes of 2 × d,3 × d and 5 × d, performing convolution operation on each size by using three filters respectively, and performing maximum pooling operation on output features; and splicing the processed results of the nine output features into a vector1 containing the semantic information of the exercises.
In a further improvement, the second step includes the following steps:
obtaining an embedded vector of each problem by using a skip-gram network model: firstly, taking exercises done by a user in one exercise as a set, and setting the number of the exercises done by the user in one exercise as S, wherein the exercises are W1, W2 and W3, 8230; WS; selecting a current target problem Wi, and outputting other problems co-occurring with the current target problem Wi in a problem set by using a skip-gram network model, namely a positive sample; the model is trained such that the conditional probability of the co-occurrence of the target problem Wi with every other problem in the user's one exercise is maximized in all problem sets, i.e.
Figure BDA0002073051610000031
Maximum;
wherein the content of the first and second substances,
Figure BDA0002073051610000032
wherein u is i Is the vector of the target problem Wi, v j Is a problem that appears in the collection simultaneously with the target problem Wi
The vector of (a); i represents a question bank containing all the questions; k represents the problem of the input question bank; wj represents in use
The problem in the problem which is practised by the user at one time is different from the target problem Wi;
a negative sampling method is applied, namely a plurality of exercises which are not in the same set with the target exercise Wi, namely negative samples are randomly extracted to optimize output, and the training calculated amount of the model is reduced; finally, the embedded vector expression of the problem itself is obtained: vector2.
In a further improvement, the third step comprises the following steps:
splicing the vector1 and the vector2 to obtain a final vector, inputting the vector into a fully-connected neural network, and then performing learning training through the final vector: and taking the same type of exercises as a training set, taking a plurality of training sets as a training set, inputting the target exercises, wherein the exercises expected to be output are other exercises belonging to the same type of exercises as the target exercises, so that the probability of outputting the other exercises which are the same type as the previous target exercises is the maximum, and the calculated probability of the other exercises which are not the same type as the current target exercises is the minimum, thereby obtaining the trained model.
Further improvement, a negative sampling method is adopted in the training process to accelerate the training, namely, for the input of a target exercise, e exercises which are not in the same set with the target exercise, namely negative samples, are randomly extracted to optimize the updating process of the parameters, the calculated amount is reduced, and the training speed of the network is accelerated.
The invention has the beneficial effects that: the problem that the problem of inaccurate exercise recommendation is caused by improper similarity calculation due to the complex structure of exercise content can be well solved, and the exercise recommendation accuracy is effectively improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic flow chart of step one.
Fig. 2 is a schematic of the final flow of the present invention.
The specific implementation mode is as follows:
the embodiments of the present invention will be described in further detail below with reference to the accompanying drawings, and it should be understood that the embodiments described herein are for purposes of illustration and explanation, and are not intended to limit the invention.
Example 1
The specific steps of the invention are shown in fig. 1 and fig. 2:
1) Firstly, segmenting each exercise by adopting Chinese segmentation in a third-party library jieba, and training the obtained segmentation by using a skip-gram network model of word2vec so as to map each word into a d-dimensional word vector. And connecting the word vectors of all the participles of the problem to obtain a matrix representing the problem. And taking the exercise with the maximum word number, wherein the word number is n. And processing each problem into an n-d matrix, and keeping the dimension of input by complementing 0 for the problem with the word number less than n. All the problems are finally represented as n x d matrices. And then learning the problem matrix by using a convolution neural network, setting three filters with the sizes of 2 x d,3 x d and 5 x d for convolution operation, and outputting the maximum value of the output characteristics by using a maximum pooling operation. And splicing the results of processing the nine output features into a vector1 containing the semantic information of the problem.
2) Taking exercises as a whole, trying to obtain an embedded vector of each exercise by using a skip-gram network model through the idea of item2vec, taking the exercises made by a user once as a set, and setting the number of the exercises made by the user at this time as S and the exercises as W1, W2, W3 \8230 \ 8230and WS. We select current target problem Wi, then the other problems that require the network output of the skip-gram to co-occur with the current target problem in one set, namely positive samples, while the problems that do not occur in one set are negative samples. The model is trained such that the conditional probability of two co-occurring problems in a set of problems is maximized. The corresponding objective function of the model is as follows:
Figure BDA0002073051610000041
where p (Wj | Wi) is a softmax function:
Figure BDA0002073051610000042
wherein u is i Is the vector of the target problem Wi, v j Is a vector of problems that appear in the collection concurrently with target problem Wi; i represents a question bank containing all the exercises; k represents the problem of the input question bank;
and (3) a negative sampling method is applied, namely a plurality of exercises which are not in the current set, namely negative samples, are randomly extracted to optimize output, and only a small number of parameters need to be updated each time to accelerate training, so that the embedded vector expression of the exercises is finally obtained: vector2.
3) Splicing the vector1 and the vector2 to obtain a final vector, inputting the vector into a fully-connected neural network, and then performing learning training: inputting a target problem, wherein the problem expected to be output is other problems belonging to a same category as the target problem, specifically, inputting a target problem vector, and obtaining the probability that each problem in the problem library is the problem of the same type as the current problem after normalization through a multilayer neural network and a softmax function, wherein the fitting target of the model is to maximize the calculated probability of other problems of the same type as the current target problem and minimize the calculated probability of other problems of the same type as the current target problem. After training is finished, the model can calculate the probability that other exercises in the exercise library are the same as the type of the other exercises according to the target exercise vector, namely the probability of recommending the exercises.
The problem quantity is great, if the training mode of normal sampling needs a large amount of calculation and time, the output optimization is carried out by adopting the thought of negative sampling, the specific measure is that a plurality of negative samples (generally the quantity is set to be 3-7) are randomly selected for a target problem, the training is carried out by adopting the form of cross entropy, thereby the training of the model is completed, and the training calculation cost and the training time are saved compared with the training of the total quantity of problems.
4) And (3) inputting the model trained in the step (3) into a problem sample made by the user, outputting a result as the probability that all problems are recommended to the problem, sequencing the probabilities of all problems in the result according to the probability representative and the probability that the problems belong to the same category, and selecting the largest problem which is not made by the user to display to the user to complete a recommendation task.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (3)

1. A method for calculating and recommending exercise similarity based on user records is characterized by comprising the following steps:
step one, taking each exercise as a sentence to perform word segmentation processing to obtain word embedding vectors of the segmented words in the exercise, connecting the word embedding vectors of all words in each exercise into a matrix according to the sequence of the words appearing in the exercise to obtain an exercise matrix representing exercise information, and processing the exercise matrix by using a convolutional neural network model: the convolutional neural network model adopts filters with different sizes to carry out convolution to obtain a plurality of output characteristics, and the results of the output characteristics are subjected to pooling processing and spliced into a vector1;
step two, taking the exercises as a whole, and calculating the similarity between the exercises: taking the exercises as a word, and taking a set of the exercises which are once done by each user as a sentence; calculating the probability of two exercises appearing in the same exercise set at the same time to serve as the similarity of the two exercises; finally, obtaining an embedded vector of each problem, namely a vector2;
obtaining an embedded vector of each problem by using a skip-gram network model: firstly, taking exercises done by a user in one exercise as a set, and setting the number of the exercises done by the user in one exercise as S, wherein the exercises are W1, W2 and W3, 8230; WS; selecting a current target problem Wi, and outputting other problems co-occurring with the current target problem Wi in a problem set by using a skip-gram network model, namely a positive sample; the model is trained such that the conditional probability of the co-occurrence of the target problem Wi with every other problem in the user's one exercise is maximized in all problem sets, i.e.
Figure FDA0003593567450000011
Maximum;
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003593567450000012
wherein u is i Is the vector of the target problem Wi, v j Is a vector of the problem that appears in the collection with target problem Wi; i represents a question bank containing all the exercises; k represents the exercises of the input question bank; wj represents a problem different from the target problem Wi in the problem that the user exercises at one time;
a negative sampling method is applied, namely a plurality of exercises which are not in the same set with the target exercise Wi, namely negative samples are randomly extracted to optimize output, and the training calculated amount of the model is reduced; finally, the embedded vector expression of the problem is obtained: vector2;
step three, splicing vector1 and vector2 to obtain a final vector, and training through the vector to obtain a trained model: splicing the vector1 and the vector2 to obtain a final vector, inputting the vector into a fully-connected neural network, and then performing learning training through the final vector: taking the same type of exercises as a training set, taking a plurality of training sets as a training set, inputting target exercises, wherein the exercises expected to be output are other exercises belonging to the same type of exercises as the target exercises, so that the probability of outputting the exercises which are the same type as the previous target exercises is the maximum, and the calculated probabilities of the exercises which are not the same type as the current target exercises are the minimum, thereby obtaining a trained model;
inputting the latest exercise made by the user into the trained model, outputting a result that the probability that all exercises in the exercise library correspond to the exercises made by the user in the same category is a recommendation probability, sequencing the probabilities of all exercises in the result, and selecting the exercises with the maximum recommendation probability which are not made by the user to be displayed to the user to complete a recommendation task; a is the set exercise recommendation number.
2. The method of claim 1, wherein the step one comprises the steps of:
step one, segmenting each exercise by using a third-party library jieba Chinese segmentation component, training the obtained segmentation by using a skip-gram network model of word2vec, mapping each word in the exercise into a d-dimensional word vector, and connecting the word vectors of all the segments in each exercise according to the semantic sequence in the exercise to obtain a representative exercise matrix; taking the exercise with the maximum number of words, wherein the number of words is n; processing each problem into an n x d matrix, and performing 0 complementing operation on the problem with the word number less than n so as to keep the dimension of input data consistent; learning a problem matrix by using a convolution model, setting three sizes of 2 × d,3 × d and 5 × d, performing convolution operation on each size by using three filters respectively, and performing maximum pooling operation on output features; and splicing the processed results of the nine output features into a vector1 containing the semantic information of the problem.
3. The exercise similarity calculation recommendation method based on user records as claimed in claim 1, wherein a negative sampling method is adopted to accelerate the training in the training process, that is, for the input of a target exercise, e exercises which are not in the same set as the target exercise, that is, negative samples are randomly extracted to optimize the updating process of the parameters, so that the calculation amount is reduced, and the training speed of the network is accelerated.
CN201910444120.3A 2019-05-27 2019-05-27 Exercise similarity calculation recommendation method based on user records Active CN110334204B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910444120.3A CN110334204B (en) 2019-05-27 2019-05-27 Exercise similarity calculation recommendation method based on user records

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910444120.3A CN110334204B (en) 2019-05-27 2019-05-27 Exercise similarity calculation recommendation method based on user records

Publications (2)

Publication Number Publication Date
CN110334204A CN110334204A (en) 2019-10-15
CN110334204B true CN110334204B (en) 2022-10-18

Family

ID=68140298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910444120.3A Active CN110334204B (en) 2019-05-27 2019-05-27 Exercise similarity calculation recommendation method based on user records

Country Status (1)

Country Link
CN (1) CN110334204B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143604B (en) * 2019-12-25 2024-02-02 腾讯音乐娱乐科技(深圳)有限公司 Similarity matching method and device for audio frequency and storage medium
CN117688248B (en) * 2024-02-01 2024-04-26 安徽教育网络出版有限公司 Online course recommendation method and system based on convolutional neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832453A (en) * 2017-11-24 2018-03-23 重庆科技学院 Virtual test paper recommendation method oriented to personalized learning scheme
CN109271401A (en) * 2018-09-26 2019-01-25 杭州大拿科技股份有限公司 Method, apparatus, electronic equipment and storage medium are corrected in a kind of search of topic
CN109299380A (en) * 2018-10-30 2019-02-01 浙江工商大学 Exercise personalized recommendation method in online education platform based on multidimensional characteristic
CN109635100A (en) * 2018-12-24 2019-04-16 上海仁静信息技术有限公司 A kind of recommended method, device, electronic equipment and the storage medium of similar topic

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10275820B2 (en) * 2017-01-31 2019-04-30 Walmart Apollo, Llc Systems and methods for utilizing a convolutional neural network architecture for visual product recommendations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832453A (en) * 2017-11-24 2018-03-23 重庆科技学院 Virtual test paper recommendation method oriented to personalized learning scheme
CN109271401A (en) * 2018-09-26 2019-01-25 杭州大拿科技股份有限公司 Method, apparatus, electronic equipment and storage medium are corrected in a kind of search of topic
CN109299380A (en) * 2018-10-30 2019-02-01 浙江工商大学 Exercise personalized recommendation method in online education platform based on multidimensional characteristic
CN109635100A (en) * 2018-12-24 2019-04-16 上海仁静信息技术有限公司 A kind of recommended method, device, electronic equipment and the storage medium of similar topic

Also Published As

Publication number Publication date
CN110334204A (en) 2019-10-15

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN110162593B (en) Search result processing and similarity model training method and device
CN111415740B (en) Method and device for processing inquiry information, storage medium and computer equipment
CN107943784B (en) Relationship extraction method based on generation of countermeasure network
CN109783817B (en) Text semantic similarity calculation model based on deep reinforcement learning
CN108363743B (en) Intelligent problem generation method and device and computer readable storage medium
WO2019153737A1 (en) Comment assessing method, device, equipment and storage medium
CN109902177B (en) Text emotion analysis method based on dual-channel convolutional memory neural network
CN110188272B (en) Community question-answering website label recommendation method based on user background
CN108197109A (en) A kind of multilingual analysis method and device based on natural language processing
CN109977199B (en) Reading understanding method based on attention pooling mechanism
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
CN109726745B (en) Target-based emotion classification method integrating description knowledge
CN110516070B (en) Chinese question classification method based on text error correction and neural network
CN109271524B (en) Entity linking method in knowledge base question-answering system
CN109840328B (en) Deep learning commodity comment text sentiment tendency analysis method
CN111400455A (en) Relation detection method of question-answering system based on knowledge graph
CN111552773A (en) Method and system for searching key sentence of question or not in reading and understanding task
CN110334204B (en) Exercise similarity calculation recommendation method based on user records
CN110874392B (en) Text network information fusion embedding method based on depth bidirectional attention mechanism
CN110569355B (en) Viewpoint target extraction and target emotion classification combined method and system based on word blocks
CN113486645A (en) Text similarity detection method based on deep learning
CN110969005B (en) Method and device for determining similarity between entity corpora
CN115270752A (en) Template sentence evaluation method based on multilevel comparison learning
CN114743029A (en) Image text matching method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant