CN106599054A

CN106599054A - Method and system for title classification and push

Info

Publication number: CN106599054A
Application number: CN201611009278.0A
Authority: CN
Inventors: 刘德建; 章亮; 詹博悍; 陈霖; 吴拥民; 陈宏展
Original assignee: Fujian Tianquan Educational Technology Ltd
Current assignee: Fujian Tianquan Educational Technology Ltd
Priority date: 2016-11-16
Filing date: 2016-11-16
Publication date: 2017-04-26
Anticipated expiration: 2036-11-16
Also published as: CN106599054B

Abstract

The invention relates to the field of classification, in particular to a method and a system for title classification and push. The method comprises the following steps of classifying a first title according to a preset knowledge point classification model to obtain a first classification set and a first correlation degree set, wherein elements in the first correlation degree set are correlation degrees of the first title and the classifications in the first classification set; computing similarity between the first title and the titles included in the classifications in the first classification set to obtain a similarity set corresponding to the classifications in the first classification set; obtaining a second correlation degree set according to the similarity set and the first correlation degree set; obtaining an approximate title set according to the second correlation degree set; and pushing the approximate title set. According to the method and the system, the accuracy of title classification and the correlation of the pushed approximate titles are improved.

Description

A kind of exercise question classification and the method and system for pushing

Technical field

The present invention relates to field of classifying, more particularly to the method and system of a kind of classification of exercise question and push.

Background technology

Big data epoch, daily produced data volume explosive growth.K12 is educated as Chinese most important education One of form, the data volume for producing daily is very important.The scale of China On Line education is just increased with annual more than 30% speed Long, market valuation will be more than 160,000,000,000 yuan.K12 online education resources become each enterprise's hotly contested spot, if can be to increasingly increasing Long problem data in addition analysis and utilization, in Rational Classification to corresponding knowledge point, is difficult to resolve or after weak topic when student runs into, and pushes The exercise question big with the Knowledge Relation degree is deeply practised for student, can improve the Consumer's Experience of application.

The patent document of Application No. 201510246727.2 provides a kind of exercise question and recommends method, by receiving retrieval topic Mesh；The theme attribute information of the retrieval exercise question is obtained, and according to the theme attribute acquisition of information preliminary search result；Obtain The user description information of user, and the preliminary search result is ranked up according to the user description information, sorted Result afterwards；The result of predetermined number is selected after result from after the sequence, is defined as recommending exercise question.Realize improving recommendation topic Mesh and the correlation for retrieving exercise question, so as to improve recommendation effect.

But, above-mentioned patent document is ranked up according to user description information to the preliminary search result, its classification knot The accuracy of fruit depends on the accuracy of user description information.

The content of the invention

The technical problem to be solved is：A kind of exercise question classification and the method and system for pushing are provided, realize carrying The accuracy of high exercise question classification and the correlation of push exercise question.

In order to solve above-mentioned technical problem, the technical solution used in the present invention is：

The present invention provides a kind of exercise question classification and the method for pushing, including：

S1, the first exercise question of being classified according to default knowledge point disaggregated model, obtain the first classification set and the first degree of association collection Close；Element in first degree of association set is first exercise question and associating for respectively classifying in the described first classification set Degree；

S2, the similarity for calculating the exercise question for including of respectively classifying during first exercise question is gathered with the described first classification, obtain With corresponding similarity set of respectively classifying in the described first classification set；

S3, according to the similarity set and first degree of association set, obtain the second degree of association set；

S4, according to second degree of association set, obtain approximate topic set；

S5, the push approximate topic set.

The present invention also provides a kind of exercise question classification and the system for pushing, including：

Sort module, according to default knowledge point disaggregated model the first exercise question of classification, obtains the first classification set and first and closes Connection degree set；Element in first degree of association set is first exercise question and each classification in the described first classification set The degree of association；

Computing module, for calculating the phase of the exercise question for including of respectively classifying during first exercise question is gathered with the described first classification Like spending, obtain and corresponding similarity set of respectively classifying in the described first classification set；

First processing module, for according to the similarity set and first degree of association set, obtaining the second association Degree set；

Second processing module, for according to second degree of association set, obtaining approximate topic set；

Pushing module, for pushing the approximate topic set.

The beneficial effects of the present invention is：It is different from prior art and directly pushes correlation according to the classification results of disaggregated model Approximate topic, the present invention by the way that the first exercise question is classified with the knowledge point that obtained according to knowledge point disaggregated model in exercise question carry out Similarity analysis, according to the degree of association that the exercise question of Similarity Measure first is classified with the knowledge point, then know from the degree of association is larger Know and extract in point classification the exercise question high with the first exercise question similarity and be pushed to user as approximate topic, it is possible to increase push it is approximate Inscribe the correlation with the first exercise question.

Description of the drawings

Fig. 1 is the FB(flow block) of a kind of exercise question classification of the invention and the method for pushing；

Fig. 2 is the structured flowchart of a kind of exercise question classification of the invention and the system for pushing；

Label declaration：

1st, sort module；2nd, computing module；3rd, first processing module；4th, Second processing module；5th, pushing module.

Specific embodiment

To describe the technology contents of the present invention in detail, being realized purpose and effect, below in conjunction with embodiment and coordinate attached Figure is explained.

The design of most critical of the present invention is：By by the first exercise question and the knowledge point that obtained according to knowledge point disaggregated model Exercise question in classification carries out similarity analysis, recalculates the degree of association that the first exercise question is classified with each knowledge point, it is possible to increase push away The correlation of the approximate topic sent and the first exercise question.

As shown in figure 1, the present invention provides a kind of exercise question classification and the method for pushing, including：

S5, the push approximate topic set.

Further, the S1 is specially：

Each node of the different default knowledge point disaggregated model of deployment in default classification cluster；

The each node in first exercise question to the default classification cluster is sent, the first classification set and institute is obtained State the first degree of association set.

Seen from the above description, be conducive to the approximate topic for processing extensive batch exercise question to push using distributed type assemblies to appoint Business, improves the efficiency for pushing.

Further, also include：

The corresponding classification of each exercise question in the approximate topic set is obtained, the second classification set is obtained；

The default knowledge point disaggregated model is updated according to the described second classification set.

Seen from the above description, periodically according to classification results renewal knowledge point disaggregated model, it is possible to increase disaggregated model point The accuracy of class, so as to improve the correlation for pushing approximate topic.

Further, each node in first exercise question to the default classification cluster is sent, described first point is obtained Class set and first degree of association set, specially：

The each node in first exercise question to the default classification cluster is sent, classify corresponding with the node is obtained Set and degree of association set；

Knowledge point disaggregated model according to disposing on the node obtains the weighted value of the node；

Classified accordingly according to the weighted value and the node of the node and gathered and degree of association set, obtain described first Classification set and first degree of association set.

Seen from the above description, various different disaggregated models are disposed on node respectively in classification cluster, therefore, respectively The classification results that node is obtained are different, and according to the disaggregated model disposed on each node its weighted value, comprehensive analysis weighted value are determined And corresponding classification results, obtain the knowledge point classification big with the first exercise question degree of association.Realize being adjusted according to actual application scenarios The weighted value of whole each node, is conducive to best suiting the desired approximate topic of user according to the push of the different demands of user.

Further, the S1 is specially：

Symbol in first exercise question is changed according to default ESC, the second exercise question is obtained；

The feature of second exercise question is extracted, characteristic vector is obtained；The characteristic vector include word frequency vector sum it is semantic to Amount；

According to the default knowledge point disaggregated model, the first classification set corresponding with the characteristic vector and first are obtained Degree of association set.

Seen from the above description, because the describing mode of the exercise question of separate sources may difference, especially different formula Description of the editing machine to the symbol in formula differs greatly, therefore, the symbol in the formula is changed by default ESC Number, different describing modes can be normalized but the symbol of equivalent is represented, so as to accurately and make full use of information in exercise question, carry The accuracy of high exercise question classification, so as to the efficiency for improving the correlation for pushing exercise question and obtain approximate topic.

For example：Wait the exercise question 1 for pushing approximate topic " to make function significantPositive integer span group Into the unit of set have”.Wait the exercise question 2 for pushing approximate topic " to make the significant y=of function (5-x)^1/2Positive integer value The unit of the set of scope composition have”.In fact, exercise question 1 and exercise question 2 are substantially identicals, but existing method cannot The information of formula in exercise question is made full use of, the span for calculating variable is simply pushed so that the significant exercise question of function, and The span for calculating variable more cannot targetedly be pushed so that the significant exercise question of function with radical sign.And it is existing Some method None- identifieds and judge identical exercise question, cause to need the same exercise question of repeated resolution approximately to inscribe so as to obtain, efficiency is low.

Further, according to the default knowledge point disaggregated model, the first classification corresponding with the characteristic vector is obtained Set and the first degree of association set, specially：

Node of knowledge point disaggregated model of the deployment based on word frequency in default classification cluster；

Node of the deployment based on semantic knowledge point disaggregated model in default classification cluster；

Seen from the above description, the knowledge point related to the first exercise question for being obtained by the classification cluster classify include from The classification results that word frequency and semantic two dimensions are obtained, due to having considered exercise question in word frequency and semanteme, it is possible to increase point The accuracy of class, so as to improve the approximate topic of push and the correlation of the first exercise question.

Further, the feature of second exercise question is extracted, characteristic vector is obtained；The characteristic vector includes word frequency vector And semantic vector, specially：

Second exercise question is parsed, Chinese character stack and non-Chinese character stack is obtained；

Cutting word process is carried out to the character in the Chinese character stack using cutting word algorithm, and using default regular expressions Formula matches the formula stored in the non-Chinese character stack, obtains the 3rd exercise question；

Stop-word is deleted from the 3rd exercise question, the 4th exercise question is obtained；

Word frequency vector is built according to the 4th exercise question；The number of element is in the 4th exercise question in the word frequency vector The quantity of different words, the value of element is that word corresponding with the element occurs in the 4th exercise question in the word frequency vector Number of times；

Semantic feature extraction model is set up according to default dimension；

Semantic vector corresponding with the 4th exercise question is built according to the semantic feature extraction model.

Seen from the above description, the non-Chinese character in exercise question, a centering word can be deleted due to existing cutting word algorithm Symbol carries out cutting word process, therefore, the Chinese character in exercise question and non-Chinese character are first respectively put into different stacks by the present invention, right Chinese character stack carries out cutting word process, the corresponding formula of matching regular expressions is used to non-Chinese character stack, as far as possible by formula In discernible part separate, can retain exercise question in information while, cutting word is carried out to exercise question, be conducive to extract exercise question in Characteristic vector.Additionally, ensure that character sequence is constant using stack preservation Chinese character and non-Chinese character, in cutting word process During do not change the original meaning of exercise question.Furthermore, delete exercise question in stop-word, i.e., insignificant word, as " ", " it ", " ", " being ", " the inside " etc., can more accurately extract the characteristic vector of exercise question, ignore irrelevant information, reduce the redundancy of characteristic vector Degree.

Further, stop-word is deleted from the 3rd exercise question, obtains the 4th exercise question, specially：

Calculate the weight of each word in the 3rd exercise question；

The word in the 3rd exercise question is sorted according to the weight, forms first queue；

Word corresponding with predetermined number element before the first queue is deleted from the 3rd exercise question, the 4th topic is obtained Mesh.

Seen from the above description, it is existing because the particular content of different subjects and the stop-word of different school age sections is different Stop-word acquisition methods are to be consulted by stopping vocabulary, and flexibility and specific aim are relatively low, and the present invention is calculated by stop-word Algorithm, such as TF-IDF algorithms, calculate weight of each word in exercise question, and delete the less word of weight in the 3rd exercise question, Different subjects can be directed to and obtain different stop-words, so as to improve the correlation of the approximate topic for getting.

For example, common vocabulary " acceleration " is the vocabulary that Jing often occurs in physics subject, and to the understanding of the meaning of the question It is critically important, but in biology, 1000 road exercise questions may not necessarily all have this vocabulary, so if sending out in biological subject Existing " acceleration ", it is possible to regard as it and be off word, can not treat as word important in biological subject, can be by it Delete.

Wherein, word frequency (term frequency, TF) refer to that some given word occurs in this document time Number.This numeral would generally be normalized (molecule is generally less than denominator and is different from IDF), to prevent it to be partial to long file.Its Computing formula is as follows：

N in above-mentioned formula_i,jIt is the word in file d_jThe number of times of middle appearance, and denominator this be in file d_jIn all words go out The sum of existing number of times.

Reverse document-frequency (inverse document frequency, IDF) is the degree of a word general importance Amount.The IDF of a certain particular words, can by general act number divided by the file comprising the word number, then by the business for obtaining Take the logarithm and obtain.Its formula is as follows：

Wherein | D | is the sum of language material file, | { j:t_i∈d_j| comprising word t_iNumber of files, if the word does not exist In corpus, may result in dividend is 0, therefore generally uses 1+ | { j:t_i∈d_j}|.Finally obtain TF-IDF Formula, it is as follows：

tf-idf_i,j=tf_i,j×idf_i

High term frequencies in a certain specific file, and low document-frequency of the word in whole file set, can To produce the TF-IDF of high weight.Therefore, TF-IDF tends to filter out common word, retains important word.

Further, according to second degree of association set, approximate topic set is obtained, specially：

According to the classification in second degree of association set sequence the first classification set, the first classification queue is obtained；

The classification of default classification number is obtained from first classification queue, the second classification set is obtained；

The exercise question for being more than default similarity threshold in the second classification set with the similarity of first exercise question is obtained, Obtain approximate topic set.

Seen from the above description, by choosing from the knowledge point classification related to the degree of association of the first exercise question and the first topic The higher exercise question of mesh similarity forms approximate topic set, realizes improving the correlation of approximate topic set and the first exercise question for pushing.

As shown in Fig. 2 the present invention also provides a kind of exercise question classification and the system for pushing, including：

Sort module 1, according to default knowledge point disaggregated model the first exercise question of classification, obtains the first classification set and first and closes Connection degree set；Element in first degree of association set is first exercise question and each classification in the described first classification set The degree of association；

Computing module 2, for calculating first exercise question and the described first classification set in respectively classify the exercise question that includes Similarity, obtains and corresponding similarity set of respectively classifying in the described first classification set；

First processing module 3, for according to the similarity set and first degree of association set, obtaining the second association Degree set；

Second processing module 4, for according to second degree of association set, obtaining approximate topic set；

Pushing module 5, for pushing the approximate topic set.

Seen from the above description, the system classified by the exercise question and pushed, realizes improving the accuracy of exercise question classification, So as to further improve the approximate topic of push and the correlation of the first exercise question.

Embodiments of the invention are：

S1, it is default classification cluster node on respectively deployment based on word frequency knowledge point disaggregated model and based on semanteme Knowledge point disaggregated model；

Wherein, the knowledge point disaggregated model based on word frequency is specially：

(1) input of new exercise question；

(2) new exercise question is carried out the conversion of latex forms；

(3) cutting word of text is processed, and the stop-word obtained according to training process, deletes corresponding stop-word

(4) new exercise question is built into into word frequency vector；

(5) word frequency vector is input in the knowledge point disaggregated model based on word frequency that training in advance is completed, and obtains phase The knowledge point answered and its weight.

The process of the training knowledge point disaggregated model based on word frequency is specially：

(1) education question purpose input；

(2) training exercise question is converted into into latex forms；

(3) cutting word of text is processed；

(4) weight of each word is calculated using stopping word algorithm (TF-IDF), and stop-word is obtained according to the threshold value of setting, Stop-word in training exercise question is deleted；

(5) each training exercise question is changed into into word frequency vector；

(6) corresponding parameter is set according to sorting algorithm；

(7) word frequency vector is all input in sorting algorithm and is trained, and obtain the knowledge point classification mould based on word frequency Type.

It is described to be specially based on semantic knowledge point disaggregated model：

(1) input of new exercise question；

(2) new exercise question is carried out the conversion of latex forms；

(3) cutting word of text is processed, and the stop-word obtained according to training process, deletes corresponding stop-word；

(4) new exercise question is input in the good semantic feature extraction model of training in advance, obtains corresponding semantic vector；

(5) semantic vector is input in the knowledge point disaggregated model based on semanteme that training in advance is completed, and obtains phase The knowledge point answered and its weight.

The training process based on semantic knowledge point disaggregated model is specially：

(1) education question purpose input；

(2) training exercise question is converted into into latex forms；

(3) cutting word of text is processed；

(4) the training exercise question after cutting word is input in semantic feature extraction model (such as word2vec models), and root Obtain for education question purpose semanteme feature extraction model according to the model parameter of setting；

(5) each training exercise question is input in semantic feature extraction model, is obtained for each education question target langua0 Adopted vector；

(6) corresponding sorting algorithm (such as random forest and xgboost algorithms) is set；

(7) semantic vector is all input in sorting algorithm and is trained, and obtained based on semantic knowledge point classification mould Type.

S2, each node sent in first exercise question to the default classification cluster；Each node is to described first Exercise question carries out classification process, specially：

S21, the symbol in default ESC conversion first exercise question, obtain the second exercise question；

Wherein, symbolESC be " sqrt ", the ESC of symbol "=" is to be input under English state Equal to number, the ESC of symbol "-" is the minus sign being input under English state.The second exercise question obtained Jing after ESC conversion For " unit for making the set of significant y=sqrt (5-x) the positive integers span composition of function have”

S22, parsing second exercise question, obtain Chinese character stack and non-Chinese character stack；

Calculate the weight of each word in the 3rd exercise question；

Word corresponding with predetermined number element before the first queue is deleted from the 3rd exercise question, the 4th topic is obtained Mesh；

Semantic feature extraction model is set up according to default dimension；

Semantic vector corresponding with the 4th exercise question is built according to the semantic feature extraction model；

Wherein, cutting word process is carried out to the character in the Chinese character stack using jieba cutting words algorithm, and using default Matching regular expressions described in the formula that stores in non-Chinese character stack, specially：

First the character in Chinese character string is carried out into cutting word using jieba cutting words algorithm, obtaining the 3rd exercise question " makes@letters The@of the meaningful@of number@The@element@of the@set@of the@positive integers@span@composition@of@have@", symbol@is Represent separator.

The weight of each word in the 3rd exercise question is calculated using TF-IDF algorithms, each word in the 3rd exercise question is obtained Weight be followed successively by：

" making "：0.05, " function "：0.51, " meaningful "：0.22, " "：0.02, " y "：0.09, "="：0.07, " sqrt”：0.22, " ("：0.01, " 5 "：0.01, "-"：0.07, " x "：0.07, ") "：0.01, " positive integer "：0.49, " value model Enclose "：0.44, " composition "：0.15, " "：0.02, " set "：0.38, " "：0.02, " element "：0.35, " having "：0.05, “”：0.01.The less word of weight of word is deleted from the 3rd exercise question, the 4th exercise question is obtained, the described 4th is entitled：" function@ Meaningful@sqrt@positive integers@span@composition@set@elements ".

Count in the 4th exercise question the number of times that each word occurs, the non-stop term vector according to constructed by all non-stop words, The word frequency vector of the 4th exercise question is built, specially：

If the quantity of non-stop word is 1000 in all training sets, then the word frequency vector length of the 4th exercise question is 1000, each element in vector represents the number of times that equivalent occurs in the exercise question, then occur in the 4th exercise question Word, such as " function " only occur once, then in the word frequency vector of the 4th exercise question, the dimension values corresponding to " function " will be 1, If " function " is in the event of twice in the exercise question, then the dimension values corresponding to " function " are 2.Remaining does not go out in the exercise question The dimension values of existing word are all 0.

By in the 4th exercise question occur each word be input in the semantic model for having trained (such as word2vec or GloVe models) vector of each word is obtained, because the vector of each word for obtaining is isometric, therefore can be by each word Vector is overlapped, i.e., identical dimensional value is added, and obtains one comprising whole topic object vector, and semantic model is that one kind can be protected The method for expressing of semantic context relation is deposited, the process for building the semantic vector of the 4th exercise question is specially：

4th exercise question is input in the good semantic model of pre-training, can be obtained according to the parameter setting of pre-training model The semantic vector of each word, for example：Because the vector length of each word in practice can typically be set to 100 to 200 dimensions, in order to say Bright problem, sets here the vector of each word as 4 dimensions.

Function	0.41	0.12	0.02	0.31
					It is meaningful	0.21	0.01	0.02	0.22
\sqrt	0.02	0.08	0.06	0.05
					Positive integer	0.35	0.14	0.21	0.33
Span	0.01	0.03	0.05	0.06
					Composition	0.23	0.41	0.05	0.02
Set	0.14	0.02	0.13	0.09
					Element	0.06	0.04	0.07	0.08

Finally the value of each word identical dimensional above is added, so that it may obtain the semantic vector of the 4th exercise question：

1.43

0.85

0.61

1.16

And all obtain the value in every dimension divided by total word number (8) of the 4th exercise question：

0.17875

0.10625

0.07625

0.145

It is exactly above the semantic vector of the 4th exercise question.

S23, according to the default knowledge point disaggregated model, obtain corresponding with semantic vector described in the word frequency vector sum First classification set and the first degree of association set；

Wherein, the first knowledge point set is：{ span of set element, the method for expressing of function, the expression of set Method, the most value of element in set, the existence of root and and the number of root judge, the value of function, radical computing }, and the 4th exercise question It is respectively with the degree of association of each knowledge point：{ 0.85,0.04,0.03,0.02,0.03,0.02,0.01 }.Second knowledge point set It is combined into：{ span of set element, radical computing, the representation of set, the most value of element in set, equal, the letter of set Several domain of definition and its seek the value of method function, and the 4th exercise question is respectively with the degree of association of each knowledge point：0.73,0.08, 0.08,0.04,0.04,0.02 }.The knowledge point in the first knowledge point set and the second knowledge point set is obtained, the 3rd is formed and is known Know point set.The knowledge point of the 3rd knowledge point set meets the spy of the word frequency vector sum semantic vector of the 4th exercise question simultaneously Levy, larger with the degree of association of the 4th exercise question, the classification corresponding to the knowledge point in the 3rd knowledge point set forms the first category set Close.

S24, the similarity for calculating the exercise question for including of respectively classifying during first exercise question is gathered with the described first classification, obtain With corresponding similarity set of respectively classifying in the described first classification set；

According to the similarity set and first degree of association set, the second degree of association set is obtained；

The exercise question for being more than default similarity threshold in the second classification set with the similarity of first exercise question is obtained, Obtain approximate topic set；

Wherein, the COS distance formula for calculating similarity is as follows：

Wherein, x represents the characteristic vector of the first exercise question, and y represents the characteristic vector of each exercise question in the classification, the value of cos θ Closer to 1, represent that the similarity of two exercise questions is higher.

First exercise question is respectively with first degree of association of each classification in the described first classification set：

{ the span of set element：1.58, the method for expressing of function：0.04, the representation of set:0.11, in set The most value of element：0.06, the existence of root and and root number judge：0.03, the value of function：0.02, radical computing：0.09, The most value of element in set：0.04, set it is equal：0.04, the domain of definition of function and its seek the value of method function：0.02 } basis Similarity set and the first degree of association set obtain the process of the second degree of association set and are specially：

(1) four larger elements of first degree of association, i.e. the value model of set element in above-mentioned first classification set are obtained The most value of element in representation, radical computing, the set enclose, gathered, by all exercise questions for belonging to these knowledge points in exam pool TF-IDF vectors are extracted.

(2) it is utilized respectively COS distance formula and the characteristic vector of the exercise question for being extracted and the first exercise question is calculated into cosine Distance.

(3) COS distance for obtaining all exercise questions and the first exercise question is ranked up and obtains the second degree of association set.From The exercise question higher with the first exercise question similarity is chosen in the larger corresponding classification of two degrees of association and forms approximate topic set.

S25, the push approximate topic set；

S26, the corresponding classification of each exercise question in the approximate topic set is obtained, obtain the second classification set；

Wherein, the process for updating knowledge point disaggregated model is specially：

(1) the exercise question length after exercise question participle is calculated first, the 4th is entitled：" the meaningful@of function@sqrt@positive integers@ Span@constitutes@set@elements ".The exercise question length of the 4th exercise question is equal to 8.

(2) updateWeight is set to need the parameter of judgement, i.e., when the 4th exercise question is more than 5, then updateWeight =0.5, otherwise：

That is the updateWeight=0.5 of the 4th exercise question.

(3) incomeWeight is calculated, the value refers to the approximation on the average of approximate topic of the similarity more than 0.1 under the knowledge point Degree, it is assumed that all topic destination aggregation (mda)s are A under the knowledge point, and a ∈ A, x is the exercise question currently inquired about, such as the 4th exercise question.

Definition A'=x | sim (a, x) ＞ 0.1 }, wherein, sim (a, x) is the degree of approximation of exercise question a and x.Calculate each to know Knowing the incomeWeight for putting is：

(4) according to equation below：

NewWeight=oldWeight × (1-updateWeight)+incomeWeight × updateWeight

Update the weighted value of each knowledge point, wherein newWeight is the knowledge point weight after updating, oldWeight For the weight of original knowledge point, the old knowledge point weight of such as the 4th exercise question is respectively：1.58th, 0.11,0.09,0.06, finally The newWeight for obtaining is new knowledge point weight.

In sum, the present invention provide a kind of exercise question classification and push method and system, by by the first exercise question with Similarity analysis are carried out according to the exercise question in the knowledge point classification that knowledge point disaggregated model is obtained, is inscribed according to Similarity Measure first The degree of association that mesh is classified with the knowledge point, then extract high with the first exercise question similarity from the larger knowledge point classification of the degree of association Exercise question be pushed to user as approximate topic, it is possible to increase the approximate topic of push and the correlation of the first exercise question.Further, by Foregoing description understands that be conducive to the approximate topic for processing extensive batch exercise question to push task using distributed type assemblies, raising is pushed Efficiency.Further, periodically according to classification results renewal knowledge point disaggregated model, it is possible to increase it is accurate that disaggregated model is classified Degree, so as to improve the correlation for pushing approximate topic.Further, realize adjusting the weight of each node according to actual application scenarios Value, is conducive to best suiting the desired approximate topic of user according to the push of the different demands of user.Further, by default escape Character changes the symbol in the formula, can normalize different describing modes but represent the symbol of equivalent, so as to accurately simultaneously The information in exercise question is made full use of, the accuracy of exercise question classification is improved, so as to improving the correlation of push exercise question and obtaining approximate The efficiency of topic.Further, by having considered exercise question in word frequency and semanteme, it is possible to increase the accuracy of classification, so as to Improve the correlation of approximate topic and the first exercise question for pushing.Further, can be while information in retaining exercise question, to exercise question Cutting word is carried out, is conducive to extracting the characteristic vector in exercise question.Additionally, preserve Chinese character and non-Chinese character using stack can protect Card character sequence is constant, and the original meaning of exercise question is not changed in cutting word processing procedure.Further, the present invention is calculated by stop-word Algorithm, calculates weight of each word in exercise question, and deletes the less word of weight in the 3rd exercise question, can be for not classmate Section obtains different stop-words, so as to improve the correlation of the approximate topic for getting.Further, by from the first exercise question The exercise question higher with the first exercise question similarity is chosen in the related knowledge point classification of the degree of association and form approximate topic set, realize improving The approximate topic set of push and the correlation of the first exercise question.The present invention also provides a kind of exercise question classification and the system for pushing, and passes through The exercise question classification and the system for pushing, realize improving the accuracy of exercise question classification, so as to further improve the approximate topic of push With the correlation of the first exercise question.

Embodiments of the invention are the foregoing is only, the scope of the claims of the present invention is not thereby limited, it is every using this The equivalents that bright specification and accompanying drawing content are made, or the technical field of correlation is directly or indirectly used in, include in the same manner In the scope of patent protection of the present invention.

Claims

1. a kind of method that exercise question is classified and pushed, it is characterised in that include：

S1, the first exercise question of being classified according to default knowledge point disaggregated model, obtain the first classification set and the first degree of association set；Institute State the degree of association that the element in the first degree of association set is first exercise question and each classification in the described first classification set；

S2, the similarity for calculating the exercise question for including of respectively classifying during first exercise question is gathered with the described first classification, obtain and institute State corresponding similarity set of respectively classifying in the first classification set；

S5, the push approximate topic set.

2. the method that exercise question according to claim 1 is classified and pushed, it is characterised in that the S1 is specially：

The each node in first exercise question to the default classification cluster is sent, the first classification set and described the is obtained One degree of association set.

3. the method that exercise question according to claim 2 is classified and pushed, it is characterised in that also include：

4. the method that exercise question according to claim 2 is classified and pushed, it is characterised in that send first exercise question to institute The each node in default classification cluster is stated, the first classification set and first degree of association set is obtained, specially：

The each node in first exercise question to the default classification cluster is sent, set of classifying corresponding with the node is obtained With degree of association set；

Classified accordingly according to the weighted value and the node of the node and gathered and degree of association set, obtain first classification Set and first degree of association set.

5. the method that exercise question according to claim 1 is classified and pushed, it is characterised in that the S1 is specially：

The feature of second exercise question is extracted, characteristic vector is obtained；The characteristic vector includes word frequency vector sum semantic vector；

According to the default knowledge point disaggregated model, the first classification set corresponding with the characteristic vector and the first association are obtained Degree set.

6. the method that exercise question according to claim 5 is classified and pushed, it is characterised in that according to the default knowledge point point Class model, obtains the first classification set corresponding with the characteristic vector and the first degree of association set, specially：

7. the method that exercise question according to claim 5 is classified and pushed, it is characterised in that the spy for extracting second exercise question Levy, obtain characteristic vector；The characteristic vector includes word frequency vector sum semantic vector, specially：

Cutting word process is carried out to the character in the Chinese character stack using cutting word algorithm, and using default regular expression With the formula stored in the non-Chinese character stack, the 3rd exercise question is obtained；

Word frequency vector is built according to the 4th exercise question；The number of element is different in the 4th exercise question in the word frequency vector The quantity of word, in the word frequency vector value of element be word corresponding with the element occurs in the 4th exercise question it is secondary Number；

Semantic feature extraction model is set up according to default dimension；

8. the method that exercise question according to claim 7 is classified and pushed, it is characterised in that delete from the 3rd exercise question Stop-word, obtains the 4th exercise question, specially：

Calculate the weight of each word in the 3rd exercise question；

Word corresponding with predetermined number element before the first queue is deleted from the 3rd exercise question, the 4th exercise question is obtained.

9. the method that exercise question according to claim 1 is classified and pushed, it is characterised in that according to second degree of association collection Close, obtain approximate topic set, specially：

Obtain in the second classification set with the similarity of first exercise question more than the exercise question of default similarity threshold, obtain Approximate topic set.

10. the system that a kind of exercise question is classified and pushed, it is characterised in that include：

Sort module, according to default knowledge point disaggregated model the first exercise question of classification, obtains the first classification set and first degree of association Set；Element in first degree of association set is first exercise question and associating for respectively classifying in the described first classification set Degree；

Computing module, for calculating first exercise question to the similar of the exercise question that includes of respectively classifying in the described first classification set Degree, obtains and corresponding similarity set of respectively classifying in the described first classification set；

First processing module, for according to the similarity set and first degree of association set, obtaining the second degree of association collection Close；

Pushing module, for pushing the approximate topic set.