CN112035605A

CN112035605A - Topic recommendation method, device, equipment and storage medium

Info

Publication number: CN112035605A
Application number: CN202010774748.2A
Authority: CN
Inventors: 陈静
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date: 2020-08-04
Filing date: 2020-08-04
Publication date: 2020-12-04

Abstract

The application provides a title recommendation method, a title recommendation device, a title recommendation equipment and a storage medium, wherein the method comprises the following steps: acquiring an initial difficulty label of a question and interactive data aiming at the question; the interactive data represents the answering condition of the user to the question; establishing prior distribution related to the problem difficulty according to the initial difficulty label of the problem, and correcting the prior distribution based on the interactive data to obtain posterior distribution related to the problem difficulty; determining the difficulty label of the subject after correction according to the posterior distribution; and recommending the titles according to the corrected difficulty labels of the titles. According to the method and the device, accuracy and stability of the finally obtained problem difficulty are guaranteed, and therefore accurate recommendation of the problem is achieved.

Description

Topic recommendation method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer software technologies, and in particular, to a topic recommendation method, device, apparatus, and storage medium.

Background

In the teaching link, the exercises are important learning resources and can help students consolidate, review and check learned knowledge. The question difficulty is one of characteristics for measuring the question and is closely related to the scene of the question recommendation. In one example, the teacher needs to arrange the order of the questions appearing in the preview, study, review based on the difficulty of the questions. In another example, in an adaptive learning system, in order to implement personalized accurate recommendation, the system needs to match the difficulty of the subject with the abilities and weaknesses of the student, so as to accurately help the student improve the knowledge level of the student.

Factors influencing the subject difficulty are various, such as methods, knowledge points, novelty of subject types and the like related to the subjects, and the quantity of the subjects in one subject is often hundreds of thousands or millions of orders, which means that the marking of the subject difficulty is a time-consuming and labor-consuming work, and the quality of the marking directly influences the application of subsequent accurate recommendation according to the subject difficulty.

Disclosure of Invention

In view of the above, the present application provides a topic recommendation method, apparatus, device and storage medium.

According to a first aspect of embodiments of the present application, there is provided a title recommendation method, including:

acquiring an initial difficulty label of a question and interactive data aiming at the question; the interactive data represents the answering condition of the user to the question;

establishing prior distribution related to the problem difficulty according to the initial difficulty label of the problem, and correcting the prior distribution based on the interactive data to obtain posterior distribution related to the problem difficulty;

determining the difficulty label of the subject after correction according to the posterior distribution;

and recommending the titles according to the corrected difficulty labels of the titles.

Optionally, the obtaining an initial difficulty label of the title includes:

acquiring topic information of a plurality of topics, wherein the topic information comprises knowledge points related to the topics;

determining the similarity between every two questions belonging to the same knowledge point according to the question information;

if the questions correspond to the initial difficulty label, determining one or more similar questions of the questions according to the similarity between every two questions belonging to the same knowledge point;

correcting the initial difficulty label of the topic based on the initial difficulty labels of the one or more similar topics.

Optionally, the obtaining an initial difficulty label of the title includes:

if the questions do not correspond to the initial difficulty labels, clustering the questions into one or more question sets according to the similarity between every two questions belonging to the same knowledge point;

determining representative questions from the question set, and acquiring initial difficulty labels of the representative questions marked by a user;

and determining the initial difficulty label of the representative topic as the initial difficulty labels of other topics except the representative topic in the topic set.

Optionally, the question information further includes question stem information and question analysis information, and the question stem information and the question analysis information are displayed in a text mode and/or an image mode;

the determining the similarity between every two topics belonging to the same knowledge point according to the topic information comprises the following steps:

acquiring a text vector of the question according to the question stem information and/or the question analysis information displayed in a text mode, and/or acquiring an image feature of the question according to the question stem information and/or the question analysis information displayed in an image mode;

determining text similarity according to the distance between text vectors of topics belonging to the same knowledge point, and/or determining image similarity according to the distance between image features of topics belonging to the same knowledge point;

and determining the similarity between every two topics belonging to the same knowledge point according to the text similarity and/or the image similarity.

Optionally, the modifying the initial difficulty label of the topic based on the initial difficulty labels of the one or more similar topics comprises:

if the initial difficulty label of the question is a discrete value, taking the initial difficulty label of the similar question and the highest frequency number of the initial difficulty labels of the similar question as the corrected initial difficulty label of the question;

and if the initial difficulty label of the title is a continuous value, taking the weighted average result of the initial difficulty labels of the similar titles and the initial difficulty label of the title as the corrected initial difficulty label of the title.

Optionally, in the topic set, the number of topics having similarity higher than a preset threshold with the representative topic is the largest.

Optionally, after the obtaining the initial difficulty label of the topic, the method further includes:

adjusting the initial difficulty label of the question according to the attribute of the question to obtain the adjusted initial difficulty label;

establishing prior distribution aiming at the problem difficulty according to the initial difficulty label of the problem, wherein the prior distribution comprises the following steps: and establishing prior distribution aiming at the problem difficulty according to the initial difficulty label after the problem adjustment.

Optionally, the topic attribute includes at least one of: topic type and the number of knowledge points to which the topic relates.

Optionally, the interaction data at least comprises the answering time of the user to the topic;

after the interactive data for the topic is obtained, the method further comprises:

for a plurality of interactive data obtained by the same user answering the same question for a plurality of times, filtering other interactive data except the interactive data obtained by the first answering; and/or the presence of a gas in the gas,

and filtering the interactive data corresponding to the same question, wherein the response time is not in a preset range.

Optionally, the interactive data at least includes a correct and wrong condition of the user to answer the question;

the prior distribution and the posterior distribution are both beta distributions;

the modifying the prior distribution based on the interaction data includes:

counting the number of correct answers and the number of wrong answers of the user in the interactive data according to the correct and wrong answers of the user to the questions;

and correcting the prior distribution according to the number of correct answers and the number of wrong answers of the user in the interactive data.

Optionally, the difficulty label of the subject after correction is an expected value of the posterior distribution.

According to a second aspect of embodiments of the present application, there is provided a title recommendation apparatus, including:

the system comprises a question data acquisition module, a question recognition module and a question recognition module, wherein the question data acquisition module is used for acquiring an initial difficulty label of a question and interactive data aiming at the question; the interactive data represents the answering condition of the user to the question;

the problem difficulty correction module is used for establishing prior distribution related to the problem difficulty according to the initial difficulty label of the problem, correcting the prior distribution based on the interactive data of the user and obtaining posterior distribution related to the problem difficulty;

the problem difficulty determining module is used for determining the difficulty label of the problem after correction according to the posterior distribution of the problem difficulty;

and the question recommending module is used for recommending the questions according to the difficulty labels after the questions are corrected.

According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including:

a processor;

a memory for storing executable instructions;

wherein the processor, when executing the executable instructions, is configured to implement the method of any of the first aspects.

According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the steps of the method of any one of the first to fourth aspects of embodiments herein.

The embodiment of the application has the following beneficial effects:

the method comprises the steps of using Bayesian inference to conduct iterative updating of subject difficulty, firstly establishing prior distribution aiming at the subject difficulty based on an initial difficulty label of the subject, then correcting the prior distribution based on interactive data to obtain posterior distribution aiming at the subject difficulty, when the interactive data is less, the interactive data has a smaller regulating effect on the subject difficulty, and correspondingly, the initial difficulty has a larger influence on the subject difficulty; when the interactive data is more, the adjusting effect of the interactive data on the problem difficulty is larger, correspondingly, the influence of the initial difficulty on the problem difficulty is reduced, so that the initial difficulty label of the problem is effectively balanced, the influence of the interactive data on the problem difficulty is effectively balanced, the accidental data error of the interactive data is favorably eliminated, the accuracy and the stability of the finally obtained problem difficulty are ensured, and the accurate recommendation is carried out based on the difficulty label corrected by the problem.

Drawings

FIG. 1 is a flowchart illustrating an embodiment of a topic recommendation method according to an exemplary embodiment of the present application;

FIG. 2 is a flowchart illustrating a second topic recommendation method according to an exemplary embodiment of the present application;

FIG. 3 is a flowchart illustrating an embodiment of a third topic recommendation method according to an exemplary embodiment of the present application;

FIG. 4 is a block diagram of an embodiment of a topic recommendation device according to an exemplary embodiment of the present application;

fig. 5 is a hardware block diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In the related art, the labeling of the question difficulty is usually a question difficulty determining method based on question interaction data, namely, the question difficulty is directly determined according to the answer accuracy. However, the method depends on the number of times of the questions to be trained, if the interactive records of the exercises are less, the result error is larger, and for a large number of question banks, it is difficult to ensure that each question is exposed, and if the questions are not exposed, there is no corresponding difficulty label, so that it is difficult to implement accurate recommendation according to the question difficulty.

Based on this, referring to fig. 1, an embodiment of the present application provides a topic recommendation method, where the method may be applied to an electronic device, where the electronic device includes but is not limited to a computer, an intelligent interactive tablet, a mobile phone, a server, or a cloud server, and the method includes:

in step S101, an initial difficulty tag of a topic and interactive data for the topic are obtained; and the interactive data represents the answering condition of the user to the question.

In step S102, a priori distribution about the topic difficulty is established according to the initial difficulty label of the topic, and the priori distribution is corrected based on the interactive data, so as to obtain a posterior distribution about the topic difficulty.

In step S103, the difficulty label of the modified title is determined according to the posterior distribution.

In step S104, a question is recommended according to the difficulty label after the question is corrected.

In an embodiment, the electronic device may obtain an initial difficulty label of the topic, and then establish a prior distribution for the topic difficulty according to the initial difficulty label of the topic, where the prior distribution for the topic difficulty reflects a probability distribution related to the topic difficulty of the topic obtained according to other experiences or other knowledge before performing a statistical experiment according to the interactive data, and reflects an empirical knowledge of the topic difficulty of the topic, and the prior distribution does not have to have an objective basis, and may be partially or completely based on subjective beliefs.

The electronic device can acquire the initial difficulty label of the title through the following two implementation modes:

in a first implementation manner, under the condition that the questions correspond to the initial difficulty labels, the problem of errors of manual labeling is considered, for unified questions, the measurement standards and difficulty scales of different objects are not completely the same, so that the difficulty of the same question or the same type of question is easily different, and if the result of manual labeling is directly used, errors may be transmitted to influence the effect of subsequent operation.

Based on this, the electronic equipment is in the condition that the topic corresponds to there is initial degree of difficulty label, right the initial degree of difficulty label of topic is revised, electronic equipment first acquires the topic information of a plurality of topics, the topic information includes the knowledge point that the topic relates to, then according to the topic information confirms the similarity between two items that belong to under the same knowledge point, again according to the similarity between two items that belong to under the same knowledge point confirms one or more similar topics of topic, at last revise the initial degree of difficulty label of this topic based on the initial degree of difficulty label of one or more similar topics.

The embodiment realizes the correction of the questions with the initial difficulty labels, and uniformly corrects the questions with high similarity, so that the questions with high similarity have the same or similar initial difficulty labels, the uniformity and the normalization of the measurement standards of the question difficulty are improved, and the accuracy of subsequent operation is also improved.

In a second implementation manner, if the topics do not correspond to the initial difficulty labels, it is considered that although the topic content is various at present, if the topics belong to the same knowledge point, many topics are changed from soup to medicine, some topics are different in numerical value, and some topics are different in text background, and the used problem solving methods are the same because the topics are based on the content of the same knowledge point.

Based on the method, the electronic equipment can divide the highly similar topics together through clustering, and one of the topics is selected as a representative to be labeled, so that the labeled data volume is reduced, and the labor is saved; the title does not correspond under the condition that has initial degree of difficulty label, electronic equipment acquires the title information of a plurality of titles at first, the title information includes the knowledge point that the title relates to, then according to title information confirms the degree of similarity between two liang of titles that belong to under the same knowledge point, according to the degree of similarity between two liang of titles that belong to under the same knowledge point will a plurality of titles cluster into one or more title sets, follow the title is concentrated and is confirmed representative title and obtain user mark the initial degree of difficulty label of representative title, will at last the initial degree of difficulty label of representative title also confirms as the title is concentrated except the initial degree of difficulty label of other titles except for representative title.

In the embodiment, when the problem does not correspond to the initial difficulty label, manual labeling is a time-consuming and labor-consuming work, so that the similarity between the problems is utilized, the highly similar problems are divided together through clustering, one of the highly similar problems is selected as a representative to be labeled, a user only needs to label the initial difficulty label of the problem to be the representative, and other problems under the condition are subjected to unified difficulty initialization by utilizing the initial difficulty label of the representative problem, so that the quantity to be labeled can be reduced, the labeling cost is reduced, and the labeling efficiency is also improved.

In one embodiment, considering the influence of the topic attributes such as topic types or the number of related knowledge points, the topic difficulty is different; in one example, for example, the same question has a 50% mistpair probability as a judgment question and a 25% mistpair probability as a selection question, and is difficult to misty as a blank filling question, and when the question is a question and answer question, the answer is required to be correct, and the steps are required to be correct, so that the difficulties of the questions of the four types of questions are gradually increased; in another example, the topic difficulty of topics relating to a combination of multiple knowledge points is significantly greater than the topic difficulty of topics relating to only one knowledge point.

Based on the method, after the electronic equipment obtains the initial difficulty label of the question, the initial difficulty label of the question is adjusted according to the attribute of the question, and the adjusted initial difficulty label is obtained; the title attribute comprises at least one of: topic type and the number of knowledge points to which the topic relates. In this embodiment, the topic difficulty of the topic is finely adjusted based on the topic attribute, so as to further improve the accuracy of the initial difficulty label of the topic.

In one implementation mode, difficulty adjustment values can be respectively set for a plurality of attribute values in the subject attribute based on an actual application scene, and the difficulty of the subject under the attribute values is adjusted according to the difficulty adjustment values corresponding to different attribute values; in one example, the adjusted initial difficulty label of the title may be determined according to a sum of the initial difficulty label of the title and a corresponding difficulty adjustment value.

In one example, for example, the question attribute is a question type, the question type includes a judgment question, a selection question, a blank filling question and an answer question, a difficulty adjustment value of the judgment question is set to-0.2, a difficulty adjustment value of the selection question is set to-0.1, a difficulty adjustment value of the blank filling question is set to 0, a difficulty adjustment value of the answer question is set to +0.1, and an initial difficulty label of the question is set to 0.4; if the title is a judgment title, the initial difficulty label of the adjusted title is 0.2 (0.4-0.2); if the title is a choice title, the initial difficulty label of the adjusted title is 0.3 (0.4-0.1); if the title is a blank filling title, the initial difficulty label of the adjusted title is 0.4, namely, the adjustment is not carried out; if the subject is a question-answer, the adjusted initial difficulty label of the subject is 0.5(0.5+ 0.1).

In another example, for example, the topic attribute is the number of knowledge points involved in the topic, the difficulty adjustment value related to the number of 1 knowledge point is set to-0.2, the difficulty adjustment value related to the number of 2 knowledge points is set to-0.1, the difficulty adjustment value related to the number of 3 knowledge points is set to 0, the difficulty adjustment value related to the number of 4 knowledge points is set to +0.1, the initial difficulty label of the topic is 0.4, and if 1 knowledge point is involved, the initial difficulty label of the topic after adjustment is 0.2 (0.4-0.2); if 2 knowledge points are involved, the initial difficulty label of the subject after adjustment is 0.3 (0.4-0.1); if 3 knowledge points are involved, the initial difficulty label of the subject after adjustment is 0.4, namely, no adjustment is carried out; if 4 knowledge points are involved, the initial difficulty label of the topic after adjustment is 0.5(0.5+ 0.1).

Of course, above be the exemplification, do not constitute the restriction to the embodiment of this application, the theme attribute is to the shared weight of the initial degree of difficulty label of theme is than little, promptly based on the theme attribute is right the amplitude that the initial degree of difficulty label of theme was adjusted is than little to avoid the theme under the different knowledge points too big because of the influence that receives theme type or knowledge point quantity, guarantee final definite the accuracy of the initial degree of difficulty label of theme.

In an embodiment, obtain after the initial degree of difficulty label of topic, electronic equipment still need acquire the needle to the interactive data of topic, interactive data characterization user is right the condition of answering of topic, then electronic equipment basis the initial degree of difficulty label of topic establishes the prior distribution about the topic degree of difficulty, and is based on interactive data is right the prior distribution is revised, acquires the posterior distribution about the topic degree of difficulty, and according to posterior distribution confirms the degree of difficulty label after the topic is revised, at last according the degree of difficulty label after the topic is revised carries out the topic recommendation.

The method comprises the steps of using Bayesian inference to conduct iterative updating of subject difficulty, firstly establishing prior distribution aiming at the subject difficulty based on an initial difficulty label of the subject, then correcting the prior distribution based on interactive data to obtain posterior distribution aiming at the subject difficulty, when the interactive data is less, the interactive data has a smaller regulating effect on the subject difficulty, and correspondingly, the initial difficulty has a larger influence on the subject difficulty; when the interactive data is more, the adjusting effect of the interactive data on the problem difficulty is larger, correspondingly, the influence of the initial difficulty on the problem difficulty is reduced, so that the initial difficulty label of the problem is effectively balanced, the influence of the interactive data on the problem difficulty is effectively balanced, the accidental data error of the interactive data is favorably eliminated, the accuracy and the stability of the finally obtained problem difficulty are ensured, and the follow-up effect of accurately recommending the problem based on the difficulty label corrected by the problem is improved.

In an embodiment, the interactive data at least includes the answering time of each question and the correct and wrong condition of answering the question by the user, and considering that repeated answering can make the question accuracy higher, but this reduces the relevance of the question accuracy and the question difficulty, based on this, the electronic device filters other interactive data except the interactive data obtained by the first answering for a plurality of times of the same question by the same user, thereby improving the accuracy of the subsequent steps. In addition, considering that the answer time difference of students in different levels is large, if the interactive data of all students aiming at the question is obtained, the accurate question difficulty may not be reflected, so that the electronic equipment filters the interactive data of which the answer time is not within the preset range for the interactive data corresponding to the same question. It can be understood that the preset range may be specifically set according to an actual application scenario, which is not limited in the embodiment of the present application, for example, if the specified response time is 20min, the interaction data with the response time between 5min and 15min may be obtained, and the interaction data with the response time not between 5min and 15min may be filtered. In the embodiment, considering that repeated answering can lead to higher question accuracy and the problem that students at different levels have larger answering time difference, corresponding interactive data are filtered, and the accuracy of subsequent operation is improved.

In an embodiment, after obtaining the initial difficulty label of the topic and the interactive data for the topic, the electronic device establishes a prior distribution related to the topic difficulty based on the initial difficulty label of the topic, and corrects the prior distribution based on the interactive data to obtain a posterior distribution related to the topic difficulty, that is, the prior distribution + the interactive data → the posterior distribution.

Wherein the interactive data packet of the topicThe method comprises the steps that the user is included for the right and wrong condition of answering the questions, the condition that the user is only related to the question difficulty is assumed, the probability of wrong answering by the user is made to be the binomial distribution of the question difficulty according to the interactive data, the number of the interactive data is recorded to be n, the question difficulty is recorded to be theta, the probability of wrong answering by the user is made to be the binomial distribution of the question difficulty, and X-b (n, theta) is provided to show that the probability X of wrong answering by the user obeys the binomial distribution of parameters n and theta. According to the conjugate prior property of Beta distribution and binomial distribution, Beta distribution is used as prior distribution, the probability of user response error is made to be binomial distribution of topic difficulty according to the interaction data, and the posterior distribution of the topic difficulty obtained through Bayesian inference is also Beta distribution. Therefore, based on the characteristics, in the embodiment of the present application, beta distribution is used as prior distribution about difficulty of topics, and if the initial difficulty label of the topic is not [0,1]]The electronic device can map the initial difficulty tag of the topic to [0,1]]In particular, the electronic device may obtain two parameters of the beta distribution according to the mapped initial difficulty label, thereby determining the beta distribution, where the two parameters of the beta distribution are respectively α and β, and the mapped initial difficulty label is θ, if so, then there is a difference between the two parameters of the beta distribution and the two parameters of the beta distribution

Then the prior distribution of difficulty for the topic can be represented by beta (α, β).

In an embodiment, the interactive data at least includes a correct-error condition of a result indicating whether the question is answered correctly or not, then the electronic device can count the number of correct answers and the number of wrong answers in the interactive data according to the correct-error condition of the question answered by the user, and then correct the prior distribution according to the number of correct answers and the number of wrong answers in the interactive data to obtain the posterior distribution about the question difficulty.

That is, there is beta (α, β) indicating difficulty with respect to the subjectPrior distribution, wherein the total number of user answers in the interactive data is n, and the number of answer errors is n_wThen there is beta (alpha, beta) + (n)_w，n-n_w)→beta(α+n_w，β+n-n_w) That is, the posterior distribution of the difficulty of the question can be expressed as beta (alpha + n)_w，β+n-n_w) Wherein n-n_wIndicating the amount of user response correctness in the interaction data.

In one implementation, the difficulty label of the subject after correction can be an expected value of the posterior distribution. In one example, the difficulty label of the subject after correction is set as θ_newThen, then

Further, the variance of the difficulty label after the problem correction can be obtained according to the posterior distribution, and if the variance is S, the variance is obtained

The variance of the subject-corrected difficulty label can be used to represent the stability of the subject-corrected difficulty label. In an implementation manner, the initial difficulty labels of the questions may be corrected for multiple times according to the interactive data acquired multiple times, and more stable difficulty labels after the questions are corrected may be selected according to variances of the difficulty labels after the questions are corrected. In another implementation manner, considering that the more the interactive data is, the smaller the variance of the difficulty label after the problem correction is, the more the interactive data can be obtained as much as possible, and the prior distribution constructed by the initial difficulty label of the problem is corrected.

Correspondingly, referring to fig. 2, an embodiment of the present application further provides a second topic recommendation method, where the method includes:

in step S201, topic information of a plurality of topics is obtained, where the topic information includes knowledge points related to the topics.

In step S202, the similarity between every two topics belonging to the same knowledge point is determined according to the topic information.

In step S203, if the topics correspond to the initial difficulty label, determining one or more similar topics of the topics according to the similarity between every two topics belonging to the same knowledge point.

In step S204, the initial difficulty label of the topic is modified based on the initial difficulty labels of the one or more similar topics.

In step S205, interactive data for the topic is acquired; and the interactive data represents the answering condition of the user to the question. Similar to step S101, the description is omitted here.

In step S206, a priori distribution about the topic difficulty is established according to the initial difficulty label of the topic, and the priori distribution is corrected based on the interactive data, so as to obtain a posterior distribution about the topic difficulty. Similar to step S102, the description is omitted here.

In step S207, the difficulty label of the modified title is determined according to the posterior distribution. Similar to step S103, the description is omitted here.

In step S208, a question is recommended according to the difficulty label after the question is corrected. Similar to step S104, the description is omitted here.

Under the condition that the questions correspond to the question difficulty labels, considering the problem of errors of manual labeling, for unified questions, the measurement standards and difficulty scales of different objects are not completely the same, so that the difficulty of the same question or the same type of question is easy to have obvious differences, and if the result of manual labeling is directly used, errors can be transmitted to influence the effect of subsequent operation.

Based on this, the electronic equipment is in the condition that the topic corresponds to there is initial degree of difficulty label, right the initial degree of difficulty label of topic is revised, electronic equipment first acquires the topic information of a plurality of topics, the topic information includes the knowledge point that the topic relates to, then according to the topic information confirms the similarity between two items that belong to under the same knowledge point, again according to the similarity between two items that belong to under the same knowledge point confirms one or more similar topics of topic, at last revise the initial degree of difficulty label of this topic based on the initial degree of difficulty label of one or more similar topics. The embodiment realizes the correction of the questions with the initial difficulty labels, and uniformly corrects the questions with high similarity, so that the questions with high similarity have the same or similar initial difficulty labels, the uniformity and the normalization of the measurement standards of the question difficulty are improved, and the accuracy of subsequent operation is also improved.

In an embodiment, the question information further includes question stem information and question parsing information, and the question stem information and the question parsing information are displayed in a text mode and/or an image mode, wherein and/or represent one or both of the question stem information and the question parsing information; when the similarity between every two topics belonging to the same knowledge point is obtained, firstly, the electronic device may obtain the text vector of the topic according to the question stem information and/or the question parsing information displayed in a text manner, and/or obtain the image feature of the topic according to the question stem information and/or the question parsing information displayed in an image manner.

The embodiment of the present application does not limit the manner of obtaining the text vector or the image feature; in one example, the electronic device may convert the question stem information and the question parsing information, which are displayed in a text manner, into a text vector based on a Word2vec model, a glove model, an ELMo model, a bert model, a TF-IDF method, or the like; in one example, the image feature may be a pixel-based feature, such as the image feature may be a pixel value of an image; in one example, the electronic device may extract image features from the theme information and the theme resolution information displayed in an image manner based on a feature extraction algorithm such as a SIFT (scale invariant feature transform) algorithm, a SURF algorithm, or a histogram of oriented gradients.

Then, after the text vectors and/or the image features are obtained, determining text similarity according to the distance between the text vectors of the topics belonging to the same knowledge point by the electronic equipment, and/or determining image similarity according to the distance between the image features of the topics belonging to the same knowledge point by the electronic equipment; the specific calculation mode of the distance between the text vectors of the topics belonging to the same knowledge point and the distance between the image features of the topics belonging to the same knowledge point is not limited at all, and can be specifically set according to an actual application scene; in one example, the electronic device can calculate a cosine distance, a euclidean distance, a manhattan distance, a hamming distance, a chebyshev distance, or the like between text vectors and/or image vectors of topics belonging to the same knowledge point.

And finally, the electronic equipment determines the similarity between every two topics belonging to the same knowledge point according to the text similarity and/or the image similarity. When the electronic device determines similarity between every two topics belonging to the same knowledge point according to the text similarity and the image similarity, a weighting coefficient may be set for the text similarity and the image similarity, and the similarity between every two topics belonging to the same knowledge point is obtained in a weighted summation manner, that is, the similarity between every two topics belonging to the same knowledge point is text similarity and weighting coefficient + image similarity and weighting coefficient; the value of the weighting coefficient is between 0 and 1, and the specific value of the weighting coefficient can be specifically set according to the actual application scene. In the embodiment, only the similarity between the topics belonging to the same knowledge point is calculated, so that the quasi-determination and comparability of the similarity between the determined topics are ensured.

In an embodiment, a first threshold related to similarity may be set according to an actual application scenario, and then the electronic device may determine whether the other topics are similar topics of the topic according to whether the similarity between the topic and each of the other topics belonging to the same knowledge point is higher than the first threshold, and if the similarity is higher than the first threshold, the other topics are similar topics of the topic, that is, the similar topics of the topic may be topics with the similarity of the topic higher than the first threshold.

In an embodiment, after determining one or more similar topics for the topic, the electronic device can modify the initial difficulty tag for the topic based on the initial difficulty tag for the one or more similar topics.

If the initial difficulty label of the question is a discrete value, the electronic equipment takes the initial difficulty label of the similar question and the one with the highest frequency number in the initial difficulty label of the question as the corrected initial difficulty label of the question; in one example, for example, the difficulty label is set to (1,2,3)3 grades, the difficulty label 1 represents simple, the difficulty label 2 represents medium, the difficulty label 3 represents difficulty, the initial difficulty label of the topic is 1, the topic has 3 similar topics, the initial difficulty label of the similar topic 1 is 2, the initial difficulty label of the similar topic 2 is 3, and the initial difficulty label of the similar topic 3 is 2, wherein the frequency of the difficulty label 2 is the highest, and then the electronic device corrects the initial difficulty labels of the topics to 2. In this embodiment, the questions with high similarity are uniformly corrected, so that the questions with high similarity have the same or similar initial difficulty labels, and the uniformity and normalization of the measurement standards of the question difficulty are improved.

And if the initial difficulty label of the topic is a continuous value, the electronic equipment takes the weighted average result of the initial difficulty labels of the similar topics and the initial difficulty label of the topic as the corrected initial difficulty label of the topic. In one example, weighted summation may be performed based on similarity, that is, the similarity is used as a weighting coefficient, the topic has 2 similar topics, which are respectively similar topic 1 and similar topic 2, the similarity of the topic to similar topic 1 is a1, the initial difficulty label of similar topic 1 is B1, the similarity of the topic to similar topic 2 is a2, the initial difficulty label of similar topic 2 is B2, and the initial difficulty label of the modified topic is C, then C ═ is (a1 × B1+ a2 × B2)/(a1+ a 2). In one example, for example, in the interval where the difficulty label is set to [0,1], the difficulty label of the topic is 0.2, the topics have 2 similar topics, which are respectively similar topic 1 and similar topic 2, the initial difficulty label of the similar topic 1 is 0.3, the initial difficulty label of the similar topic 2 is 0.4, the similarity of the topic to the similar topic 1 is 0.6, and the similarity to the similar topic 2 is 0.8, and the initial difficulty label of the topic after modification is 0.36((0.3 × 0.6+0.4 × 0.8)/(0.6+ 0.8)).

Correspondingly, referring to fig. 3, an embodiment of the present application further provides a third topic recommendation method, where the method includes:

in step S301, topic information of a plurality of topics is obtained, where the topic information includes knowledge points related to the topics.

In step S302, the similarity between every two topics belonging to the same knowledge point is determined according to the topic information.

In step S303, if the topics do not correspond to the initial difficulty label, clustering the topics into one or more topic sets according to the similarity between every two topics belonging to the same knowledge point.

In step S304, a representative topic is determined from the topic set and an initial difficulty tag of the representative topic labeled by the user is obtained.

In step S305, the initial difficulty label of the representative topic is also determined as the initial difficulty labels of the topics other than the representative topic in the topic set.

In step S306, interactive data for the topic is acquired; and the interactive data represents the answering condition of the user to the question. Similar to step S101, the description is omitted here.

In step S307, a priori distribution about the topic difficulty is established according to the initial difficulty label of the topic, and the priori distribution is corrected based on the interactive data, so as to obtain a posterior distribution about the topic difficulty. Similar to step S102, the description is omitted here.

In step S308, the difficulty label of the modified title is determined according to the posterior distribution. Similar to step S103, the description is omitted here.

In step S309, a question is recommended according to the difficulty label after the question is corrected. Similar to step S104, the description is omitted here.

If the questions do not correspond to the initial difficulty labels, considering that although the current questions have various content patterns, if the questions belong to the same knowledge point, a plurality of questions are changed from soup to medicine, some questions are different in numerical value, and some questions are different in text background, and the used problem solving methods are the same because the questions are based on the content of the same knowledge point.

Based on the method, the electronic equipment can divide the highly similar topics together through clustering, and one of the topics is selected as a representative to be labeled, so that the labeled data volume is reduced, and the labor is saved; the title does not correspond under the condition that there is initial degree of difficulty label, electronic equipment acquires the title information of a plurality of titles at first, the title information includes the knowledge point that the title relates to, then according to title information confirms that the title belongs to under the same knowledge point degree of similarity between two liang of titles, according to the degree of similarity between two liang of titles under the same knowledge point will a plurality of titles cluster into one or more title sets, follow confirm in the title set that the representative title and acquire the user mark the initial degree of difficulty label of representative title, will at last the initial degree of difficulty label of representative title also confirms as the initial degree of difficulty label of other titles except for the representative title in the title set. In the embodiment, when the problem difficulty is not an initial value, manual labeling is a time-consuming and labor-consuming work, so that the similarity among the problems is utilized, the highly similar problems are divided together through clustering, one of the highly similar problems is selected as a representative to be labeled, a user only needs to label the initial difficulty label of the representative problem, and the other problems under the condition are subjected to unified difficulty initialization by using the initial difficulty label of the representative problem, so that the amount of required labeling can be reduced, the labeling cost is reduced, and the labeling efficiency is also improved.

In an embodiment, the question information further includes question stem information and question parsing information, and the question stem information and the question parsing information are displayed in a text mode and/or an image mode, wherein and/or represent one or both of the question stem information and the question parsing information; when the similarity between every two topics belonging to the same knowledge point is obtained, firstly, the electronic device may obtain the text vector of the topic according to the question stem information and/or the question parsing information displayed in a text manner, and/or obtain the image feature of the topic according to the question stem information and/or the question parsing information displayed in an image manner. The embodiment of the present application does not limit the manner of obtaining the text vector or the image feature; in one example, the electronic device may convert the question stem information and the question parsing information, which are displayed in a text manner, into a text vector based on a Word2vec model, a glove model, an ELMo model, a bert model, a TF-IDF method, or the like; in one example, the image feature may be a pixel-based feature, such as the image feature may be a pixel value of an image; in one example, the electronic device may extract image features from the theme information and the theme resolution information displayed in an image manner based on a feature extraction algorithm such as a SIFT (scale invariant feature transform) algorithm, a SURF algorithm, or a histogram of oriented gradients.

In an embodiment, under the condition that the topics do not correspond to the initial difficulty label, the electronic device may cluster the topics into one or more topic sets according to the similarity between every two topics belonging to the same knowledge point, where the topic sets include one or more topics. It can be understood that, in the embodiment of the present application, no limitation is imposed on the used clustering method, and the clustering method may be specifically set according to an actual application scenario, for example, the clustering method may be a K-means method or a hierarchical clustering method.

In an embodiment, after obtaining one or more topic sets, for each topic set, the electronic device may determine, from the topic set, a representative topic, as one implementation manner, and the electronic device may determine a number of topics in the topic set, of which the similarity to other topics is higher than a preset threshold, and use a topic with a largest number as the representative topic, that is, in the topic set, a number of topics with a similarity to the representative topic higher than the preset threshold is the largest.

After selecting the representative topic corresponding to each topic set, the user may label the initial difficulty label of the representative topic, for example, the representative topic may be displayed on an interactive interface, and the user inputs the initial difficulty label of the representative topic on the interactive interface based on his own experience.

After the initial difficulty labels of the representative topics corresponding to each topic set are obtained, the electronic device determines the initial difficulty labels of the representative topics as the initial difficulty labels of other topics except the representative topics in the corresponding topic sets. In the embodiment, the user only needs to label the representative title of the title set without labeling all the titles in the title set, so that the amount of labels needed can be reduced, the labeling cost is reduced, and the labeling efficiency is improved; on the other hand, the topics with high similarity have the same or similar initial difficulty labels, and the uniformity and the normalization of the measuring standard of the topic difficulty are improved.

Embodiments of a topic recommendation apparatus, an electronic device, and a computer-readable storage medium are also provided, corresponding to embodiments of a topic recommendation method of the present application.

Referring to FIG. 4, a block diagram of an embodiment of a topic recommendation apparatus of the present application is shown, the apparatus comprising:

a topic data obtaining module 401, configured to obtain an initial difficulty tag of a topic and interactive data for the topic; and the interactive data represents the answering condition of the user to the question.

The topic difficulty correction module 402 is configured to establish prior distribution related to topic difficulty according to the initial difficulty label of the topic, correct the prior distribution based on the interaction data, and obtain posterior distribution related to the topic difficulty.

A topic difficulty determining module 403, configured to determine the difficulty label of the revised topic according to the posterior distribution.

And a topic recommendation module 404, configured to recommend a topic according to the difficulty label after the topic is corrected.

In an embodiment, the topic data obtaining module 401 includes:

the topic information acquisition unit is used for acquiring topic information of a plurality of topics, and the topic information comprises knowledge points related to the topics.

And the similarity determining unit is used for determining the similarity between every two topics belonging to the same knowledge point according to the topic information.

And the similar topic determining unit is used for determining one or more similar topics of the topics according to the similarity between every two topics belonging to the same knowledge point if the topics correspond to the initial difficulty label.

An initial difficulty label correcting unit, configured to correct the initial difficulty label of the question based on the initial difficulty labels of the one or more similar questions.

In an embodiment, the topic data obtaining module 401 includes:

And the topic set determining unit is used for clustering the plurality of topics into one or more topic sets according to the similarity between every two topics belonging to the same knowledge point if the topics do not correspond to the initial difficulty label.

And the representative topic determining and labeling unit is used for determining a representative topic from the topic set and acquiring an initial difficulty label of the representative topic labeled by a user.

And the initial difficulty label determining unit is used for determining the initial difficulty labels of the representative titles as the initial difficulty labels of other titles except the representative titles in the title set.

In an embodiment, the question information further includes question stem information and question parsing information, and the question stem information and the question parsing information are displayed in a text mode and/or an image mode.

The similarity determination unit includes:

and the text vector and/or image feature determining subunit is used for acquiring a text vector of the question according to the question stem information and/or the question parsing information displayed in a text mode, and/or acquiring an image feature of the question according to the question stem information and/or the question parsing information displayed in an image mode.

The similarity determining subunit is used for determining text similarity according to the distance between the text vectors of the topics belonging to the same knowledge point, and/or determining image similarity according to the distance between the image features of the topics belonging to the same knowledge point; and determining the similarity between every two topics belonging to the same knowledge point according to the text similarity and/or the image similarity.

In one embodiment, the initial difficulty tag modification unit includes:

and the first correction subunit is used for taking the initial difficulty label of the similar question and the highest frequency number of the initial difficulty labels of the questions as the corrected initial difficulty label of the questions if the initial difficulty label of the questions is a discrete value.

And the second correction subunit is configured to, if the initial difficulty label of the topic is a continuous value, use a weighted average result of the initial difficulty labels of the similar topics and the initial difficulty label of the topic as the corrected initial difficulty label of the topic.

In one embodiment, in the topic set, the number of topics having similarity higher than a preset threshold with the representative topic is the largest.

In an embodiment, after the thematic data obtaining module 401, the method further includes:

and the initial difficulty label adjusting module is used for adjusting the initial difficulty label of the question according to the attribute of the question and acquiring the adjusted initial difficulty label.

The topic difficulty correction module 402 further comprises: and establishing prior distribution aiming at the problem difficulty according to the initial difficulty label after the problem adjustment.

In one embodiment, the topic attributes include at least one of: topic type and the number of knowledge points to which the topic relates.

In one embodiment, the interaction data includes at least a user response time to the topic.

After the topic data obtaining module 401, the method further includes:

the interactive data filtering module is used for filtering other interactive data except the interactive data obtained by the first answer for a plurality of interactive data obtained by the same user answering the same question for a plurality of times; and/or filtering the interactive data corresponding to the same question, wherein the response time is not in a preset range.

In one embodiment, the interactive data at least comprises the correct and wrong condition of the user to answer the question; the prior distribution and the posterior distribution are both beta distributions.

The topic difficulty correction module 402 further comprises:

and the counting unit is used for counting the number of correct answers and the number of wrong answers of the user in the interactive data according to the correct and wrong answers of the user to the questions.

And the correction unit is used for correcting the prior distribution according to the number of correct answers and the number of wrong answers of the user in the interactive data.

In one embodiment, the difficulty label of the subject after correction is the expected value of the posterior distribution.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

Correspondingly, as shown in fig. 5, the present application further provides an electronic device 50, which includes a processor 51; a memory 52 for storing executable instructions, wherein the processor 51, when executing the executable instructions, is configured to:

establishing prior distribution related to the problem difficulty according to the initial difficulty label of the problem, and correcting the prior distribution based on the interactive data of the user to obtain posterior distribution related to the problem difficulty;

and determining the difficulty label of the corrected title according to the posterior distribution of the title difficulty.

The Processor 51 executes executable instructions included in the memory 52, and the Processor 51 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 52 stores executable instructions that inherit the title recommendation method, and the memory 52 may include at least one type of storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. Also, the apparatus may cooperate with a network storage device that performs a storage function of the memory through a network connection. The storage 52 may be an internal storage unit of the device 50, such as a hard disk or a memory of the device 50. The memory 52 may also be an external storage device of the device 50, such as a plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) Card, Flash memory Card (Flash Card), etc. provided on the device 50. Further, memory 52 may also include both internal and external storage units of device 50. The memory 52 is used for storing a computer program 55 as well as other programs and data required by the device. The memory 52 may also be used to temporarily store data that has been output or is to be output.

The various embodiments described herein may be implemented using a computer-readable medium such as computer software, hardware, or any combination thereof. For a hardware implementation, the embodiments described herein may be implemented using at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a processor, a controller, a microcontroller, a microprocessor, and an electronic unit designed to perform the functions described herein. For a software implementation, the implementation such as a process or a function may be implemented with a separate software module that allows performing at least one function or operation. The software codes may be implemented by software applications (or programs) written in any suitable programming language, which may be stored in memory and executed by the controller.

The electronic device 50 may be a desktop computer, a notebook, a palm top computer, a server, a cloud server, a mobile phone, or other computing devices. The device may include, but is not limited to, a processor 51, a memory 52. Those skilled in the art will appreciate that fig. 5 is merely an example of an electronic device 50 and does not constitute a limitation of electronic device 50 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the device may also include input-output devices, network access devices, buses, etc.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of an apparatus to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium, instructions in the storage medium, when executed by a processor of a terminal, enable the terminal to perform the above-described method.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

The present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed.

Claims

1. A title recommendation method, comprising:

2. The method of claim 1, wherein obtaining an initial difficulty label for a topic comprises:

3. The method of claim 1, wherein obtaining an initial difficulty label for a topic comprises:

4. The method according to claim 2 or 3, wherein the question information further comprises question stem information and question parsing information, and the question stem information and the question parsing information are displayed in a text mode and/or an image mode;

5. The method of claim 2, wherein the modifying the initial difficulty label for the topic based on the initial difficulty labels for the one or more similar topics comprises:

6. The method of claim 3, wherein a number of topics in the set of topics having a similarity to the representative topic above a preset threshold is at a maximum.

7. The method of claim 1, further comprising, after said obtaining an initial difficulty label for a topic:

8. The method of claim 7, wherein the theme attributes include at least one of: topic type and the number of knowledge points to which the topic relates.

9. The method of claim 1, wherein the interaction data comprises at least a user response time to the topic;

10. The method of claim 9, wherein the interaction data comprises at least a user's positive or negative answer to the question;

the modifying the prior distribution based on the interaction data includes:

11. The method of claim 10, wherein the topic-corrected difficulty label is an expected value of the posterior distribution.

12. A title recommendation device, comprising:

13. An electronic device, comprising:

a processor;

a memory for storing executable instructions;

wherein the processor, when executing the executable instructions, is configured to implement the method of any of claims 1 to 11.

14. A computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the steps of the method of any one of claims 1 to 11.