CN110737771B

CN110737771B - Topic distribution method and device based on big data

Info

Publication number: CN110737771B
Application number: CN201910866615.5A
Authority: CN
Inventors: 孙全智; 耿溟; 孙艺恬
Original assignee: Beijing Tenfen Technology Co ltd
Current assignee: Beijing Tenfen Technology Co ltd
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2022-09-27
Anticipated expiration: 2039-09-12
Also published as: CN110737771A

Abstract

The invention provides a title distribution method and device based on big data, and belongs to the technical field of computer information. The invention provides a big data-based question distribution method which comprises a data establishment stage and a question distribution stage, wherein in the data establishment stage, users are classified into a plurality of user category groups according to answer data of the users, and then the difficulty coefficient of each question under each user category group is determined according to the answer data of each user category group. In the topic distribution stage, a user category group of a user and the difficulty coefficient of each topic in the group are obtained, and then the topic with the proper difficulty coefficient is selected and distributed to the user. Because the difficulty of each question is determined according to the individual answer data of the user and the answer data of each user in the group to which the user belongs, and then the question with proper difficulty is selected and distributed to the user, the difficulty of the question can be more accurately evaluated, and meanwhile, the question with proper difficulty can be distributed according to the personalized requirements of the user.

Description

Question distribution method and device based on big data

Technical Field

The invention belongs to the technical field of computer information, and particularly relates to a question allocation method and device based on big data.

Background

The application of online answers such as online learning, online examination, online answer games and the like is the current trend, and currently, on some websites or application programs for online answers, when a user wants to answer, several ways of assigning questions are generally adopted, for example, the questions are randomly extracted from a question bank and assigned to the user; for another example, the user selects an answer mode, and all users answer the same question in the same mode; for another example, the difficulty of the questions is evaluated according to the personal answering conditions of the user, and the questions with moderate difficulty are selected and distributed to the user for answering. The problem distribution modes can not distribute problems according to the personal condition of each user, or difficulty evaluation can be carried out only according to the personal answer condition of the user, the contingency is high, the difficulty evaluation is inaccurate, and therefore a good learning effect cannot be achieved.

Disclosure of Invention

The invention aims to solve at least one of the technical problems in the prior art, and provides a question allocation method based on big data, which can accurately evaluate the difficulty of questions and allocate the questions according to the evaluated difficulty and the personalized requirements of users, thereby achieving better answering effect.

The technical scheme adopted for solving the technical problem of the invention is a title distribution method based on big data, which comprises the following steps:

a data establishing stage:

establishing a plurality of special topics, and respectively establishing question banks for the plurality of special topics, wherein each question bank comprises a plurality of questions;

classifying each user into a plurality of user category groups through a classifier obtained by pre-training according to answer data of each user;

determining difficulty coefficients of all the questions under each user category group according to preset algorithms according to answer data of all users in each user category group;

a topic distribution stage:

determining a special question and a question answering mode selected by a user;

acquiring a user category group to which the user belongs, and acquiring a difficulty coefficient of each topic in the topics selected by the user according to the user category group;

and selecting questions with matched difficulty coefficients from the questions to be distributed to the user for answering according to a preset difficulty coefficient interval of the answering mode selected by the user.

According to the method provided by the invention, the difficulty coefficient of each question is determined according to the individual answer data of the user and the answer data of each user in the user category group to which the user belongs, and then the question with the proper difficulty coefficient is selected and distributed to the user for answering according to the special question and the answer mode selected by the user, so that the difficulty of the question can be more accurately evaluated, and then the question is distributed according to the evaluated difficulty and the personalized requirement of the user, so that a better answer effect can be achieved.

Preferably, in the above method provided by the present invention, the method further comprises:

and after the user finishes answering, updating each question made by the user at this time according to the answer data of the user at this time and a preset algorithm, wherein the difficulty coefficient is under the user category group to which the user belongs.

Preferably, in the above method provided by the present invention, each topic in the topic library includes at least one tag; the classifying of each user into a plurality of user category groups by a classifier obtained through pre-training according to the answer data of each user specifically includes:

respectively generating associated data of each user and a label according to the answer data of the question answered by each user and the label included in the question answered by each user; the associated data is the number of the questions answered by the user and the number of the questions answered by the user under the label included by the questions answered by each user;

and classifying each user into a plurality of user category groups according to the associated data and a classifier obtained by pre-training.

Preferably, in the method provided by the present invention, the pre-trained classifier uses a clustering algorithm to classify each user into a plurality of user category groups.

Preferably, in the above method provided by the present invention, the clustering algorithm includes any one of a K-means clustering algorithm, a center point clustering algorithm, and a random selection clustering algorithm.

Preferably, in the method provided by the present invention, the preset algorithm satisfies the following condition:

wherein K1 is the initial difficulty coefficient of the topic; the AC is the total number of users of the user category group to which the user belongs; the ACF is the number of users who have done the topic in the total number of users; ACFR is the number of users who have done the answer to the track.

Correspondingly, the invention also provides a title distribution device based on big data, which comprises: a data establishing unit and a topic distributing unit;

the data establishing unit specifically includes:

the system comprises an item bank establishing module, a plurality of item banks and a plurality of item bank setting modules, wherein the item bank establishing module is used for establishing a plurality of special items and respectively establishing item banks for the plurality of special items, and each item bank comprises a plurality of items;

the user classification module is used for classifying each user into a plurality of user category groups through a classifier obtained through pre-training according to answer data of each user;

the difficulty calculation module is used for determining the difficulty coefficient of each question under each user category group according to a preset algorithm and the answer data of each user in each user category group;

the title allocation unit specifically includes:

the selection module is used for determining the special questions and answer modes selected by the user;

the acquisition module is used for acquiring a user category group to which the user belongs and acquiring difficulty coefficients of all the topics in the special topics selected by the user according to the user category group;

and the distribution module is used for selecting questions with matched difficulty coefficients from the questions to distribute to the user for answering according to a difficulty coefficient interval preset in the answering mode selected by the user.

Preferably, in the above apparatus provided by the present invention, the apparatus further comprises:

and the difficulty updating unit is used for updating each question made by the user at this time according to the answer data of the user at this time and a preset algorithm after the user finishes answering the questions, and the difficulty coefficient under the user category group to which the user belongs.

Preferably, in the above apparatus provided by the present invention, each topic in the topic library includes at least one tag; the user classification module specifically comprises:

the first module is used for respectively generating associated data of each user and a label according to the answer data of the question answered by each user and the label included in the question answered by each user; the associated data is the number of the questions answered by the user and the number of the questions answered by the user under the label included by the questions answered by each user;

and the second module is used for classifying each user into a plurality of user category groups according to the associated data and the classifier obtained by pre-training.

Preferably, in the apparatus provided by the present invention, in the second module, the pre-trained classifier classifies each user into a plurality of user category groups by using a clustering algorithm.

Preferably, in the above apparatus provided by the present invention, the clustering algorithm includes any one of a K-means clustering algorithm, a center point clustering algorithm, and a random selection clustering algorithm.

Preferably, in the above apparatus provided by the present invention, in the difficulty calculating module and/or the difficulty updating unit, the preset algorithm satisfies:

Drawings

Fig. 1 is a flowchart of a data establishment phase in a method for assigning titles based on big data according to this embodiment;

fig. 2 is a flowchart of a topic allocation stage in the method for allocating topics based on big data according to this embodiment;

fig. 3 is a detailed flowchart of step 12 in the data establishment phase in the method for assigning titles based on big data according to this embodiment;

fig. 4 is a schematic structural diagram of a title distribution device based on big data according to this embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The shapes and sizes of the various elements in the drawings are not to scale and are merely intended to facilitate an understanding of the contents of the embodiments of the invention.

The embodiment provides a title distribution method based on big data, which comprises the following steps:

as shown in fig. 1, in the data establishment phase:

s11, establishing a plurality of topics, and respectively establishing question banks for the topics, wherein each question bank comprises a plurality of questions.

Specifically, a plurality of topics are established according to needs, for example, mathematics, politics and English topics are respectively established, and then, an item base is respectively established for the plurality of topics according to contents required by the topics, wherein the item base of each topic comprises a plurality of items.

Further, after the topic library of each topic is established, establishing initialization information for each topic in each topic library, where the initialization information of each topic may include an initial difficulty coefficient K1 of the topic and a label of the topic, where the initial difficulty coefficients K1 of the topics are the same, and K1 may be assigned arbitrarily. The label of each topic can be determined according to the content of the topic, and each topic at least comprises one label. For example, in a topic library of historical topics, if there is a topic, the topic content is "which year the disorder of history occurs? "the labels such as" chinese history "," down dynasty "," year "and the like may be established for the title, and the specific label may be established as required, which is not limited herein.

And S12, classifying the users into a plurality of user category groups through a classifier obtained through pre-training according to the answer data of the users.

Specifically, the users can be classified according to the historical answer data of the users and the labels of the answered questions. As shown in fig. 2, S12 may specifically include:

s121, respectively generating associated data of each user and a label according to answer data of the question answered by each user and the label included in the question answered by each user; the associated data may be the number of questions answered by the user and the number of questions answered by the user under the label included in the questions answered by each user.

Specifically, all tags included in the questions answered by the user are obtained, the number of questions answered by the user and the number of questions answered by the user under each tag are obtained, and associated data of the user and each tag are generated.

For example, for a user a, the user a answers 10 questions, 9 questions are answered in 10 questions, the 10 questions include a label M and a label N, under the label M, the user a answers 7 questions and 6 questions, under the label N, the user a answers 3 questions and answers 3 questions, and then associated data of the user a, the label M and the label N can be generated according to answer data of the 10 questions answered by the user a, which is shown in table 1-1:

user' s	Label (R)	Number of pairs/number of questions	Accuracy rate
				A	M	6/7	85％
A	N	3/3	100％

TABLE 1-1

Of course, the associated data may be in other forms, may include other contents, and may be specifically designed according to needs, and is not limited herein.

It should be noted that the answer data of the user specifically includes the question answered by the user and information about whether the answered question is answered or not answered.

And S122, classifying the users into a plurality of user category groups according to the associated data of the users and a classifier obtained by pre-training.

Specifically, according to the associated data of each user, the classifier obtained by pre-training may classify each user into a plurality of user class groups by using a clustering algorithm. The classifier obtains the associated data of each user, takes the associated data of each user as input data, takes each label as a dimension, selects a proper metric, such as Euclidean metric or Manhattan distance metric, performs cluster analysis on each user to obtain a plurality of data clusters, each data cluster is each user category group, and each data in the data clusters is each user. According to the questions answered by the users under the labels and the accuracy of the answered questions, the users are divided into a plurality of user category groups, namely in the same user category group, the relevance of the answers of the users is the highest, so that the users can be classified more accurately, and the problem that the classification is inaccurate due to the fact that a single variable is used for classifying the users is avoided. Of course, other methods may be adopted to classify the users, and the specific method may be designed according to actual needs, and is not limited herein.

Optionally, the classifier obtained by pre-training may adopt various types of clustering algorithms to classify the users, for example, any one of a K-means clustering algorithm, a center point clustering algorithm, and a random selection clustering algorithm. Taking the classification of users by adopting a K-means clustering algorithm as an example, if the users need to be classified into K user category groups, establishing a multi-dimensional coordinate system by using the associated data of each user, firstly randomly selecting the associated data of K users as an initial clustering center, then calculating the distance between the associated data of each user and the associated data of each user serving as the clustering center, and allocating each user to the user serving as the clustering center closest to the user according to the calculation. If all the users are distributed, K data clusters are obtained, and then K new users serving as clustering centers are calculated according to the position of each user in the K data clusters. This process is repeated until a termination condition is met, which can be set as desired, e.g., no (or a minimum number) of users are reassigned to different data clusters, or no (or a minimum number) of cluster centers of the data clusters change. Specifically, the design is as required, and is not limited herein.

Further, the user category group of each user may be updated according to the answer data of the user.

And S13, according to the answer data of each user in each user category group, determining the difficulty coefficient of each question in the user category group according to a preset algorithm.

Specifically, the preset algorithm satisfies:

wherein K1 is the initial difficulty coefficient of the topic; the AC is the total number of users of the user category group to which the user belongs; the ACF is the number of users who have done the topic in the total number of users of the user category group to which the user belongs; ACFR is the number of users who have done the answer to the track. Of course, the difficulty coefficient of the topic may also be calculated in other manners, and the specific design is designed according to the requirement, which is not limited herein.

For a user category group, a first item of a preset algorithm formula is the basic difficulty of a question in the user category group, a second item of the preset algorithm formula is the accuracy rate of answering the question of the user category group, the difficulty coefficient of the question is evaluated by combining big data, namely answer data of a plurality of users, and the difficulty of the question is evaluated by integrating the basic difficulty and the accuracy rate in the user category group, so that the difficulty of the question can be evaluated more accurately, and the difficulty coefficient caused by the fact that a single variable is used for calculating the difficulty coefficient of the question is avoided from being inaccurate. On the other hand, when the difficulty coefficient of one topic is calculated, different difficulty coefficients are calculated for different user category groups, so that when the topics are distributed, the topic distribution is favorably carried out according to the individual requirements of the users.

As shown in fig. 3, in the title assignment phase:

and S21, determining the special subject and the answering mode selected by the user.

Specifically, besides setting multiple special questions, multiple answer modes may also be set, for example, if the method provided in this embodiment is applied to learning software, a learning mode, an examination mode, an easy-to-error-question checking mode, and other modes may be set, each answer mode is preset with a different difficulty coefficient interval, and a user may select an answer mode according to personal requirements.

S22, obtaining a user category group to which the user belongs, and obtaining the difficulty coefficient of each topic in the topics selected by the user according to the user category group to which the user belongs.

Specifically, each topic has different difficulty coefficients in different user category groups, so after the topic selected by the user is determined, the user category group to which the user belongs can be obtained, and then the difficulty coefficient of each topic in the topic selected by the user in the user category group is obtained according to the user category group to which the user belongs.

And S23, selecting the questions matched with the difficulty coefficient from the questions in the special questions selected by the user according to the difficulty coefficient interval preset by the answer mode selected by the user, and distributing the selected questions to the user for answering.

Specifically, the questions with the difficulty coefficient in the difficulty coefficient interval of the mode selected by the user are screened out from the question bank selected by the user, and then the selected questions are distributed to the user according to a certain rule for answering, for example, according to a rule that the difficulty coefficient is from low to high. Furthermore, each answer mode is preset with different difficulty coefficient intervals according to the requirement, for example, a difficulty coefficient interval with a larger range can be set in the learning mode, so that the problem coverage allocated by the user is larger. The specific design may be as required, and is not limited herein.

In summary, in the method provided in this embodiment, the difficulty coefficient of each question is determined according to the answer data of the individual user and the answer data of each user in the user category group to which the user belongs, and then the question with the appropriate difficulty coefficient is selected and allocated to the user for answering according to the special question and the answer mode selected by the user, so that the difficulty of the question can be more accurately evaluated, and then the question is allocated according to the evaluated difficulty for the personalized requirement of the user, so as to achieve a better answer effect.

Optionally, in the method provided in this embodiment, the method may further include:

and S31, after the user finishes answering, updating each question made by the user at this time according to the answer data of the user at this time and a preset algorithm, and the difficulty coefficient under the user category group to which the user belongs.

Specifically, after the user finishes doing the question each time, the question that the user has answered this time and the information whether each question is answered are recorded, and then the difficulty coefficient of the question that the user has answered this time is updated according to the algorithm in the S13. The difficulty coefficient of each question is updated along with the change of the user group type, the change of the answer data of each user and the change of a preset algorithm. The difficulty coefficient of the question is dynamically updated by combining the answer data of each user, the difficulty coefficient of the question can be more accurately evaluated, and therefore the question with the proper difficulty coefficient can be better distributed to the user according to the personalized requirements of the user.

Optionally, when the user answers for the first time, questions including each difficulty coefficient may be randomly screened out from the multiple special questions, and distributed to the user for answering, and then the user is classified according to the answer data of the user.

Correspondingly, as shown in fig. 4, the present embodiment further provides a title distribution device based on big data, including: a data establishing unit 1 and a topic allocating unit 2.

Specifically, the data establishing unit 1 specifically includes:

the question bank establishing module 11 is used for establishing a plurality of special subjects and respectively establishing question banks for the plurality of special subjects, wherein each question bank comprises a plurality of questions.

And the user classification module 12 is configured to classify each user into a plurality of user category groups through a classifier obtained through pre-training according to the answer data of each user.

And the difficulty calculating module 13 is configured to determine, according to the answer data of each user in each user category group, a difficulty coefficient of each question in the user category group according to a preset algorithm.

Specifically, the topic allocation unit 2 specifically includes:

and a selection module 21 for determining the special question and answer mode selected by the user.

The obtaining module 22 is configured to obtain a user category group to which the user belongs, and obtain a difficulty coefficient of each topic in the topics selected by the user according to the user category group to which the user belongs.

And the distribution module 23 is configured to select, according to a difficulty coefficient interval preset in the answer mode selected by the user, a question with a matching difficulty coefficient from among the questions in the special question selected by the user, and distribute the selected question to the user for answering.

Optionally, in the above apparatus provided in this embodiment, the apparatus further includes:

and the difficulty updating unit 3 is used for updating each question made by the user at this time according to the answer data of the user at this time and a preset algorithm after the user finishes answering the question, and the difficulty coefficient under the user category group to which the user belongs.

Optionally, in the above apparatus provided in this embodiment, each topic in the topic library building module 11 includes at least one tag. The user classification module 12 specifically includes:

the first module 01 is used for respectively generating associated data of each user and a label according to answer data of the question answered by each user and the label included in the question answered by each user; the associated data is the number of the questions answered by the user and the number of the questions answered by the user under the label included by the questions answered by each user.

A second module 02, configured to classify each user into a plurality of user category groups according to the associated data and a classifier obtained through pre-training.

Optionally, in the apparatus provided in this embodiment, in the second module 02, a pre-trained classifier classifies each user into a plurality of user category groups by using a clustering algorithm.

Optionally, in the apparatus provided in this embodiment, in the second module 02, the clustering algorithm adopted by the pre-trained classifier includes any one of a mean clustering algorithm, a central point clustering algorithm, and a random selection clustering algorithm.

Optionally, in the apparatus provided in this embodiment, in the difficulty calculating module 13 and/or the difficulty updating unit 03, a preset algorithm satisfies:

wherein K1 is the initial difficulty coefficient of the topic; the AC is the total number of users of the user category group to which the user belongs; the ACF is the number of users who have done the topic in the total number of users of the user category group to which the user belongs; ACFR responds the number of users who have done the track to the number of users who have done the track.

In summary, according to the question allocation method based on big data provided by the present invention, the difficulty coefficient of each question is determined according to the answer data of the individual user and the answer data of each user in the user category group to which the user belongs, and then the question with the appropriate difficulty coefficient is selected and allocated to the user for answering according to the special question and the answer mode selected by the user, so that the difficulty of the question can be more accurately evaluated, and then the question is allocated according to the evaluated difficulty and the personalized requirement of the user, thereby achieving a better answer effect.

It will be understood that the above embodiments are merely exemplary embodiments adopted to illustrate the principles of the present invention, and the present invention is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims

1. A title distribution method based on big data is characterized by comprising the following steps:

a data establishment stage:

classifying each user into a plurality of user category groups through a classifier obtained through pre-training according to answer data of each user;

a topic distribution stage:

acquiring a user category group to which the user belongs, and acquiring difficulty coefficients of all the topics in the special topics selected by the user according to the user category group;

selecting questions with matched difficulty coefficients from the questions to be distributed to the user for answering according to a difficulty coefficient interval preset in the answering mode selected by the user;

each question in the question bank comprises at least one label; the classifying of each user into a plurality of user category groups by a classifier obtained through pre-training according to the answer data of each user specifically includes:

classifying each user into a plurality of user category groups according to the associated data and a classifier obtained by pre-training; the relevance of the answers of all users is highest in the same user category group;

the preset algorithm satisfies the following conditions:

wherein K1 is the initial difficulty coefficient of the topic; the AC is the total number of users of the user category group to which the user belongs; the ACF is the number of users who do the topic in the total number of users; ACFR is the number of users who have done the answer to the track.

2. The method of claim 1, further comprising:

3. The method of claim 1, wherein the pre-trained classifier employs a clustering algorithm to classify users into a plurality of user class groups.

4. The method according to claim 3, wherein the clustering algorithm comprises any one of a K-means clustering algorithm, a center point clustering algorithm, and a random selection clustering algorithm.

5. A topic allocation device based on big data is characterized by comprising: a data establishing unit and a topic distributing unit;

the data establishing unit specifically includes:

the difficulty calculation module is used for determining a difficulty coefficient of each question under each user category group according to a preset algorithm according to the answer data of each user in each user category group;

the title allocation unit specifically comprises:

the selection module is used for determining the special subject and the answering mode selected by the user;

the distribution module is used for selecting questions with matched difficulty coefficients from all the questions to distribute to the user for answering according to a difficulty coefficient interval preset in the answering mode selected by the user;

each question in the question bank comprises at least one label; the user classification module specifically comprises:

the first module is used for respectively generating associated data of each user and a label according to the answer data of the question answered by each user and the label included in the question answered by each user; the associated data is the number of the questions answered by each user and the number of the questions answered by the user under the label included by the questions answered by each user;

the second module is used for classifying each user into a plurality of user category groups according to the associated data and a classifier obtained by pre-training; wherein, in the same user category group, the relevance of each user answering is highest;

in the difficulty calculation module and/or the difficulty updating unit, the preset algorithm satisfies the following condition:

wherein, K1 is the initial difficulty coefficient of the subject; the AC is the total number of users of the user category group to which the user belongs; the ACF is the number of users who have done the topic in the total number of users; ACFR responds the number of users who have done the track to the number of users who have done the track.

6. The apparatus of claim 5, further comprising:

and the difficulty updating unit is used for updating each question made by the user at this time according to the answer data of the user at this time and a preset algorithm after the user finishes answering, and the difficulty coefficient under the user category group to which the user belongs.

7. The apparatus of claim 5, wherein the pre-trained classifier in the second module classifies users into a plurality of user class groups using a clustering algorithm.

8. The apparatus of claim 7, wherein the clustering algorithm comprises any one of a K-means clustering algorithm, a center point clustering algorithm, and a random selection clustering algorithm.