WO2022170985A1 - 选题方法、装置、计算机设备和存储介质 - Google Patents
选题方法、装置、计算机设备和存储介质 Download PDFInfo
- Publication number
- WO2022170985A1 WO2022170985A1 PCT/CN2022/074152 CN2022074152W WO2022170985A1 WO 2022170985 A1 WO2022170985 A1 WO 2022170985A1 CN 2022074152 W CN2022074152 W CN 2022074152W WO 2022170985 A1 WO2022170985 A1 WO 2022170985A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- difficulty
- exercise
- electronic
- exercises
- target
- Prior art date
Links
- 238000003860 storage Methods 0.000 title claims abstract description 20
- 238000010187 selection method Methods 0.000 title claims abstract description 6
- 238000000034 method Methods 0.000 claims abstract description 85
- 230000006399 behavior Effects 0.000 claims abstract description 60
- 238000011156 evaluation Methods 0.000 claims description 36
- 238000012545 processing Methods 0.000 claims description 20
- 238000004458 analytical method Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 6
- 230000003247 decreasing effect Effects 0.000 claims 1
- 238000012216 screening Methods 0.000 description 28
- 238000004422 calculation algorithm Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 9
- 238000011176 pooling Methods 0.000 description 9
- 238000000605 extraction Methods 0.000 description 8
- 238000012360 testing method Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 238000012549 training Methods 0.000 description 7
- 239000013598 vector Substances 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 230000001149 cognitive effect Effects 0.000 description 4
- 238000003745 diagnosis Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000004816 latex Substances 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 235000018185 Betula X alpestris Nutrition 0.000 description 1
- 235000018212 Betula X uliginosa Nutrition 0.000 description 1
- 241000122205 Chamaeleonidae Species 0.000 description 1
- 238000000342 Monte Carlo simulation Methods 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 229920000126 latex Polymers 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000005549 size reduction Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B7/00—Electrically-operated teaching apparatus or devices working with questions and answers
- G09B7/02—Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
Definitions
- the embodiments of the present application relate to the technical field of information processing, and in particular, to a topic selection method, apparatus, computer device, and storage medium.
- Electronic resources have the characteristics of higher real-time, larger quantity, and wider range. They are provided to users by educational platforms online, making online education widely available. universal.
- supplementary books, examination papers, and practice questions are integrated into electronic resources to generate electronic exercises, that is, according to different discipline categories, knowledge structure and other factors, provide preparation for specific discipline knowledge, learning effect detection and skill test. Selected test questions and examples for reference.
- the education platform based on the massive online question bank and rich learning situation data, can more targetedly select suitable electronic exercises according to the user's learning level, and achieve "thousands of people and thousands of faces" exercise recommendation.
- the current method of screening electronic exercises may produce a large number of repeated or similar electronic exercises, which makes students answer the repeated electronic exercises, resulting in lower learning efficiency.
- the embodiments of the present application propose a method, apparatus, computer equipment and storage medium for selecting a question, so as to solve the problem of low learning efficiency for users caused by screening repeated electronic exercises within a limited learning time.
- an embodiment of the present application provides a method for selecting topics, including:
- Electronic exercises are respectively selected for the user from a plurality of the exercise sets as target exercises, the difficulty of the target exercises satisfies the first condition, and the number of the target exercises satisfies a preset second condition.
- an embodiment of the present application also provides a topic selection device, including:
- the learning task determination module is used to determine the learning task of the user
- an exercise set determination module configured to determine a plurality of exercise sets whose contents are related to the learning task, and each of the exercise sets has a plurality of electronic exercise problems with the same or similar content;
- a behavior data acquisition module used for acquiring behavior data recorded when the user answers the electronic exercises related to the learning task
- a difficulty condition setting module configured to set a first condition in the dimension of difficulty according to the behavior data
- a target exercise selection module configured to select electronic exercises for the user from a plurality of the exercise sets as target exercises, the difficulty of the target exercises satisfies the first condition, and the number of the target exercises satisfies a preset Second condition.
- an embodiment of the present application further provides a computer device, the computer device comprising:
- processors one or more processors
- memory for storing one or more programs
- the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the method for selecting a topic as described in the first aspect.
- an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the topic selection described by the first party is implemented method.
- the learning task is matched with multiple exercise sets, and each exercise set has multiple electronic exercises with the same or similar content.
- the generation of the exercise sets mainly considers the content of the electronic exercise itself, and has nothing to do with the user's interaction behavior.
- the tens of millions of question banks ensure the feasibility of large-scale question banks, and avoid the situation that the large-scale question banks cannot be exposed to users, and the similarity calculation using the user's relatively sparse interactive behavior cannot be applied to the tens of millions of question banks.
- Embodiment 1 is a flowchart of a method for selecting a topic provided in Embodiment 1 of the present application;
- FIG. 2 is a flowchart of a method for selecting a topic provided in Embodiment 2 of the present application;
- FIG. 3 is a schematic diagram of a screening target exercise provided by the second embodiment of the present application.
- FIG. 4 is a schematic diagram of a clustering of electronic exercises provided by the second embodiment of the present application.
- 5A is an example diagram of an electronic exercise provided by the second embodiment of the present application.
- 5B is an example diagram of another electronic exercise provided by the second embodiment of the present application.
- 5C is an example diagram of another electronic exercise provided by the second embodiment of the present application.
- 5D is an example diagram of another electronic exercise provided by the second embodiment of the present application.
- FIG. 6 is a schematic structural diagram of a topic selection device provided in Embodiment 3 of the present application.
- FIG. 7 is a schematic structural diagram of a computer device according to Embodiment 4 of the present application.
- first, second, third, etc. may be used in this application to describe various information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
- first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information without departing from the scope of the present application.
- the words "if”, “if” as used herein may be interpreted as “at the time of” or "when” or “in response to determining” or.
- the user's learning situation the popularity of the electronic exercises and other factors are considered, and the electronic exercises are screened for the users by rules or linear weighting.
- this method takes advantage of the popularity of the electronic exercises, while ignoring the intrinsic relationship of the electronic exercises, that is, there may be multiple electronic exercises with the same or similar content, all of which are highly popular and are all popular electronic exercises. It is possible that all of the learning conditions suitable for a user are screened for the user at the same time. However, the user's learning time is limited, and repeatedly answering electronic exercises with the same or similar content leads to low learning efficiency.
- the user is diagnosed with the mastery of each knowledge point, so that the electronic exercises with suitable difficulty are screened for the user.
- This method can more accurately capture the suitable difficulty of users by using cognitive diagnosis, but it mainly considers the learning level of users, and does not consider the information of the electronic exercises themselves. For the business needs of screening electronic exercises, the information of the electronic exercises itself is one of the important considerations. Similar to the rule-based method, the cognitive diagnosis-based method is more likely to screen out electronic exercises with similar difficulty but the same or similar content, and also has the problem of lower efficiency.
- collaborative filtering also considers the user's own learning situation, and also does not use the information of the electronic exercises themselves. Therefore, there is still a problem of low efficiency in screening a large number of electronic exercises with the same or similar content.
- collaborative filtering considers the similarity between users, but two similar users do not mean that their weak knowledge points are the same; even if the weak knowledge points are the same, it does not mean that a user is wrong It may only be one dimension of screening electronic exercises, and this kind of electronic exercise screening based on the similarity of users may have a certain discrepancy between the expected effect of the electronic exercises and the actual ones. .
- the similarity between the electronic exercises and the electronic system is calculated by taking the electronic exercises as the unit, and the similarity between the electronic exercises and the electronic system is used to assist the selection of the electronic exercises, that is, to screen out the electronic exercises with the user's prior knowledge.
- the incorrectly answered electronic problem is similar to other electronic problems to the user.
- the situation of the electronic exercises answered by the user is used to find out the similarity of the user's behavior.
- the applicable scenarios are relatively limited, and the number of applicable electronic systems is small, especially the scene of classroom arrangement.
- the actual large-scale question bank such as tens of millions of levels, it is impossible for each electronic exercise to be answered by the user, which means that most of the electronic exercises cannot calculate the correlation between the electronic exercises through the user's behavior.
- It contains text data, image data, formula data and other modal information, which is hardly used, that is, it still lacks the use of electronic exercise content.
- the information of the electronic exercises itself affects the number of electronic exercises (that is, the number of electronic exercises with the same or similar content, the number of electronic systems with different content), and the difficulty of the electronic exercises and the number of electronic exercises are used as a measure of the application of electronic questions for practice and testing.
- One of the characteristics of is closely related to the business needs of screening electronic exercises, and there is an optimal combination for screening electronic exercises based on the difficulty of electronic exercises and the number of electronic exercises.
- FIG. 1 is a flow chart of a method for selecting a question provided by the first embodiment of the application.
- This embodiment can be applied to the situation of screening electronic exercises based on a combination of factors such as typicality, difficulty, and quantity, and the method can be executed by a question-selecting device.
- the topic selection device can be implemented by software and/or hardware, and can be configured in the computer equipment of the educational platform, such as a server, a workstation, a personal computer, and the like.
- the computer equipment plays the role of a server and maintains The logic of screening electronic exercises, providing users with a service of recommending electronic exercises, in order to save storage resources and reduce the update of the client, the user logs in to the client (client), the client can receive the electronic exercises pushed by the server, and display the electronic exercises to User answers and exercises.
- the server may provide the current user with a service of recommending electronic exercises, and may also provide a service of recommending electronic exercises for other users, which is not limited in this embodiment.
- the users include teachers and students.
- the teacher can log in to the client, select some or all students based on the learning situation of the students, and notify the server to select e-exercises suitable for these students for these students. , push the electronic exercises to the corresponding students' logged-in client, and let these students answer and practice respectively.
- students can log in to the client, notify the server to select suitable electronic exercises for them, and push the electronic exercises Go to the client that the student logs in, and let the student answer and practice.
- the number of electronic exercises can be reduced to 100,000.
- the client can access the server from the server. Downloading electronic exercises, maintaining the logic of screening electronic exercises, and providing users with a service of recommending electronic exercises, so that users can still answer and practice normally in an offline scenario, which is not limited in this embodiment.
- Step 101 Determine the learning task of the user.
- the application can be a client that provides learning services independently, or it can be a functional module (such as SDK (Software Development Kit, software development kit)) that provides learning services in other clients, such as instant messaging tools, industry work
- the client, etc. may also be a client with a browsing component, and the client with a browsing component may include a browser, an application program configured with a browsing component (such as a WebView (web view)), which is not limited in this embodiment .
- the user can be logged in in the application, which is represented by identity data. If the user is not logged in, temporary identity data can be provided for the user, and the temporary identity data can be bound with the device identity. If the temporary identity data bound to the same device identification is combined, if the temporary user registers and logs in subsequently, the temporary identity data of the user can be converted into formal identity data.
- the client can provide UI (User Interface, user interface), on which the user can trigger the operation of practicing electronic exercises for the specified learning task, such as clicking on a learning task to learn, clicking on a learning task to test, Specify a learning task by refreshing the electronic exercises under a certain learning task, and practice the electronic exercises in the learning task.
- UI User Interface, user interface
- the client can send the operation to the server, and when the server receives the operation, the server starts the logic of screening electronic exercises (ie, executes steps 102 to 105 ).
- the client when the client receives the operation that occurs on the UI, the client starts the logic of screening the electronic exercises (ie, executes steps 102 to 105 ).
- learning tasks may refer to tasks that users learn, and have different meanings for different business scenarios, for example, business scenarios that students learn in K12 (kindergarten through twelfth grade, kindergarten to twelfth grade), universities, etc.
- the learning task can refer to the chapters in the subject (Chapter 1 "Concept of Sets and Functions" in Compulsory 1 of Senior High School Mathematics, including “Meaning and Representation of Sets", “Basic Relationships between Sets”, “Basic Operations on Sets” and other sections), special knowledge points (such as appreciation of ancient poetry in Chinese, revision of sick sentences, past tense in English, etc.), grade level (such as the middle of third grade, the end of fourth grade, etc.), etc., in civil engineering
- the learning task can refer to the knowledge points in the industry, such as the structural composition of houses in civil engineering, the structural forms of multi-storey and high-rise buildings, the laws and regulations in the law, the patent in the application process
- Step 102 Determine a plurality of exercise sets whose contents are related to the learning task.
- multiple exercise sets that are matched with the learning task may be set in advance.
- the so-called “matching” may refer to the content related to the learning task, that is, with at least one of learning, practice, and testing as the goal, the learning task is set multiple exercise sets, wherein each exercise set has multiple electronic exercises with the same or similar content.
- the electronic exercises collected in the same exercise set have the same or similar content, and can be called the same series of electronic exercises, which are typical electronic exercises.
- exercise sets are associated with the learning tasks, and the exercise sets and the associations are stored in the question bank, where the question bank may refer to a database that stores electronic exercises. Traverse the relationship to identify multiple problem sets matching the learning task.
- the general method of personalized screening of electronic exercises is to consider the user's learning level, without considering the content of the electronic exercises themselves. Using the user's interactive behavior to calculate the similarity cannot solve the calculation problem of the similarity under the tens of millions of question banks. Therefore, in this embodiment, considering the content of the electronic exercises themselves, the electronic exercises with the same or similar content are aggregated into the same exercise set, which is applicable to a question bank of tens of millions.
- Step 103 Acquire behavior data recorded when the user answers electronic exercises whose content is related to the learning task.
- the server can record many online learning behaviors of users through logs and other files.
- online learning behaviors include answering electronic exercises that are matched with each learning task, for example, the electronic exercises that users answer, and the results of answering (such as correct answer, wrong answer, answer score, etc.), the time of answering, etc., the data of these behaviors can enable the education platform to record more abundant and complete learning data.
- the so-called matching may refer to the content related to the learning task, that is, the electronic exercises set for the learning task aiming at at least one of learning, practice, and testing.
- the data related to the user's answers to the electronic exercises matched with each learning task can be extracted from the log and other files, and the data can be divided by the learning task as a different dimension, so as to form the behavior of the user answering the electronic exercises matched with the learning task.
- data and record the behavior data in the server-side database.
- the behavior data associated with the user for answering the electronic exercises matching the learning task can be searched in the database.
- the server can push the current user's behavior data to the client, the client stores the user's behavior data locally, and calls the user locally when the logic for screening electronic exercises is activated.
- the client may locally record many behaviors of the user during learning, so as to form the behavior data of the user answering the electronic exercises matched with the learning task, which is not limited in this embodiment.
- Step 104 Set a first condition in the dimension of difficulty according to the behavior data.
- each electronic exercise can be pre-configured with difficulty, that is, the difficulty level of correctly answering the electronic exercise.
- difficulty that is, the difficulty level of correctly answering the electronic exercise.
- the exercises are easier.
- the difficulty can be marked manually by educators according to their insights after browsing the electronic exercises, or can be marked by learning from big data, which is not limited in this embodiment.
- some electronic exercises are marked with difficulty, and some electronic exercises are not marked with difficulty, such as 1, 2, 3, 4, and 5.
- difficulty such as 1, 2, 3, 4, and 5.
- the difficulty of the electronic exercise is a discrete value
- the difficulty of the current electronic exercise and the difficulty of other similar electronic exercises with the highest frequency are used as the revised difficulty of the two.
- the difficulty of the electronic exercise is a continuous value
- the difficulty of the current electronic exercise and the difficulty of other similar electronic exercises are linearly merged (eg, weighted and summed) as the revised difficulty of the two.
- a first condition can be set in the dimension of difficulty according to the behavior data, and the first condition is used to filter electronic exercises.
- Step 105 select electronic exercises for the user from multiple exercise sets, respectively, as target exercises.
- the difficulty of the target exercises satisfies the first condition
- the number of the target exercises satisfies the preset second condition, that is, it is expected to find a combination of electronic exercises, so that the user can get as much contact as possible with the least electronic exercises in a set of questions
- Electronic exercises appropriate to the user's learning level
- the learning task is matched with multiple exercise sets, and each exercise set has multiple electronic exercises with the same or similar content.
- the generation of the exercise sets mainly considers the content of the electronic exercise itself, and has nothing to do with the user's interaction behavior.
- the tens of millions of question banks ensure the feasibility of large-scale question banks, and avoid the situation that the large-scale question banks cannot be exposed to users, and the similarity calculation using the user's relatively sparse interactive behavior cannot be applied to the tens of millions of question banks.
- the repetition rate of the electronic exercises is low, the user can obtain the expected exercise effect, and the user can avoid manually searching for the electronic exercises suitable for him in the question bank, thereby avoiding wasting the resources consumed by screening the electronic exercises (such as the processor resources, memory resources, bandwidth resources of the education platform, etc.), the electronic device where the user is located displays the resources consumed by the electronic exercises, and the time for the user to answer the electronic exercises.
- FIG. 2 is a flowchart of a method for selecting a question provided in Embodiment 2 of the present application. Based on the foregoing embodiment, this embodiment further refines the operations of generating a problem set, setting a first condition, and screening target problems. The method is specific It includes the following steps:
- Step 201 Determine the learning task of the user.
- step S301 is executed to determine the learning tasks to be learned by the current user, such as chapters, knowledge points, and the like.
- Step 202 acquiring electronic exercises whose content is related to the learning task.
- step S302 may be performed in advance to configure corresponding exercise sets for different learning tasks and store them in the question bank.
- step S302 can be performed offline to cluster the electronic exercises under each learning task into multiple exercise sets, and the association between the learning tasks and the exercise sets is stored in the question bank, and step S303 can be performed online.
- Learning Task Search the question bank for the set of exercises that match the learning task.
- step S401 is executed, and electronic problems matching each learning task are extracted from the problem bank.
- step S402 can be executed to preprocess the electronic exercises, for example, to remove some of the electronic exercises Tag, filter out e-workouts that are flagged as errors, filter out e-workouts that are flagged as duplicates, and more.
- tags formula data in mathematics and character data in English are usually recorded in some specific formats so that they can be displayed correctly on the page, such as latex (electronic typesetting system based on the underlying programming language). ), HTML (HyperText Markup Language, Hypertext Markup Language), MathML (Mathematical Markup Language), etc., will generate tags when recording.
- latex electronic typesetting system based on the underlying programming language.
- HTML HyperText Markup Language
- Hypertext Markup Language Hypertext Markup Language
- MathML MathML (Mathematical Markup Language), etc.
- the electronic exercises include at least one of question stem information, option information, and analysis information, wherein the analysis information may include knowledge points (or test points), answers, analysis (or detailed explanations) and other information.
- step S403 can be performed, from the electronic exercise including at least one of the question stem information, option information, and analysis information, and using the electronic exercise to include at least one of the question stem information, option information, and analysis information. clustering.
- multiple-choice questions usually include stem information 511, option information 512, and analysis information 513.
- true-false questions usually include stem information 521 and analysis information 522, as shown in the figure
- the fill-in-the-blank question usually includes stem information 531 , analysis information 532 , and as shown in FIG. 5D
- the quiz question usually includes stem information 541 , analysis information 542 , and so on.
- Step 203 Divide the electronic exercises into multiple types of exercise information.
- Electronic exercises of different disciplines and question types may contain different types of exercise information, such as text data, formula data, and first image data.
- the stem information, option information, and analysis information may include text data, formula data, and first image data, where the first image data is used to represent geometric figures, Problem scenarios, statistical graphs, function curves, and more.
- the question stem information, option information, and analytical information may all contain text data, including first image data, where the first image data is used to represent chemical instruments and statistical charts. , experimental procedures, etc.
- step S4041, step S4042, and step S4043 can be executed respectively, and the electronic exercise can be divided into corresponding electronic exercises according to the types of text data, formula data, first image data, etc. Exercise information, thereby forming multiple modal exercise information.
- Step 204 Extract candidate feature information from the exercise information respectively.
- a strategy corresponding to the type (modality) can be used to extract and extract features from the exercise information as candidate feature information.
- step S4061 is executed to determine the language model, such as ERNIE 2.0, BERT (Bidirectional Encoder Representations from Transformers, bidirectional encoder representation from Transformers) , etc., execute step S4071, respectively input the text data of each electronic exercise into the language model for processing, to output the features of the text data that specify the first length as candidate feature information.
- the candidate feature The information is usually a sentence vector, which can also be called a text feature vector.
- pre-training/trained models can be used as language models.
- the so-called pre-training refers to self-supervised learning (such as autoregressive language models and auto-encoding techniques), obtain pre-trained models from large-scale data that are independent of specific tasks (ie, extract features of text data in electronic exercises), reflect the semantic representation of a word in a general context, and learn implicitly to general knowledge of syntax and semantics.
- common text data such as encyclopedia data
- languages such as Chinese, English, etc.
- the corpus can be used as the corpus to pre-train the language model, so that the performance of the language model in different kinds of languages is better.
- the text data of electronic exercises may be inconsistent with general text data (such as encyclopedia data), based on the scalability of pre-training for language models, the text data of some electronic exercises can be used as corpus, and self-supervised learning
- the language model is fine tuned to correct the language model for specific tasks (that is, extracting the features of the text data in the electronic exercises), that is, the text data in the electronic exercises are used as tags (Tag), and the text data in the electronic exercises are processed.
- sentence rearrangement, document rotation and other operations input the language model to make it converge to the mark after training, so that the trained language model can better adapt to the distribution of text data in the scenario of electronic exercises.
- the text data of the electronic exercises can also be directly used as the corpus to train the language model, which is not limited in this embodiment.
- step S4062 is performed to determine the first image model, such as ResNet 50, VGG, DenseNet, etc., and step S4072 is performed, Input the first image data into the first image model for processing to output the features of the first image data that specify the second length as candidate feature information.
- the candidate feature information can also be called is the image feature vector.
- the pre-trained model can be directly used as the first image model.
- the first image data of the electronic exercise may also be directly used as a sample to train the first image model, which is not limited in this embodiment.
- the first image data includes colors (such as RGB (red, green, and blue)), which are usually three-dimensional tensors (Tensor), and the candidate feature information thereof is also a three-dimensional tensor, that is, (width, height, color depth) (width, height, color_depth), in order to match the dimensions of the candidate feature information under each type, two-dimensional pooling operations (such as 2D max pooling, 2D average pooling, 2D min pooling can be used, that is, adding windows to the candidate feature information in turn , taking the maximum value, the average value, and the minimum value in the window) to reduce the candidate feature information of the first image data from three-dimensional to one-dimensional.
- colors such as RGB (red, green, and blue)
- Tensor three-dimensional tensors
- the candidate feature information thereof is also a three-dimensional tensor, that is, (width, height, color depth) (width, height, color_depth)
- two-dimensional pooling operations such as 2D max
- the first image model is ResNet 50
- the first image data is input into ResNet 50 for processing, and candidate feature information with a dimension of 7*7*2048 can be output.
- the dimension of the candidate feature information It becomes 1*1*2048.
- the candidate feature information of the first image data of the frame is the candidate characteristic information of the electronic exercise under the type of the first image data.
- the average value of the multiple candidate feature information of the multiple frames of the first image data in each dimension is calculated as the whole of the multiple frames of the first image data.
- the candidate feature information of as the candidate feature information of the electronic exercise under the type of the first image data.
- Step S405 converting the formula data into second image data, that is, drawing the formula data in the memory according to its stored form to form a bitmap, and extracting the bitmap as the second image data.
- step S4063 can be executed to determine a second image model
- the second image model can be the same as the first image model, or can be different from the first image model, such as ResNet 50, VGG, DensNet, etc.
- the first image model may be a pre-trained model, or may directly use the second image data recording the formula data as a sample for training, which is not limited in this embodiment.
- step S4073 is performed, and the second image data is input into the second image model for processing, so as to output the features of the formula data that specify the third length as candidate feature information.
- the candidate feature information can also be referred to as an image feature vector.
- the second image data contains color, which is usually a three-dimensional tensor, and its candidate feature information is also a three-dimensional tensor, that is (width, height, color depth).
- candidate feature information of the formula data is reduced from three dimensions to one dimension.
- the candidate feature information of the formula data is the candidate feature information of the electronic exercise under the type of formula data.
- the average value of the multiple candidate feature information of the multiple formula data is calculated in each dimension, as the candidate feature information of the entire multiple formula data, as The candidate feature information of the electronic exercise under the type of formula data.
- candidate feature information and its extraction method are only examples.
- other candidate feature information and its extraction method may be set according to actual conditions, which are not limited in this embodiment.
- those skilled in the art may also adopt other candidate feature information and its extraction method according to actual needs, which are not limited in this embodiment of the present application.
- one or more types of exercise information may be missing from an electronic exercise, that is, one or more types of exercise information may be empty.
- some may have text data and first image data, but lack formula data, and some may have text data, but lack formula data and first image data.
- some may have text data, first image data, and formula data, but not missing exercise information, and some may have text data, first image data, and missing formula data.
- some may have text data, first image data, and missing formula data, and some may have text data, but missing formula data and first image data.
- the candidate feature information of the exercise information can be set to a specified value, such as 0.
- Step 205 splicing the candidate feature information into target feature information.
- step S408 is performed, and each type of candidate feature information is spliced end to end in a preset order to form target feature information representing the overall feature of the electronic exercise.
- the candidate feature information of the text data, the candidate feature information of the first image data, and the candidate feature information of the formula data can be spliced end to end in a preset order. is the target feature information.
- the candidate feature information of the text data is ranked first
- the candidate feature information of the first image data is ranked second
- the candidate feature information of the formula data is ranked third.
- Step 206 using the target feature information to cluster the electronic exercises into multiple clusters to obtain multiple exercise sets.
- step S409 is executed, and the target feature information composed of candidate feature information of exercise problem information such as text data, first image data, and formula data can be used to cluster the electronic exercises to obtain multiple clusters.
- One cluster is An exercise set, which considers the characteristics of multiple modalities and the characteristics of multiple orientations during clustering, can improve the accuracy of the similarity between electronic exercises, thereby improving the effect of clustering and ensuring that the electronic exercises in the exercise set are correct. typical.
- Clustering is an unsupervised learning method. When mining electronic problems, it can be used to discover the distribution and implicit patterns of electronic problems.
- the category of each sample (electronic problem) in a batch of samples (electronic problem) is not known in advance. Or other prior knowledge, the classification is based on the characteristics (target feature information) of the samples (electronic exercises), and a certain similarity measurement method is used to classify the same or similar features into one category to achieve clustering, that is, Automatically divide a bunch of unlabeled data (electronic exercises) into several categories to prove that the data of the same category have similar characteristics.
- the following clustering algorithm can be applied according to business requirements, and the electronic exercises are clustered into multiple clusters by using the target feature information, and one cluster can be regarded as one exercise set:
- K K is a positive integer, K ⁇ N
- each grouping represents a cluster , for example, K-MEANS algorithm, K-MEDOIDS algorithm, CLARANS algorithm, etc.
- the density of points (electronic exercises) in a region is greater than a certain threshold, it can be added to a cluster close to it, for example, DBSCAN algorithm, OPTICS algorithm, DENCLUE algorithm, and so on.
- K-MEANS is used as an example of a clustering algorithm for description.
- the principle of K-MEANS is relatively simple. It is similarity (affinity), and the more similar and less different samples are grouped into a class (cluster), and finally multiple clusters are formed, so that the samples within the same cluster are highly similar, and the differences between different clusters are high, so , the use of K-MEANS to cluster electronic problems is simple, easy to implement, and has fast convergence speed. Moreover, when faced with electronic problems with a number of tens of millions, the clusters are dense, and the difference between clusters is obvious. , the use of K-MEANS to cluster electronic exercises is better.
- the value of K may be determined as the number of clusters, ie, the deadline to cluster the electronic problems into K clusters.
- the value of K (ie the number of clusters) is an empirically set value.
- the number of electronic exercises matched with the learning task can be queried in the question bank, so as to set the number of clusters based on the number of electronic exercises, wherein the number of clusters and the number of electronic exercises satisfy a nonlinear positive relationship
- Correlation relationship the so-called nonlinear, can refer to the relationship between the number of clusters and the number of electronic exercises that are not proportional or linear.
- the so-called positive correlation can mean that the more the number of electronic exercises, the more the number of clusters, and vice versa, the electronic exercises The smaller the number of , the smaller the number of clusters, which ensures that various typical electronic exercises can be clustered independently, and the effect of clustering is guaranteed.
- the coefficient is a positive number and less than 1, such as
- square root rounding, rounding up, rounding down
- i represents the ith learning task
- ki represents the number of clusters when clustering the ith learning task
- int represents the rounding down
- ni represents the number of electronic exercises matching the ith learning task
- the above method for calculating the number of clusters is only an example.
- other methods for calculating the number of clusters may be set according to actual conditions.
- the number of electronic exercises is input into a nonlinear activation function (such as a growth function). ), the output result is the number of clusters, etc., which is not limited in this embodiment.
- those skilled in the art can also adopt other methods for calculating the number of clusters according to actual needs, which are not limited in this embodiment of the present application.
- clusters are generated according to the number of clusters.
- the clusters have a center point (also called a centroid), and the initial value of the center point is randomly selected.
- Use the target feature information to calculate the distance between each electronic exercise and each center point such as cosine distance, Euclidean distance, etc.
- the electronic problem is divided into a cluster corresponding to the central point.
- SSE sum of squared errors
- Step 207 Acquire behavior data recorded when the user answers the electronic exercise questions whose content is related to the learning task.
- step S304 is executed, for the learning task selected by the user, the behavior data recorded by the user when he first answered the electronic exercises matching the learning task is queried in the database.
- Step 208 Identify the first exercise question and the second exercise question from the answering behavior data.
- step S305 is performed, and a first condition is set under the dimension of difficulty, and the first condition includes a difficulty interval, and the difficulty interval is used to constrain the difficulty of the single electronic exercise, and the difficulty is suitable for screening. Electronic exercises that match the user's learning level.
- the first exercise question and the second exercise question can be identified from the answering behavior data, wherein the first exercise question is the electronic exercise question answered by the user, including the electronic exercise question that the user answers correctly, the electronic exercise question that the user answers incorrectly, and the second exercise question.
- the first exercise question is the electronic exercise question answered by the user, including the electronic exercise question that the user answers correctly, the electronic exercise question that the user answers incorrectly, and the second exercise question.
- Electronic exercises for the user to answer incorrectly ie the first exercise contains the second exercise.
- the user's answering behavior data during the most recent n (n is a positive integer, optional 1) practice of the current learning task can be screened out.
- the first exercise question and the second exercise question can be identified in the data, or, the answering behavior data of the user when they practice the current learning task in the recent period of time (such as within one month) can be filtered, and the first exercise question and the second exercise question can be identified from the answering behavior data. exercises, etc.
- the ratio between the score value of the user's answer and the standard score value is greater than or equal to a preset ratio (such as 0.5), it can be considered that the electronic exercise question is answered correctly. If the user answers When the ratio between the score value and the standard score value is less than a preset ratio (such as 0.5), it can be considered that the electronic exercise is answered incorrectly.
- Step 209 Set the upper limit of the difficulty interval with reference to the difficulty of the first exercise.
- the difficulty interval has an upper limit value, that is, the endpoint with the largest value. Assuming that the difficulty interval is [a, b], b can be called the upper limit value, and the difficulty interval can be set by referring to the difficulty of the first exercise question answered by the user. Upper limit.
- the difficulty of each first exercise may be compared, and the difficulty with the largest numerical value in the first exercise may be taken as the upper limit of the difficulty interval.
- the above method of setting the upper limit of the difficulty interval is only an example.
- other methods of setting the upper limit of the difficulty interval may be set according to the actual situation.
- the average value of m difficulties is taken as the upper limit value of the difficulty interval, etc., which is not limited in this embodiment of the present application.
- those skilled in the art may also adopt other methods for setting the upper limit of the difficulty interval according to actual needs, which are not limited in the embodiments of the present application.
- Step 210 Set the lower limit of the difficulty interval with reference to the difficulty of the second exercise.
- the difficulty interval has a lower limit value, that is, the endpoint with the largest value. Assuming that the difficulty interval is [a, b], then a can be called the lower limit value. By referring to the difficulty of the second exercise question that the user answered incorrectly, the difficulty interval can be determined. Set the lower limit value.
- the difficulty of each second exercise can be compared, and the difficulty with the smallest numerical value in the second exercise can be taken as the lower limit of the difficulty interval. value.
- the difficulty of each first exercise can be compared, and the difficulty with the smallest numerical value in the first exercise can be taken as the lower limit of the difficulty interval. value.
- the above method of setting the lower limit of the difficulty interval is only an example.
- other methods of setting the lower limit of the difficulty interval may be set according to actual conditions.
- those skilled in the art may also adopt other methods for setting the lower limit of the difficulty interval according to actual needs, which are not limited in the embodiments of the present application.
- the distinction between the upper limit value and the lower limit value of the difficulty interval is not obvious, and may even be equal.
- a preset first threshold value eg, 0.2
- the upper limit value and/or the lower limit value may be selectively increased, so that the difference value is greater than the first threshold value.
- the upper limit of the difficulty interval can be increased by a specified first range (such as 0.1). ) lowering the lower limit of the difficulty interval, etc., after each increase of the upper limit of the difficulty interval and/or lowering of the lower limit of the difficulty interval, the difference between the upper limit and the lower limit may be recalculated until the The difference is greater than the first threshold.
- a specified first range such as 0.1
- Step 211 extracting historical target difficulty and evaluation index from the answering behavior data.
- step S306 is performed, and a first condition is set under the dimension of difficulty, and the first condition includes the real-time target difficulty, and the real-time target difficulty is used to constrain the difficulty of the overall electronic exercise, such as the average Values, quantiles, etc., to filter electronic exercises with difficulty adapted to the user's learning level.
- historical target difficulty and evaluation indicators can be calculated from the answering behavior data.
- the historical target difficulty is used to count the difficulty of the electronic exercises that the user has answered and is matched with the learning task, and the difference between the historical target difficulty and the real-time target difficulty.
- the type is unified, such as average value, quantile value, etc.
- the evaluation index is used to evaluate the performance of the electronic exercises that the user has answered and is matched with the learning task.
- the evaluation index can include positive evaluation indicators, such as the accuracy rate, the number of correct answers etc., and negative evaluation indicators, such as the error rate, the number of wrong answers, etc., can also be included.
- the user's answering behavior data during the last n (such as 1) times of practicing the current learning task can be screened out, and the historical target difficulty,
- the evaluation index, or, the answering behavior data of the user during the practice of the current learning task in a recent period of time (for example, within one month) can be filtered out, and the historical target difficulty, evaluation index, and the like can be calculated from the answering behavior data.
- Step 212 adjusting the historical target difficulty with reference to the evaluation index to obtain the real-time target difficulty.
- the historical target difficulty can reflect the overall difficulty of the electronic exercises from the perspective of electronic exercises, and the evaluation index reflects the overall learning level of users from the user's point of view.
- the evaluation index reflects the overall learning level of users from the user's point of view.
- the evaluation index is a positive evaluation index (eg, accuracy rate)
- the evaluation index eg, accuracy rate
- the evaluation index may be compared with a preset second threshold.
- the evaluation index (such as the accuracy rate) is greater than the preset second threshold, it means that the overall difficulty of the electronic exercise has not reached the user's learning level, and the historical target difficulty can be increased as the real-time target difficulty, thereby increasing the difficulty of this electronic exercise. Improve user learning efficiency.
- the sum value between the historical target difficulty and the preset first step length can be calculated as the real-time target difficulty, which is expressed as follows:
- d t is the real-time target difficulty
- d t-1 is the historical target difficulty
- g 1 is the first step length.
- the evaluation index (such as the accuracy rate) is less than the preset second threshold, it means that the overall difficulty of the electronic exercise exceeds the overall learning level of the user, and the historical target difficulty can be reduced as the real-time target difficulty, thereby reducing the difficulty of this electronic exercise. Improve user learning efficiency.
- the difference between the historical target difficulty and the preset second step size can be calculated as the real-time target difficulty, which is expressed as follows:
- d t is the real-time target difficulty
- d t-1 is the historical target difficulty
- g 2 is the second step size.
- first step length may be greater than the second step length, the first step length may also be equal to the second step length, and the first step length may also be smaller than the second step length, which is not limited in this embodiment.
- the first step and the second step are used to adjust the difficulty of the historical target, and the step size is small, which can reduce the fluctuation of the overall difficulty of the electronic exercises caused by the chance of answering the electronic exercises, thereby reducing the difficulty of the electronic exercises.
- the influence of the user's answering the electronic exercises makes the overall difficulty of the electronic exercises gradually converge with the user's learning level.
- the above method for adjusting the difficulty of the historical target is only an example, and other methods for adjusting the difficulty of the historical target may be set according to the actual situation when implementing the embodiments of the present application.
- Threshold then take the specified multiple (the multiple is greater than 1) for the historical target difficulty as the real-time target difficulty, if the evaluation index (such as the accuracy rate) is less than the preset second threshold, then take the historical target difficulty with the specified coefficient (the coefficient). Greater than 0, less than 1) as the real-time target difficulty, etc., which are not limited in this embodiment of the present application.
- the above method for adjusting the difficulty of the historical target those skilled in the art may also adopt other methods for adjusting the difficulty of the historical target according to actual needs, which are not limited in this embodiment of the present application.
- Step 213 using the target exercise as a variable, plan the difficulty of the target exercise to satisfy the first condition and the quantity to satisfy the preset second condition.
- Step 214 when the number of target problems is set as an integer, solve the target problems.
- step S307 in addition to setting the first condition in the dimension of difficulty, step S307 may also be performed as shown in FIG. 3 , the second condition may be set in the dimension of quantity, and step S308 may be performed,
- the planning objective exercises conform to the constraints of these two dimensions, wherein the objective exercises are electronic exercises selected from the exercise set respectively.
- the difficulty of the target exercise meeting the first condition includes at least one of the following:
- the difficulty of a single objective exercise is in the difficulty zone
- the statistical value of the difficulty of all target exercises is greater than or equal to the real-time target difficulty
- the statistical value of the difficulty of all target exercises is less than or equal to the real-time target difficulty.
- the difficulty interval constrains the difficulty of the individual electronic exercises
- the real-time target difficulty constrains the difficulty of the overall electronic exercises. Constraining the difficulty in these two aspects can ensure that the selected electronic exercises are in the individual, the whole and the user.
- the learning level is adapted to improve the effect of practicing electronic exercises, thereby improving the efficiency of user learning.
- the number of target exercises satisfying the preset second condition includes at least one of the following:
- the number of all target exercises is less than or equal to the preset fourth threshold
- the number of target exercises extracted in each exercise set is less than or equal to a preset fifth threshold.
- a unified fifth threshold can be set for each problem set, such as 1, that is, at most one electronic problem can be selected from each problem set as a typical problem, thereby reducing the amount of calculation, and the setting of the problem set can also be adapted.
- the fifth threshold for example, can count the number of electronic exercises in the exercise set, and set the fifth threshold based on the number of electronic exercises in the exercise set, so that the fifth threshold and the number of electronic exercises in the exercise set satisfy a nonlinear positive correlation. The greater the number of exercises, the more important the exercise set is to a certain extent. By adaptively adjusting the number of selected electronic exercises in the exercise set, more electronic exercises in the exercise set are exposed, and the limited number of electronic exercises can be distinguished. Focus on the key points, thereby improving the efficiency of user learning.
- first condition and the second condition can be expressed as follows:
- x i 0 means that the i-th electronic exercise is not selected
- x i 1 means that the i-th electronic exercise is selected
- the number of electronic exercises (ie, target exercises) selected from all exercise sets is less than or equal to the fourth threshold N, from the The number of electronic problems (ie target problems) selected in j problem sets t j is less than or equal to the fifth threshold Q
- the difficulty d i of each target problem is located in the difficulty interval [a, b]
- the evaluation index acc before is greater than
- the second threshold c the statistical value of the difficulty of all target exercises Greater than or equal to the real-time target difficulty d t
- the evaluation index acc before is less than or equal to the preset second threshold c
- first and second conditions are only examples.
- other first and second conditions may be set according to actual conditions.
- the evaluation index such as the accuracy rate
- the second threshold is to take a specified multiple (the multiple is greater than 1) for the historical target difficulty as the real-time target difficulty. If the evaluation index (such as the accuracy rate) is less than the preset second threshold, then the historical target difficulty is taken as a specified coefficient ( The coefficient is greater than 0 and less than 1) as the real-time target difficulty, etc., which are not limited in this embodiment of the present application.
- the evaluation index such as the accuracy rate
- the historical target difficulty is taken as a specified coefficient ( The coefficient is greater than 0 and less than 1) as the real-time target difficulty, etc., which are not limited in this embodiment of the present application.
- those skilled in the art can also adopt other first condition and second condition according to actual needs, which are not limited in the embodiments of the present application.
- the selection of electronic exercises for the user is regarded as an optimization problem, that is, an optimal combination of electronic exercises is planned to satisfy the first condition in terms of difficulty and the second condition in terms of quantity at the same time, such as
- the target problem is a variable in the planning. If the number of target problems is set as an integer instead of a fraction or a decimal, the planning is also called integer planning.
- the variables (target problems) can be solved by the branch and bound method, the dividing plane method, the implicit enumeration method, the Hungarian method, the Monte Carlo method, etc. And the number of electronic exercises that meet the second condition is regarded as the target exercise.
- the result of screening the electronic exercises is that the electronic exercises are selected as the target exercises, and the electronic exercises are not selected. Therefore, the selection of electronic exercises is an assignment problem, a special case of 0-1 planning, and a special case of transportation problems.
- Application The calculation of the Hungarian method to solve the target problem is relatively simple.
- the Hungarian method is proposed for the problem of minimal target requirements.
- step S309 is performed, and the selected target exercises can be deduplicated to avoid repeated answers by the user, that is, the electronic exercises that the user has answered earlier are inquired, and the electronic exercises that the user answered earlier are removed from the target exercises. .
- the screened target exercises can be displayed for the user to answer after secondary screening in conjunction with other methods, or combined with the electronic exercises screened by other methods and displayed for the user to answer, and can also be directly displayed to the user for answering. This embodiment This is not restricted.
- FIG. 6 is a structural block diagram of a topic selection device provided in Embodiment 3 of the present application, which may specifically include the following modules:
- a learning task determination module 601 used for determining the learning task of the user
- An exercise set determination module 602 configured to determine multiple exercise sets whose contents are related to the learning task, each of the exercise sets has multiple electronic exercise problems with the same or similar content;
- the behavior data acquisition module 603 is used for acquiring the behavior data recorded when the user answers the electronic exercises related to the learning task;
- a difficulty condition setting module 604 configured to set a first condition in the dimension of difficulty according to the behavior data
- the target exercise selection module 605 is used to select electronic exercises for the user from a plurality of the exercise sets, respectively, as the target exercises, the difficulty of the target exercises satisfies the first condition, and the number of the target exercises satisfies a preset the second condition.
- the problem set determination module 602 includes:
- a matching exercise acquisition module used for acquiring electronic exercises whose content is related to the learning task, and the electronic exercises include at least one of question stem information, option information, and analysis information;
- an exercise information division module used for dividing the electronic exercise into multiple types of exercise information
- a candidate feature information extraction module for extracting candidate feature information from the exercise information respectively
- a target feature information splicing module used for splicing the candidate feature information into target feature information
- the exercise clustering module is configured to use the target feature information to cluster the electronic exercises into multiple clusters to obtain multiple exercise sets.
- the candidate feature information extraction module includes:
- a language model determination module for determining a language model if the type of the exercise information is text data
- a text feature processing module configured to input the text data into the language model for processing to output candidate feature information of the text data
- a first image model determination module configured to determine the first image model if the type of the exercise information is the first image data
- a first image feature processing module configured to input the first image data into the first image model for processing to output candidate feature information of the first image data
- a first image feature dimension reduction module configured to reduce the candidate feature information of the first image data from three dimensions to one dimension
- an image conversion module for converting the formula data into second image data if the type of the exercise information is formula data
- a second image model determination module configured to determine the second image model
- a second image feature processing module configured to input the second image data into the second image model for processing to output candidate feature information of the formula data
- the second image feature dimension reduction module is used to reduce the candidate feature information of the formula data from three-dimensional to one-dimensional;
- An empty information processing module configured to set the candidate feature information of the exercise information to a specified value if the exercise information of a certain type is empty.
- the candidate feature information extraction module further includes:
- the first mean value calculation module is configured to calculate the mean value in each dimension of the multiple candidate feature information of the multiple frames of the first image data as the multiple frames if the electronic exercise contains multiple frames of the first image data. candidate feature information of the entirety of the first image data;
- the second mean value calculation module is configured to calculate an average value in each dimension for the plurality of candidate feature information of the plurality of formula data if the electronic exercise contains a plurality of the formula data, as a plurality of the formula data The overall candidate feature information.
- the exercise clustering module includes:
- the cluster number determination module is used to determine the number of clusters
- a cluster generation determination module configured to generate clusters according to the number of clusters, and the clusters have center points
- a distance calculation module for calculating the distance between the electronic exercise and the center point using the target feature information
- An exercise division module configured to divide the electronic exercise into the clusters corresponding to the center point if the distance between a certain center point is the smallest;
- the convergence judgment module is used to judge whether the cluster is converged; if so, call the exercise set output module, if not, call the center update module;
- An exercise set output module for outputting the clusters as exercise sets
- the center updating module is configured to update the center point in the cluster, and return to executing the calculation of the distance between the electronic exercise and the center point by using the target feature information.
- the cluster quantity determination module includes:
- a number of exercises query module used for querying the number of the electronic exercises matched with the learning task
- the number of clusters setting module is configured to set the number of clusters based on the number of the electronic exercises, and a non-linear positive correlation is satisfied between the number of clusters and the number of the electronic exercises.
- the nonlinear mapping module includes:
- the nonlinear mapping module is used for taking the square root of the product between the number of electronic problems and the specified coefficient and rounding it up as the number of clusters.
- the first condition includes a difficulty interval;
- the difficulty condition setting module 604 includes:
- An exercise problem identification module for identifying a first exercise question and a second exercise question from the answering behavior data, where the first exercise question is an electronic exercise question answered by the user, and the second exercise question is an electronic exercise question answered incorrectly by the user ;
- an upper limit value setting module for setting the upper limit value of the difficulty interval with reference to the difficulty of the first exercise
- a lower limit value setting module configured to set the lower limit value of the difficulty interval with reference to the difficulty of the second exercise.
- the upper limit value setting module includes:
- the maximum difficulty value module is used to take the difficulty with the largest numerical value in the first exercise as the upper limit value of the difficulty interval.
- the lower limit value setting module includes:
- the first minimum difficulty value module is used for taking the difficulty with the smallest numerical value in the second exercise as the lower limit value of the difficulty interval if the second exercise is a non-empty set;
- the second minimum difficulty value module is configured to, if the second exercise is an empty set, take the difficulty with the smallest numerical value in the first exercise as the lower limit value of the difficulty interval.
- the difficulty condition setting module 604 further includes:
- a difference calculation module for calculating the difference between the upper limit value and the lower limit value
- a valid determination module configured to determine that the difficulty interval is valid if the difference is greater than a preset first threshold
- a difficulty interval adjustment module configured to increase the upper limit value and/or decrease the lower limit value if the difference value is less than or equal to a preset first threshold value, so that the difference value is greater than the first threshold value threshold.
- the first condition includes real-time target difficulty;
- the difficulty condition setting module 604 includes:
- the historical parameter extraction module is used to extract historical target difficulty and evaluation index from the answering behavior data, and the historical target difficulty is used to count the difficulty of the electronic exercises that the user has answered and is matched with the learning task, and the The evaluation index is used to evaluate the performance of the electronic exercises that the user has answered and are matched with the learning task;
- the historical target difficulty adjustment module is used to adjust the historical target difficulty with reference to the evaluation index to obtain the real-time target difficulty.
- the historical target difficulty adjustment module includes:
- a historical target difficulty increasing module configured to increase the historical target difficulty as a real-time target difficulty if the evaluation index is greater than a preset second threshold
- a historical target difficulty reduction module configured to reduce the historical target difficulty as a real-time target difficulty if the evaluation index is less than a preset second threshold.
- the historical target difficulty increasing module includes:
- the step size increasing module is used to calculate the sum value between the historical target difficulty and the preset first step length as the real-time target difficulty.
- the historical target difficulty reduction module includes:
- the step size reduction module is used to calculate the difference between the historical target difficulty and the preset second step size as the real-time target difficulty.
- the target exercise selection module 605 includes:
- the conditional planning module is used to use target exercises as variables to plan the difficulty of the target exercises to satisfy the first condition and the quantity to satisfy the preset second condition, and the target exercises are electronic exercises selected from the exercise set respectively ;
- the variable solving module is configured to solve the target problem when the number of the target problem is set as an integer.
- the difficulty of the target exercise meeting the first condition includes at least one of the following:
- the difficulty of a single said target exercise is in the difficulty zone
- the statistical value of the difficulty of all the target exercises is greater than or equal to the real-time target difficulty
- the statistical value of the difficulty of all the target exercises is less than or equal to the real-time target difficulty.
- the number of the target exercises satisfying the preset second condition includes at least one of the following:
- the number of all the target exercises is less than or equal to a preset fourth threshold
- the number of the target exercises extracted in each of the exercise sets is less than or equal to a preset fifth threshold.
- the topic selection device provided by the embodiment of the present application can execute the topic selection method provided by any embodiment of the present application, and has functional modules and beneficial effects corresponding to the execution method.
- FIG. 7 is a schematic structural diagram of a computer device according to Embodiment 4 of the present application.
- FIG. 7 shows a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present application.
- the computer device 12 shown in FIG. 7 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
- computer device 12 takes the form of a general-purpose computing device.
- Components of computer device 12 may include, but are not limited to, one or more processors or processing units 16 , system memory 28 , and a bus 18 connecting various system components including system memory 28 and processing unit 16 .
- Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures.
- these architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect ( PCI) bus.
- Computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 12, including both volatile and nonvolatile media, removable and non-removable media.
- System memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32 .
- Computer device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
- storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard drive”).
- a disk drive for reading and writing to removable non-volatile magnetic disks (eg "floppy disks") and removable non-volatile optical disks (eg CD-ROM, DVD-ROM) may be provided or other optical media) to read and write optical drives.
- each drive may be connected to bus 18 through one or more data media interfaces.
- Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.
- a program/utility 40 having a set (at least one) of program modules 42, which may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include an implementation of a network environment.
- Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
- Computer device 12 may also communicate with one or more external devices 14 (eg, keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with computer device 12, and/or communicate with Any device (eg, network card, modem, etc.) that enables the computer device 12 to communicate with one or more other computing devices. Such communication may take place through input/output (I/O) interface 22 . Also, the computer device 12 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet) through a network adapter 20 . As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18 . It should be understood that, although not shown, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives and data backup storage systems.
- the processing unit 16 executes various functional applications and data processing by running the programs stored in the system memory 28, for example, implementing the topic selection method provided by the embodiments of the present application.
- the fifth embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, each process of the above method for selecting a topic can be realized, and the same technical effect can be achieved , in order to avoid repetition, it will not be repeated here.
- the computer-readable storage medium may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination of the above. More specific examples (a non-exhaustive list) of computer readable storage media include: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- General Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Strategic Management (AREA)
- Health & Medical Sciences (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Electrically Operated Instructional Devices (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
一种选题方法、装置、计算机设备和存储介质,方法包括:确定用户的学习任务(101),确定内容与学习任务相关的多个习题集(102),获取用户作答内容与学习任务相关的电子习题时记录的行为数据(103),根据行为数据在难度的维度下设置第一条件(104),分别从多个习题集中为用户选择电子习题、作为目标习题(105),目标习题的难度满足第一条件、目标习题的数量满足预设的第二条件。
Description
本申请要求在2021年02月09日提交中国专利局、申请号为202110181881.1的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
本申请实施例涉及信息处理的技术领域,尤其涉及一种选题方法、装置、计算机设备和存储介质。
随着教育信息化的快速发展,许多教育的事项被整合成电子资源,电子资源具备实时性更高、数量更大、范围更广等诸多特点,由教育平台在线提供给用户,使得在线教育广泛普及。
为了配合线上教育,教辅书、试卷、练习题被整合成电子资源,生成电子习题,即,按照不同的学科门类、知识结构等因素,为特定学科知识、学习效果检测和技能测试提供备选试题、例题参考。
为了配合用户线上学习,教育平台基于海量的在线题库以及丰富的学情数据,可以更有针对性地根据用户的学习水平筛选合适的电子习题,实现“千人千面”的习题推荐。
目前筛选电子习题方法可能会出现较多大量重复或类似的电子习题,使得学生作答重复的电子习题导致学习效率较低。
发明内容
本申请实施例提出了一种选题方法、装置、计算机设备和存储介质,以解决在有限的学习时间内筛选重复的电子习题导致用户学习效率较低的问题。
第一方面,本申请实施例提供了一种选题方法,包括:
确定用户的学习任务;
确定内容与所述学习任务相关的多个习题集,每个所述习题集中具有多个内容相同或相似的电子习题;
获取所述用户作答内容与所述学习任务相关的电子习题时记录的行为数据;
根据所述行为数据在难度的维度下设置第一条件;
分别从多个所述习题集中为所述用户选择电子习题、作为目标习题,所述目标习题的难度满足所述第一条件、所述目标习题的数量满足预设的第二条件。
第二方面,本申请实施例还提供了一种选题装置,包括:
学习任务确定模块,用于确定用户的学习任务;
习题集确定模块,用于确定内容与所述学习任务相关的多个习题集,每个所述习题集中具有多个内容相同或相似的电子习题;
行为数据获取模块,用于获取所述用户作答内容与所述学习任务相关的电子习题时记录的行为数据;
难度条件设置模块,用于根据所述行为数据在难度的维度下设置第一条件;
目标习题选择模块,用于分别从多个所述习题集中为所述用户选择电子习题、作为目标习题,所述目标习题的难度满足所述第一条件、所述目标习题的数量满足预设的第二条件。
第三方面,本申请实施例还提供了一种计算机设备,所述计算机设备包括:
一个或多个处理器;
存储器,用于存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面所述的选题方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现如第一方所述的选题方法。
在本实施例中,学习任务配套多个习题集,每个习题集中具有多个内容相同或相似的电子习题,习题集的生成主要考虑电子习题本身的内容,与用户的交互行为无关,可适用于千万级的题库,保证了在大规模题库中的可行性,避免了大规模题库无法曝光给用户的情况下、使用用户较为稀疏的交互行为计算相似度无法应用在千万级的题库的问题,确定用户的学习任务,确定内容与学习任务相关的多个习题集,获取用户作答内容与学习任务相关的电子习题时记录的行为数据,根据行为数据在难度的维度下设置第一条件,分别从多个习题集中为用户选择电子习题、作为目标习题,目标习题的难度满足第一条件、目标习题的数量满足预设的第二条件,在难度、数量这两个维度的约束下分别从各个习题集中为用户选择电子习题,不仅可以提高不同题型的电子习题的曝光率,而且筛选出的电子习题适配用户的学习水平,使得电子习题的组合更加合理、实现全局最优,减少重复选择相同或相似的电子习题,让用户在有限的时 间内接触更多的、与其学习水平适配的典型习题,从而提高用户的学习效率。
图1为本申请实施例一提供的一种选题方法的流程图;
图2是本申请实施例二提供的一种选题方法的流程图;
图3为本申请实施例二提供的一种筛选目标习题的架构图;
图4为本申请实施例二提供的一种对电子习题聚类的架构图;
图5A为本申请实施例二提供的一种电子习题的示例图;
图5B为本申请实施例二提供的另一种电子习题的示例图;
图5C为本申请实施例二提供的另一种电子习题的示例图;
图5D为本申请实施例二提供的另一种电子习题的示例图;
图6为本申请实施例三提供的一种选题装置的结构示意图;
图7为本申请实施例四提供的一种计算机设备的结构示意图。
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
在本申请使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本申请可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本申请范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”、“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或。
在教学环节中,电子习题是一种重要的学习资源,可以帮助用户巩固、复习和检验所学的知识,在教学平台的题库中的规模一般较大,可能达千万级别,这意味从题库中筛选电子习题给用户是一个费时费力的工作,目前,通常是考虑电子习题涉及的方法、知识点、题型的新颖度等等因素,按照一定的方法筛选电子习题给用户,该方法主要有:
一、基于规则的方法
在此方法中,考虑用户的学情、电子习题的热度等因素,利用规则或者线性的加权筛选电子习题给用户。
但是,该方法利用了电子习题的热度,而忽略了电子习题本身的内在关联,即可能存在多道内容相同或相似的电子习题的热度均较高,均是热门的电子习题,如果这些电子习题适合某个用户的学情,则有可能都同时筛选给该用户。然而,用户的学习时间是有限的,反复作答内容相同或相似的电子习题导致学习效率较低。
二、基于认知诊断的方法
在此方法中,对用户进行各个知识点的掌握度诊断,从而筛选难度合适的电子习题给用户。
该方法利用认知诊断的方式能够更为精准地捕获用户适合的难度,但是其主要考虑的是用户的学习水平,而没有考虑电子习题本身的信息。而对于筛选电子习题的业务需求,电子习题本身的信息是重要的考虑因素之一。类似于基于规则的方法,基于认知诊断的方法更有可能筛选出难度相似、但是内容相同或相似的电子习题,同样存在效率较低的问题。
三、基于协同过滤的方法
在此方法中,找到和当前用户学习行为、学情相似的其他用户,将其他用户作答较差的电子习题筛选给当前用户。
在此方法中,协同过滤考虑的也是用户本身的学情,同样没有利用电子习题本身的信息,因此,仍然存在筛选大量内容相同或相似的电子习题导致效率较低的问题。
与此同时,协同过滤考虑的是用户之间的相似性,但是两个相似的用户并不代表他们薄弱的知识点就是一样的;即使薄弱的知识点一样,也并不代表某个用户做错的电子习题、另一个用户也会做错,其仅仅可能是筛选电子习题的一个维度,而这种基于用户的相似性筛选电子习题,在电子习题的预期效果上可能与实际会有一定的出入。
四、基于内容的方法
在此方法中,通过用户的作答电子习题的情况,以电子习题为单位,去统计电子习题与电子系统之间的相似性,进而利用这种相似性辅助筛选电子习题,即筛选与用户在先作答错误的电子习题相似的其他电子习题给该用户。
在此方法中,利用的是用户作答的电子习题的情况,从而去找出用户行为的相似性,其适用的场景相对有限,适用的电子系统的数量少,尤其是课堂布置的场景。在实际的千万级等大规模的题库中,不可能每道电子习题都被用户作答,意味着大部分的电子习题无法通过用户的行为来计算电子习题之间的相关性,而电子习题中本身包含了文本数据、图像数据、公式数据等多种模态的信息,几乎没有被利用,即其仍然缺乏对于电子习题内容的利用。
综上几种筛选电子习题的方法,更多考虑的是用户的学习水平,较少考虑电子习题本身的信息,尤其是考虑二者之间的组合,而用户的学习水平影响电子习题的难度、电子习题本身的信息影响电子习题的数量(即内容相同或相似的电子习题的数量、内容不同的电子系统的数量),而电子习题的难度、电子习题的数量作为衡量应用电子题目进行练习、测试的特征之一,与筛选电子习题的业务需求息息相关,基于电子习题的难度、电子习题的数量筛选电子习题存在最优的组合。
实施例一
图1为本申请实施例一提供的一种选题方法的流程图,本实施例可适用于基于典型性、难度、数量等因素组合筛选电子习题的情况,该方法可以由选题装置来执行,该选题装置可以由软件和/或硬件实现,可配置在教育平台的计算机设备中,例如,服务器、工作站、个人电脑,等等。
一般情况下,考虑到电子习题的数量众多、可达千万级别,占用的存储资源巨大,以及,筛选电子习题的逻辑众多、且偶有更新,计算机设备作为服务端(server)的角色,维护筛选电子习题的逻辑,面向用户提供推荐电子习题的服务,以便节省存储资源、减少对客户端的更新,用户登录客户端(client),客户端可接收服务端推送的电子习题,显示该电子习题给用户作答、练习。
需要说明的是,服务端可以为当前用户提供推荐电子习题的服务,也可以为其他用户提供推荐电子习题的服务,本实施例对此不加以限制。
示例性地,在教育的场景中,用户包括教师、学生,一方面,教师可登录客户端,基于学生的学习情况,选择部分或全部学生、通知服务端为这些学生筛选适合这些学生的电子习题,将该电子习题推送至相应学生登录的客户端中, 让这些学生分别进行作答、练习,另一方面,学生可登录客户端,通知服务端为其筛选适合的电子习题,将该电子习题推送至该学生登录的客户端中,让该学生进行作答、练习。
当然,在部分业务场景下,可降低电子习题的数量、达十万级别,例如,练习某个学科在某个年级的电子习题、练习职业考试的电子习题,等等,客户端可从服务端下载电子习题,维护筛选电子习题的逻辑,面向用户提供推荐电子习题的服务,以便用户在离线的场景下、依然正常作答、练习,本实施例对此不加以限制。
如图1所示,具体包括如下步骤:
步骤101、确定用户的学习任务。
在用户使用的电子设备中,如学习机、移动终端(如手机、平板电脑、数字助理等)等,其操作系统可以包括Android(安卓)、iOS、Windows等,可安装支持作答电子习题的应用程序,该应用程序可以为独立提供学习服务的客户端,也可以为其他客户端中提供学习服务的功能模块(如SDK(Software Development Kit,软件开发工具包)),如即时通讯工具、行业工作的客户端等,还可以为具有浏览组件的客户端,该具有浏览组件的客户端可包括浏览器、配置浏览组件(如WebView(网络视图))的应用程序,本实施例对此不加以限制。
对于用户而言,可以在该应用程序中使用户进行登录,从而以身份数据进行表示,若用户没有登录,可为该用户提供临时的身份数据,并将该临时的身份数据与设备标识进行绑定,将绑定相同设备标识的临时的身份数据进行合并,若后续该临时用户注册、登录,则可以将用户临时的身份数据转换为正式的身份数据。
该客户端可提供UI(User Interface,用户界面),用户可在该UI上针对指定的学习任务触发练习电子习题的操作,如点击某个学习任务进行学习、点击某个学习任务进行测试、在某个学习任务下的电子习题进行刷新等方式指定学习任务,对该学习任务中的电子习题进行练习。
若本实施例应用于服务端,则客户端可将该操作发送至服务端,服务端在接收到该操作时,启动筛选电子习题的逻辑(即执行步骤102-步骤105)。
若本实施例应用于客户端,则客户端在接收到在UI上发生的该操作时,启动筛选电子习题的逻辑(即执行步骤102-步骤105)。
进一步而言,学习任务可以指用户学习的任务,针对不同的业务场景具有不同的含义,示例性地,在K12(kindergarten through twelfth grade,幼稚园至 第十二年级)、大学等学生学习的业务场景中,该学习任务可以指科目中的章节(高中数学必修一中第一章“集合与函数的概念”包含“集合的含义与表示”、“集合间的基本关系”、“集合的基本运算”等多个节)、专题知识点(如语文中的古诗歌鉴赏、病句修改,英文中的过去式等)、年级阶段(如三年级期中、四年级期末等),等等,在土木工程、法律、专利等行业学习的业务场景中,该学习任务可以指该行业中的知识点,如土木工程中的房屋的构造组成、多层及高层建筑的结构形式,法律中的法条,专利中的申请流程,等等。
步骤102、确定内容与学习任务相关的多个习题集。
在本实施例中,可以预先为学习任务设置配套的多个习题集,所谓配套,可以指内容与学习任务相关,即,以学习、练习、测试中的至少一者为目标、对学习任务设置的多个习题集,其中,每个习题集中具有多个内容相同或相似的电子习题。
所谓配套,可以指习题集中的电子习题是为学习任务中的知识而设计的,便于用户学习该学习任务中的知识,并未超出学习任务中的知识的范围。
在同一个习题集中收集的电子习题内容相同或相似,可称为同一系列的电子习题,属于典型的电子习题。
这些习题集与学习任务建立关联关系,并将该习题集、该关联关系存储至题库中,其中,题库可以指存储电子习题的数据库,若确定用户当前待练习的学习任务,则可以在题库中遍历该关联关系,从而识别与该学习任务配套的多个习题集。
一般的个性化筛选电子习题的方法均是考虑用户的学习水平,没有考虑电子习题本身的内容,使用用户的交互行为计算相似度是没办法解决千万级的题库下的相似度的计算问题,因此,在本实施例中,考虑电子习题本身的内容,将内容相同或相似的电子习题聚合在同一个习题集中,适用于千万级的题库。
步骤103、获取用户作答内容与学习任务相关的电子习题时记录的行为数据。
对于教育平台而言,服务端可以通过日志等文件记录用户在线学习的诸多行为,其中,在线学习的行为包括作答与各个学习任务配套的电子习题,例如,用户作答的电子习题,作答的结果(如作答正确、作答错误、作答分数等)、作答的时间,等等,这些行为的数据能够使教育平台记录更为丰富、完整的学情数据。其中,所谓配套,可以指内容与学习任务相关,即,以学习、练习、测试中的至少一者为目标、对学习任务设置的电子习题。
针对同一用户,可以从日志等文件中提取该用户作答与各个学习任务配套的电子习题相关的数据,以学习任务作为区分的维度划分该数据,从而形成用 户作答与学习任务配套的电子习题的行为数据,并将该行为数据记录在服务端的数据库中。
若确定用户当前待练习的学习任务,则可以在该数据库查找该用户关联的,作答与该学习任务配套的电子习题的行为数据。
此外,若本实施例应用于客户端,服务端可将当前用户的行为数据推送至客户端,客户端将用户的行为数据存储在本地,在启动筛选电子习题的逻辑时,在本地调用该用户的行为数据,或者,客户端可在本地记录用户在学习时的诸多行为,从而形成用户作答与学习任务配套的电子习题的行为数据,本实施例对此不加以限制。
步骤104、根据行为数据在难度的维度下设置第一条件。
在本实施例中,可以预先为每个电子习题配置难度,即正确作答电子习题的难易程度,一般情况下,难度越大,正确作答电子习题越困难,反之,难度越小,正确作答电子习题越容易。
进一步而言,该难度可以由教育人士浏览电子习题之后按照其见解手动标记,也可以通过大数据学习标记,本实施例对此不加以限制。
在一个学习难度的示例中,部分电子习题标记有难度,部分电子习题未标记有难度,如1、2、3、4、5,数值越大,则难度越高,反之,数值越小,则难度越低,
对于未标记有难度的电子习题,考虑到电子习题的样式较多,但如果是属于同一个知识点的话,很多电子习题的实质都是一样的,有的电子题目只是数值不同,有的电子习题只是背景不同,使用的解题方法是一样的。因此,可以对比同一知识点下电子习题之间的相似度,将某些电子习题标记的难度赋值为与其相似的、且未标记难度的其他电子习题。
对于已标记有难度的电子习题,考虑到这些难度是一个或多个教育人士的见解,而教育人士与用户可能存在不同的衡量标准、难度尺度也不完全相同,即教育人士与用户的见解并非完全一致,使得标记的难度可能存在一定的误差,容易造成对于同一类型的电子习题甚至同一道电子习题均具有明显差异。因此,可以对比同一知识点下电子习题之间的相似度,基于某些电子习题标记的难度修正与其相似的其他电子习题的难度,使得具有高相似度的电子习题具有相同或相近的初始难度标签,提高了难度的衡量标准的统一性和规范性,从而也提高了后续筛选电子习题的准确度。
例如,若电子习题的难度为离散值,将当前电子习题的难度及其相似的其他电子习题的难度中频数最高的一个作为两者修正后的难度。
又如,若电子习题的难度为连续值,将当前电子习题的难度及其相似的其他电子习题的难度进行线性融合(如加权之后求和)作为两者修正后的难度。
在本实施例中,通过分析用户作答与学习任务配套的电子习题时记录的行为数据,可以在一定程度上体现用户对该学习任务中的知识的掌握程度,在该掌握程度的基础上,配合其他的教育需求,可以根据行为数据在难度的维度下设置第一条件,该第一条件用于筛选电子习题。
例如,在面向学习积极性较高的用户、多次作答较为简单的电子习题等因素下,可配合提高用户的学习水平的教育需求,在本次筛选电子习题时提高电子习题的难度,让用户暴露对该学习任务中的知识的欠缺。
又如,在面向学习积极性较低的用户、多次作答较为困难的电子习题等因素下,可配合调节用户的学习情绪的教育需求,在本次筛选电子习题时降低电子习题的难度,让用户在本次作答电子习题时获得较为优良的成绩,提高学习的信心。
步骤105、分别从多个习题集中为用户选择电子习题、作为目标习题。
在难度的维度下设置第一条件之后,筛选出的电子习题的数量仍然比较庞大,因此,除了在难度的维度下设置第一条件,还可以在数量的维度下设置第二条件,用于收敛最终选择的电子习题的数量。
在本实施例中,可在难度、数量这两个维度下,分别从多个习题集中为用户选择电子习题,所谓分别,是尽可能地从每个习题集中为用户选择电子习题,而并非集中在某几个习题集中,为便于区分,这些被选择的电子习题可记为目标习题。
其中,目标习题的难度满足第一条件、目标习题的数量满足预设的第二条件,即期望找到一个电子习题的组合,使得用户在一套题中,用最少的电子习题尽可能多地接触适合该用户的学习水平的电子习题。
在本实施例中,学习任务配套多个习题集,每个习题集中具有多个内容相同或相似的电子习题,习题集的生成主要考虑电子习题本身的内容,与用户的交互行为无关,可适用于千万级的题库,保证了在大规模题库中的可行性,避免了大规模题库无法曝光给用户的情况下、使用用户较为稀疏的交互行为计算相似度无法应用在千万级的题库的问题,确定用户的学习任务,确定内容与学习任务相关的多个习题集,获取用户作答内容与学习任务相关的电子习题时记录的行为数据,根据行为数据在难度的维度下设置第一条件,分别从多个习题集中为用户选择电子习题、作为目标习题,目标习题的难度满足第一条件、目标习题的数量满足预设的第二条件,在难度、数量这两个维度的约束下分别从 各个习题集中为用户选择电子习题,不仅可以提高不同题型的电子习题的曝光率,而且筛选出的电子习题适配用户的学习水平,使得电子习题的组合更加合理、实现全局最优,减少重复选择相同或相似的电子习题,让用户在有限的时间内接触更多的、与其学习水平适配的典型习题,从而提高用户的学习效率。
如果应用基于规则、基于认知诊断、基于协同过滤、基于内容等方法为用户筛选电子习题,导致电子习题的重复率较高,为提高学习效率,用户可能会直接手动在题库中寻找适合自己的电子习题,由于题库中电子习题的数量多,将导致用户手动筛选电子习题的效率低,而且,在先筛选电子习题所消耗的资源(如教育平台的处理器资源、内存资源、带宽资源等),用户所在的电子设备显示电子习题所消耗的资源,用户作答电子习题的时间将会被浪费。
而应用本实施例为用户筛选电子习题,电子习题的重复率较低,用户可获得预期的练习效果,避免手动在题库中寻找适合自己的电子习题,从而避免浪费筛选电子习题所消耗的资源(如教育平台的处理器资源、内存资源、带宽资源等),用户所在的电子设备显示电子习题所消耗的资源,用户作答电子习题的时间。
实施例二
图2为本申请实施例二提供的一种选题方法的流程图,本实施例以前述实施例为基础,进一步细化生成习题集、设置第一条件、筛选目标习题的操作,该方法具体包括如下步骤:
步骤201、确定用户的学习任务。
如图3所示,执行步骤S301,确定当前用户待学习的学习任务,如章节、知识点等。
步骤202、获取内容与学习任务相关的电子习题。
如图3所示,可以预先执行步骤S302,针对不同的学习任务配置相应的习题集,存储在题库中。
一般情况下,可以离线执行步骤S302,将各个学习任务下的电子习题聚类为多个习题集,在题库中存储学习任务与习题集之间的关联关系,在线执行步骤S303,根据用户指定的学习任务在题库中查询与该学习任务配套的习题集。
在离线聚类习题库时,如图4所示,执行步骤S401,从题库中提取与各个学习任务配套的电子习题。
如图4所示,为提高电子习题的质量,从而有助于提高后续聚类为习题集 的精度和性能,可执行步骤S402,对电子习题进行预处理,例如,去除电子习题中的某些标签、滤除被标记为错误的电子习题、滤除被标记为重复的电子习题,等等。
以去除标签作为预处理的示例,针对数学中的公式数据、英文中的字符数据,通常会使用一些特定的格式进行记录,以便在页面中正确显示,如latex(基于底层编程语言的电子排版系统)、HTML(HyperText Markup Language,超文本标记语言)、MathML(数学标记语言)等,在记录时会产生标签。
例如,针对数学中求解一元二次方程的公式数据
使用MathML记录时,使用标签<math>记录文档的开始,使用标签<mi>记录各个标识符元素(代表变量、函数名、常量等),如x、b、a、c等,使用标签<mo>记录操作符元素,如=、±、-等,使用标签<mfrac>记录
为分数模式,使用标签<msup>记录b
2为上标模式,等等,在对数据公式
进行预处理时,可以去除这些标签。
一般情况下,电子习题包括题干信息、选项信息、解析信息中的至少一者,其中,解析信息可以包含知识点(或考点)、答案、分析(或详解)等信息,在本实施例中,如图4所示,可执行步骤S403,从电子习题包括题干信息、选项信息、解析信息中的至少一者,利用电子习题包括题干信息、选项信息、解析信息中的至少一者进行聚类。
针对不同的学科,电子习题的题型较多,例如,选择题、判断题、填空题、问答题(又称解答题),等等,针对不同题型的电子习题,电子习题所包含的信息有所不同,例如,如图5A所示,选择题通常包括题干信息511、选项信息512、解析信息513,如图5B所示,判断题通常包括题干信息521、解析信息522,如图5C所示,填空题通常包括题干信息531、解析信息532,如图5D所示,问答题通常包括题干信息541、解析信息542,等等。
步骤203、将电子习题划分为多个类型的习题信息。
不同学科、不同题型的电子习题,其题干信息、选项信息、解析信息等部分均可能包含了不同类型的习题信息,如文本数据、公式数据、第一图像数据。
例如,针对数学的选择题、填空题、问答题,其题干信息、选项信息、解析信息均可能包含文本数据、公式数据、第一图像数据,其中,第一图像数据用于表示几何图形、问题的场景、统计图表、函数曲线,等等。
又例如,针对化学的选择题、填空题、问答题,其题干信息、选项信息、解析信息均可能文本数据、包含第一图像数据,其中,第一图像数据用于表示 化学仪器、统计图表、实验流程,等等。
在本实施例中,如图4所示,可以分别执行步骤S4041、步骤S4042、步骤S4043,从电子习题中按照文本数据、公式数据、第一图像数据等类型,将该电子习题划分为相应的习题信息,从而形成多个模态的习题信息。
步骤204、分别从习题信息中提取候选特征信息。
针对不同类型(模态)的习题信息,可以使用该类型(模态)对应的策略从该习题信息提取抽取特征,作为候选特征信息。
在一种情况中,如图4所示,若习题信息的类型为文本数据,则执行步骤S4061,确定语言模型,如ERNIE 2.0、BERT(Bidirectional Encoder Representations from Transformers,来自变压器的双向编码器表示),等等,执行步骤S4071,分别将各个电子习题的文本数据输入语言模型中进行处理,以输出文本数据的、指定第一长度的特征,作为候选特征信息,对于文本数据而言,该候选特征信息通常为句向量,又可称之为文本特征向量。
在自然语言处理(NLP)中,为提高语言模型的训练效率,可使用预训练(pre-training/trained)的模型作为语言模型,所谓预训练,是指通过自监督学习(如自回归的语言模型和自编码技术),从大规模数据中获得与具体任务无关(即提取电子习题中文本数据的特征)的预训练模型,体现某一个词在一个通用上下文中的语义表征,隐式地学习到了通用的语法语义知识。
在本实施例中,可使用不同种类的语言(如中文、英文等)下通用的文本数据(如百科数据)作为语料预训练语言模型,使得语言模型在不同种类的语言表现的效果更佳。
为了解决电子习题的文本数据可能与通用的文本数据(如百科数据)分别不一致的问题,基于预训练对于语言模型的可扩展性,可将部分电子习题的文本数据作为语料,通过自监督学习对语言模型进行微调(fine tuning),实现针对具体的任务(即提取电子习题中文本数据的特征)修正语言模型,即,以电子习题中文本数据作为标记(Tag),对电子习题中文本数据进行句子重排、文档旋转等操作之后输入语言模型,使之训练之后收敛于该标记,从而使得训练出来的语言模型能够更加适配电子习题这个场景下文本数据的分布。
当然,除了预训练语言模型之外,还可以直接采用电子习题的文本数据作为语料训练语言模型,本实施例对此不加以限制。
在另一种情况中,如图4所示,若习题信息的类型为第一图像数据,则执行步骤S4062,确定第一图像模型,如ResNet 50、VGG、DenseNet,等等,执行步骤S4072,将第一图像数据输入第一图像模型中进行处理,以输出第一图像 数据的、指定第二长度的特征,作为候选特征信息,对于第一图像数据而言,该候选特征信息又可称之为图像特征向量。
由于第一图像模型的通用性较强,与具体的任务(即提取电子习题中第一图像数据的特征)的关联性较弱,因此,可以直接使用预训练的模型作为第一图像模型。
当然,除了预训练第一图像模型之外,还可以直接采用电子习题的第一图像数据作为样本训练第一图像模型,本实施例对此不加以限制。
第一图像数据包含色彩(如RGB(红绿蓝)),其通常为三维的张量(Tensor),其候选特征信息也为三维的张量,即(宽,高,色彩深度)(width,height,color_depth),为了各个类型下候选特征信息在维度上的匹配,可以通过二维池化操作(如2D最大池化、2D平均池化、2D最小池化,即依次在候选特征信息添加窗口,在该窗口内取最大值、平均值、最小值)等方式将第一图像数据的候选特征信息从三维降维至一维。
例如,若第一图像模型为ResNet 50,将第一图像数据输入ResNet 50中进行处理,可输出维度为7*7*2048的候选特征信息,通过2D平均池化后,该候选特征信息的维度变成1*1*2048。
此外,若电子习题包含一帧第一图像数据,则可以确认该帧第一图像数据的候选特征信息为电子习题在第一图像数据的类型下的候选特征信息。
若电子习题包含多帧(即两帧或两帧以上)第一图像数据,则对多帧第一图像数据的多个候选特征信息在各个维度上计算平均值,作为多帧第一图像数据整体的候选特征信息,作为电子习题在第一图像数据的类型下的候选特征信息。
在又一种情况中,若习题信息的类型为公式数据,其存储的形式有所不同,如LaTex、HTML、图像数据等,如图4所示,为提高处理公式数据的通用性,则执行步骤S405,将公式数据转换为第二图像数据,即按照其存储的形式在内存绘制该公式数据,形成位图,提取该位图作为第二图像数据。
如图4所示,针对第二图像数据,可执行步骤S4063,确定第二图像模型,第二图像模型可以与第一图像模型相同,也可以与第一图像模型不同,如ResNet 50、VGG、DensNet,等等,第一图像模型可以为预训练的模型,也可以直接采用记载公式数据的第二图像数据作为样本训练,本实施例对此不加以限制。
如图4所示,执行步骤S4073,将第二图像数据输入第二图像模型中进行处理,以输出公式数据的、指定第三长度的特征,作为候选特征信息,对于第二图像数据而言,候选特征信息又可称之为图像特征向量。
第二图像数据包含色彩,其通常为三维的张量,其候选特征信息也为三维的张量,即(宽,高,色彩深度),为了各个类型下候选特征信息在维度上的匹配,可以通过二维池化操作(如2D最大池化、2D平均池化、2D最小池化,即依次在候选特征信息添加窗口,在该窗口内取最大值、平均值、最小值)等方式将将公式数据的候选特征信息从三维降维至一维。
此外,若电子习题包含一个公式数据,则可以确认该公式数据的候选特征信息为电子习题在公式数据的类型下的候选特征信息。
若电子习题包含多个(即两帧或两帧以上)公式数据,则对多个公式数据的多个候选特征信息在各个维度上计算平均值,作为多个公式数据整体的候选特征信息,作为电子习题在公式数据的类型下的候选特征信息。
当然,上述候选特征信息及其提取方式只是作为示例,在实施本实施例时,可以根据实际情况设置其它候选特征信息及其提取方式,本实施例对此不加以限制。另外,除了上述候选特征信息及其提取方式外,本领域技术人员还可以根据实际需要采用其它候选特征信息及其提取方式,本申请实施例对此也不加以限制。
在某些情况下,电子习题可能会缺失一种或多种类型的习题信息,即存在一种或多种类型的习题信息为空的情况。
例如,语文的选择题、填空题、问答题,部分可能会具有文本数据、第一图像数据,而缺失公式数据,部分可能会具有文本数据,而缺失公式数据、第一图像数据。
例如,语文的选择题、填空题、问答题,部分可能会具有文本数据、第一图像数据、公式数据,并不缺失习题信息,部分可能会具有文本数据、第一图像数据,而缺失公式数据,部分可能会具有文本数据、第一图像数据,而缺失公式数据,部分可能会具有文本数据,而缺失公式数据、第一图像数据。
将电子习题所具有的习题信息的类型与标准的类型进行比较,若发现电子习题缺失某个类型的习题信息,即某个类型的习题信息为空,为后续统一目标特征信息的维度、便于聚类,则可以将习题信息的候选特征信息设置为指定的值,如0。
步骤205、将候选特征信息拼接为目标特征信息。
在本实施例中,如图4所示,执行步骤S408,将各个类型的候选特征信息按照预设的顺序依次首尾拼接,从而组成表征电子习题整体特征的目标特征信息。
若电子习题的类型包括文本数据、第一图像数据、公式数据,则可以将文 本数据的候选特征信息、第一图像数据的候选特征信息、公式数据的候选特征信息按照预设的顺序依次首尾拼接为目标特征信息。
例如,文本数据的候选特征信息排序第一,第一图像数据的候选特征信息排序第二,公式数据的候选特征信息排序第三。
步骤206、使用目标特征信息将电子习题聚类为多个簇,获得多个习题集。
如图4所示,执行步骤S409,可使用文本数据、第一图像数据、公式数据等习题信息的候选特征信息所组成的目标特征信息对电子习题进行聚类,得到多个簇,一个簇为一个习题集,在聚类时考虑了多个模态的特征、考虑了多个方位的特征,可以提高电子习题之间相似性的准确度,从而提高聚类的效果,保证习题集中电子习题的典型性。
聚类属于无监督学习方法,对电子习题挖掘时,可用于发现电子习题分布和隐含模式的一项技术,事先不了解一批样品(电子习题)中的每个样品(电子习题)的类别或者其他的先验知识,分类依据是样品(电子习题)的特征(目标特征信息),利用某种相似性度量的方法,把特征相同的或相近的分为一类,实现聚类,即,将一堆没有标签的数据(电子习题)自动划分成几类,证同一类的数据有相似的特征。
在本实施例中,可以根据业务需求应用如下聚类算法,使用目标特征信息将电子习题聚类为多个簇,一个簇可以视为一个习题集:
1、基于划分的方法
给定一个有N(N为正整数)个元组或者纪录的数据集(电子习题),分裂法将构造K(K为正整数,K<N)个分组,每一个分组就代表一个聚类,例如,K-MEANS算法、K-MEDOIDS算法、CLARANS算法,等等。
2、基于层次
对给定的数据集(电子习题)进行层次似的分解,直到某种条件满足为止。具体又可分为“自底向上”和“自顶向下”两种方案,例如,BIRCH算法、CURE算法、CHAMELEON算法,等等。
3、基于密度
若一个区域中的点(电子习题)的密度大过某个阈值,则可以把它加到与之相近的聚类中去,例如,DBSCAN算法、OPTICS算法、DENCLUE算法,等等。
4、基于网格
将数据空间划分成为有限个单元(cell)的网格结构,所有的处理都是以单个 的单元为对象的,例如,STING算法、CLIQUE算法、WAVE-CLUSTER算法,等等。
为使本领域技术人员更好地理解本申请,在本实施例中,将K-MEANS作为聚类算法的一种示例进行说明,K-MEANS的原理比较简单,根据样本之间的距离或者说是相似性(亲疏性),把越相似、差异越小的样本聚成一类(簇),最后形成多个簇,使同一个簇内部的样本相似度高,不同簇之间差异性高,因此,使用K-MEANS对电子习题进行聚类计算简单,容易实现,收敛速度快,并且,面对数量达千万级别的电子习题时,簇是密集的,而簇与簇之间区别较为明显时,使用K-MEANS对电子习题进行聚类的效果较好。
在本示例中,可确定K值,作为簇的数量,即期限将电子习题聚类为K个簇。
在一种情况中,该K值(即簇的数量)为根据经验设置的数值。
在另一种情况中,可在题库中查询与学习任务配套的电子习题的数量,从而基于该电子习题的数量设置簇的数量,其中,簇的数量与电子习题的数量之间满足非线性正相关的关系,所谓非线性,可以指簇的数量与电子习题的数量不按比例、不成直线的关系,所谓正相关,可以指电子习题的数量越多,簇的数量越多,反之,电子习题的数量越少,簇的数量越少,保证可以独立聚类到各种典型的电子习题,保证聚类的效果。
当然,上述计算簇的数量的方法只是作为示例,在实施本实施例时,可以根据实际情况设置其它计算簇的数量的方法,例如,将电子习题的数量输入非线性的激活函数(如生长函数)中,输出的结果为簇的数量,等等,本实施例对此不加以限制。另外,除了上述计算簇的数量的方法外,本领域技术人员还可以根据实际需要采用其它计算簇的数量的方法,本申请实施例对此也不加以限制。
在向量空间中按照簇的数量生成簇,簇具有中心点(又称质心),中心点初始的值为随机选取。
使用目标特征信息计算每个电子习题与每个中心点之间的距离,如余弦距离、欧式距离等。
将电子习题与所有中心点的距离进行比较。
若电子习题与某个中心点的距离最小,则将该电子习题划分至该中心点对应的簇中。
在本次迭代中,以误差平方和(SSE)等方法判断簇是否收敛,其中,SSE越小表示数据点越接近它们的中心点,聚类效果也越好。
若是,则输出簇为习题集。
若否,则使用均值(即计算簇中所有电子习题的目标特征向量的平均值作为新的中心点)等方式更新簇中的中心点,返回执行使用目标特征信息计算电子习题与中心点之间的距离,进入下一轮迭代。
步骤207、获取用户作答内容与学习任务相关的电子习题时记录的行为数据。
如图3所示,执行步骤S304,针对用户选定的学习任务,在数据库中查询用户在先作答与该学习任务配套的电子习题时记录的行为数据。
步骤208、从答题行为数据中识别第一习题、第二习题。
在本实施例中,如图3所示,执行步骤S305,在难度的维度下设置第一条件,该第一条件包括难度区间,该难度区间用于约束单体电子习题的难度,筛选难度适配用户的学习水平的电子习题。
为设置难度区间,可从答题行为数据中识别第一习题、第二习题,其中,第一习题为用户作答的电子习题,包含用户作答正确的电子习题、用户作答错误的电子习题,第二习题为用户作答错误的电子习题,即第一习题包含第二习题。
为保证难度区间真实反映用户的学习水平,保持难度区间的准确性,可筛选出用户最近n(n为正整数,可选为1)次练习当前学习任务时的答题行为数据,从该答题行为数据中识别第一习题、第二习题,或者,可筛选出用户最近一段时间(如1个月内)练习当前学习任务时的答题行为数据,从该答题行为数据中识别第一习题、第二习题,等等。
针对判断题、选择题、填空题等题型的电子习题,如果用户作答的答案与参考的答案相同时,则可以认为该电子习题作答正确,如果用户作答的答案与参考的答案不同时,则可以认为该电子习题作答错误。
针对问答题等题型的电子习题,如果用户作答的分数值与标准的分数值之间的比值大于或等于预设的比例(如0.5)时,则可以认为该电子习题作答正确, 如果用户作答的分数值与标准的分数值之间的比值小于预设的比例(如0.5)时,则可以认为该电子习题作答错误。
步骤209、参考第一习题的难度设置难度区间的上限值。
难度区间具有上限值,即数值最大的端点,假设难度区间为[a,b],则b可以称之为上限值,可通过参考用户作答的第一习题的难度,对该难度区间设置上限值。
示例性地,可以对比各个第一习题的难度,取第一习题中数值最大的难度,作为难度区间的上限值。
当然,上述设置难度区间的上限值的方法只是作为示例,在实施本申请实施例时,可以根据实际情况设置其它设置难度区间的上限值的方法,例如,对第一习题中数值最大的m个难度取平均值,作为难度区间的上限值,等等,本申请实施例对此不加以限制。另外,除了上述设置难度区间的上限值的方法外,本领域技术人员还可以根据实际需要采用其它设置难度区间的上限值的方法,本申请实施例对此也不加以限制。
步骤210、参考第二习题的难度设置难度区间的下限值。
难度区间具有下限值,即数值最大的端点,假设难度区间为[a,b],则a可以称之为下限值,可通过参考用户作答错误的第二习题的难度,对该难度区间设置下限值。
示例性地,可以判断第二习题是否为空集,即判断用户在先是否对当前学习任务下的电子习题作答错误。
若第二习题为非空集,即用户在先对当前学习任务下的电子习题作答错误,则可以对比各个第二习题的难度,取第二习题中数值最小的难度,作为难度区间的下限值。
若第二习题为空集,即用户在先未对当前学习任务下的电子习题作答错误,则可以对比各个第一习题的难度,取第一习题中数值最小的难度,作为难度区间的下限值。
当然,上述设置难度区间的下限值的方法只是作为示例,在实施本申请实施例时,可以根据实际情况设置其它设置难度区间的下限值的方法,例如,对第一习题中数值最小的m个难度取平均值,作为难度区间的下限值,等等,本申请实施例对此不加以限制。另外,除了上述设置难度区间的下限值的方法外,本领域技术人员还可以根据实际需要采用其它设置难度区间的下限值的方法,本申请实施例对此也不加以限制。
此外,在用户作答的电子习题较为稀疏(即第一习题的数量较少)的情况下,难度区间的上限值与下限值之间的区分并不明显,甚至可能相等,为了保证难度区间较为宽裕,可以有效筛选电子习题,可以计算上限值与下限值之间的差值,并将该差值与预设的第一阈值(如0.2)进行比较。
若差值大于预设的第一阈值,则确定难度区间有效,维持该难度区间不变。
若差值小于或等于预设的第一阈值,则针对不同的学习需求,可选择性地增加上限值和/或降低下限值,以使该差值大于第一阈值。
例如,为曝光难度较高的电子习题,可以以指定的第一幅度(如0.1)增加难度区间的上限值,又如,为提高作答的准确率,可以以指定的第二幅度(如0.1)降低难度区间的下限制,等等,在每次增加难度区间的上限值和/或降低难度区间的下限制之后,可重新计算上限值与下限值之间的差值,直至该差值大于第一阈值。
步骤211、从答题行为数据中提取历史目标难度、评价指标。
在本实施例中,如图3所示,执行步骤S306,在难度的维度下设置第一条件,该第一条件包括实时目标难度,该实时目标难度用于约束整体电子习题的难度,如平均值、分位值,等等,筛选难度适配用户的学习水平的电子习题。
为设置实时目标难度,可从答题行为数据中计算历史目标难度、评价指标,其中,历史目标难度用于统计用户已作答、与学习任务配套的电子习题的难度,历史目标难度与实时目标难度的类型统一,如平均值、分位值等,评价指标用于评价用户已作答、与学习任务配套的电子习题的成绩,该评价指标可包括正向的评价指标,如准确率、作答正确的数量等,也可以包括负向的评价指标,如错误率、作答错误的数量等。
为保证难度区间真实反映用户的学习水平,保持难度区间的准确性,可筛选出用户最近n(如1)次练习当前学习任务时的答题行为数据,从该答题行为数据中计算历史目标难度、评价指标,或者,可筛选出用户最近一段时间(如1个月内)练习当前学习任务时的答题行为数据,从该答题行为数据中计算历史目标难度、评价指标,等等。
步骤212、参考评价指标对历史目标难度进行调整,获得实时目标难度。
针对同一学习任务下的同一批电子习题,历史目标难度可以从电子习题的角度反映电子习题整体的难度,而评价指标从用户的角度反映用户整体的学习水平,根据业务的需求,可以在本次练习统一学习任务下的电子习题时,参考评价指标调整历史目标难度,获得实时目标难度,使得电子习题整体的难度更加适配用户整体的学习水平。
在具体实现中,若评价指标为正向的评价指标(如准确率),则可以将该评价指标(如准确率)与预设的第二阈值进行比较。
若评价指标(如准确率)大于预设的第二阈值,表示电子习题整体的难度未到达用户的学习水平,则可以增加历史目标难度,作为实时目标难度,从而提高本次电子习题的难度,提高用户的学习效率。
示例性地,可计算历史目标难度与预设的第一步长之间的和值,作为实时目标难度,表示如下:
d
t=d
t-1+g
1
其中,d
t为实时目标难度,d
t-1为历史目标难度,g
1为第一步长。
若评价指标(如准确率)小于预设的第二阈值,表示电子习题整体的难度超过用户整体的学习水平,则可以降低历史目标难度,作为实时目标难度,从而降低本次电子习题的难度,提高用户的学习效率。
示例性地,可计算历史目标难度与预设的第二步长之间的差值,作为实时目标难度,表示如下:
d
t=d
t-1-g
2
其中,d
t为实时目标难度,d
t-1为历史目标难度,g
2为第二步长。
进一步而言,第一步长可以大于第二步长,第一步长也可以等于第二步长,第一步长还可以小于第二步长,本实施例对此不加以限制。
在本实施例中,通过第一步长、第二步长调整历史目标难度,步进的方式幅度较小,可减少因作答电子习题的偶然性对电子习题的整体难度造成的波动,从而减少对用户作答电子习题的影响,使得电子习题整体的难度与用户的学习水平逐渐趋同。
当然,上述调整历史目标难度的方法只是作为示例,在实施本申请实施例时,可以根据实际情况设置其它调整历史目标难度的方法,例如,若评价指标(如准确率)大于预设的第二阈值,则对历史目标难度取指定的倍数(该倍数大于1)作为实时目标难度,若评价指标(如准确率)小于预设的第二阈值,则对历史目标难度取指定的系数(该系数大于0、小于1)作为实时目标难度,等等,本申请实施例对此不加以限制。另外,除了上述调整历史目标难度的方法外,本领域技术人员还可以根据实际需要采用其它调整历史目标难度的方法,本申请实施例对此也不加以限制。
步骤213、以目标习题作为变量,规划目标习题的难度满足第一条件、数量满足预设的第二条件。
步骤214、在设定目标习题的数量为整数时,对目标习题进行求解。
在本实施例中,可以根据学习的需求,除了在难度的维度下设置第一条件,还可以如图3所示,执行步骤S307,在数量的维度下设置了第二条件,执行步骤S308,规划目标习题符合这两个维度的约束,其中,目标习题为分别从习题集中选择的电子习题。
在一个示例中,目标习题的难度满足第一条件包括如下的至少一者:
单个目标习题的难度位于难度区间中;
在评价指标大于预设的第二阈值时,所有目标习题的难度的统计值大于或等于实时目标难度;
在评价指标小于或等于预设的第二阈值时,所有目标习题的难度的统计值小于或等于实时目标难度。
在第一条件中,难度区间约束单体电子习题的难度,实时目标难度约束整体电子习题的难度,在这两方面对难度进行约束,可以保证筛选的电子习题在单体、在整体与用户的学习水平适配,从而提高练习电子习题的效果,从而提高用户学习的效率。
此外,目标习题的数量满足预设的第二条件包括如下的至少一者:
所有目标习题的数量小于或等于预设的第四阈值;
在每个习题集中提取的目标习题的数量小于或等于预设的第五阈值。
在第二条件中,可以对每个习题集设置统一的第五阈值,如1,即最多从每个习题集中选择一道电子习题,作为典型题,从而降低计算量,也可以自适应习题集设置第五阈值,例如,可统计习题集中电子习题的数量,基于习题集中电子习题的数量设置第五阈值,使得第五阈值与习题集中电子习题的数量满足非线性正相关的关系,由于习题集电子习题的数量越多,可以在一定程度上表示该习题集更加重要,通过自适应调整该习题集中被选择的电子习题的数量,曝光该习题集中更多的电子习题,在有限的电子习题中区分侧重点,从而提高用户学习的效率。
在本示例中,第一条件、第二条件可以表示如下:
其中,M个习题集中共n道电子习题,第i道电子习题的筛选结果(变量)为x
i、其难度为d
i,在筛选电子习题时受限于(s.t.)如下条件:
x
i=0表示未选择第i道电子习题,x
i=1表示选择第i道电子习题,从所有习题集中选择的电子习题(即目标习题)的数量小于或等于第四阈值N,从第j个习题集t
j中选择的电子习题(即目标习题)的数量小于或等于第五阈值Q,每道目标习题的难度d
i位于难度区间[a,b]内,在评价指标acc
before大于第二阈值c时,所有目标习题的难度的统计值
大于或等于实时目标难度d
t,在评价指标acc
before小于或等于预设的第二阈值c时,所有目标习题的难度的统计值
小于或等于实时目标难度d
t。
当然,上述第一条件、第二条件只是作为示例,在实施本申请实施例时,可以根据实际情况设置其它第一条件、第二条件,例如,若评价指标(如准确率)大于预设的第二阈值,则对历史目标难度取指定的倍数(该倍数大于1)作为实时目标难度,若评价指标(如准确率)小于预设的第二阈值,则对历史目标难度取指定的系数(该系数大于0、小于1)作为实时目标难度,等等,本申请实施例对此不加以限制。另外,除了上述第一条件、第二条件外,本领域技术人员还可以根据实际需要采用其它第一条件、第二条件,本申请实施例对此也不加以限制。
在本实施例中,将为用户筛选电子习题当作是一个优化问题,即规划一个最优的电子习题的组合,以同时满足在难度上满足第一条件、在数量上满足第二条件,如图3所示,在步骤S308中,目标习题在规划中属于变量,若设定目标习题的数量为整数,而并非分数或小数,即该规划又称之为整数规划。
对于整数规划,可通过分支定界法、隔平面法、隐枚举法、匈牙利法、蒙特卡洛法等方法对变量(目标习题)进行求解,即分别从各个习题集中寻找满足第一条件、且数量满足第二条件的电子习题,作为目标习题。
由于电子习题具有不可风格的性质,筛选电子习题的结果为选择电子习题为目标习题、未选择电子习题,因此,筛选电子习题属于分派问题,属于0-1 规划的特例、运输问题的特例,应用匈牙利法求解目标习题的计算较为简便。
匈牙利法是针对目标要求极小问题提出来的,其基本原理是:为了实现目标极小,在系数矩阵元素C
ij≥0的条件下,如果能使矩阵具有一组处于不同行又不同列的零元素(C′
ij=0)打上括号(),对应该元素的决策变量x
ij=1,未打括号元素对应的决策变量x
ij=0,那么目标函数值Z为最小(0),这样的组合解就是最优解。
具体而言,从(c
ij)矩阵的每行(或列)减去或加上一个常数u
i(或v
j)构成新矩阵(c′
ij),c′
ij=c
ij±(u
i+v
j),则对应(c′
ij)的(x
ij)最优解与原(c
ij)的最优解等价。
如图3所示,执行步骤S309,对于筛选出来的目标习题,可以进行去重处理,避免用户重复作答,即查询用户在先作答的电子习题,从目标习题中去除用户在先作答的电子习题。
此外,对于筛选出来的目标习题,可以配合其他方式进行二次筛选之后显示给用户作答,也可以与其他方式筛选的电子习题组合之后显示给用户作答,还可以直接显示给用户作答,本实施例对此不加以限制。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所涉及的动作并不一定是本申请实施例所必须的。
实施例三
图6为本申请实施例三提供的一种选题装置的结构框图,具体可以包括如下模块:
学习任务确定模块601,用于确定用户的学习任务;
习题集确定模块602,用于确定内容与所述学习任务相关的多个习题集,每个所述习题集中具有多个内容相同或相似的电子习题;
行为数据获取模块603,用于获取所述用户作答内容与所述学习任务相关的电子习题时记录的行为数据;
难度条件设置模块604,用于根据所述行为数据在难度的维度下设置第一条件;
目标习题选择模块605,用于分别从多个所述习题集中为所述用户选择电子 习题、作为目标习题,所述目标习题的难度满足所述第一条件、所述目标习题的数量满足预设的第二条件。
在本申请的一个实施例中,所述习题集确定模块602包括:
配套习题获取模块,用于获取内容与所述学习任务相关的电子习题,所述电子习题包括题干信息、选项信息、解析信息中的至少一者;
习题信息划分模块,用于将所述电子习题划分为多个类型的习题信息;
候选特征信息提取模块,用于分别从所述习题信息中提取候选特征信息;
目标特征信息拼接模块,用于将所述候选特征信息拼接为目标特征信息;
习题聚类模块,用于使用所述目标特征信息将所述电子习题聚类为多个簇,获得多个习题集。
在本申请的一个实施例中,所述候选特征信息提取模块包括:
语言模型确定模块,用于若所述习题信息的类型为文本数据,则确定语言模型;
文本特征处理模块,用于将所述文本数据输入所述语言模型中进行处理,以输出所述文本数据的候选特征信息;
和/或,
第一图像模型确定模块,用于若所述习题信息的类型为第一图像数据,则确定第一图像模型;
第一图像特征处理模块,用于将所述第一图像数据输入所述第一图像模型中进行处理,以输出所述第一图像数据的候选特征信息;
第一图像特征降维模块,用于将所述第一图像数据的候选特征信息从三维降维至一维;
和/或,
图像转换模块,用于若所述习题信息的类型为公式数据,则将所述公式数据转换为第二图像数据;
第二图像模型确定模块,用于确定第二图像模型;
第二图像特征处理模块,用于将所述第二图像数据输入所述第二图像模型中进行处理,以输出所述公式数据的候选特征信息;
第二图像特征降维模块,用于将所述公式数据的候选特征信息从三维降维至一维;
和/或,
空信息处理模块,用于若某个类型的所述习题信息为空,则将所述习题信息的候选特征信息设置为指定的值。
在本申请的一个实施例中,所述候选特征信息提取模块还包括:
第一均值计算模块,用于若所述电子习题包含多帧所述第一图像数据,则对多帧所述第一图像数据的多个候选特征信息在各个维度上计算平均值,作为多帧所述第一图像数据整体的候选特征信息;
和/或,
第二均值计算模块,用于若所述电子习题包含多个所述公式数据,则对多个所述公式数据的多个候选特征信息在各个维度上计算平均值,作为多个所述公式数据整体的候选特征信息。
在本申请的一个实施例中,所述习题聚类模块包括:
簇数量确定模块,用于确定簇的数量;
簇生成确定模块,用于按照所述簇的数量生成簇,所述簇具有中心点;
距离计算模块,用于使用所述目标特征信息计算所述电子习题与所述中心点之间的距离;
习题划分模块,用于若某个所述中心点的距离最小,则将所述电子习题划分至所述中心点对应的所述簇中;
收敛判断模块,用于判断所述簇是否收敛;若是,则调用习题集输出模块,若否,则调用中心更新模块;
习题集输出模块,用于输出所述簇为习题集;
中心更新模块,用于更新所述簇中的所述中心点,返回执行所述使用所述目标特征信息计算所述电子习题与所述中心点之间的距离。
在本申请的一个实施例中,所述簇数量确定模块包括:
习题数量查询模块,用于查询与所述学习任务配套的所述电子习题的数量;
簇数量设置模块,用于基于所述电子习题的数量设置簇的数量,簇的数量与所述电子习题的数量之间满足非线性正相关的关系。
在本申请的一个实施例中,所述非线性映射模块包括:
非线性映射模块,用于对所述电子习题的数量与指定的系数之间的乘积开平方之后取整,作为簇的数量。
在本申请的一个实施例中,所述第一条件包括难度区间;所述难度条件设置模块604包括:
习题识别模块,用于从所述答题行为数据中识别第一习题、第二习题,所述第一习题为所述用户作答的电子习题、所述第二习题为所述用户作答错误的电子习题;
上限值设置模块,用于参考所述第一习题的难度设置所述难度区间的上限值;
下限值设置模块,用于参考所述第二习题的难度设置所述难度区间的下限值。
在本申请的一个实施例中,所述上限值设置模块包括:
最大难度取值模块,用于取所述第一习题中数值最大的难度,作为所述难度区间的上限值。
在本申请的一个实施例中,所述下限值设置模块包括:
第一最小难度取值模块,用于若所述第二习题为非空集,则取所述第二习题中数值最小的难度,作为所述难度区间的下限值;
第二最小难度取值模块,用于若所述第二习题为空集,则取所述第一习题中数值最小的难度,作为所述难度区间的下限值。
在本申请的一个实施例中,所述难度条件设置模块604还包括:
差值计算模块,用于计算所述上限值与所述下限值之间的差值;
有效确定模块,用于若所述差值大于预设的第一阈值,则确定所述难度区间有效;
难度区间调整模块,用于若所述差值小于或等于预设的第一阈值,则增加所述上限值和/或降低所述下限值,以使所述差值大于所述第一阈值。
在本申请的一个实施例中,所述第一条件包括实时目标难度;所述难度条件设置模块604包括:
历史参数提取模块,用于从所述答题行为数据中提取历史目标难度、评价指标,所述历史目标难度用于统计所述用户已作答、与所述学习任务配套的电子习题的难度,所述评价指标用于评价所述用户已作答、与所述学习任务配套的电子习题的成绩;
历史目标难度调整模块,用于参考所述评价指标对所述历史目标难度进行调整,获得实时目标难度。
在本申请的一个实施例中,所述历史目标难度调整模块包括:
历史目标难度增加模块,用于若所述评价指标大于预设的第二阈值,则增加所述历史目标难度,作为实时目标难度;
历史目标难度降低模块,用于若所述评价指标小于预设的第二阈值,则降低所述历史目标难度,作为实时目标难度。
在本申请的一个实施例中,所述历史目标难度增加模块包括:
步长增加模块,用于计算所述历史目标难度与预设的第一步长之间的和值,作为实时目标难度。
在本申请的一个实施例中,所述历史目标难度降低模块包括:
步长降低模块,用于计算所述历史目标难度与预设的第二步长之间的差值,作为实时目标难度。
在本申请的一个实施例中,所述目标习题选择模块605包括:
条件规划模块,用于以目标习题作为变量,规划所述目标习题的难度满足所述第一条件、数量满足预设的第二条件,所述目标习题为分别从所述习题集中选择的电子习题;
变量求解模块,用于在设定所述目标习题的数量为整数时,对所述目标习题进行求解。
在本申请的一个实施例中,所述目标习题的难度满足所述第一条件包括如下的至少一者:
单个所述目标习题的难度位于难度区间中;
在所述评价指标大于预设的第二阈值时,所有所述目标习题的难度的统计值大于或等于实时目标难度;
在所评价指标小于或等于预设的第二阈值时,所有所述目标习题的难度的统计值小于或等于实时目标难度。
在本申请的一个实施例中,所述目标习题的数量满足预设的第二条件包括如下的至少一者:
所有所述目标习题的数量小于或等于预设的第四阈值;
在每个所述习题集中提取的所述目标习题的数量小于或等于预设的第五阈值。
本申请实施例所提供的选题装置可执行本申请任意实施例所提供的选题方法,具备执行方法相应的功能模块和有益效果。
实施例四
图7为本申请实施例四提供的一种计算机设备的结构示意图。图7示出了适于用来实现本申请实施方式的示例性计算机设备12的框图。图7显示的计算机设备12仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图7所示,计算机设备12以通用计算设备的形式表现。计算机设备12的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(ISA)总线,微通道体系结构(MAC)总线,增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。
计算机设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
系统存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)30和/或高速缓存存储器32。计算机设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图7未显示,通常称为“硬盘驱动器”)。尽管图7中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本申请各实施例的功能。
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器28中,这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本申请所描述的实施例中的功能和/或方法。
计算机设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显 示器24等)通信,还可与一个或者多个使得用户能与该计算机设备12交互的设备通信,和/或与使得该计算机设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且,计算机设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器20通过总线18与计算机设备12的其它模块通信。应当明白,尽管图中未示出,可以结合计算机设备12使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
处理单元16通过运行存储在系统存储器28中的程序,从而执行各种功能应用以及数据处理,例如实现本申请实施例所提供的选题方法。
实施例五
本申请实施例五还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现上述选题方法的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
其中,计算机可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
Claims (19)
- 一种选题方法,包括:确定用户的学习任务;确定内容与所述学习任务相关的多个习题集,每个所述习题集中具有多个内容相同或相似的电子习题;获取所述用户作答内容与所述学习任务相关的电子习题时记录的行为数据;根据所述行为数据在难度的维度下设置第一条件;分别从多个所述习题集中为所述用户选择电子习题、作为目标习题,所述目标习题的难度满足所述第一条件、所述目标习题的数量满足预设的第二条件。
- 根据权利要求1所述的方法,其中,所述确定内容与所述学习任务相关的多个习题集,包括:获取内容与所述学习任务相关的电子习题,所述电子习题包括题干信息、选项信息、解析信息中的至少一者;将所述电子习题划分为多个类型的习题信息;分别从所述习题信息中提取候选特征信息;将所述候选特征信息拼接为目标特征信息;使用所述目标特征信息将所述电子习题聚类为多个簇,获得多个习题集。
- 根据权利要求2所述的方法,其中,所述分别从所述习题信息中提取候选特征信息,包括:若所述习题信息的类型为文本数据,则确定语言模型;将所述文本数据输入所述语言模型中进行处理,以输出所述文本数据的候选特征信息;和/或,若所述习题信息的类型为第一图像数据,则确定第一图像模型;将所述第一图像数据输入所述第一图像模型中进行处理,以输出所述第一图像数据的候选特征信息;将所述第一图像数据的候选特征信息从三维降维至一维;和/或,若所述习题信息的类型为公式数据,则将所述公式数据转换为第二图像数据;确定第二图像模型;将所述第二图像数据输入所述第二图像模型中进行处理,以输出所述公式数据的候选特征信息;将所述公式数据的候选特征信息从三维降维至一维;和/或,若某个类型的所述习题信息为空,则将所述习题信息的候选特征信息设置为指定的值。
- 根据权利要求3所述的方法,其中,所述分别从所述习题信息中提取候选特征信息,还包括:若所述电子习题包含多帧所述第一图像数据,则对多帧所述第一图像数据的多个候选特征信息在各个维度上计算平均值,作为多帧所述第一图像数据整体的候选特征信息;和/或,若所述电子习题包含多个所述公式数据,则对多个所述公式数据的多个候选特征信息在各个维度上计算平均值,作为多个所述公式数据整体的候选特征信息。
- 根据权利要求2所述的方法,其中,所述使用所述目标特征信息将所述电子习题聚类为多个簇,获得多个习题集,包括:确定簇的数量;按照所述簇的数量生成簇,所述簇具有中心点;使用所述目标特征信息计算所述电子习题与所述中心点之间的距离;若某个所述中心点的距离最小,则将所述电子习题划分至所述中心点对应的所述簇中;判断所述簇是否收敛;若是,则输出所述簇为习题集;若否,则更新所述簇中的所述中心点,返回执行所述使用所述目标特征信息计算所述电子习题与所述中心点之间的距离。
- 根据权利要求5所述的方法,其中,所述确定簇的数量,包括:查询与所述学习任务配套的所述电子习题的数量;基于所述电子习题的数量设置簇的数量,簇的数量与所述电子习题的数量 之间满足非线性正相关的关系。
- 根据权利要求6所述的方法,其中,所述基于所述电子习题的数量设置簇的数量,包括:对所述电子习题的数量与指定的系数之间的乘积开平方之后取整,作为簇的数量。
- 根据权利要求1所述的方法,其中,所述第一条件包括难度区间;所述根据所述行为数据在难度的维度下设置第一条件,包括:从所述答题行为数据中识别第一习题、第二习题,所述第一习题为所述用户作答的电子习题、所述第二习题为所述用户作答错误的电子习题;参考所述第一习题的难度设置所述难度区间的上限值;参考所述第二习题的难度设置所述难度区间的下限值。
- 根据权利要求8所述的方法,其中,所述参考所述第一习题的难度设置所述难度区间的上限值,包括:取所述第一习题中数值最大的难度,作为所述难度区间的上限值。
- 根据权利要求8所述的方法,其中,所述参考所述第二习题的难度设置所述难度区间的下限值,包括:若所述第二习题为非空集,则取所述第二习题中数值最小的难度,作为所述难度区间的下限值;若所述第二习题为空集,则取所述第一习题中数值最小的难度,作为所述难度区间的下限值。
- 根据权利要求8所述的方法,其中,所述根据所述行为数据在难度的维度下设置第一条件,还包括:计算所述上限值与所述下限值之间的差值;若所述差值大于预设的第一阈值,则确定所述难度区间有效;若所述差值小于或等于预设的第一阈值,则增加所述上限值和/或降低所述下限值,以使所述差值大于所述第一阈值。
- 根据权利要求1所述的方法,其中,所述第一条件包括实时目标难度;所述根据所述行为数据在难度的维度下设置第一条件,包括:从所述答题行为数据中提取历史目标难度、评价指标,所述历史目标难度用于统计所述用户已作答、与所述学习任务配套的电子习题的难度,所述评价 指标用于评价所述用户已作答、与所述学习任务配套的电子习题的成绩;参考所述评价指标对所述历史目标难度进行调整,获得实时目标难度。
- 根据权利要求12所述的方法,其中,所述参考所述评价指标对所述历史目标难度进行调整,获得实时目标难度,包括:若所述评价指标大于预设的第二阈值,则增加所述历史目标难度,作为实时目标难度;若所述评价指标小于预设的第二阈值,则降低所述历史目标难度,作为实时目标难度。
- 根据权利要求13所述的方法,其中,所述增加所述历史目标难度,作为实时目标难度,包括:计算所述历史目标难度与预设的第一步长之间的和值,作为实时目标难度;所述降低所述历史目标难度,作为实时目标难度,包括:计算所述历史目标难度与预设的第二步长之间的差值,作为实时目标难度。
- 根据权利要求1-14任一项所述的方法,其中,所述分别从多个所述习题集中为所述用户选择电子习题、作为目标习题,包括:以目标习题作为变量,规划所述目标习题的难度满足所述第一条件、数量满足预设的第二条件,所述目标习题为分别从所述习题集中选择的电子习题;在设定所述目标习题的数量为整数时,对所述目标习题进行求解。
- 根据权利要求1-14任一项所述的方法,其中,所述目标习题的难度满足所述第一条件包括如下的至少一者:单个所述目标习题的难度位于难度区间中;在所述评价指标大于预设的第二阈值时,所有所述目标习题的难度的统计值大于或等于实时目标难度;在所评价指标小于或等于预设的第二阈值时,所有所述目标习题的难度的统计值小于或等于实时目标难度;所述目标习题的数量满足预设的第二条件包括如下的至少一者:所有所述目标习题的数量小于或等于预设的第四阈值;在每个所述习题集中提取的所述目标习题的数量小于或等于预设的第五阈值。
- 一种选题装置,包括:学习任务确定模块,用于确定用户的学习任务;习题集确定模块,用于确定内容与所述学习任务相关的多个习题集,每个所述习题集中具有多个内容相同或相似的电子习题;行为数据获取模块,用于获取所述用户作答内容与所述学习任务相关的电子习题时记录的行为数据;难度条件设置模块,用于根据所述行为数据在难度的维度下设置第一条件;目标习题选择模块,用于分别从多个所述习题集中为所述用户选择电子习题、作为目标习题,所述目标习题的难度满足所述第一条件、所述目标习题的数量满足预设的第二条件。
- 一种计算机设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-16中任一项所述的选题方法。
- 一种计算机可读存储介质,存储计算机程序,所述计算机程序被处理器执行时实现如权利要求1-16中任一项所述的选题方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110181881.1A CN114913729B (zh) | 2021-02-09 | 2021-02-09 | 一种选题方法、装置、计算机设备和存储介质 |
CN202110181881.1 | 2021-02-09 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022170985A1 true WO2022170985A1 (zh) | 2022-08-18 |
Family
ID=82760845
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/074152 WO2022170985A1 (zh) | 2021-02-09 | 2022-01-27 | 选题方法、装置、计算机设备和存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114913729B (zh) |
WO (1) | WO2022170985A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116384393A (zh) * | 2023-04-27 | 2023-07-04 | 圣麦克思智能科技(江苏)有限公司 | 一种基于自然语言处理的运维数据处理系统及方法 |
CN116663537A (zh) * | 2023-07-26 | 2023-08-29 | 中信联合云科技有限责任公司 | 基于大数据分析的选题策划信息处理方法及系统 |
CN117557426A (zh) * | 2023-12-08 | 2024-02-13 | 广州市小马知学技术有限公司 | 基于智能题库的作业数据反馈方法及学习评估系统 |
CN118132858A (zh) * | 2024-05-08 | 2024-06-04 | 江西财经大学 | 一种基于ai的个性化学习推荐方法及系统 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116561260A (zh) * | 2023-07-10 | 2023-08-08 | 北京十六进制科技有限公司 | 一种基于语言模型的习题生成方法、设备及介质 |
CN118627476A (zh) * | 2024-08-01 | 2024-09-10 | 福建鹿鸣教育科技有限公司 | 一种用于答题卡生成制备的管理系统 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102629272A (zh) * | 2012-03-14 | 2012-08-08 | 北京邮电大学 | 一种基于聚类的考试系统试题库优化方法 |
JP2014071161A (ja) * | 2012-09-27 | 2014-04-21 | Dainippon Printing Co Ltd | 学習システム、プログラム及び学習通信システム |
CN109147446A (zh) * | 2018-08-20 | 2019-01-04 | 国政通科技有限公司 | 电子考试系统 |
CN110390019A (zh) * | 2019-07-26 | 2019-10-29 | 江苏曲速教育科技有限公司 | 一种试题的聚类方法、去重方法及系统 |
CN110413728A (zh) * | 2019-06-20 | 2019-11-05 | 平安科技(深圳)有限公司 | 练习题推荐方法、装置、设备和存储介质 |
CN110930274A (zh) * | 2019-12-02 | 2020-03-27 | 中山大学 | 一种基于认知诊断的实践效果评估及学习路径推荐系统和方法 |
CN112035605A (zh) * | 2020-08-04 | 2020-12-04 | 广州视源电子科技股份有限公司 | 题目推荐方法、装置、设备及存储介质 |
CN112256869A (zh) * | 2020-10-12 | 2021-01-22 | 浙江大学 | 一种基于题意文本的同知识点试题分组系统和方法 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102693300A (zh) * | 2012-05-18 | 2012-09-26 | 苏州佰思迈信息咨询有限公司 | 教学软件的试题生成方法 |
CN107203582A (zh) * | 2017-03-27 | 2017-09-26 | 杭州博世数据网络有限公司 | 一种基于项目反应理论分析结果的智能组题方法 |
CN109461103A (zh) * | 2018-10-16 | 2019-03-12 | 安徽弘讯教育软件科技有限公司 | 一种在线教育平台 |
US11113323B2 (en) * | 2019-05-23 | 2021-09-07 | Adobe Inc. | Answer selection using a compare-aggregate model with language model and condensed similarity information from latent clustering |
CN110245207B (zh) * | 2019-05-31 | 2021-07-06 | 深圳市轱辘车联数据技术有限公司 | 一种题库构建方法、题库构建装置及电子设备 |
CN110489454A (zh) * | 2019-07-29 | 2019-11-22 | 北京大米科技有限公司 | 一种自适应测评方法、装置、存储介质及电子设备 |
CN110427534A (zh) * | 2019-07-31 | 2019-11-08 | 广州视源电子科技股份有限公司 | 一种电子习题的处理方法、装置、设备和存储介质 |
CN110765278B (zh) * | 2019-10-24 | 2022-10-25 | 深圳小蛙出海科技有限公司 | 一种查找相似习题的方法、计算机设备及存储介质 |
CN111831914A (zh) * | 2020-07-22 | 2020-10-27 | 上海掌学教育科技有限公司 | 一种在线教育的智能推题系统 |
CN112184089B (zh) * | 2020-11-27 | 2021-03-09 | 北京世纪好未来教育科技有限公司 | 试题难度预测模型的训练方法、装置、设备及存储介质 |
-
2021
- 2021-02-09 CN CN202110181881.1A patent/CN114913729B/zh active Active
-
2022
- 2022-01-27 WO PCT/CN2022/074152 patent/WO2022170985A1/zh active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102629272A (zh) * | 2012-03-14 | 2012-08-08 | 北京邮电大学 | 一种基于聚类的考试系统试题库优化方法 |
JP2014071161A (ja) * | 2012-09-27 | 2014-04-21 | Dainippon Printing Co Ltd | 学習システム、プログラム及び学習通信システム |
CN109147446A (zh) * | 2018-08-20 | 2019-01-04 | 国政通科技有限公司 | 电子考试系统 |
CN110413728A (zh) * | 2019-06-20 | 2019-11-05 | 平安科技(深圳)有限公司 | 练习题推荐方法、装置、设备和存储介质 |
CN110390019A (zh) * | 2019-07-26 | 2019-10-29 | 江苏曲速教育科技有限公司 | 一种试题的聚类方法、去重方法及系统 |
CN110930274A (zh) * | 2019-12-02 | 2020-03-27 | 中山大学 | 一种基于认知诊断的实践效果评估及学习路径推荐系统和方法 |
CN112035605A (zh) * | 2020-08-04 | 2020-12-04 | 广州视源电子科技股份有限公司 | 题目推荐方法、装置、设备及存储介质 |
CN112256869A (zh) * | 2020-10-12 | 2021-01-22 | 浙江大学 | 一种基于题意文本的同知识点试题分组系统和方法 |
Non-Patent Citations (1)
Title |
---|
LI, JINHONG: "Application Research on Clustering Algorithms in Exam Question Storage", PROCEEDINGS OF THE 2013 ANNUAL CONFERENCE OF THE NATIONAL METALLURGICAL AUTOMATION INFORMATION NETWORK, 21 May 2013 (2013-05-21), XP055958713, [retrieved on 20220907] * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116384393A (zh) * | 2023-04-27 | 2023-07-04 | 圣麦克思智能科技(江苏)有限公司 | 一种基于自然语言处理的运维数据处理系统及方法 |
CN116384393B (zh) * | 2023-04-27 | 2023-11-21 | 圣麦克思智能科技(江苏)有限公司 | 一种基于自然语言处理的运维数据处理系统及方法 |
CN116663537A (zh) * | 2023-07-26 | 2023-08-29 | 中信联合云科技有限责任公司 | 基于大数据分析的选题策划信息处理方法及系统 |
CN116663537B (zh) * | 2023-07-26 | 2023-11-03 | 中信联合云科技有限责任公司 | 基于大数据分析的选题策划信息处理方法及系统 |
CN117557426A (zh) * | 2023-12-08 | 2024-02-13 | 广州市小马知学技术有限公司 | 基于智能题库的作业数据反馈方法及学习评估系统 |
CN117557426B (zh) * | 2023-12-08 | 2024-05-07 | 广州市小马知学技术有限公司 | 基于智能题库的作业数据反馈方法及学习评估系统 |
CN118132858A (zh) * | 2024-05-08 | 2024-06-04 | 江西财经大学 | 一种基于ai的个性化学习推荐方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN114913729A (zh) | 2022-08-16 |
CN114913729B (zh) | 2023-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022170985A1 (zh) | 选题方法、装置、计算机设备和存储介质 | |
CN107230174B (zh) | 一种基于网络的在线互动学习系统和方法 | |
CN106528656A (zh) | 一种基于学员历史和实时学习状态参量实现课程推荐的方法和系统 | |
Sorour et al. | Comment data mining to estimate student performance considering consecutive lessons | |
CN105512214A (zh) | 一种知识数据库、构建方法及学情诊断系统 | |
WO2021218028A1 (zh) | 基于人工智能的面试内容精炼方法、装置、设备及介质 | |
CN113886567A (zh) | 一种基于知识图谱的教学方法及系统 | |
CN109754349B (zh) | 一种在线教育智能师生匹配系统 | |
Laukkanen | Comparative causal mapping and CMAP3 software in qualitative studies | |
US20230101354A1 (en) | Method, system, and storage medium for intelligent analysis of student's actual learning based on exam paper | |
CN117150151A (zh) | 一种基于大语言模型的错题分析及试题推荐系统和方法 | |
Wibawa et al. | Learning analytic and educational data mining for learning science and technology | |
CN113196318A (zh) | 一种理科教学系统及其使用方法、计算机可读存储介质 | |
Mühling | Investigating knowledge structures in computer science education | |
Valckx et al. | Measuring and exploring the structure of teachers’ educational beliefs | |
Xie et al. | Virtual reality primary school mathematics teaching system based on GIS data fusion | |
CN109800880B (zh) | 基于动态学习风格信息的自适应学习特征提取系统及应用 | |
Yang et al. | A Learning Preference Analysis Method Based on a Novel Developed Teaching Skill Training App for Mobile Learning | |
Ma et al. | Format-aware item response theory for predicting vocabulary proficiency | |
CN113487928A (zh) | 一种精准教学评价与诊断方法及系统 | |
CN113761145A (zh) | 语言模型训练方法、语言处理方法和电子设备 | |
Xia et al. | Assessing concept mapping competence using item expansion‐based diagnostic classification analysis | |
Lee | Comparative Study on Predicting Student Grades using Five Machine Learning Algorithms | |
Hagiwara et al. | DEVELOPMENT OF A SYSTEM TO SUPPORT GRASPING FOSTERING STATUS OF STUDENTS’QUALITIES AND ABILITIES FROM TEXT DATA OF REFLECTIONS ON LEARNING | |
Huang et al. | An Interpretation of Cross-cultural English Translation Teaching Strategies under Clustering Algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22752146 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22752146 Country of ref document: EP Kind code of ref document: A1 |