CN110705025A - Large-scale examination data analog simulation method, device and storage medium - Google Patents

Large-scale examination data analog simulation method, device and storage medium Download PDF

Info

Publication number
CN110705025A
CN110705025A CN201910830105.2A CN201910830105A CN110705025A CN 110705025 A CN110705025 A CN 110705025A CN 201910830105 A CN201910830105 A CN 201910830105A CN 110705025 A CN110705025 A CN 110705025A
Authority
CN
China
Prior art keywords
data
simulation
subject
basic condition
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910830105.2A
Other languages
Chinese (zh)
Inventor
柯瑞强
李宏强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Crystal Ball Education Information Technology Co Ltd
Original Assignee
Crystal Ball Education Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Crystal Ball Education Information Technology Co Ltd filed Critical Crystal Ball Education Information Technology Co Ltd
Priority to CN201910830105.2A priority Critical patent/CN110705025A/en
Publication of CN110705025A publication Critical patent/CN110705025A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B7/00Electrically-operated teaching apparatus or devices working with questions and answers
    • G09B7/02Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a large-scale examination data simulation method, a device and a storage medium, wherein the method comprises the following steps: inputting basic condition parameters, wherein the basic condition parameters comprise examinee information, subject information and historical examination data; according to the basic condition parameters, calculating and obtaining the result data of each individual subject by utilizing a normally distributed probability density function; and performing optimization matching and iterative processing on the result data of each individual department by using a least square principle to obtain result data of simulation. The invention is suitable for the relevant research and analysis of large-scale examinations, and can effectively improve the quality of simulation data, thereby improving the level and quality of examinations and tests.

Description

Large-scale examination data analog simulation method, device and storage medium
Technical Field
The invention relates to the technical field of analog simulation, in particular to a large-scale examination data analog simulation method, a large-scale examination data analog simulation device and a large-scale examination data analog simulation storage medium.
Background
Conventionally, in large-scale examinations and evaluations, there are qualification examinations and graded examinations for different purposes, and they are used for talent evaluation and talent selection. Basically, a number of related techniques based on CTT (classical measurement theory) are used in conventional examinations and evaluations. In recent years, with the development of theoretical research, IRT (project response theory) has been increasingly used, but large examinations are based on CTT theory as a whole.
With the development of the reform of the middle and high-school entrance examination, the requirement on the large-scale examination is higher and higher, and two basic principles of the traditional large-scale examination are unchanged all the time: quality and fairness, and on the basis of the two principles, the requirement on efficiency is gradually increased to the height of the principle. In the middle and high-school reform process in recent years, research in related fields is more and more, more and more problems are gradually highlighted in the aspects of traditional examination means, content and standard specifications, including assigning modes and the like, and the requirements of public society and national policies on education fairness and quality standard improvement cannot be met.
For large-scale examinations, especially those with great influence such as high-rise examinations, the relevant research work has great significance and value, and because the cost of the large-scale examinations and the evaluation is very high, data simulation is very necessary and important when relevant research is carried out. Referring to the middle and high-school reform, especially the high-school reform, all over the country in recent years, the relevant standard specifications of the high-school reform of the provinces of several recent batches, including the relevant standard specifications of the Zhejiang and Shanghai of the first batch of reform, the Beijing, Shandong, Tianjin, Hainan and the like of the second batch, especially in the aspect of assigning modes, the mode difference is very large compared with the traditional mode.
In the related research process, methods generally adopted for qualitative research include methods routes such as literature research methods, experience summary methods and comprehensive evaluation methods, and for quantitative research, a simulation research method is generally adopted, and the influence degree of related factors on scoring modes is researched and corresponding solutions are proposed by utilizing big data analysis and simulation technologies.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a large-scale examination data simulation method, device and storage medium, which are suitable for large-scale examination related research and analysis, and can effectively improve the quality of simulation data, thereby improving the level and quality of examination and test.
An embodiment of the present invention provides a large-scale examination data simulation method, including:
inputting basic condition parameters, wherein the basic condition parameters comprise examinee information, subject information and historical examination data;
according to the basic condition parameters, calculating and obtaining the result data of each individual subject by utilizing a normally distributed probability density function;
and performing optimization matching and iterative processing on the result data of each individual department by using a least square principle to obtain result data of simulation.
The examinee information comprises the number of examinees, the examinee number and the information of the subject selection; the subject information comprises basic information of each individual test paper, a score range and difficulty parameters.
Wherein, according to the basic condition parameters, the probability density function of normal distribution is utilized to measure and calculate the result data of each individual subject, and the method comprises the following steps:
calculating theoretical distribution probability of each score value by utilizing a probability density function of normal distribution according to the basic condition parameters, and multiplying the theoretical distribution probability by the number of examinees to obtain the number distribution of each score, namely the performance number distribution;
and assigning the score number distribution to a score array equal to the number of examinees to obtain score data of each single subject.
The method comprises the following steps of performing optimization matching and iterative processing on achievement data of each individual department by using a least square principle to obtain result data of simulation, wherein the method comprises the following steps:
matching score data of a single department with examinee information, and fixing a matching relation;
calculating the correlation coefficient of Pearson product moments among all disciplines by adopting the following formula:
Figure BDA0002189932450000021
or
The sum of the squares of the errors of all correlation coefficients is calculated using the principle of least squares, using sigma (Y)i-Yj)2]The minimum is used as an optimal criterion, wherein Yi is equivalent to a correlation coefficient of Pearson product moments R1, and Yj is equivalent to a configuration correlation coefficient set by R;
and repeating iteration until the result data of the simulation is obtained.
An embodiment of the present invention further provides a large-scale examination data simulation apparatus, including:
the basic condition parameter input unit is used for inputting basic condition parameters, wherein the basic condition parameters comprise examinee information, subject information and historical examination data;
the achievement data calculation unit of each individual subject is used for measuring and calculating achievement data of each individual subject by utilizing a probability density function of normal distribution according to the basic condition parameters;
and the simulation unit is used for performing optimization matching and iterative processing on the result data of each individual department by using the principle of the least square method to obtain simulation result data.
The examinee information comprises the number of examinees, the examinee number and the information of the subject selection; the subject information comprises basic information of each individual test paper, a score range and difficulty parameters.
An embodiment of the present invention further provides a big data statistics sampling server, including:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a large test data simulation methodology as described above.
An embodiment of the present invention further provides a computer-readable storage medium, where the storage medium includes a stored computer program, and when the computer program runs, the apparatus on which the storage medium is located is controlled to execute the large test data simulation method as described above.
The embodiment of the invention has the following beneficial effects:
according to the teaching of the embodiment, the invention is suitable for the research and analysis related to large-scale examinations, and can effectively improve the quality of simulation data, thereby improving the level and quality of examinations and tests.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a simulation method for large-scale test data according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a large-scale examination data simulation apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be understood that the step numbers used herein are for convenience of description only and are not intended as limitations on the order in which the steps are performed.
It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The terms "comprises" and "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The term "and/or" refers to and includes any and all possible combinations of one or more of the associated listed items.
Please refer to fig. 1.
An embodiment of the present invention provides a large-scale examination data simulation method, including:
s100, inputting basic condition parameters, wherein the basic condition parameters comprise examinee information, subject information and historical examination data.
The examinee information comprises the number of examinees, the examinee number and the information of the subject selection; the subject information comprises basic information of each individual test paper, a score range and difficulty parameters.
In particular embodiments, large multi-subject or multi-item examinations and assessments; the data simulation method for a single subject or a single item can be multi-source, namely, the data simulation method can be generated based on historical data (detail or statistical data) or without historical data (detail or statistical data), and can be generated based on standard normal distribution function simulation or other distribution functions (such as beta distribution) or methods when no historical data exists; for the generated multidimensional data (each single subject or single item data set), the data attribution (sequence) of each single-dimensional data set is changed through a specific algorithm, so that the data set with original multidimensional data without correlation (or correlation far from theoretical or actual expectation) is more in accordance with practical requirements and expectations.
And S200, calculating and obtaining the achievement data of each individual subject by utilizing a probability density function of normal distribution according to the basic condition parameters.
Wherein, according to the basic condition parameters, the probability density function of normal distribution is utilized to measure and calculate the result data of each individual subject, and the method comprises the following steps:
calculating theoretical distribution probability of each score value by utilizing a probability density function of normal distribution according to the basic condition parameters, and multiplying the theoretical distribution probability by the number of examinees to obtain the number distribution of each score, namely the performance number distribution;
and assigning the score number distribution to a score array equal to the number of examinees to obtain score data of each single subject.
In a specific embodiment, the achievement data of each individual subject is simulated according to the configuration parameters, the subject configuration parameters and the historical data, and the achievement data of each individual subject can be simulated through the historical data or can be simulated according to a pure theory (generally, normal distribution is used).
For the generated multidimensional data (each single subject or single item data group), in terms of algorithm implementation, continuous and repeated iteration is performed by configuring a relatively constant correlation coefficient (Pearson product moment correlation coefficient) and utilizing the principle of a least square method (Gauss-Markov theorem) to obtain an optimal calculation result as a final data result of simulation.
And S300, performing optimization matching and iterative processing on the result data of each individual department by using a least square principle to obtain result data of simulation.
The method comprises the following steps of performing optimization matching and iterative processing on achievement data of each individual department by using a least square principle to obtain result data of simulation, wherein the method comprises the following steps:
matching score data of a single department with examinee information, and fixing a matching relation;
calculating the correlation coefficient of Pearson product moments among all disciplines by adopting the following formula:
Figure BDA0002189932450000051
or
Figure BDA0002189932450000052
The sum of the squares of the errors of all correlation coefficients is calculated using the principle of least squares, using sigma (Y)i-Yj)2]The minimum is used as an optimal criterion, wherein Yi is equivalent to a correlation coefficient of Pearson product moments R1, and Yj is equivalent to a configuration correlation coefficient set by R;
and repeating iteration until the result data of the simulation is obtained.
In a specific embodiment, when the least square method is used for iterative processing, the exchanged data is not completely random, but is optimized by sequencing in combination with the definition of the correlation coefficient, so that iteration of a completely random data exchange mode is avoided, and the efficiency of iterative processing is greatly improved.
For generated multidisciplinary multigroup simulation data, the multigroup single-disciplinary result data are matched to the examinee data by continuous iteration by means of the least square principle (Gauss-Markov theorem) according to the configured correlation coefficient (Pearson product moment correlation coefficient), in the iteration process, the main operation mode is the sequencing change of the data, after the iteration is carried out to the optimal result (the error is minimum), the formed data containing the examinee and each discipline result are the final simulation data.
And for the final analog simulation data, performing related analysis processing on the data according to the range and the content of related quantitative research to generate a final target result.
According to the teaching of the embodiment, the embodiment of the invention is mainly used for data simulation of large-scale examinations and can be suitable for research and analysis related to large-scale examinations (such as high and middle schools), including various eligibility examinations, graded examinations, design of rule standards for talent selection evaluation, design of assigning modes and related research fields. The method is suitable for large-scale examination and evaluation of multidisciplinary (multi-project) types. The method can be combined with modern measurement technology theories such as CTT (classical measurement theory), IRT (project reaction theory) and the like, and repeated iteration processing is carried out on one-dimensional and multi-dimensional data generated based on Gaussian distribution (normal distribution) or other methods, combined with correlation coefficients (Pearson product moment correlation coefficients) and based on the method theory of least square method (Gauss-Markov theorem), so that simulation data which more accord with theory and reality are simulated, the quality of the simulation data is effectively improved, and the level and the quality of examination and test are further improved. The method is easier to realize by using modern IT technology, especially big data and cloud computing and related technologies in a low-cost and high-efficiency mode, and effectively improves means and technical level for solving several core requirements (quality, fairness and efficiency) in large-scale examination.
The technical solutions in the embodiments of the present invention will be further clearly and completely described below with reference to practical examples.
Assuming a college entrance examination in a certain province, the influence of the number of people (or other factors) in the research department on the standard specification of the assigning mode needs to be quantitatively analyzed through data simulation, one or more groups of examinee score data need to be simulated according to configured initial conditions and parameters, and the total score is calculated and converted.
The calculation of the total score and the conversion total score does not have any technical problem, and the data simulation of any single subject does not have any problem theoretically, so that the key of the problem is the correlation problem among the results of the subjects simulated by final simulation, if the correlation problem is not considered, the correlation coefficient among the subjects of the finally simulated result data is certainly close to 0, which is not in accordance with theory and reality, or an algorithm is adopted for processing, but whether the final result data is optimal? It is basically difficult to guarantee that this critical problem is solved by the method of the invention, which is explained step by way of example.
First, it is assumed that the historical data is insufficient, or the reliability of simulation based on the historical data is far from sufficient due to a change in origin and policy, and therefore, it is assumed that simulation is performed in accordance with all the theories.
Configuration of basic condition parameters:
the number of examinees is 10 ten thousand, and the examinee numbers are assumed to be C000001 to C100000; three public basic disciplines: chinese (YW), math (SX), English (YY); the subjects of six to three include: physical (WL), chemical (HX), biological (SW), historical (LS), geographic (DL), political (ZZ);
subject test paper difficulty setting: assuming that a normal distribution is used to perform data simulation of a single subject, the difficulty and the degree of distinction of the test paper are controlled by an average score and a standard deviation, and the average score of each subject is μ _ yw, μ _ SX, μ _ YY, μ _ WL, μ _ HX, μ _ SW, μ _ LS, μ _ DL, and μ _ ZZ; the standard deviation is: σ _ yw, σ _ SX, σ _ YY, σ _ WL, σ _ HX, σ _ SW, σ _ LS, σ _ DL, σ _ ZZ; as shown in the following table:
subject of discipline Average score Standard deviation of Lower limit of score range Score Range Up
Chinese language μ_YW σ_YW 0 100
Mathematics, and μ_SX σ_SX 0 100
english language μ_YY σ_YY 0 100
Physics of physics μ_WL σ_WL 0 100
Chemistry μ_HX σ_HX 0 100
Biological organisms μ_SW σ_SW 0 100
History of μ_LS σ_LS 0 100
Geography μ_DL σ_DL 0 100
Politics μ_ZZ σ_ZZ 0 100
Setting of correlation coefficient:
Figure BDA0002189932450000061
generating subject performance data according to the parameters:
according to the basic configuration, there are 10 thousands of examinees, and the result data of each subject of 10 thousands of examinees are X _ YW [ ], X _ SX [ ], X _ YY [ ], X _ WL [ ], X _ HX [ ], X _ SW [ ], X _ LS [ ], X _ DL [ ], X _ ZZ [ ], each array contains 10 thousands of result data, the examinee list data is KS [ ] array, and 10 thousands of examination numbers of C000001 to C100000 are stored;
probability density function according to normal distribution:
Figure BDA0002189932450000071
wherein, the theoretical distribution probability of each point value can be calculated by substituting sigma into the standard deviation and mu into the average point, and then the number of people is multiplied by 10 ten thousands of sample numbers, so that the number of people distributed in each point can be obtained, wherein, the average point 65 and the standard deviation 6.5 are used for simulation calculation, and the result of the number distribution of the achievements is as follows (the number which is not listed is 0):
achievement of Number of Achievement of Number of Achievement of Number of Achievement of Number of Achievement of Number of
0 0 48 201 61 5079 74 2353 87 20
0 49 297 62 5516 75 1879 88 12
37 1 50 428 63 5854 76 1466 89 7
38 1 51 603 64 6065 77 1117 90 4
39 2 52 831 65 6138 78 831 91 2
40 4 53 1117 66 6065 79 603 92 1
41 7 54 1466 67 5854 80 428 93 1
42 12 55 1879 68 5517 81 296 94 0
43 20 56 2353 69 5079 82 201 0
44 33 57 2878 70 4566 83 133 100 0
45 54 58 3437 71 4008 84 86
46 86 59 4008 72 3437 85 54
47 133 60 4566 73 2878 86 33
Score quantity distribution data table calculated according to normal distribution
The distribution curve is standard normal distribution; note that there may be precision errors caused by mathematical calculation in the calculation process, and the final number has a small error of 10 ten thousand in total, and here, error correction is needed, and the total number is made up to 10 ten thousand; after the result number distribution data is calculated, the result data can be given to 10 ten thousand result arrays according to the number;
according to the method and the steps, the achievement data of all disciplines are respectively calculated and simulated, at this time, each single discipline simulation data comes out, but the matching with the examinee is not carried out yet, generally, a random matching mode is adopted, if the uniform matching is carried out after the ordering, the correlation coefficient of each discipline is 1 or-1, if the random matching is carried out, the correlation coefficient is approximately equal to 0, and obviously, the theory and the reality are not met; the next step is the core processing stage of the method;
data optimization matching and iteration by using least square method principle
The mathematical proof of the least square principle can be proved by the Gauss-Markov theorem, and the description is omitted; the specific processing method of this step is described here by the actual data calculation and iterative processing flow;
firstly, matching data of a subject with data of an examinee, and fixing the matching relation between the subject achievement data and the examinee, wherein the assumed language is a Chinese language, the result simulation data can be generated as follows:
Figure BDA0002189932450000072
the correlation coefficients between all disciplines were calculated:
Figure BDA0002189932450000073
Figure BDA0002189932450000081
here, the correlation coefficient (pearson product-moment correlation coefficient) is calculated as follows:
Figure BDA0002189932450000082
or
Figure BDA0002189932450000083
After all correlation coefficient values R1 are calculated, the sum of squares of errors of all correlation coefficients is calculated using the principle of least squares, and Σ (Y) is usedi-Uj)2]Minimum as the optimal criterion, where Yi is equivalent to R1 (calculated correlation coefficient) and Yj is equivalent to R (configured correlation coefficient);
then, through repeated iteration, firstly, the score sequence of other disciplines of the examinee except the Chinese discipline (the fixed subject) is modified, then the correlation coefficients of all the disciplines are calculated, and according to the optimal criterion, repeated iteration is carried out, and finally optimized data is obtained, namely the final result data of simulation;
it should be noted that there is an extremely important issue to consider when performing the iteration: efficiency; if the algorithm is not optimized, the iteration efficiency is very low and very poor, so that a certain algorithm design is pertinently adopted to optimize the whole iteration process, and the low efficiency of conventional random processing is avoided;
the optimization algorithm of the specific iterative process is as follows:
(1) because a plurality of groups of correlation coefficients exist, the correlation coefficient with the largest error is found out firstly for optimization processing;
(2) after the error of the correlation coefficient of a certain group of data is reduced to a certain degree, when the error is not maximum, the correlation coefficients of another group or two disciplines are adjusted until the errors of all the correlation coefficients are kept at a relatively balanced level;
(3) in order to carry out iteration most efficiently, when subject result data is exchanged, the adjustment is carried out at a destination after the adjustment direction is determined, and the adjustment is avoided by adopting a completely random mode, and the specific method comprises the following steps: one group of data is sorted and fixed (assumed to be in ascending order here), the other group of data is also sorted in ascending order and marked in the order, and when data exchange is carried out and the correlation coefficient needs to be improved, the high-score data arranged in the front is exchanged with the low-score data arranged in the back; when the correlation coefficient needs to be reduced, the operation is performed vice versa, so that the targeted operation is pointed, and the efficiency of iterative processing can be greatly improved;
4. after the simulation of the performance data is completed, subsequent data analysis processing, such as influence of the selection data and the like, can be performed according to research needs, which are not within the scope of the method and are not described here.
Please refer to fig. 2,.
An embodiment of the present invention further provides a large-scale examination data simulation apparatus, including:
the basic condition parameter input unit 10 is configured to input basic condition parameters, where the basic condition parameters include examinee information, subject information, and historical examination data.
The examinee information comprises the number of examinees, the examinee number and the information of the subject selection; the subject information comprises basic information of each individual test paper, a score range and difficulty parameters.
In particular embodiments, large multi-subject or multi-item examinations and assessments; the data simulation method for a single subject or a single item can be multi-source, namely, the data simulation method can be generated based on historical data (detail or statistical data) or without historical data (detail or statistical data), and can be generated based on standard normal distribution function simulation or other distribution functions (such as beta distribution) or methods when no historical data exists; for the generated multidimensional data (each single subject or single item data set), the data attribution (sequence) of each single-dimensional data set is changed through a specific algorithm, so that the data set with original multidimensional data without correlation (or correlation far from theoretical or actual expectation) is more in accordance with practical requirements and expectations.
And the achievement data calculation unit 20 of each individual subject is used for measuring and calculating achievement data of each individual subject by utilizing a probability density function of normal distribution according to the basic condition parameters.
Wherein, according to the basic condition parameters, the probability density function of normal distribution is utilized to measure and calculate the result data of each individual subject, and the method comprises the following steps:
calculating theoretical distribution probability of each score value by utilizing a probability density function of normal distribution according to the basic condition parameters, and multiplying the theoretical distribution probability by the number of examinees to obtain the number distribution of each score, namely the performance number distribution;
and assigning the score number distribution to a score array equal to the number of examinees to obtain score data of each single subject.
In a specific embodiment, the achievement data of each individual subject is simulated according to the configuration parameters, the subject configuration parameters and the historical data, and the achievement data of each individual subject can be simulated through the historical data or can be simulated according to a pure theory (generally, normal distribution is used).
For the generated multidimensional data (each single subject or single item data group), in terms of algorithm implementation, continuous and repeated iteration is performed by configuring a relatively constant correlation coefficient (Pearson product moment correlation coefficient) and utilizing the principle of a least square method (Gauss-Markov theorem) to obtain an optimal calculation result as a final data result of simulation.
And the simulation unit 30 is used for performing optimization matching and iterative processing on the result data of each individual department by using the principle of the least square method to obtain simulation result data.
The method comprises the following steps of performing optimization matching and iterative processing on achievement data of each individual department by using a least square principle to obtain result data of simulation, wherein the method comprises the following steps:
matching score data of a single department with examinee information, and fixing a matching relation;
calculating the correlation coefficient of Pearson product moments among all disciplines by adopting the following formula:
Figure BDA0002189932450000091
or
Figure BDA0002189932450000092
The sum of the squares of the errors of all correlation coefficients is calculated using the principle of least squares, using sigma (Y)i-Yj)2]The minimum is used as an optimal criterion, wherein Yi is equivalent to a correlation coefficient of Pearson product moments R1, and Yj is equivalent to a configuration correlation coefficient set by R;
and repeating iteration until the result data of the simulation is obtained.
In a specific embodiment, when the least square method is used for iterative processing, the exchanged data is not completely random, but is optimized by sequencing in combination with the definition of the correlation coefficient, so that iteration of a completely random data exchange mode is avoided, and the efficiency of iterative processing is greatly improved.
For generated multidisciplinary multigroup simulation data, the multigroup single-disciplinary result data are matched to the examinee data by continuous iteration by means of the least square principle (Gauss-Markov theorem) according to the configured correlation coefficient (Pearson product moment correlation coefficient), in the iteration process, the main operation mode is the sequencing change of the data, after the iteration is carried out to the optimal result (the error is minimum), the formed data containing the examinee and each discipline result are the final simulation data.
And for the final analog simulation data, performing related analysis processing on the data according to the range and the content of related quantitative research to generate a final target result.
According to the teaching of the embodiment, the embodiment of the invention is mainly used for data simulation of large-scale examinations and can be suitable for research and analysis related to large-scale examinations (such as high and middle schools), including various eligibility examinations, graded examinations, design of rule standards for talent selection evaluation, design of assigning modes and related research fields. The method is suitable for large-scale examination and evaluation of multidisciplinary (multi-project) types. The method can be combined with modern measurement technology theories such as CTT (classical measurement theory), IRT (project reaction theory) and the like, and repeated iteration processing is carried out on one-dimensional and multi-dimensional data generated based on Gaussian distribution (normal distribution) or other methods, combined with correlation coefficients (Pearson product moment correlation coefficients) and based on the method theory of least square method (Gauss-Markov theorem), so that simulation data which more accord with theory and reality are simulated, the quality of the simulation data is effectively improved, and the level and the quality of examination and test are further improved. The method is easier to realize by using modern IT technology, especially big data and cloud computing and related technologies in a low-cost and high-efficiency mode, and effectively improves means and technical level for solving several core requirements (quality, fairness and efficiency) in large-scale examination.
An embodiment of the present invention further provides a big data statistics sampling server, including:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a large test data simulation methodology as described above.
An embodiment of the present invention further provides a computer-readable storage medium, where the storage medium includes a stored computer program, and when the computer program runs, the apparatus on which the storage medium is located is controlled to execute the large test data simulation method as described above.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (8)

1. A large-scale examination data simulation method is characterized by comprising the following steps:
inputting basic condition parameters, wherein the basic condition parameters comprise examinee information, subject information and historical examination data;
according to the basic condition parameters, calculating and obtaining the result data of each individual subject by utilizing a normally distributed probability density function;
and performing optimization matching and iterative processing on the result data of each individual department by using a least square principle to obtain result data of simulation.
2. The large-scale examination data simulation method according to claim 1, wherein the examinee information comprises the number of examinees, the number of examinees and the information of the subjects selected; the subject information comprises basic information of each individual test paper, a score range and difficulty parameters.
3. The large-scale examination data simulation method according to claim 1, wherein the calculating of the performance data of each individual subject according to the basic condition parameters by using a normally distributed probability density function comprises:
calculating theoretical distribution probability of each score value by utilizing a probability density function of normal distribution according to the basic condition parameters, and multiplying the theoretical distribution probability by the number of examinees to obtain the number distribution of each score, namely the performance number distribution;
and assigning the score number distribution to a score array equal to the number of examinees to obtain score data of each single subject.
4. The large-scale examination data simulation method according to claim 1, wherein the obtaining of simulation result data by performing optimization matching and iterative processing on the performance data of each individual subject using the least square principle comprises:
matching score data of a single department with examinee information, and fixing a matching relation;
calculating the correlation coefficient of Pearson product moments among all disciplines by adopting the following formula:
Figure FDA0002189932440000011
calculating the sum of squares of the errors of all correlation coefficients using the principle of least squares, using sigma (Yi-Yj)2The minimum is used as an optimal criterion, wherein Yi is equivalent to a correlation coefficient of Pearson product moments R1, and Yj is equivalent to a configuration correlation coefficient set by R;
and repeating iteration until the result data of the simulation is obtained.
5. A large-scale examination data simulation apparatus, comprising:
the basic condition parameter input unit is used for inputting basic condition parameters, wherein the basic condition parameters comprise examinee information, subject information and historical examination data;
the achievement data calculation unit of each individual subject is used for measuring and calculating achievement data of each individual subject by utilizing a probability density function of normal distribution according to the basic condition parameters;
and the simulation unit is used for performing optimization matching and iterative processing on the result data of each individual department by using the principle of the least square method to obtain simulation result data.
6. The large-scale examination data simulation device according to claim 5, wherein the examinee information comprises the number of examinees, the numbers of the examinees and the information of the subjects selected; the subject information comprises basic information of each individual test paper, a score range and difficulty parameters.
7. A big data statistics sampling server, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a large test data simulation methodology according to any one of claims 1 to 4.
8. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls a device on which the storage medium is located to perform a large test data simulation method according to any one of claims 1 to 4.
CN201910830105.2A 2019-09-03 2019-09-03 Large-scale examination data analog simulation method, device and storage medium Pending CN110705025A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910830105.2A CN110705025A (en) 2019-09-03 2019-09-03 Large-scale examination data analog simulation method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910830105.2A CN110705025A (en) 2019-09-03 2019-09-03 Large-scale examination data analog simulation method, device and storage medium

Publications (1)

Publication Number Publication Date
CN110705025A true CN110705025A (en) 2020-01-17

Family

ID=69193533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910830105.2A Pending CN110705025A (en) 2019-09-03 2019-09-03 Large-scale examination data analog simulation method, device and storage medium

Country Status (1)

Country Link
CN (1) CN110705025A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191002A (en) * 2021-05-04 2021-07-30 河南环球优路教育科技有限公司 Examination simulation method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191002A (en) * 2021-05-04 2021-07-30 河南环球优路教育科技有限公司 Examination simulation method and system

Similar Documents

Publication Publication Date Title
Edwards A Markov chain Monte Carlo approach to confirmatory item factor analysis
Dekel et al. POTENT reconstruction from mark III velocities
Moed et al. New bibliometric tools for the assessment of national research performance: Database description, overview of indicators and first applications
Lawson Design and Analysis of Experiments with R
Hodges Some algebra and geometry for hierarchical models, applied to diagnostics
Schumacker Learning statistics using R
Shavelson et al. Steps in designing an indicator system
Bilicki et al. Bright galaxy sample in the Kilo-Degree Survey Data Release 4-Selection, photometric redshifts, and physical properties
CN104731777A (en) Translation evaluation method and device
CN111209316A (en) Information literacy data mining method and device, electronic equipment and storage medium
Oort et al. Maximum likelihood estimation in meta‐analytic structural equation modeling
CN114913729A (en) Question selection method and device, computer equipment and storage medium
Dvornik et al. KiDS-1000: Combined halo-model cosmology constraints from galaxy abundance, galaxy clustering, and galaxy-galaxy lensing
Lepers et al. Inference with selection, varying population size, and evolving population structure: application of ABC to a forward–backward coalescent process with interactions
CN110705025A (en) Large-scale examination data analog simulation method, device and storage medium
CN111325486A (en) Teacher quality intelligent evaluation method based on analytic hierarchy process and storage medium
Merlin et al. Euclid preparation-XXV. The Euclid Morphology Challenge: Towards model-fitting photometry for billions of galaxies
CN110942830A (en) Background management system for psychological evaluation
Ziari et al. Development of statistical discriminant mathematical programming model via resampling estimation techniques
Yuan et al. Online Calibration in Multidimensional Computerized Adaptive Testing with Polytomously Scored Items
Bétrisey et al. Probing stellar cores from inversions of frequency separation ratios
CN109299861A (en) On-satellite scheduling method of agile satellite
CN107862126A (en) A kind of system reliability estimation method under the conditions of component-level information diversity
CN113191002A (en) Examination simulation method and system
Berry et al. Goodman and Kruskal's tau-b statistic: A FORTRAN-77 subroutine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination