CN111274234B - Machine scoring system and method based on data analysis - Google Patents

Machine scoring system and method based on data analysis Download PDF

Info

Publication number
CN111274234B
CN111274234B CN202010058365.5A CN202010058365A CN111274234B CN 111274234 B CN111274234 B CN 111274234B CN 202010058365 A CN202010058365 A CN 202010058365A CN 111274234 B CN111274234 B CN 111274234B
Authority
CN
China
Prior art keywords
data
analysis
result
training
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010058365.5A
Other languages
Chinese (zh)
Other versions
CN111274234A (en
Inventor
陈红光
张福秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wenzhou Zhongding Network Technology Co ltd
Original Assignee
Wenzhou Zhongding Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wenzhou Zhongding Network Technology Co ltd filed Critical Wenzhou Zhongding Network Technology Co ltd
Priority to CN202010058365.5A priority Critical patent/CN111274234B/en
Publication of CN111274234A publication Critical patent/CN111274234A/en
Application granted granted Critical
Publication of CN111274234B publication Critical patent/CN111274234B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Optimization (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Quality & Reliability (AREA)
  • Algebra (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of data analysis, and particularly relates to a machine scoring system based on data analysis, which comprises: the data entry device is used for entering score data of students; the data entry devices are distributed to different teachers, and the teachers enter score information of students through the exclusive data entry devices respectively; and the data analysis device is used for carrying out data analysis on the recorded result information to obtain the comprehensive results of the students and encrypting the recorded result information. The data security is high, the privacy of the data can be guaranteed, the privacy of students and the data security are protected, meanwhile, the data analysis result can more accurately analyze the score of the students, and the score of the students can be more objectively displayed.

Description

Machine scoring system and method based on data analysis
Technical Field
The invention belongs to the technical field of data analysis, and particularly relates to a machine scoring system and method based on data analysis.
Background
The data analysis means that a large amount of collected data is analyzed by using a proper statistical analysis method, and the collected data is summarized, understood and digested so as to maximally develop the function of the data and play the role of the data. Data analysis is the process of studying and summarizing data in detail to extract useful information and to form conclusions.
The mathematical basis for data analysis was established in the early 20 th century, but the advent of computers did not make practical operation possible and enabled the spread of data analysis. Data analysis is the product of a combination of mathematics and computer science.
The international organization for standardization (ISO) definition of computer system security is: technical and administrative security protections established and employed for data processing systems protect computer hardware, software, and data from being damaged, altered, and revealed by casual and malicious causes. The security of a computer network can thus be understood as: by adopting various technologies and management measures, the network system can normally operate, thereby ensuring the availability, integrity and confidentiality of network data. Therefore, the purpose of establishing network security measures is to ensure that data transmitted and exchanged over the network is not subject to increase, modification, loss, leakage, and the like.
Information security or data security has two opposite meanings: firstly, the safety of data is mainly characterized in that a modern cryptographic algorithm is adopted to carry out active protection on the data, such as data confidentiality, data integrity, bidirectional identity authentication and the like, and secondly, the safety of data protection is mainly characterized in that a modern information storage means is adopted to carry out active protection on the data, such as means of disk arrays, data backup, remote disaster recovery and the like are adopted to ensure the safety of the data, the data safety is an active contained measure, the safety of the data must be based on a reliable cryptographic algorithm and a safety system, and the two types of symmetric algorithms and public key cryptographic systems are mainly adopted.
The safety of data processing refers to how to effectively prevent data from being damaged or lost due to hardware failure, power failure, crash, human misoperation, program defect, virus or hacker and the like in the processes of inputting, processing, counting or printing, and the data is leaked due to the fact that some sensitive or confidential data may not be read by qualified personnel or operators.
Disclosure of Invention
The invention mainly aims to provide a machine scoring system and a machine scoring method based on data analysis, which have high data safety, can ensure the privacy of data and protect the privacy of students and the data safety, and meanwhile, the data analysis result can more accurately analyze the score of the students and more objectively show the score of the students.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a machine scoring system based on data analysis, comprising: the data entry device is used for entering score data of students; the data entry devices are distributed to different teachers, and the teachers enter score information of students through the exclusive data entry devices respectively; the data analysis device is used for carrying out data analysis on the recorded score information to obtain the comprehensive score of the student and encrypting the recorded score information; the data analysis device is characterized in that the data analysis device performs data analysis on the recorded result information to obtain the comprehensive results of the students, and the method for encrypting the recorded result information executes the following steps: step 1: setting a first encryption key function by SiExpressing, performing convolution operation on each input variable and the corresponding weight function to obtain an intermediate result of the first encryption key; step 2: setting a second encryption key function, which is a bright function:
Figure BDA0002373584840000021
setting a random decision threshold as: p; performing convolution operation on the first encryption key function, the second encryption key function and the random judgment threshold value to obtain an encryption result:
Figure BDA0002373584840000022
and step 3: analyzing and calculating the error of the forward encryption result; setting the output variable E of the trainingIIs an actual value, but the model training will produce a predicted value OiTherefore, the error function is obtained as:
Figure BDA0002373584840000023
wherein m represents the number of the input modeling samples at this time, and i represents the ith variable; and 4, step 4: and reversely propagating and updating the weight w until the value calculated by the error function is within a preset threshold range, and storing the obtained final result as an encrypted result.
Further, the data analysis device performs data analysis on the entered result information to obtain the comprehensive results of the students and executes the following steps: cleaning and integrating the acquired original data; carrying out discretization processing on the preprocessed data by utilizing entropy to obtain a sample set of nominal data; constructing a logistic regression model for training, randomly extracting a data set from the sample set subjected to data conversion to serve as a training set, and solving parameters of the constructed logistic regression model based on the training set to obtain a final analysis model; taking the other data set in the sample set as a test set, and testing by using the obtained final analysis model to obtain an analysis result; a variety of evaluation metrics are employed to evaluate the accuracy of the analysis results.
Further, the data analysis device performs the following steps for the method for cleaning and integrating the collected achievement information: the data filtering sets with four quadrants arranged in the coordinate axis are respectively expressed by the following formulas:
Figure BDA0002373584840000031
wherein, x is the result information,
Figure BDA0002373584840000032
is a real number set, p is a first parameter, and the value range is: (1-10); q is a first parameter, and the value range is as follows: (11-100); and eliminating the data of which the score information falls outside the data filtering set to finish the cleaning and integration of the data.
Further, the method for discretizing the preprocessed data by using entropy to obtain a sample set of nominal data performs the following steps: separating the data after data cleaning and integration through a paradigm matrix, discretizing continuous data, and simplifying the subsequent calculation operation amount; the method of separation by a normal matrix uses the following formula for separation:
Figure BDA0002373584840000041
where | | is the norm operation, X is the coefficient matrix, whichThe number of rows and columns matches the type of data to be discretized, a is a matrix in which the data to be discretized are arranged, G is an adjustment coefficient, and the set range is: (0.2-0.7).
Further, the method for constructing a logistic regression model for training, randomly extracting a data set from the sample set subjected to data conversion as a training set, and solving parameters of the constructed logistic regression model based on the training set to obtain a final analysis model performs the following steps: carrying out linear transformation on the data by using the following conversion functions based on the training set, and solving parameters of a logistic regression model:
Figure BDA0002373584840000042
wherein x is*Solving the obtained parameters; x is a training set; min is the minimum value calculation in the data; max is the maximum value calculation in the data; carrying out data modeling to obtain an analysis model; performing an effect analysis comprising: after the model training is finished, calculating the accuracy of the analysis result and the prediction result of the analysis model by adopting the following formula, namely obtaining R2The higher the score is, the higher the analysis accuracy is represented;
Figure BDA0002373584840000043
wherein y represents the predicted outcome;
Figure BDA0002373584840000044
represents the result of the analysis;
nsamplesrepresenting the size of the sample size entering the model.
A machine scoring method based on data analysis, the method performing the steps of: the data input device is used for inputting score data of students; and the data analysis device is used for carrying out data analysis on the recorded result information to obtain the comprehensive results of the students and encrypting the recorded result information.
Further, in the above-mentioned case,the data analysis device performs data analysis on the recorded result information to obtain the comprehensive results of the students, and the method for encrypting the recorded result information executes the following steps: step 1: setting a first encryption key function by SiExpressing, performing convolution operation on each input variable and the corresponding weight function to obtain an intermediate result of the first encryption key; step 2: setting a second encryption key function, which is a bright function:
Figure BDA0002373584840000051
setting a random decision threshold as: p; performing convolution operation on the first encryption key function, the second encryption key function and the random judgment threshold value to obtain an encryption result:
Figure BDA0002373584840000052
and step 3: analyzing and calculating the error of the forward encryption result; setting the output variable E of the trainingIIs an actual value, but the model training will produce a predicted value OiTherefore, the error function is obtained as:
Figure BDA0002373584840000053
wherein m represents the number of the input modeling samples at this time, and i represents the ith variable; and 4, step 4: and reversely propagating and updating the weight w until the value calculated by the error function is within a preset threshold range, and storing the obtained final result as an encrypted result.
Further, the data analysis device performs data analysis on the entered result information to obtain the comprehensive results of the students and executes the following steps: cleaning and integrating the acquired original data; carrying out discretization processing on the preprocessed data by utilizing entropy to obtain a sample set of nominal data; constructing a logistic regression model for training, randomly extracting a data set from the sample set subjected to data conversion to serve as a training set, and solving parameters of the constructed logistic regression model based on the training set to obtain a final analysis model; taking the other data set in the sample set as a test set, and testing by using the obtained final analysis model to obtain an analysis result; a variety of evaluation metrics are employed to evaluate the accuracy of the analysis results.
Further, the data analysis device performs the following steps for the method for cleaning and integrating the collected achievement information: the data filtering sets with four quadrants arranged in the coordinate axis are respectively expressed by the following formulas:
Figure BDA0002373584840000061
wherein, x is the result information,
Figure BDA0002373584840000062
is a real number set, p is a first parameter, and the value range is: (1-10); q is a first parameter, and the value range is as follows: (11-100); and eliminating the data of which the score information falls outside the data filtering set to finish the cleaning and integration of the data.
Further, the method for discretizing the preprocessed data by using entropy to obtain a sample set of nominal data performs the following steps: separating the data after data cleaning and integration through a paradigm matrix, discretizing continuous data, and simplifying the subsequent calculation operation amount; the method of separation by a normal matrix uses the following formula for separation:
Figure BDA0002373584840000063
wherein, | | is to carry out norm operation, X is the coefficient matrix, and its row and column number is unanimous with the kind of the data of carrying out the discretization, and a is the matrix that the data of carrying out the discretization were arranged into, and G is the adjustment coefficient, and the scope of setting is: (0.2-0.7).
The machine scoring system and method based on data analysis provided by the invention have the following beneficial effects: the invention adopts a new encryption mode based on the convolutional neural network to the achievement information, thereby ensuring the security of the data, the data encrypted by the mode is harder to be decrypted than the conventional encryption mode, and the encryption result can be predicted, namely the encryption result can be known by decryption in advance, the transmission of a secret key is not needed, and the security of the data is further enhanced; in addition, data cleaning and data integration are carried out on the data by adopting a four-quadrant data set, so that the data cleaning efficiency is high, the data filtering result is more accurate, and effective data cannot be deleted; the data discretization can also accelerate the later data analysis process; meanwhile, the invention uses model analysis, can predict the analysis result, has higher intelligent degree and learning ability, and can gradually improve the analysis efficiency, and the data analysis efficiency is greatly higher than that of the common data analysis method in the long term.
Drawings
FIG. 1 is a system diagram of a machine scoring system and method based on data analysis according to an embodiment of the present invention;
FIG. 2 is a schematic method flow diagram of a machine scoring method based on data analysis according to an embodiment of the present invention;
FIG. 3 is a schematic overall method flow diagram of a machine scoring method based on data analysis according to an embodiment of the present invention;
fig. 4 is a comparative experiment chart of a cracking rate curve diagram of the encrypted data information cracked and a cracking rate curve chart of the data information cracked in the prior art of the machine scoring system and the method based on data analysis according to the embodiment of the present invention;
fig. 5 is an experimental graph comparing a graph illustrating data analysis efficiency of a machine scoring system and method based on data analysis according to an embodiment of the present invention with a graph illustrating data analysis efficiency of the prior art.
1-experimental curve schematic diagram of the invention, 2-experimental curve schematic diagram of the prior art.
Detailed Description
The method of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments of the invention.
Example 1
As shown in figures 1, 3 and 4, a machine printing based on data analysisA subsystem, comprising: the data entry device is used for entering score data of students; the data entry devices are distributed to different teachers, and the teachers enter score information of students through the exclusive data entry devices respectively; the data analysis device is used for carrying out data analysis on the recorded score information to obtain the comprehensive score of the student and encrypting the recorded score information; the data analysis device is characterized in that the data analysis device performs data analysis on the recorded result information to obtain the comprehensive results of the students, and the method for encrypting the recorded result information executes the following steps: step 1: setting a first encryption key function by SiExpressing, performing convolution operation on each input variable and the corresponding weight function to obtain an intermediate result of the first encryption key; step 2: setting a second encryption key function, which is a bright function:
Figure BDA0002373584840000081
setting a random decision threshold as: p; performing convolution operation on the first encryption key function, the second encryption key function and the random judgment threshold value to obtain an encryption result:
Figure BDA0002373584840000082
and step 3: analyzing and calculating the error of the forward encryption result; setting the output variable E of the trainingIIs an actual value, but the model training will produce a predicted value OiTherefore, the error function is obtained as:
Figure BDA0002373584840000083
wherein m represents the number of the input modeling samples at this time, and i represents the ith variable; and 4, step 4: and reversely propagating and updating the weight w until the value calculated by the error function is within a preset threshold range, and storing the obtained final result as an encrypted result.
By adopting the technical scheme, the invention ensures the safety of data by adopting a new encryption mode based on the convolutional neural network for the achievement information, the difficulty of the decryption of the data encrypted by the mode is more difficult than that of the conventional encryption mode, and the encryption result can be predicted, namely the encryption result can be known by a decryption party in advance without the transmission of a secret key, thereby further enhancing the safety of the data; in addition, data cleaning and data integration are carried out on the data by adopting a four-quadrant data set, so that the data cleaning efficiency is high, the data filtering result is more accurate, and effective data cannot be deleted; the data discretization can also accelerate the later data analysis process; meanwhile, the invention uses model analysis, can predict the analysis result, has higher intelligent degree and learning ability, and can gradually improve the analysis efficiency, and the data analysis efficiency is greatly higher than that of the common data analysis method in the long term.
Example 2
On the basis of the above embodiment, the data analysis device performs data analysis on the entered result information to obtain the comprehensive results of the students, and the method includes the following steps: cleaning and integrating the acquired original data; carrying out discretization processing on the preprocessed data by utilizing entropy to obtain a sample set of nominal data; constructing a logistic regression model for training, randomly extracting a data set from the sample set subjected to data conversion to serve as a training set, and solving parameters of the constructed logistic regression model based on the training set to obtain a final analysis model; taking the other data set in the sample set as a test set, and testing by using the obtained final analysis model to obtain an analysis result; a variety of evaluation metrics are employed to evaluate the accuracy of the analysis results.
By adopting the technical scheme, the discretization is a common skill in program design, and can effectively reduce the time complexity. The basic idea is to consider only the values that need to be used, among many possible cases. Discretization can improve an inefficient algorithm or even implement an algorithm that is not possible at all. To master this idea, the features of this method must be understood from a large number of topics. For example, discretization may be considered where building the segment tree space is insufficient.
Example 3
On the basis of the previous embodiment, the data analysis device performs the following steps on the method for cleaning and integrating the collected achievement information: the data filtering sets with four quadrants arranged in the coordinate axis are respectively expressed by the following formulas:
Figure BDA0002373584840000091
wherein, x is the result information,
Figure BDA0002373584840000092
is a real number set, p is a first parameter, and the value range is: (1-10); q is a first parameter, and the value range is as follows: (11-100); and eliminating the data of which the score information falls outside the data filtering set to finish the cleaning and integration of the data.
Specifically, when a continuous system is simulated on a digital computer, the problem firstly encountered is how to solve the basic problem of the discreteness of the digital computer in numerical value and time and the continuity of the simulated system in numerical value and time. In the fundamental sense, the numerical calculations performed by a digital computer are simply "numerical" calculations, which means that the precision of the numerical values is limited by the word length, which introduces rounding errors; on the other hand, the calculation is performed step by step on command, and therefore, the time must be discretized, so that only the system performance at discrete time points can be obtained. Numerical integration of differential equations by numerical simulation is achieved by some numerical calculation method. Any calculation method can only be an approximation of the original integral. Therefore, in the continuous system simulation, the original continuous system is discretized essentially in terms of both time and numerical values, and an appropriate numerical calculation method is selected to approximate the integral operation, so that the discrete model obtained by the method approximates the original continuous model. How to ensure that the calculation result of the discrete model can represent the behavior of the original system in principle is the first problem to be solved by the digital simulation of the continuous system.
By adopting the technical scheme, the data after discretization can be more easily classified, so that the analysis is more efficient than continuous data.
Example 4
On the basis of the above embodiment, the method for discretizing the preprocessed data by using entropy to obtain a sample set of nominal data performs the following steps: separating the data after data cleaning and integration through a paradigm matrix, discretizing continuous data, and simplifying the subsequent calculation operation amount; the method of separation by a normal matrix uses the following formula for separation:
Figure BDA0002373584840000101
wherein, | | is to carry out norm operation, X is the coefficient matrix, and its row and column number is unanimous with the kind of the data of carrying out the discretization, and a is the matrix that the data of carrying out the discretization were arranged into, and G is the adjustment coefficient, and the scope of setting is: (0.2-0.7).
Specifically, the discretization method develops along different main lines according to different requirements, and currently, a plurality of classification systems of different discretization methods exist. Different classification schemes emphasize different aspects of the distinction between discretization methods. The main classification systems are supervised and unsupervised, dynamic and static, global and local, split (top to bottom) and merged (bottom to top), univariate and multivariate as well as direct and incremental.
The discretization method can be classified into a supervised discretization method and an unsupervised discretization method according to whether the discretization method uses category labeling information of the data set in the discretization process. In which the unsupervised discretization method does not need to use category information in the discretization process, and typical representatives of such methods are binning methods, including equal-width binning and equal-frequency binning. The binning method uses bin mean or bin number to replace each value in a bin to discretize the data. In practical application, the box separation method has poor effect, especially when numerical data is not uniformly distributed. Supervised discretization methods require the use of category information in the discretization process. Previous studies have shown that supervised methods work better than unsupervised methods.
Discretization methods are also often distinguished by dynamic or static classification methods. The dynamic discretization method is to discretize continuous features while establishing a classification model. The static discretization method is to complete the discretization before classification.
The discretization method can be divided into global and local again, depending on whether the discretization process is for the entire training data space or not. The global discretization method uses all instances, while the local discretization method uses only a portion of the instances.
Example 5
On the basis of the above embodiment, the method for constructing a logistic regression model for training, randomly extracting a data set from a data-converted sample set as a training set, and solving parameters of the constructed logistic regression model based on the training set to obtain a final analysis model performs the following steps: carrying out linear transformation on the data by using the following conversion functions based on the training set, and solving parameters of a logistic regression model:
Figure BDA0002373584840000111
wherein x is*Solving the obtained parameters; x is a training set; min is the minimum value calculation in the data; max is the maximum value calculation in the data; carrying out data modeling to obtain an analysis model; performing an effect analysis comprising: after the model training is finished, calculating the accuracy of the analysis result and the prediction result of the analysis model by adopting the following formula, namely obtaining R2The higher the score is, the higher the analysis accuracy is represented;
Figure BDA0002373584840000121
wherein y represents the predicted outcome;
Figure BDA0002373584840000122
represents the result of the analysis;
nsamplesrepresenting the size of the sample size entering the model.
Specifically, the specific nonlinear adaptive information processing capability of the artificial neural network overcomes the defects of intuition, such as mode, voice recognition and unstructured information processing, of the traditional artificial intelligent method, and the method is successfully applied to the fields of a neural expert system, mode recognition, intelligent control, combination optimization, prediction and the like. The combination of artificial neural networks and other traditional methods will promote the development of artificial intelligence and information processing technology. In recent years, artificial neural networks are more deeply developed on the road of positively simulating human cognition, and are combined with a fuzzy system, a genetic algorithm, an evolution mechanism and the like to form computational intelligence, so that the artificial neural networks become an important direction of artificial intelligence and are developed in practical application. The information geometry is applied to the research of the artificial neural network, and a new way is opened for the theoretical research of the artificial neural network. The research of the neural computer is developed rapidly, and products enter the market. The photoelectric combined nerve computer provides good conditions for the development of an artificial neural network.
Neural networks have found good application in many fields, but there are many aspects that need to be studied. Among them, the combination of neural networks with the advantages of distributed storage, parallel processing, self-learning, self-organization and nonlinear mapping and other technologies, and the hybrid method and hybrid system thereof have become a great research hotspot. Since other methods have their own advantages, the neural network is combined with other methods to make up for the deficiencies of the neural network, and then better application effects can be obtained. At present, the work of the aspect is the fusion of a neural network and fuzzy logic, an expert system, a genetic algorithm, wavelet analysis, chaos, a rough set theory, a fractal theory, an evidence theory, a gray system and the like.
By adopting the technical scheme, the improvement of the analysis result and the analysis efficiency is ensured through the mode, the analysis accuracy is gradually improved through comparison between the neural network predicted value and the analysis finger, and the analysis efficiency is higher than that of the prior art.
Example 6
As shown in fig. 2, a machine scoring method based on data analysis, the method performs the following steps: the data input device is used for inputting score data of students; and the data analysis device is used for carrying out data analysis on the recorded result information to obtain the comprehensive results of the students and encrypting the recorded result information.
Example 7
On the basis of the above embodiment, the data analysis device performs data analysis on the entered result information to obtain the comprehensive results of the students, and the method for encrypting the entered result information performs the following steps: step 1: setting a first encryption key function by SiExpressing, performing convolution operation on each input variable and the corresponding weight function to obtain an intermediate result of the first encryption key; step 2: setting a second encryption key function, which is a bright function:
Figure BDA0002373584840000131
setting a random decision threshold as: p; performing convolution operation on the first encryption key function, the second encryption key function and the random judgment threshold value to obtain an encryption result:
Figure BDA0002373584840000132
and step 3: analyzing and calculating the error of the forward encryption result; setting the output variable E of the trainingIIs an actual value, but the model training will produce a predicted value OiTherefore, the error function is obtained as:
Figure BDA0002373584840000133
wherein m represents the number of the input modeling samples at this time, and i represents the ith variable; and 4, step 4: and reversely propagating and updating the weight w until the value calculated by the error function is within a preset threshold range, and storing the obtained final result as an encrypted result.
Example 8
On the basis of the above embodiment, the data analysis device performs data analysis on the entered result information to obtain the comprehensive results of the students, and the method includes the following steps: cleaning and integrating the acquired original data; carrying out discretization processing on the preprocessed data by utilizing entropy to obtain a sample set of nominal data; constructing a logistic regression model for training, randomly extracting a data set from the sample set subjected to data conversion to serve as a training set, and solving parameters of the constructed logistic regression model based on the training set to obtain a final analysis model; taking the other data set in the sample set as a test set, and testing by using the obtained final analysis model to obtain an analysis result; a variety of evaluation metrics are employed to evaluate the accuracy of the analysis results.
Example 9
On the basis of the previous embodiment, the data analysis device performs the following steps on the method for cleaning and integrating the collected achievement information: the data filtering sets with four quadrants arranged in the coordinate axis are respectively expressed by the following formulas:
Figure BDA0002373584840000141
wherein, x is the result information,
Figure BDA0002373584840000142
is a real number set, p is a first parameter, and the value range is: (1-10); q is a first parameter, and the value range is as follows: (11-100); and eliminating the data of which the score information falls outside the data filtering set to finish the cleaning and integration of the data.
By adopting the technical scheme, in the data cleaning process, the four-quadrant data filtering method is adopted, the filtering speed is higher, and the filtering efficiency is higher than that of a pure data filtering method.
Example 10
On the basis of the previous embodiment, the preprocessed data are discretized by using entropy to obtain a nominal numberThe method according to the sample set performs the following steps: separating the data after data cleaning and integration through a paradigm matrix, discretizing continuous data, and simplifying the subsequent calculation operation amount; the method of separation by a normal matrix uses the following formula for separation:
Figure BDA0002373584840000143
wherein, | | is to carry out norm operation, X is the coefficient matrix, and its row and column number is unanimous with the kind of the data of carrying out the discretization, and a is the matrix that the data of carrying out the discretization were arranged into, and G is the adjustment coefficient, and the scope of setting is: (0.2-0.7).
The above description is only an embodiment of the present invention, but not intended to limit the scope of the present invention, and any structural changes made according to the present invention should be considered as being limited within the scope of the present invention without departing from the spirit of the present invention.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that, the system provided in the foregoing embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (9)

1. A machine scoring system based on data analysis, comprising:
the data entry device is used for entering score data of students; the data entry devices are distributed to different teachers, and the teachers enter score information of students through the exclusive data entry devices respectively; the data analysis device is used for carrying out data analysis on the recorded score information to obtain the comprehensive score of the student and encrypting the recorded score information; it is characterized in that the preparation method is characterized in that,
the data analysis device performs data analysis on the input result information to obtain the comprehensive results of the students and executes the following steps: cleaning and integrating the acquired original data; carrying out discretization processing on the preprocessed data by utilizing entropy to obtain a sample set of nominal data; constructing a logistic regression model for training, randomly extracting a data set from the sample set subjected to data conversion to serve as a training set, and solving parameters of the constructed logistic regression model based on the training set to obtain a final analysis model; taking the other data set in the sample set as a test set, and testing by using the obtained final analysis model to obtain an analysis result; evaluating the accuracy of the analysis result by adopting various evaluation metrics;
the method for constructing the logistic regression model for training randomly extracts a data set from a sample set subjected to data conversion as a training set, and solves the parameters of the constructed logistic regression model based on the training set to obtain a final analysis model comprises the following steps: carrying out linear transformation on the data by using the following conversion functions based on the training set, and solving parameters of a logistic regression model:
Figure FDA0003103215190000011
wherein x is*Solving the obtained parameters; x is the number ofIs a training set; min is the minimum value calculation in the data; max is taken intoCalculating the maximum value in the line data; carrying out data modeling to obtain an analysis model; performing an effect analysis comprising: after the model training is finished, calculating the accuracy of the analysis result and the prediction result of the analysis model by adopting the following formula, namely obtaining R2The higher the score is, the higher the analysis accuracy is represented;
Figure FDA0003103215190000012
wherein y represents the predicted outcome;
Figure FDA0003103215190000021
represents the result of the analysis;
nsamplesrepresenting the size of the sample size entering the model;
the data analysis device performs data analysis on the recorded result information to obtain the comprehensive results of the students, and the method for encrypting the recorded result information executes the following steps:
step 1: setting a first encryption key function by SiExpressing, performing convolution operation on each input variable and the corresponding weight function to obtain an intermediate result of the first encryption key;
step 2: setting a second encryption key function, wherein the second encryption key function is a bright function:
Figure FDA0003103215190000022
setting a random decision threshold as: p; performing convolution operation on the first encryption key function, the second encryption key function and the random judgment threshold value to obtain an encryption result:
Figure FDA0003103215190000023
Figure FDA0003103215190000024
and step 3: analyzing and calculating the error of the forward encryption result; setting the output variable E of the trainingIIs an actual value, but the model training will produce a predicted value OiTherefore, the error function is obtained as:
Figure FDA0003103215190000025
Figure FDA0003103215190000026
wherein m represents the number of the input modeling samples at this time, and i represents the ith variable;
and 4, step 4: and (4) reversely propagating and updating the weight w until the value calculated by the error function is in a set threshold range, and storing the obtained final result as an encrypted result.
2. The system of claim 1, wherein the data analysis device performs data analysis on the entered achievement information to obtain the comprehensive achievement of the student, and the method comprises the following steps: cleaning and integrating the acquired original data; carrying out discretization processing on the preprocessed data by utilizing entropy to obtain a sample set of nominal data; constructing a logistic regression model for training, randomly extracting a data set from the sample set subjected to data conversion to serve as a training set, and solving parameters of the constructed logistic regression model based on the training set to obtain a final analysis model; taking the other data set in the sample set as a test set, and testing by using the obtained final analysis model to obtain an analysis result; a variety of evaluation metrics are employed to evaluate the accuracy of the analysis results.
3. The system of claim 2, wherein the data analysis device, the method of cleaning and integrating the collected performance information performs the steps of: the data filtering sets with four quadrants arranged in the coordinate axis are respectively expressed by the following formulas:
Figure FDA0003103215190000031
wherein, beta is the result information,
Figure FDA0003103215190000032
is a real number set, p is a first parameter, and the value range is: 1-10; q is a second parameter, and the value range is as follows: 11 to 100; and eliminating the data of which the score information falls outside the data filtering set to finish the cleaning and integration of the data.
4. The system of claim 3, wherein the method of discretizing the preprocessed data with entropy to obtain a sample set of nominal data performs the steps of: separating the data after data cleaning and integration through a paradigm matrix, discretizing continuous data, and simplifying the subsequent calculation operation amount; the method of separation by a normal matrix uses the following formula for separation:
Figure FDA0003103215190000033
wherein | | | purple hair2In order to perform norm operation, X is a coefficient matrix whose number of rows and columns matches the type of data to be discretized, a is a matrix in which data to be discretized are arranged, G is an adjustment coefficient, and the set range is: 0.2 to 0.7.
5. A machine scoring method based on data analysis based on the system of one of claims 1 to 4, characterized in that the method performs the following steps:
firstly, inputting score data of students through a data input device;
then, the recorded result information is subjected to data analysis by the data analysis device to obtain the comprehensive result of the student, and the recorded result information is encrypted.
6. The method of claim 5,
the method for analyzing the recorded result information by the data analysis device to obtain the comprehensive result of the student and encrypting the recorded result information comprises the following steps:
step 1: setting a first encryption key function by SiExpressing, performing convolution operation on each input variable and the corresponding weight function to obtain an intermediate result of the first encryption key;
step 2: setting a second encryption key function, wherein the second encryption key function is a bright function:
Figure FDA0003103215190000041
setting a random decision threshold as: p; performing convolution operation on the first encryption key function, the second encryption key function and the random judgment threshold value to obtain an encryption result:
Figure FDA0003103215190000042
Figure FDA0003103215190000043
and step 3: analyzing and calculating the error of the forward encryption result; setting the output variable E of the trainingIIs an actual value, but the model training will produce a predicted value OiTherefore, the error function is obtained as:
Figure FDA0003103215190000044
Figure FDA0003103215190000045
wherein m represents the number of the input modeling samples at this time, and i represents the ith variable;
and 4, step 4: and (4) reversely propagating and updating the weight w until the value calculated by the error function is in a set threshold range, and storing the obtained final result as an encrypted result.
7. The method of claim 6, wherein the method of data analyzing the entered achievement information by the data analysis device to derive the student's combined achievement performs the steps of: cleaning and integrating the acquired original data; carrying out discretization processing on the preprocessed data by utilizing entropy to obtain a sample set of nominal data; constructing a logistic regression model for training, randomly extracting a data set from the sample set subjected to data conversion to serve as a training set, and solving parameters of the constructed logistic regression model based on the training set to obtain a final analysis model; taking the other data set in the sample set as a test set, and testing by using the obtained final analysis model to obtain an analysis result; a variety of evaluation metrics are employed to evaluate the accuracy of the analysis results.
8. The method of claim 7, wherein the method of cleaning and integrating the collected performance information by the data analysis device performs the steps of: the data filtering sets with four quadrants arranged in the coordinate axis are respectively expressed by the following formulas:
Figure FDA0003103215190000051
wherein, beta is the result information,
Figure FDA0003103215190000052
is a real number set, p is a first parameter, and the value range is: 1-10; q is a second parameter, and the value range is as follows: 11 to 100; and eliminating the data of which the score information falls outside the data filtering set to finish the cleaning and integration of the data.
9. The method of claim 6, wherein the discretizing the preprocessed data with entropy to obtain a sample set of nominal data comprises: separating the data after data cleaning and integration through a paradigm matrix, discretizing continuous data, and simplifying the subsequent calculation operation amount; the method of separation by a normal matrix uses the following formulaAnd (3) separation:
Figure FDA0003103215190000053
wherein | | | purple hair2In order to perform norm operation, X is a coefficient matrix whose number of rows and columns matches the type of data to be discretized, a is a matrix in which data to be discretized are arranged, G is an adjustment coefficient, and the set range is: 0.2 to 0.7.
CN202010058365.5A 2020-01-19 2020-01-19 Machine scoring system and method based on data analysis Active CN111274234B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010058365.5A CN111274234B (en) 2020-01-19 2020-01-19 Machine scoring system and method based on data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010058365.5A CN111274234B (en) 2020-01-19 2020-01-19 Machine scoring system and method based on data analysis

Publications (2)

Publication Number Publication Date
CN111274234A CN111274234A (en) 2020-06-12
CN111274234B true CN111274234B (en) 2021-07-30

Family

ID=71003070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010058365.5A Active CN111274234B (en) 2020-01-19 2020-01-19 Machine scoring system and method based on data analysis

Country Status (1)

Country Link
CN (1) CN111274234B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112396923B (en) * 2020-11-25 2023-09-19 贵州轻工职业技术学院 Marketing teaching simulation system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779087A (en) * 2016-11-30 2017-05-31 福建亿榕信息技术有限公司 A kind of general-purpose machinery learning data analysis platform
CN107924384A (en) * 2015-03-11 2018-04-17 阿雅斯迪公司 For the system and method using study model prediction result is predicted
CN109426861A (en) * 2017-08-16 2019-03-05 阿里巴巴集团控股有限公司 Data encryption, machine learning model training method, device and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107924384A (en) * 2015-03-11 2018-04-17 阿雅斯迪公司 For the system and method using study model prediction result is predicted
CN106779087A (en) * 2016-11-30 2017-05-31 福建亿榕信息技术有限公司 A kind of general-purpose machinery learning data analysis platform
CN109426861A (en) * 2017-08-16 2019-03-05 阿里巴巴集团控股有限公司 Data encryption, machine learning model training method, device and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
机器学习面试总结】—— LR(逻辑回归);ashergaga;《https://zhuanlan.zhihu.com/p/100763009》;20200102;第1-38页 *
面向大数据特征学习的深度计算模型研究;张清辰;《中国博士学位论文全文数据库 信息科技辑》;20170315(第3期);全文 *

Also Published As

Publication number Publication date
CN111274234A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN103870751B (en) Method and system for intrusion detection
Chen et al. Research on intrusion detection method based on Pearson correlation coefficient feature selection algorithm
Cheng et al. Data mining applications in evaluating mine ventilation system
CN109165337B (en) Method and system for establishing bid and ask field association analysis based on knowledge graph
CN112231306B (en) Big data based energy data analysis system and method
Koc et al. Prediction of construction accident outcomes based on an imbalanced dataset through integrated resampling techniques and machine learning methods
CN110011990B (en) Intelligent analysis method for intranet security threats
Fauvel et al. A performance-explainability framework to benchmark machine learning methods: application to multivariate time series classifiers
CN113726784A (en) Network data security monitoring method, device, equipment and storage medium
CN106682835A (en) Data-driven complex electromechanical system service quality state evaluation method
CN111274234B (en) Machine scoring system and method based on data analysis
CN112637108A (en) Internal threat analysis method and system based on anomaly detection and emotion analysis
Lambert II Security analytics: Using deep learning to detect cyber attacks
Holub et al. The Intelligent Monitoring of Messages on Social Networks.
Jha Strengthening smart grid cybersecurity: An in-depth investigation into the fusion of machine learning and natural language processing
Kumar et al. Identifying patterns in common vulnerabilities and exposures databases with exploratory data analysis
Parfenov et al. Research of multiclass fuzzy classification of traffic for attacks identification in the networks
Elsarrar et al. Analysis of forest fire data using neural network rule extraction with human understandable rules
CN115189966A (en) Block chain private data encryption and decryption service system
Alcoforado et al. Text mining and ruin theory: A case study of research on risk models with dependence
Tian et al. Preliminary study of PHM system based on data driven
Razak et al. A student performance prediction model using data mining technique
Loh et al. Comparison of feedforward neural network with different training algorithms for Bitcoin price forecasting
CN116305267B (en) Privacy disclosure risk assessment method and system for hybrid cloud model
Vavra et al. EVALUATION OF DATA PREPROCESSING TECHNIQUES FOR ANOMALY DETECTION SYSTEMS IN INDUSTRIAL CONTROL SYSTEM.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant