CN112364002A - Modeling method of data analysis model - Google Patents

Modeling method of data analysis model Download PDF

Info

Publication number
CN112364002A
CN112364002A CN202011218969.8A CN202011218969A CN112364002A CN 112364002 A CN112364002 A CN 112364002A CN 202011218969 A CN202011218969 A CN 202011218969A CN 112364002 A CN112364002 A CN 112364002A
Authority
CN
China
Prior art keywords
data
model
data analysis
analysis
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011218969.8A
Other languages
Chinese (zh)
Inventor
李晓红
陈燕群
彭海宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xinpengcheng Data Technology Development Co ltd
Original Assignee
Shanghai Xinpengcheng Data Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xinpengcheng Data Technology Development Co ltd filed Critical Shanghai Xinpengcheng Data Technology Development Co ltd
Priority to CN202011218969.8A priority Critical patent/CN112364002A/en
Publication of CN112364002A publication Critical patent/CN112364002A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a modeling method of a data analysis model, which belongs to the technical field of data analysis and comprises the following steps: step one, determining a model target and a data analysis range; step two, performing data association analysis according to the data analysis range provided in the step one, and selecting data dimension and measurement; step three, constructing an initial model according to the execution result in the step two and the model target requirement, and determining the basic dimensionality of the initial model; step four, importing test data according to the initial model in the step three, and performing data fitting operation on the model; step five, checking the fitting operation result of the step four, if the accuracy of the checking result exceeds 90%, determining that the model is established, finishing modeling, and if the model does not reach the standard, returning to the step two; and step six, storing the model data in a model library. The invention utilizes automatic collection and intelligent cleaning of various types of data, and can solve the problems of complex and non-uniform data format, difficult collection and the like.

Description

Modeling method of data analysis model
Technical Field
The invention relates to the technical field of data analysis, in particular to a modeling method of a data analysis model.
Background
Data analysis refers to the process of analyzing a large amount of collected data by using an appropriate statistical analysis method, extracting useful information and forming a conclusion to study and summarize the data in detail. This process is also a support process for quality management architectures. In practice, data analysis may help people make decisions in order to take appropriate action.
The mathematical basis for data analysis was established in the early 20 th century, but the advent of computers did not make practical operation possible and enabled the spread of data analysis. Data analysis is the product of a combination of mathematics and computer science.
With the development of science and technology, colleges and universities conveniently monitor and manage all internal business data of schools, staff information and student information in the schools are input into the system by introducing some school systems, each user has a single account with real-name authentication, and the staff and the students can log in the school systems to check bulletins issued by the schools and obtain school information at any time and any place.
However, in the process of building the system in the school, a data analysis model is needed, but the existing data analysis model only supports single type of data collection and cleaning, and has low intelligent degree, poor compatibility, low efficiency in use and inconvenience in processing various types of data.
To this end, we propose a modeling method of a data analysis model to solve the above problems.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a modeling method of a data analysis model, which briefly describes the technical effects achieved below.
In order to achieve the purpose, the invention adopts the following technical scheme:
a modeling method of a data analysis model comprises the following steps:
step one, determining a model target and a data analysis range;
step two, performing data association analysis according to the data analysis range provided in the step one, and selecting data dimension and measurement;
step three, constructing an initial model according to the execution result in the step two and the model target requirement, and determining the basic dimensionality of the initial model;
step four, importing test data according to the initial model in the step three, and performing data fitting operation on the model;
step five, checking the fitting operation result of the step four, if the accuracy of the checking result exceeds 90%, determining that the model is established, finishing modeling, and if the model does not reach the standard, returning to the step two;
and step six, storing the model data in a model library.
Further, the number of samples of the test data is not less than 1000 sets.
Further, the data import format may be a structured data input format, an unstructured data input format, and structured data.
Further, the structured data input format is a json file format;
the unstructured data input format is a text file format;
the structured data is an excel file.
Further, the data relevance analysis is based on the data relevance analysis performed by the Chinese NLP technology.
Compared with the prior art, the invention has the beneficial effects that:
1. compared with the prior art, the method utilizes the Chinese NLP technology to analyze the acquired college information, news, thesis and the like, and extracts keywords and characteristic words according to the preset 5 dimensions to form a characteristic library of vocational education data;
2. compared with the prior art, the method and the device have the advantages that the problems of complex data format, non-uniformity, difficulty in collection and the like can be solved by utilizing automatic collection and intelligent cleaning of various types of data and combining the relevance among the data, meanwhile, the various types of data can be automatically collected and cleaned, the compatibility is realized by the existing modeling method, the accuracy is high, and the data processing efficiency is high.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
Fig. 1 is a schematic flow chart of a modeling method of a data analysis model according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.
Referring to fig. 1, a modeling method of a data analysis model includes the steps of:
step one, determining a model target and a data analysis range;
step two, performing data association analysis according to the data analysis range provided in the step one, and selecting data dimension and measurement;
step three, constructing an initial model according to the execution result in the step two and the model target requirement, and determining the basic dimensionality of the initial model;
step four, importing test data according to the initial model in the step three, and performing data fitting operation on the model;
step five, checking the fitting operation result of the step four, if the accuracy of the checking result exceeds 90%, determining that the model is established, finishing modeling, and if the model does not reach the standard, returning to the step two;
and step six, storing the model data in a model library.
The model object can be staff information data in the school, student information data, information data in the school, expense information data in the school, public notice information data in the school and master-student thesis information data.
More specifically, the number of samples of test data is not less than 1000 sets.
The constructed initial model is tested through a plurality of sets of test data, so that the time accuracy of the initial model is improved, and the initial model is trained more mature.
More specifically, the format of the data import may be a structured data entry format, an unstructured data entry format, and structured data.
More specifically, the structured data input format is a json file format;
the unstructured data input format is a text file format;
the structured data is an excel file.
The constructed model can be suitable for various data formats of different types, the compatibility of the data analysis model is improved, the data analysis model can also realize automatic receipt and intelligent cleaning of the data, and the data processing efficiency is superior to that of the existing data analysis model.
More specifically, the data relevance analysis is based on the chinese NLP technique. And analyzing the acquired college information, news, thesis and the like, and extracting keywords and characteristic words according to preset 5 dimensions to form a characteristic library of the vocational education data.
The working principle and the using process of the invention are as follows:
the invention utilizes the automatic collection and intelligent cleaning of various types of data, combines the relevance among the data, can solve the problems of complex data format, non-uniformity, difficult collection and the like, and can automatically collect and clean various types of data at the same time, and the compatibility is high due to the existing modeling method, the accuracy is high, and the efficiency of data processing is high.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (5)

1. A modeling method of a data analysis model is characterized by comprising the following steps:
step one, determining a model target and a data analysis range;
step two, performing data association analysis according to the data analysis range provided in the step one, and selecting data dimension and measurement;
step three, constructing an initial model according to the execution result in the step two and the model target requirement, and determining the basic dimensionality of the initial model;
step four, importing test data according to the initial model in the step three, and performing data fitting operation on the model;
step five, checking the fitting operation result of the step four, if the accuracy of the checking result exceeds 90%, determining that the model is established, finishing modeling, and if the model does not reach the standard, returning to the step two;
and step six, storing the model data in a model library.
2. The method of claim 1, wherein the number of samples of the test data is not less than 1000 sets.
3. The method of claim 1, wherein the data import format is selected from the group consisting of a structured data input format, an unstructured data input format, and structured data.
4. A method of modelling a data analysis model according to claim 3, wherein the structured data input format is a json file format;
the unstructured data input format is a text file format;
the structured data is an excel file.
5. The modeling method of a data analysis model according to claim 1, wherein the data association analysis is a data association analysis based on chinese NLP technology.
CN202011218969.8A 2020-11-04 2020-11-04 Modeling method of data analysis model Pending CN112364002A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011218969.8A CN112364002A (en) 2020-11-04 2020-11-04 Modeling method of data analysis model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011218969.8A CN112364002A (en) 2020-11-04 2020-11-04 Modeling method of data analysis model

Publications (1)

Publication Number Publication Date
CN112364002A true CN112364002A (en) 2021-02-12

Family

ID=74513537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011218969.8A Pending CN112364002A (en) 2020-11-04 2020-11-04 Modeling method of data analysis model

Country Status (1)

Country Link
CN (1) CN112364002A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108121785A (en) * 2017-12-15 2018-06-05 华中师范大学 A kind of analysis method based on education big data
CN109241030A (en) * 2018-08-09 2019-01-18 南方电网科学研究院有限责任公司 Robot manipulating task data analytics server and robot manipulating task data analysing method
US20190156231A1 (en) * 2017-11-17 2019-05-23 Adobe Systems Incorporated User segmentation using predictive model interpretation
CN109889476A (en) * 2018-12-05 2019-06-14 国网冀北电力有限公司信息通信分公司 A kind of network safety protection method and network security protection system
CN110288001A (en) * 2019-05-28 2019-09-27 西南电子技术研究所(中国电子科技集团公司第十研究所) Target identification method based on the training study of target data feature
CN110489513A (en) * 2019-06-24 2019-11-22 覃立万 A kind of intelligent robot social information processing method and the social intercourse system with people
JP2020024184A (en) * 2018-07-26 2020-02-13 直人 今西 Structural inside deformation character detection device
CN110990525A (en) * 2019-11-15 2020-04-10 华融融通(北京)科技有限公司 Natural language processing-based public opinion information extraction and knowledge base generation method
CN111192176A (en) * 2019-12-30 2020-05-22 华中师范大学 Online data acquisition method and device supporting education informatization assessment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190156231A1 (en) * 2017-11-17 2019-05-23 Adobe Systems Incorporated User segmentation using predictive model interpretation
CN108121785A (en) * 2017-12-15 2018-06-05 华中师范大学 A kind of analysis method based on education big data
JP2020024184A (en) * 2018-07-26 2020-02-13 直人 今西 Structural inside deformation character detection device
CN109241030A (en) * 2018-08-09 2019-01-18 南方电网科学研究院有限责任公司 Robot manipulating task data analytics server and robot manipulating task data analysing method
CN109889476A (en) * 2018-12-05 2019-06-14 国网冀北电力有限公司信息通信分公司 A kind of network safety protection method and network security protection system
CN110288001A (en) * 2019-05-28 2019-09-27 西南电子技术研究所(中国电子科技集团公司第十研究所) Target identification method based on the training study of target data feature
CN110489513A (en) * 2019-06-24 2019-11-22 覃立万 A kind of intelligent robot social information processing method and the social intercourse system with people
CN110990525A (en) * 2019-11-15 2020-04-10 华融融通(北京)科技有限公司 Natural language processing-based public opinion information extraction and knowledge base generation method
CN111192176A (en) * 2019-12-30 2020-05-22 华中师范大学 Online data acquisition method and device supporting education informatization assessment

Similar Documents

Publication Publication Date Title
CN106127634B (en) Student academic achievement prediction method and system based on naive Bayes model
CN109272789A (en) Learning effect assessment system and appraisal procedure based on data analysis
Matsebula et al. A big data architecture for learning analytics in higher education
WO2021143090A1 (en) Community life circle space identification method and system, computer device and storage medium
CN105072173A (en) Customer service method and system for automatically switching between automatic customer service and artificial customer service
WO2016184192A1 (en) Data processing method and device
WO2022170985A1 (en) Exercise selection method and apparatus, and computer device and storage medium
CN102591929B (en) Library data processing system and data processing method thereof
CN112862234A (en) College subject evaluation method and system
CN112685514A (en) AI intelligent customer value management platform
CN107590622A (en) The 360 ° of evaluating systems and method of a kind of Standardized Training of Residents process
Tian Interactive music instructional mode based on cloud computing
Song et al. Learning analytics at large: The lifelong learning network of 160,000 European teachers
CN112364002A (en) Modeling method of data analysis model
CN111915224A (en) Teaching evaluation system
Hendel et al. A comparative analysis of higher education ranking systems in Europe
CN113918588A (en) Wrong question dynamic intelligent management system based on knowledge points
CN108830755A (en) A kind of learning diagnosis system of the easy-operating knowledge based map of low cost
CN111402656A (en) Cloud computing teaching system
CN110991904A (en) Teaching quality assessment method, system and storage medium
CN111191859A (en) Examination prediction method and system
Jianyun Big data assisted online teaching platform for ideological and political theory course in universities
Hadisoebroto et al. The Use of The Learning Analytics Method in Moodle LMS Data to Predict The Final Score of Students in The Vocational Faculty
Zhu et al. Development and validation of information literacy assessment tool for primary students
CN108764860A (en) A kind of business Education Administration Information System based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210212