CN111104442A - Preprocessing method for enterprise comprehensive data - Google Patents

Preprocessing method for enterprise comprehensive data Download PDF

Info

Publication number
CN111104442A
CN111104442A CN201911077743.8A CN201911077743A CN111104442A CN 111104442 A CN111104442 A CN 111104442A CN 201911077743 A CN201911077743 A CN 201911077743A CN 111104442 A CN111104442 A CN 111104442A
Authority
CN
China
Prior art keywords
data
preprocessing
enterprise
sampling
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911077743.8A
Other languages
Chinese (zh)
Inventor
高婧
李依青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Green Cheng Network Technology Co Ltd
Original Assignee
Hangzhou Green Cheng Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Green Cheng Network Technology Co Ltd filed Critical Hangzhou Green Cheng Network Technology Co Ltd
Priority to CN201911077743.8A priority Critical patent/CN111104442A/en
Publication of CN111104442A publication Critical patent/CN111104442A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to a preprocessing method of enterprise comprehensive data, which comprises the steps of establishing a plurality of data storage libraries and a preprocessing rule library, storing enterprise data in a classified mode to form a data list, respectively sampling and extracting data characteristics of each database, denoising the data characteristics, discretizing the data characteristics to form a nonlinear discrete data characteristic library, sampling nonlinear discrete data, decomposing the sampled data to form a data matrix subset, importing the data on the basis of a data matrix, and evaluating a preprocessing result. The enterprise comprehensive data preprocessing method provided by the invention can conveniently preprocess different types of data by establishing different characteristic parameters through different classification databases of acquired data and by forming the resume of the enterprise rule base and the discrete characteristic database.

Description

Preprocessing method for enterprise comprehensive data
Technical Field
The invention relates to the field of image recognition, in particular to a method for preprocessing enterprise comprehensive data.
Background
Data preprocessing refers to some processing of data prior to the main processing. For example, before most of geophysical area observation data are converted or enhanced, the irregularly distributed measurement network is firstly converted into a regular network through interpolation to facilitate the operation of a computer, and in addition, for some section measurement data, such as seismic data preprocessing, vertical stacking, rearrangement, trace header addition, editing, resampling, multi-path editing and the like, the data in the real world are mostly incomplete and inconsistent dirty data, data mining cannot be directly performed, or the mining result is not satisfactory. In order to improve the quality of data mining, data preprocessing techniques have been developed, and there are various methods for data preprocessing: data cleaning, data integration, data transformation, data reduction and the like, a large amount of data supports are needed to be used as analysis objects for planning and developing directions of enterprises so as to specify targets, the information required by the enterprises is various and complex, and preprocessing of the collected information is very necessary.
Disclosure of Invention
The invention provides a method and a device for preprocessing enterprise comprehensive data and a computer readable storage medium, and mainly aims to improve the efficiency of image annotation.
In order to achieve the above object, the present invention provides a method for preprocessing enterprise integrated data, which is applied in an electronic device, and the method includes:
establishing a plurality of data storage libraries and a preprocessing rule library, and storing enterprise data in a classified manner to form a data list;
respectively sampling and extracting the data characteristics of each database, and denoising the data characteristics;
discretizing the data characteristics to form a nonlinear discrete data characteristic library;
sampling the nonlinear discrete data, and decomposing the sampled data to form a data matrix subset;
and importing data on the basis of the data matrix and evaluating a preprocessing result.
Optionally, the establishing of the data repository and the preprocessing rule base further includes creating a data table in each database and naming in a standardized manner, sampling the data to be preprocessed, importing the sampled data into the new data table, and performing mathematical statistics on the value of each field of the sampled preprocessed data.
Optionally, the plurality of data repositories includes:
an enterprise information database for storing prospective plans and target plans of an enterprise;
the market information database is used for recording connection industry big data information and questionnaire evaluation feedback information;
and the policy information database is used for recording the government policy information and changes for the industry.
Optionally, the step of discretizing the data features to form a nonlinear discrete data feature library includes:
selecting a plurality of sampling point data characteristic values as X for each database1,X2,X3......XiThe sampling values are respectively Y1,Y2,Y3......YiThree interpolation basis functions are set:
the sampling data characteristic calculation formula is as follows:
Figure BDA0002263005770000021
optionally, the step of importing data based on the data matrix and evaluating the preprocessing result further includes:
calculating the characteristic vector of each section in the data matrix subset and the characteristic vector of the regular data in the rule base, and correcting according to the error judgment between the two characteristic vectors;
importing all data to be preprocessed into a newly-built data table, preprocessing the data according to a data preprocessing method in a rule base, and evaluating a preprocessing result;
and generating an evaluation log for feeding back evaluation information.
An electronic device comprising a memory and a processor, the memory having stored thereon a pre-processing program of enterprise integrated data executable on the processor, the pre-processing program of enterprise integrated data when executed by the processor implementing the steps of:
establishing a plurality of data storage libraries and a preprocessing rule library, and storing enterprise data in a classified manner to form a data list;
respectively sampling and extracting the data characteristics of each database, and denoising the data characteristics;
discretizing the data characteristics to form a nonlinear discrete data characteristic library;
sampling the nonlinear discrete data, and decomposing the sampled data to form a data matrix subset;
and importing data on the basis of the data matrix and evaluating a preprocessing result.
Optionally, the preprocessing program of the enterprise integrated data, when executed by the processor, further implements the following steps: and newly building a data table in each database, naming in a standardized manner, sampling the data to be preprocessed, introducing the sampled data into the newly built data table, and simultaneously carrying out mathematical statistics on the value of each field of the sampled preprocessed data.
Optionally, the step of forming the non-linear discrete data feature library by discretizing the data features includes:
selecting a plurality of sampling point data characteristic values as X for each database1,X2,X3......XiThe sampling values are respectively Y1,Y2,Y3......YiThree interpolation basis functions are set:
the sampling data characteristic calculation formula is as follows:
Figure BDA0002263005770000031
the enterprise comprehensive data preprocessing method provided by the invention can conveniently preprocess different types of data by establishing different characteristic parameters through different classification databases of acquired data and by forming the resume of the enterprise rule base and the discrete characteristic database.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a method for preprocessing enterprise comprehensive data. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the method for preprocessing the enterprise integrated data includes:
establishing a plurality of data storage libraries and a preprocessing rule library, and storing enterprise data in a classified manner to form a data list;
respectively sampling and extracting the data characteristics of each database, and denoising the data characteristics;
discretizing the data characteristics to form a nonlinear discrete data characteristic library;
sampling the nonlinear discrete data, and decomposing the sampled data to form a data matrix subset;
and importing data on the basis of the data matrix and evaluating a preprocessing result.
Further, the establishing of the data storage library and the preprocessing rule library further comprises the steps of establishing a data table in each database, conducting standardized naming, conducting sampling on data to be preprocessed, then importing the data into the newly established data table, and conducting mathematical statistics on the value of each field of the sampled preprocessed data.
Further, the plurality of data stores comprise:
an enterprise information database for storing prospective plans and target plans of an enterprise;
the market information database is used for recording connection industry big data information and questionnaire evaluation feedback information;
and the policy information database is used for recording the government policy information and changes for the industry.
Further, the discretizing the data feature to form a nonlinear discrete data feature library includes:
selecting a plurality of sampling point data characteristic values as X for each database1,X2,X3......XiThe sampling values are respectively Y1,Y2,Y3......YiThree interpolation basis functions are set:
the sampling data characteristic calculation formula is as follows:
Figure BDA0002263005770000051
further, the step of importing data based on the data matrix and evaluating the preprocessing result further includes:
calculating the characteristic vector of each section in the data matrix subset and the characteristic vector of the regular data in the rule base, and correcting according to the error judgment between the two characteristic vectors;
importing all data to be preprocessed into a newly-built data table, preprocessing the data according to a data preprocessing method in a rule base, and evaluating a preprocessing result;
and generating an evaluation log for feeding back evaluation information.
An electronic device comprising a memory and a processor, the memory having stored thereon a pre-processing program of enterprise integrated data executable on the processor, the pre-processing program of enterprise integrated data when executed by the processor implementing the steps of:
establishing a plurality of data storage libraries and a preprocessing rule library, and storing enterprise data in a classified manner to form a data list;
respectively sampling and extracting the data characteristics of each database, and denoising the data characteristics;
discretizing the data characteristics to form a nonlinear discrete data characteristic library;
sampling the nonlinear discrete data, and decomposing the sampled data to form a data matrix subset;
and importing data on the basis of the data matrix and evaluating a preprocessing result.
Further, the pre-processing program of the enterprise integrated data, when executed by the processor, further implements the steps of: and newly building a data table in each database, carrying out standardized naming, sampling the data to be preprocessed, introducing the data into the newly built data table, and carrying out mathematical statistics on the value of each field of the sampled preprocessed data.
Further, the step of forming a non-linear discrete data feature library by discretizing the data features comprises:
selecting a plurality of sampling point data characteristic values as X for each database1,X2,X3......XiThe sampling values are respectively Y1,Y2,Y3......YiThree interpolation basis functions are set:
the sampling data characteristic calculation formula is as follows:
Figure BDA0002263005770000061
it should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the above embodiment method can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above, and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications made by the equivalent structures or equivalent processes in the present specification, or directly or indirectly applied to other related technical fields are included in the scope of the present invention.

Claims (8)

1. A method for preprocessing enterprise integrated data is applied to an electronic device, and is characterized in that the method comprises the following steps:
establishing a plurality of data storage libraries and a preprocessing rule library, and storing enterprise data in a classified manner to form a data list;
respectively sampling and extracting the data characteristics of each database, and denoising the data characteristics;
discretizing the data characteristics to form a nonlinear discrete data characteristic library;
sampling the nonlinear discrete data, and decomposing the sampled data to form a data matrix subset;
and importing data on the basis of the data matrix and evaluating a preprocessing result.
2. The method for preprocessing enterprise integrated data according to claim 1, wherein the establishing of the data repository and the preprocessing rule base further comprises the steps of establishing a new data table in each database, naming the data to be preprocessed, sampling the data to be preprocessed, importing the sampled data into the new data table, and performing mathematical statistics on the value of each field of the sampled preprocessed data.
3. The method for preprocessing enterprise complex data as recited in claim 1, wherein said plurality of data repositories comprises:
an enterprise information database for storing prospective plans and target plans of an enterprise;
the market information database is used for recording connection industry big data information and questionnaire evaluation feedback information;
and the policy information database is used for recording the government policy information and changes for the industry.
4. The method for pre-processing enterprise synthetic data according to claim 1 wherein the step of discretizing the data features to form a non-linear discrete data feature library comprises:
selecting a plurality of sampling point data characteristic values as X for each database1,X2,X3......XiThe sampling values are respectively Y1,Y2,Y3......YiThree interpolation basis functions are set:
the sampling data characteristic calculation formula is as follows:
Figure FDA0002263005760000021
5. the method of pre-processing enterprise synthetic data according to claim 1, wherein the step of importing data and evaluating the pre-processing result based on the data matrix further comprises:
calculating the characteristic vector of each section in the data matrix subset and the characteristic vector of the regular data in the rule base, and correcting according to the error judgment between the two characteristic vectors;
importing all data to be preprocessed into a newly-built data table, preprocessing the data according to a data preprocessing method in a rule base, and evaluating a preprocessing result;
and generating an evaluation log for feeding back evaluation information.
6. An electronic device, comprising a memory and a processor, wherein the memory stores a pre-processing program of enterprise integrated data operable on the processor, and the pre-processing program of enterprise integrated data when executed by the processor implements the steps of:
establishing a plurality of data storage libraries and a preprocessing rule library, and storing enterprise data in a classified manner to form a data list;
respectively sampling and extracting the data characteristics of each database, and denoising the data characteristics;
discretizing the data characteristics to form a nonlinear discrete data characteristic library;
sampling the nonlinear discrete data, and decomposing the sampled data to form a data matrix subset;
and importing data on the basis of the data matrix and evaluating a preprocessing result.
7. The electronic device of claim 6, wherein the pre-handler of the enterprise aggregated data when executed by the processor further performs the steps of: and newly building a data table in each database, naming in a standardized manner, sampling the data to be preprocessed, introducing the sampled data into the newly built data table, and performing mathematical statistics on the value of each field of the sampled preprocessed data.
8. The electronic device of claim 6, wherein the step of forming a non-linear discrete data feature library by discretizing the data features comprises:
selecting a plurality of sampling point data characteristic values as X for each database1,X2,X3......XiThe sampling values are respectively Y1,Y2,Y3......YiThree interpolation basis functions are set:
the sampling data characteristic calculation formula is as follows:
Figure FDA0002263005760000031
CN201911077743.8A 2019-11-06 2019-11-06 Preprocessing method for enterprise comprehensive data Pending CN111104442A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911077743.8A CN111104442A (en) 2019-11-06 2019-11-06 Preprocessing method for enterprise comprehensive data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911077743.8A CN111104442A (en) 2019-11-06 2019-11-06 Preprocessing method for enterprise comprehensive data

Publications (1)

Publication Number Publication Date
CN111104442A true CN111104442A (en) 2020-05-05

Family

ID=70420486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911077743.8A Pending CN111104442A (en) 2019-11-06 2019-11-06 Preprocessing method for enterprise comprehensive data

Country Status (1)

Country Link
CN (1) CN111104442A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009271641A (en) * 2008-05-01 2009-11-19 Japan Science & Technology Agency Audio processing device and audio processing method
CN105447090A (en) * 2015-11-05 2016-03-30 华中科技大学 Automated data mining preprocessing method
CN106372185A (en) * 2016-08-31 2017-02-01 广东京奥信息科技有限公司 Data preprocessing method for heterogeneous data sources
CN107909274A (en) * 2017-11-17 2018-04-13 平安科技(深圳)有限公司 Enterprise investment methods of risk assessment, device and storage medium
CN109190937A (en) * 2018-08-16 2019-01-11 深圳前海乘方互联网金融服务有限公司 A kind of investment value assessment system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009271641A (en) * 2008-05-01 2009-11-19 Japan Science & Technology Agency Audio processing device and audio processing method
CN105447090A (en) * 2015-11-05 2016-03-30 华中科技大学 Automated data mining preprocessing method
CN106372185A (en) * 2016-08-31 2017-02-01 广东京奥信息科技有限公司 Data preprocessing method for heterogeneous data sources
CN107909274A (en) * 2017-11-17 2018-04-13 平安科技(深圳)有限公司 Enterprise investment methods of risk assessment, device and storage medium
CN109190937A (en) * 2018-08-16 2019-01-11 深圳前海乘方互联网金融服务有限公司 A kind of investment value assessment system

Similar Documents

Publication Publication Date Title
US11816120B2 (en) Extracting seasonal, level, and spike components from a time series of metrics data
US9514167B2 (en) Behavior based record linkage
CN108763277B (en) Data analysis method, computer readable storage medium and terminal device
US9355071B2 (en) System and method for Multivariate outlier detection
CN105894183A (en) Project evaluation method and apparatus
CN108647729B (en) User portrait acquisition method
US20160335551A1 (en) Optimization of fraud detection strategies
CN110166289B (en) Method and device for identifying target information assets
Adams et al. Explainable concept drift in process mining
CN111242183A (en) Image identification and classification method and device based on attention mechanism
CN114491108A (en) Online classification system and method based on multi-source remote sensing application data
CN112069269B (en) Big data and multidimensional feature-based data tracing method and big data cloud server
Kumari et al. An efficient system for color image retrieval representing semantic information to enhance performance by optimizing feature extraction
CN111104442A (en) Preprocessing method for enterprise comprehensive data
CN113011961A (en) Method, device and equipment for monitoring risk of company associated information and storage medium
CN111784402A (en) Multi-channel based order-descending rate prediction method and device and readable storage medium
CN110597990A (en) Financial analysis method and system based on intelligent classification
CN110062112A (en) Data processing method, device, equipment and computer readable storage medium
CN115409541A (en) Cigarette brand data processing method based on data blood relationship
CN114266643A (en) Enterprise mining method, device, equipment and storage medium based on fusion algorithm
CN105574038B (en) Content of text discrimination test method and device based on anti-identification rendering
CN114185967A (en) Financial data mining method based on cloud computing
CN111460268B (en) Method and device for determining database query request and computer equipment
CN114565772A (en) Set feature extraction method and device, electronic equipment and storage medium
CN114297052A (en) Test data generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200505