CN110750695A - Credit data processing method and computer readable storage medium - Google Patents

Credit data processing method and computer readable storage medium Download PDF

Info

Publication number
CN110750695A
CN110750695A CN201910899813.1A CN201910899813A CN110750695A CN 110750695 A CN110750695 A CN 110750695A CN 201910899813 A CN201910899813 A CN 201910899813A CN 110750695 A CN110750695 A CN 110750695A
Authority
CN
China
Prior art keywords
data
metadata
basic data
processing
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910899813.1A
Other languages
Chinese (zh)
Inventor
孙中海
沈毅
姚思明
柯文朴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Information Group Big Data Operation Co Ltd
Original Assignee
Xiamen Information Group Big Data Operation Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Information Group Big Data Operation Co Ltd filed Critical Xiamen Information Group Big Data Operation Co Ltd
Priority to CN201910899813.1A priority Critical patent/CN110750695A/en
Publication of CN110750695A publication Critical patent/CN110750695A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management

Abstract

The invention provides a credit data processing method, which is characterized by comprising the following steps: s1, screening relevant data tables from the original data for analysis; s2, importing the analyzed data, and classifying and storing the imported data to generate metadata; s3, cataloging the metadata and entering S4; s4, performing quality evaluation on the metadata, directly performing abandonment processing when the metadata are unavailable, entering step S6 when the metadata completely meet the requirements, and entering step S5 when the metadata need to be treated; s5, the metadata is treated, and the step S6 is carried out; and S6, generating basic data.

Description

Credit data processing method and computer readable storage medium
Technical Field
The invention relates to a credit data processing method based on big data and a computer readable storage medium.
Background
With the rapid development of internet technology, credit or credit rating of an entity corresponding to a terminal has become an important part of a user's life.
At present, although various credit data are available for a credit evaluation system, the original credit data of each entity are various and are difficult to be directly used. Currently, there is no system or method for data integration of a wide variety of credit raw data.
Disclosure of Invention
The present invention provides a credit data processing method and a computer-readable storage medium, which can effectively solve the above problems.
The invention is realized by the following steps:
a method for processing credit data, comprising the steps of:
s1, screening relevant data tables from the original data for analysis;
s2, importing the analyzed data, and classifying and storing the imported data to generate metadata;
s3, cataloging the metadata and entering S4;
s4, performing quality evaluation on the metadata, directly performing abandonment processing when the metadata are unavailable, entering step S6 when the metadata completely meet the requirements, and entering step S5 when the metadata need to be treated;
s5, the metadata is treated, and the step S6 is carried out;
and S6, generating basic data.
As a further improvement, the credit data processing method further includes:
s7, carrying out aggregation processing on the basic data to produce a basic data set;
s8, the quality of the basic data set is evaluated, the basic data set is directly discarded when the metadata are unavailable, the step S10 is carried out when the basic data set completely meets the requirements, and the step S9 is carried out when the basic data set needs to be treated;
s9, treating the basic data set, and entering the step S10;
and S10, generating a new data directory.
As a further improvement, in step S4, the step of performing quality evaluation on the metadata includes:
the integrity of the metadata and the repetition rate of the data are evaluated.
As a further improvement, in step S2, the step of storing the imported data in a classified manner includes:
the imported data is classified and stored according to the traffic, construction, court, society, customs, tax and industry and commerce categories.
As a further improvement, in step S5, the step of administering the metadata includes:
and carrying out duplicate removal treatment on the metadata.
As a further improvement, in step S7, the step of performing aggregation processing on the basic data includes:
and carrying out aggregation processing on the basic data through a union/join command.
As a further improvement, in step S9, the step of administering the basic data set includes:
and carrying out duplicate removal treatment on the basic data set.
The present invention further provides a computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the above-mentioned credit data processing method.
The invention has the beneficial effects that: the credit data processing method provided by the invention can effectively import, catalog, clean, administer and release the original credit data. The method processes 2.21 hundred million credit data from different channels, analyzes the repetition rate and integrity of the data, and performs data deduplication treatment to finally form 0.96 hundred million effective data. Practice shows that the method can effectively treat data from different channels, forms an optimal data result, and can effectively improve the data quality by integrating to ensure that originally disordered data has practical value.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of a credit data processing method according to an embodiment of the present invention.
Fig. 2 is a flowchart of a credit data processing method according to another embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Referring to fig. 1, an embodiment of the present invention provides a method for processing credit data, including the following steps:
s1, screening relevant data tables from the original data for analysis;
s2, importing the analyzed data, and classifying and storing the imported data to generate metadata;
s3, cataloging the metadata and entering S4;
s4, performing quality evaluation on the metadata, directly performing abandonment processing when the metadata are unavailable, entering step S6 when the metadata completely meet the requirements, and entering step S5 when the metadata need to be treated;
s5, the metadata is treated, and the step S6 is carried out;
and S6, generating basic data.
In step S1, the embodiment of the present invention takes the unified social credit code as an example for analysis, and when analyzing the raw data, the raw data is screened for relevant data tables, as shown in table 1.
Table 1 shows the relevant data table obtained by screening the raw data for analysis using the unified social credit code as an example
Figure BDA0002211462510000051
Figure BDA0002211462510000071
Figure BDA0002211462510000081
The following conclusions can be obtained through rough analysis:
the unified social credit code is mainly divided into national general table data and local data, and the sum of the local data is greater than the national data. The local data has multiple copies, and the problems of partial data repetition, related data information loss and the like exist.
As a further improvement, in step S2, the step of storing the imported data in a classified manner includes:
the imported data is classified and stored according to the traffic, construction, courtroom, society, customs, tax and industry and commerce categories, so that the data management is facilitated. In this embodiment, the unified social credit code is stored in the business category.
As a further improvement, in step S4, the step of performing quality evaluation on the metadata includes:
the integrity of the metadata and the repetition rate of the data are evaluated.
As a further improvement, in step S5, the step of administering the metadata includes:
and carrying out duplicate removal treatment on the metadata.
The embodiment of the present invention further provides a system for processing credit data, including:
the analysis module is used for screening a relevant data table from the original data and analyzing the data table;
the data import module is used for importing the analyzed data and classifying and storing the imported data to generate metadata;
the metadata module is used for cataloguing the metadata;
the data quality evaluation module is used for carrying out quality evaluation on the metadata;
the data governance module is used for governing the metadata; and
and the basic data module is used for generating basic data.
Another embodiment of the present invention further provides a credit data processing method, including the following steps:
s1, screening relevant data tables from the original data for analysis;
s2, importing the analyzed data, and classifying and storing the imported data to generate metadata;
s3, cataloging the metadata and entering S4;
s4, performing quality evaluation on the metadata, directly performing abandonment processing when the metadata are unavailable, entering step S6 when the metadata completely meet the requirements, and entering step S5 when the metadata need to be treated;
s5, the metadata is treated, and the step S6 is carried out;
s6, generating basic data;
s7, carrying out aggregation processing on the basic data to produce a basic data set;
s8, the quality of the basic data set is evaluated, the basic data set is directly discarded when the metadata are unavailable, the step S10 is carried out when the basic data set completely meets the requirements, and the step S9 is carried out when the basic data set needs to be treated;
s9, treating the basic data set, and entering the step S10;
and S10, generating a new data directory.
As a further improvement, in step S7, the step of performing aggregation processing on the basic data includes:
and carrying out aggregation processing on the basic data through a union/join command.
As a further improvement, in step S9, the step of administering the basic data set includes:
and carrying out duplicate removal treatment on the basic data set.
The embodiment of the present invention further provides a system for processing credit data, including:
the analysis module is used for screening a relevant data table from the original data and analyzing the data table;
the data import module is used for importing the analyzed data and classifying and storing the imported data to generate metadata;
the metadata module is used for cataloguing the metadata;
the data quality evaluation module is used for carrying out quality evaluation on the metadata;
the data governance module is used for governing the metadata;
the basic data generating module is used for generating basic data;
the aggregation module is used for carrying out aggregation processing on the basic data to produce a basic data set;
the basic data set quality evaluation module is used for evaluating the quality of the basic data set;
the basic data set treatment module is used for treating the basic data set;
and the data directory generation module is used for generating a data directory.
In the several embodiments provided in the present invention, it should be understood that the disclosed system and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be substantially or partially implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A method for processing credit data, comprising the steps of:
s1, screening relevant data tables from the original data for analysis;
s2, importing the analyzed data, and classifying and storing the imported data to generate metadata;
s3, cataloging the metadata and entering S4;
s4, performing quality evaluation on the metadata, directly performing abandonment processing when the metadata are unavailable, entering step S6 when the metadata completely meet the requirements, and entering step S5 when the metadata need to be treated;
s5, the metadata is treated, and the step S6 is carried out;
and S6, generating basic data.
2. The method of processing credit data of claim 1, further comprising:
s7, carrying out aggregation processing on the basic data to produce a basic data set;
s8, the quality of the basic data set is evaluated, the basic data set is directly discarded when the metadata are unavailable, the step S10 is carried out when the basic data set completely meets the requirements, and the step S9 is carried out when the basic data set needs to be treated;
s9, treating the basic data set, and entering the step S10;
and S10, generating a new data directory.
3. The method for processing credit data as claimed in claim 1, wherein in step S4, the step of performing quality evaluation on the metadata comprises:
the integrity of the metadata and the repetition rate of the data are evaluated.
4. The method for processing credit data according to claim 1, wherein in step S2, the step of storing the imported data by classification includes:
the imported data is classified and stored according to the traffic, construction, court, society, customs, tax and industry and commerce categories.
5. The method for processing credit data as claimed in claim 1, wherein in step S5, the step of administering the metadata comprises:
and carrying out duplicate removal treatment on the metadata.
6. The method for processing credit data according to claim 1, wherein in step S7, the step of performing aggregation processing on the basic data includes:
and carrying out aggregation processing on the basic data through a union/join command.
7. The method for processing credit data according to claim 1, wherein in step S9, the step of administering the basic data set includes:
and carrying out duplicate removal treatment on the basic data set.
8. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the method for processing credit data according to any one of claims 1 to 7.
CN201910899813.1A 2019-09-23 2019-09-23 Credit data processing method and computer readable storage medium Pending CN110750695A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910899813.1A CN110750695A (en) 2019-09-23 2019-09-23 Credit data processing method and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910899813.1A CN110750695A (en) 2019-09-23 2019-09-23 Credit data processing method and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN110750695A true CN110750695A (en) 2020-02-04

Family

ID=69276836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910899813.1A Pending CN110750695A (en) 2019-09-23 2019-09-23 Credit data processing method and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110750695A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111913954A (en) * 2020-06-20 2020-11-10 杭州城市大数据运营有限公司 Intelligent data standard catalog generation method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256247A (en) * 2017-06-07 2017-10-17 九次方大数据信息集团有限公司 Big data data administering method and device
CN107766418A (en) * 2017-09-08 2018-03-06 广州汪汪信息技术有限公司 A kind of credit estimation method based on Fusion Model, electronic equipment and storage medium
CN110232098A (en) * 2019-04-22 2019-09-13 汇通达网络股份有限公司 A kind of data warehouse administered based on data and genetic connection designs
CN110245921A (en) * 2019-06-20 2019-09-17 普元信息技术股份有限公司 The method that data service upstream and downstream link tracing function is realized based on metadata in big data improvement

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256247A (en) * 2017-06-07 2017-10-17 九次方大数据信息集团有限公司 Big data data administering method and device
CN107766418A (en) * 2017-09-08 2018-03-06 广州汪汪信息技术有限公司 A kind of credit estimation method based on Fusion Model, electronic equipment and storage medium
CN110232098A (en) * 2019-04-22 2019-09-13 汇通达网络股份有限公司 A kind of data warehouse administered based on data and genetic connection designs
CN110245921A (en) * 2019-06-20 2019-09-17 普元信息技术股份有限公司 The method that data service upstream and downstream link tracing function is realized based on metadata in big data improvement

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111913954A (en) * 2020-06-20 2020-11-10 杭州城市大数据运营有限公司 Intelligent data standard catalog generation method and device
CN111913954B (en) * 2020-06-20 2023-08-04 杭州城市大数据运营有限公司 Intelligent data standard catalog generation method and device

Similar Documents

Publication Publication Date Title
JP5575902B2 (en) Information retrieval based on query semantic patterns
DE112012005037B4 (en) Manage redundant immutable files using deduplications in storage clouds
US20160285918A1 (en) System and method for classifying documents based on access
CN104424202B (en) Duplicate checking method and system are carried out to the customer information in crm system
CN111400392B (en) Multi-source heterogeneous data processing method and device
CN112417492A (en) Service providing method based on data classification and classification
Utamachant et al. An analysis of high-value datasets: a case study of Thailand’s open government data
US11928433B2 (en) Systems and methods for term prevalence-volume based relevance
CN102945246A (en) Method and device for processing network information data
TWI254880B (en) Method for classifying electronic document analysis
CN108427667B (en) Legal document segmentation method and device
CN110750695A (en) Credit data processing method and computer readable storage medium
CN103336800A (en) Fingerprint storage and comparison method based on behavior analysis
WO2021114634A1 (en) Text annotation method, device, and storage medium
CN117331975A (en) Method and device for executing data processing task, computer equipment and storage medium
CN105786929B (en) A kind of information monitoring method and device
US20130318104A1 (en) Method and system for analyzing data in artifacts and creating a modifiable data network
US20090300000A1 (en) Method and System For Improved Search Relevance In Business Intelligence systems through Networked Ranking
US11709798B2 (en) Hash suppression
CN114443727A (en) Human vein data processing method, device, equipment and storage medium
EP3480821B1 (en) Clinical trial support network data security
CN111984798A (en) Atlas data preprocessing method and device
Timonin et al. Research of filtration methods for reference social profile data
Assiroj et al. The performance of Naïve Bayes, support vector machine, and logistic regression on Indonesia immigration sentiment analysis
US10936665B2 (en) Graphical match policy for identifying duplicative data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination