CN111897845A - Method and system for processing mass credit information based on process - Google Patents

Method and system for processing mass credit information based on process Download PDF

Info

Publication number
CN111897845A
CN111897845A CN202010743848.9A CN202010743848A CN111897845A CN 111897845 A CN111897845 A CN 111897845A CN 202010743848 A CN202010743848 A CN 202010743848A CN 111897845 A CN111897845 A CN 111897845A
Authority
CN
China
Prior art keywords
data
information
processing
verification
repeated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010743848.9A
Other languages
Chinese (zh)
Other versions
CN111897845B (en
Inventor
汤自华
张城炜
江浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xuzhou Kingdee Software Co ltd
Original Assignee
Xuzhou Kingdee Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xuzhou Kingdee Software Co ltd filed Critical Xuzhou Kingdee Software Co ltd
Priority to CN202010743848.9A priority Critical patent/CN111897845B/en
Publication of CN111897845A publication Critical patent/CN111897845A/en
Application granted granted Critical
Publication of CN111897845B publication Critical patent/CN111897845B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0633Workflow analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Library & Information Science (AREA)
  • Operations Research (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a method and a system for realizing mass credit information processing based on a process, which comprises the steps of uniformly describing all resource information and verification logic related to credit through metadata and storing the resource information and the verification logic into a database; and the thread pools with the number matched with the configuration setting of the server are responsible for multi-task processing, the data processing flows corresponding to a plurality of tasks are simultaneously executed in a concurrent mode, and the data processing flow corresponding to each task is divided into a plurality of data processing sub-flows which are sequentially executed. The invention abstracts the whole processing logic based on the flow mode, meets different processing services and specific service logics of different resources, and realizes real-time monitoring of the execution condition by adding monitoring before and after the flow and the steps. All the processing flow examples are executed in a concurrent mode in a blocking mode, and the high efficiency and stability of system processing are guaranteed. The same resource repeatability verification adopts a queue mode to queue and execute, and the uniqueness of the credit record is ensured.

Description

Method and system for processing mass credit information based on process
Technical Field
The invention relates to a method for processing mass credit information based on a process, belonging to the technical field of data query.
Background
The national standard related to public credit information defines 7 major classes, 37 minor classes and 102 information resources for credit main bodies (including natural people, legal people or other organizations) and credit information thereof, and more credit information in different fields can be added in practical application. The information is mainly entered into a public credit information platform by a relevant agency through three ways, which are respectively: by manually importing files, the front-end library is directly butted, exchanged and imported, and a WEB service mode is called to automatically acquire data. The common format of the file is Excel, CSV or TXT.
However, in any of the methods, the information needs to be processed first, which mainly includes data format verification, association verification, repeatability verification, encryption processing, and the like. Specifically, the data format verification refers to verification of a single information item, such as a compliant identification number, and verification between information items, such as a release date being less than an expiration date. The association verification is that the certificate type, the evidence number and the main body information are associated, so that the credit information of the same main body has the unique main body ID for association, and various inquiry, analysis and gathering applications can be realized conveniently at the later stage; the unassociated data needs to be supplemented with the subject information according to the authentication information item, thereby ensuring that the credit information is associated with the subject information. And the repeated credit information record is avoided in the repeated verification, and whether the record is repeated is judged according to a plurality of key service fields in the record. The encryption processing means that the information items related to the individual privacy need to be encrypted and stored.
At present, an integral processing method is generally adopted, namely, all data are loaded for processing, but the following problems exist in the actual use process:
1. in order to prevent the data volume from being too large, the size of the data loaded at one time is limited, an operator needs to manually divide the data into a plurality of files, the operation is troublesome, and the efficiency is low;
2. because of more information resources, large information amount, complex processing logic, lower processing efficiency and large concurrent amount in the service peak period, the system is often unstable and even goes down due to the overload of the memory;
3. different resources have different verification modes and logics, individualized verification needs to be independently developed, the universality is not strong, and the maintainability is low.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a method and a system for processing mass credit information based on a flow, which can make full use of the computing capacity of the system and realize high-efficiency processing of mass data; the stability is good, and the concurrency processing capability is higher; the method and the device define the verification rules through the metadata, meet the flexible verification rules of different resources, and have strong universality.
In order to achieve the purpose, the invention adopts the following technical scheme: a method for realizing mass credit information processing based on a process comprises the following steps:
uniformly describing all resource information and verification logic related to credit through metadata, and storing the resource information and the verification logic into a database;
the thread pool with the matched number is set to be responsible for multi-task processing according to the configuration of the server, the data processing flows corresponding to a plurality of tasks are executed simultaneously in a concurrent mode, and the data processing flow corresponding to each task is divided into a plurality of data processing sub-flows which are carried out in sequence;
each data processing sub-process comprises three steps, which are sequentially as follows: acquiring data, processing the data and storing the data;
the data acquisition is that the data in the data processing flow to which the data processing sub-flow belongs is acquired in a flow type in a blocking mode, the maximum fixed number of recorded parameters of each block is set according to the configuration of a server, and the number of the blocks is the same as the number of the data processing sub-flows in the data processing flow to which the data processing sub-flow belongs;
the processing data is used for processing the acquired data;
the storage data is obtained by writing the processed qualified data into a database, generating an error record file by the unqualified data and informing a client;
loading meta information into a process execution context after each data processing process is started, wherein the data processing process and the process steps can be subjected to event monitoring and used for inserting relevant business process logic before and after execution; and after processing in each step, persisting the processing state for counting the processing condition, success quantity and failure quantity.
The description through the metadata refers to the description of each resource information related to the credit according to a resource name, a resource code and a resource information recording repetition rule, and the preliminary classification processing of the resource information is carried out according to the resource name, the resource code and the resource information recording repetition rule; each item of metadata in each resource information comprises a name, a description, a data type, a mandatory item, a verification rule, an encryption or decryption rule and a conversion rule.
And the data processing flow corresponding to each task is manually started by a user or automatically started by the system at regular time.
The processing data comprises format verification, block record repetition verification, association verification and information conversion,
the format verification is performed according to a verification rule configured in the meta-information;
the block record repeated verification is used for determining that no mutually repeated records exist in the record processed at the time, and the subsequent steps can be carried out concurrent processing; generating record summary information according to a verification repeated field list configured by the meta information, storing the summary information in a de-duplication set named by the current processing resource information code in a cache, and returning to an existing state if the corresponding summary information exists; marking repeated marks on repeated records, and taking the repeated records as error data to be not processed; the data with non-repeated block records continues to be processed in the subsequent steps; after the whole process is finished, judging whether other tasks of the current resource are running or not, if not, clearing the de-duplication set for storing the summary information of the current resource in the cache;
the association verification is to associate the acquired data with credit subject information, namely to compare only the subject identification information of the acquired data with the subject identification record loaded in the cache when the system is started; adding a body Identification (ID) field to the successful association record; generating a unique main body ID by data which are not successfully correlated, adding the unique main body ID into the current record, triggering an addition event, and adding corresponding main body information into a database in an addition monitoring process; the independent treatment of the supplement process can ensure the treatment efficiency of the main process;
the information conversion is to convert and encrypt the acquired data part field according to the meta-information configuration, and simultaneously reserve the original information of the field; and warehousing all the verified compliant data records with the converted field data, and verifying the unqualified record with the original data export file.
Before qualified data is put in storage, the stored data needs to be compared with a historical record to verify whether the stored data is repeated, the stored data is executed in series in a mode that a plurality of data processing flows which run simultaneously and have the same resource are queued in a queue, and the stored data is executed in a concurrent mode in a mode that the plurality of data processing flows which run simultaneously and have different resources are queued; the repeated verification step of the historical records is the same as the repeated judgment mechanism of the records in the blocks, verification is carried out by comparing summary information, a batch verification mode is adopted in the verification process, and repeated error marks are arranged on the repeated records.
The writing step of generating error record files in the stored data is carried out in a batch mode, filtering is carried out according to the record state, the files are written only if the error record files are marked as verification error states and are not repeated error marks, and repeated record data are directly discarded; opening the stream before the step is started, closing the stream generation file after the step is ended, recording a set window value in the memory, and writing in batches if the window value exceeds the threshold value; and the step of entering the database written in the stored data is carried out in a batch mode, batch writing is carried out according to a set window value configured by the server, and only non-error state data records are written.
A system for processing mass credit information based on a process comprises:
a database storing all credit-related resource information and validation logic described by metadata;
the server is provided with thread pools which are matched with the configuration of the server in quantity and are responsible for multitasking;
the data processing flow corresponds to the tasks to be processed one by one and comprises a plurality of data processing sub-flows which are performed in sequence; the data processing flows corresponding to the tasks are executed simultaneously in a concurrent mode;
the data processing sub-process comprises a data acquisition unit, a data processing unit and a data storage unit;
the data acquisition unit acquires data in the data processing flow to which the data processing sub-flow belongs in a block-by-block mode, parameters recorded by the maximum fixed number of each block are set according to the configuration of the server, and the number of the blocks is the same as that of the data processing sub-flows in the data processing flow to which the data processing sub-flows belong;
the data processing unit is used for processing the acquired data;
the data storage unit writes the processed qualified data into a database, generates an error record file from unqualified data and informs a client of the error record file;
loading meta information into a process execution context after each data processing process is started, wherein the data processing process and the process steps can be subjected to event monitoring and used for inserting relevant business process logic before and after execution; and after processing in each step, persisting the processing state for counting the processing condition, success quantity and failure quantity.
And the data processing flow corresponding to each task is manually started by a user or automatically started by the system at regular time.
The processing data unit comprises a format verification unit, a block record repetition verification unit, an association verification unit and an information conversion unit,
the format verification unit verifies according to a verification rule configured in the meta-information;
the block record repeated verification unit is used for determining that the records processed at the time are not mutually repeated, and subsequent steps can be concurrently processed; generating record summary information according to a verification repeated field list configured by the meta information, storing the summary information in a de-duplication set named by the current processing resource information code in a cache, and returning to an existing state if the corresponding summary information exists; marking repeated marks on repeated records, and taking the repeated records as error data to be not processed; the data with non-repeated block records continues to be processed in the subsequent steps; after the whole process is finished, judging whether other tasks of the current resource are running or not, if not, clearing the de-duplication set for storing the summary information of the current resource in the cache;
the association verification unit is used for associating the acquired data with credit subject information, namely only comparing the subject identification information of the acquired data with the subject identification record loaded in the cache when the system is started; adding a body Identification (ID) field to the successful association record; generating a unique main body ID by data which are not successfully correlated, adding the unique main body ID into the current record, triggering an addition event, and adding corresponding main body information into a database in an addition monitoring process; the independent treatment of the supplement process can ensure the treatment efficiency of the main process;
the information conversion unit is used for converting and encrypting the acquired data part field according to the meta-information configuration, and simultaneously reserving the original field information; and warehousing all the verified compliant data records with the converted field data, and verifying the unqualified record with the original data export file.
The storage data unit is compared with a historical record to verify whether the storage data unit is repeated before qualified data is put in storage, and is executed in series in a mode that a plurality of data processing flows which run simultaneously and have the same resource are queued in a queue, and is executed in a concurrent mode in a mode that the data processing flows which run simultaneously and have different resources are queued; the repeated verification step of the historical records is the same as the repeated judgment mechanism of the records in the blocks, verification is carried out by comparing summary information, a batch verification mode is adopted in the verification process, and repeated error marks are arranged on the repeated records.
Compared with the prior art, the method abstracts the whole processing logic based on the flow mode, meets different processing services and specific service logics of different resources, and realizes real-time monitoring of the execution condition by adding monitoring before and after the flow and the steps. All the processing flow examples are executed in a concurrent mode in a blocking mode, and the high efficiency and stability of system processing are guaranteed. The same resource repeatability verification adopts a queue mode to queue and execute, and the uniqueness of the credit record is ensured.
Drawings
FIG. 1 is a main flow diagram of the present invention;
FIG. 2 is a schematic diagram of the snooping of the present invention;
FIG. 3 is a flow chart of data processing according to the present invention;
FIG. 4 is a schematic diagram of a data processing flow corresponding to a plurality of tasks according to the present invention;
FIG. 5 is a graph comparing the overall process with the flow process.
Detailed Description
The technical solutions in the implementation of the present invention will be made clear and fully described below with reference to the accompanying drawings, and the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1 to 4, the method for processing massive credit information based on a process provided by the present invention includes:
uniformly describing all resource information and verification logic related to credit through metadata, and storing the resource information and the verification logic into a database; describing through metadata means that each resource information related to credit is described according to items such as resource names, resource codes, resource information recording repetition rules and the like, so that the resource information can be conveniently classified and processed primarily according to the resource names, the resource codes and the resource information recording repetition rules at the later stage; each item of metadata in each resource information can comprise a name, a description, a data type, a mandatory item, a verification rule, an encryption or decryption rule and a conversion rule; in other words, the resource meta-information description includes a resource name, a resource code (corresponding to a data table name), a resource information record repetition rule, and the like; the metadata describing the resource information items mainly comprises names, descriptions, data types, required items, verification rules, encryption rules, conversion rules and the like. The verification rules can be set simultaneously, and include static rules and dynamic rules. The static rules comprise various certificate number rules, date formats, custom rule expressions and the like. The dynamic rule is comparison and verification among information items and comprises a date interval, unequal conditions, equal conditions, bound conditions and the like. The conversion rule converts the data, such as converting gender male to "01" and converting gender female to "02". The repeated verification rule carries out unique identification by appointing a plurality of fields to be combined in sequence, and a record is uniquely determined by three fields of certificate type, certificate number and permission number. Each rule includes a code for identification and error prompt information for describing the cause of the error in the failed record of validation.
The meta-information of two typical credit resources is listed below.
TABLE 1 owing Utility fee information
Figure BDA0002607675030000071
Figure BDA0002607675030000081
TABLE 2 customs advanced certification enterprise information
Figure BDA0002607675030000082
As shown in table 1 and table 2 above, the resource meta-information descriptions of two different credit information, which are the delinquent utility charge information and the customs advanced certification enterprise information, include name, code and repeated validation rules, each information item meta-information in one credit information includes name, description, data type, whether it is necessary to fill in, validation rules, whether it is encrypted, conversion rules, etc., and the specific repeated validation rules and the description items of each information item resource meta-information are set according to the characteristics of the credit information resource itself.
Loading the existing main body identification information in the database into a cache when the system is started, wherein the main body identification information comprises a certificate type, a certificate number and a main body ID; after the data processing flow is started, the meta information is loaded into the flow execution context, and the subsequent steps analyze the data processing logic according to the meta information. The main body identification information and the meta information are loaded firstly, so that the problem that the overall processing performance is reduced due to frequent interaction with a database in the processing flow process can be avoided.
The thread pool with the number matched with the configuration setting of the server is responsible for multitasking, as shown in fig. 4, data processing flows corresponding to a plurality of tasks are executed simultaneously in a concurrent manner, and the data processing flow corresponding to each task is divided into a plurality of data processing sub-flows which are performed sequentially.
The data processing flow corresponding to each task is started manually by a user or automatically started at regular time by the system. Manually starting, namely manually uploading various credit information data files by a user; the timing start mainly aims at the library-to-library exchange task, a cron timing expression is set for the exchange task, and the system automatically starts the processing flow according to the cron timing. The system is automatically started at regular time, the data processing flow to be processed can be automatically matched with the information in the database, and the condition that the file format is not correct does not exist. For manual starting, an operator imports a file, and when the format of the imported file is not right, the system can generate reminding content to inform that the imported file is not right in format and remind a user of corresponding adjustment or replacement.
Each data processing sub-process comprises three steps, which are sequentially as follows: acquiring data, processing the data and storing the data; the step of acquiring data adopts a blocking mode to acquire the streaming data; the data processing step is to carry out operations such as format verification, block record repeated verification, association verification, information conversion and the like on data; the storage data is obtained by writing the processed qualified data into a database, and the unqualified data generates an error record file, so that a user can conveniently download, check and correct the data.
Specifically, the data obtaining step obtains data in a streaming manner in blocks, and parameters recorded in a maximum fixed number in each block are set according to the configuration of the server. The reading file is read in a block buffer mode, and the reading database is read in a paging mode, so that the stability of the memory occupied by the block data is ensured. The data source corresponding to the database connection pool parameters need to be configured to support multiple tasks, and the reading performance of the parallel tasks is guaranteed.
The data processing step comprises format verification, block record repeated verification, association verification and information conversion, wherein the format verification is verified according to the verification rule configured in the meta-information.
The block record repeated verification is used for determining that no mutually repeated records exist in the record processed at the time, and the subsequent steps can be carried out concurrent processing; generating record summary information according to a verification repeated field list configured by the meta information, storing the summary information in a de-duplication set named by the current processing resource information code in a cache, and returning to an existing state if the corresponding summary information exists; marking repeated marks on repeated records, and taking the repeated records as error data to be not processed; the data with non-repeated block records continues to be processed in the subsequent steps; and after the whole process is finished, judging whether other tasks of the current resource are running or not, and if not, clearing the de-duplication set for storing the summary information of the current resource in the cache.
The association verification is to associate the acquired data with credit subject information, namely to compare only the subject identification information of the acquired data with the subject identification information record loaded in the cache when the system is started; adding a body Identification (ID) field to the successful association record; generating a unique main body ID by data which are not successfully correlated, adding the unique main body ID into the current record, triggering an addition event, and adding corresponding main body information into a database in an addition monitoring process; the independent processing of the supplement process can ensure the processing efficiency of the main process.
The information conversion is to convert and encrypt the acquired data part field according to the meta-information configuration, and meanwhile, the original information of the field is reserved. Finally, all the verified compliant data records are put into a warehouse by the converted field data, and the verified unqualified records are exported by the original data export file.
The stored data needs to be compared with the historical records to verify whether the stored data are repeated before being put into a warehouse, and is executed in series in a mode that a plurality of data processing flows which run simultaneously and have the same resource are queued in a queue, and is executed in a concurrent mode in a mode that the data processing flows which run simultaneously and have different resources are queued. The repeated verification step of the historical records is the same as the repeated judgment mechanism of the records in the blocks, verification is carried out by comparing summary information, a batch verification mode is adopted in the verification process, and repeated error marks are arranged on the repeated records.
The writing step of generating error record files in the stored data is carried out in a batch mode, filtering is carried out according to the record state, the files are written only if the error record files are marked as verification error states and are not repeated error marks, and repeated record data are directly discarded; opening the stream before the step is started, closing the stream generation file after the step is ended, recording a set window value in the memory, and writing in batches if the window value exceeds the threshold value; and the step of entering the database written in the stored data is carried out in a batch mode, batch writing is carried out according to a set window value configured by the server, and only non-error state data records are written. The parameters of the database connection pool corresponding to the stored data source need to be configured to support multiple tasks, and the writing performance of the parallel tasks is guaranteed.
The data processing flow and the flow steps can be used for carrying out event monitoring and inserting relevant business logic into the event monitors before and after execution. And performing exception processing and persistence processing states on the events processed in each step, wherein the exception processing and persistence processing states are used for counting the processing conditions, success number and failure number. The failure number comprises a format verification error number and a repeated data number. And if the data records are repeated compared with the historical records before the events are executed in batches in the storing step, increasing the quantity of the repeated data.
Embodiments of the present invention are described in detail below with reference to the attached drawing figures:
taking the data processing flows corresponding to the six tasks shown in fig. 4 as an example, it is assumed that the resource processing flow a is information of the delinquent utility fees in table 1, the resource processing flow B is information of the customs advanced certification enterprises in table 2, and the resource processing flow C is information of the corporate bank loans, and in the early-stage processing process, the six data processing flows are executed simultaneously in a concurrent manner and processed independently without mutual influence. And the data processing flow corresponding to each task is divided into a plurality of data processing sub-flows which are performed in sequence. Taking the first a resource processing flow as an example:
a first resource processing flow is divided into a plurality of data processing sub-flows which are sequentially performed, such as a11, a12, a13, a.. once, a11, a12, a13, a.. once and the like, are obtained in a streaming manner in a blocking mode, namely, each time one block of data is obtained, if data corresponding to a11 is obtained, as shown in fig. 3, block record repeated verification operation can be performed on a11, whether records which are mutually repeated exist in a11 data is determined according to repeated verification rules (code type, code, debt type, arrearage amount, determination department and determination date) of the data, repeated marks are marked on the repeated records, and the data are not processed as error data; the non-repeated data continues to be processed in the subsequent steps.
Then, format verification is carried out, specifically: verifying whether the data format of A11 meets the requirements according to the contained information item meta-information (including name, description, data type, required item, verification rule, encryption or not and conversion rule), and if so, entering the subsequent processing; otherwise, the data is not processed as error data;
then, performing association verification, namely comparing the main body identification information of A11 with the main body identification information record loaded in the cache in system starting, and adding a main body identification ID field to the successful association record; generating a unique main body ID by data which are not successfully correlated, adding the unique main body ID into the current record, triggering an addition event, and adding corresponding main body information into a database in an addition monitoring process; the independent treatment of the supplement process can ensure the treatment efficiency of the main process;
verifying the successfully associated records and the supplement main body information, and if the records and the supplement main body information are not in compliance, producing a file (exporting the file by using original data); if the relevant data are in compliance, converting and encrypting the relevant data inconsistent fields according to the meta-information configuration, and simultaneously keeping the original information of the fields; all the verified compliance data records are put into a warehouse by using the converted field data;
and finally, before warehousing, comparing with the history record to verify whether the process is repeated, executing the plurality of data processing flows which run simultaneously and have the same resource in a queue in a serial mode, and executing the plurality of data processing flows which run simultaneously and have different resources in a concurrent mode. The repeated verification step of the historical records is the same as the repeated judgment mechanism of the records in the blocks, verification is carried out by comparing summary information, a batch verification mode is adopted in the verification process, and repeated error marks are arranged on the repeated records.
After the data processing sub-process a11 is finished, the above operations are performed on the data processing sub-process a12, and so forth, and the processing of the first a resource processing process is completed until all the data processing sub-processes divided by the first a resource processing process are processed.
When the first A resource processing flow is in processing, the other A resource processing flow and other resource processing flows are executed simultaneously in a concurrent mode and are processed independently without mutual influence. The same resource processing flow needs to be compared with the history record to verify whether the data are repeated before the qualified data are put in storage, and the storage operation is executed in series in a queuing mode.
The advantages and disadvantages of the data processing method take the data processing time as a main performance parameter, namely the practical application effect of the processing method can be verified through the data processing time. Next, a comparison experiment was performed on a plurality of sets of data of different quantities on the same server.
The test server parameters are as follows:
a CPU: i5-4750(4 nuclei); memory: 16G (DDR 48G × 2); disk capacity: 1T (7200 rotating)
Designing a first scene:
the test data volume is respectively as follows: data files of 0.6w/2.4w/7.2w/24w, compare both the overall processing and the flow-based processing.
The test performance data are shown in the following table:
TABLE 3 comparison of overall processing and flow processing time consumption
Figure BDA0002607675030000131
A more intuitive comparative analysis chart corresponding to the data processing time consumption in table 3 above is shown in fig. 5. As can be seen from the above table, the time consumption of data processing based on the flow method is averagely increased by 707% compared with the time consumption of the overall processing, and the performance is far better than that based on the overall data processing method. The process flow reaches 1000 strips per second at manual start-up.
Designing a scene two:
table 4 time consuming table for parallel processing of library based on process processing library
Figure BDA0002607675030000132
Figure BDA0002607675030000141
The library-to-library concurrent processing performance pairs are shown in table 4. And various parameters of the system are kept stable when the tasks run concurrently, and the data processing speed averagely reaches 2500 pieces per second. When the traffic is particularly large, a plurality of instances can be operated, and the data processing capacity of the provincial platform in the traffic peak period can be completely met.
The invention provides a system for realizing mass credit information processing based on a process, which comprises the following steps:
a database storing all credit-related resource information and validation logic described by metadata;
the server is provided with thread pools which are matched with the configuration of the server in quantity and are responsible for multitasking;
the data processing flow corresponds to the tasks to be processed one by one and comprises a plurality of data processing sub-flows which are performed in sequence; the data processing flows corresponding to the tasks are executed simultaneously in a concurrent mode;
the data processing sub-process comprises a data acquisition unit, a data processing unit and a data storage unit;
the data acquisition unit acquires data in the data processing flow to which the data processing sub-flow belongs in a block-by-block mode, parameters recorded by the maximum fixed number of each block are set according to the configuration of the server, and the number of the blocks is the same as that of the data processing sub-flows in the data processing flow to which the data processing sub-flows belong;
the data processing unit is used for processing the acquired data;
the data storage unit writes the processed qualified data into a database, generates an error record file from unqualified data and informs a client of the error record file;
loading meta information into a process execution context after each data processing process is started, wherein the data processing process and the process steps can be subjected to event monitoring and used for inserting relevant business process logic before and after execution; and after processing in each step, persisting the processing state for counting the processing condition, success quantity and failure quantity.
And the data processing flow corresponding to each task is manually started by a user or automatically started by the system at regular time.
The processing data unit comprises a format verification unit, a block record repetition verification unit, an association verification unit and an information conversion unit,
the format verification unit verifies according to a verification rule configured in the meta-information;
the block record repeated verification unit is used for determining that the records processed at the time are not mutually repeated, and subsequent steps can be concurrently processed; generating record summary information according to a verification repeated field list configured by the meta information, storing the summary information in a de-duplication set named by the current processing resource information code in a cache, and returning to an existing state if the corresponding summary information exists; marking repeated marks on repeated records, and taking the repeated records as error data to be not processed; the data with non-repeated block records continues to be processed in the subsequent steps; after the whole process is finished, judging whether other tasks of the current resource are running or not, if not, clearing the de-duplication set for storing the summary information of the current resource in the cache;
the association verification unit is used for associating the acquired data with credit subject information, namely only comparing the subject identification information of the acquired data with the subject identification record loaded in the cache when the system is started; adding a body Identification (ID) field to the successful association record; generating a unique main body ID by data which are not successfully correlated, adding the unique main body ID into the current record, triggering an addition event, and adding corresponding main body information into a database in an addition monitoring process; the independent treatment of the supplement process can ensure the treatment efficiency of the main process;
the information conversion unit is used for converting and encrypting the acquired data part field according to the meta-information configuration, and simultaneously reserving the original field information; and warehousing all the verified compliant data records with the converted field data, and verifying the unqualified record with the original data export file.
The storage data unit is compared with a historical record to verify whether the storage data unit is repeated before qualified data is put in storage, and is executed in series in a mode that a plurality of data processing flows which run simultaneously and have the same resource are queued in a queue, and is executed in a concurrent mode in a mode that the data processing flows which run simultaneously and have different resources are queued; the repeated verification step of the historical records is the same as the repeated judgment mechanism of the records in the blocks, verification is carried out by comparing summary information, a batch verification mode is adopted in the verification process, and repeated error marks are arranged on the repeated records.
In summary, the present invention abstracts the whole processing logic based on the flow mode, meets different processing services and specific service logics of different resources, and realizes real-time monitoring of the execution situation by adding monitoring before and after the flow and the steps. All the processing flow examples are executed in a concurrent mode in a blocking mode, and the high efficiency and stability of system processing are guaranteed. The same resource repeatability verification adopts a queue mode to queue and execute, and the uniqueness of the credit record is ensured.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should make the description as a whole, and the embodiments may be appropriately combined to form other embodiments understood by those skilled in the art.

Claims (10)

1. A method for processing massive credit information based on a process is characterized by comprising the following steps:
uniformly describing all resource information and verification logic related to credit through metadata, and storing the resource information and the verification logic into a database;
the thread pool with the matched number is set to be responsible for multi-task processing according to the configuration of the server, the data processing flows corresponding to a plurality of tasks are executed simultaneously in a concurrent mode, and the data processing flow corresponding to each task is divided into a plurality of data processing sub-flows which are carried out in sequence;
each data processing sub-process comprises three steps, which are sequentially as follows: acquiring data, processing the data and storing the data;
the data acquisition is that the data in the data processing flow to which the data processing sub-flow belongs is acquired in a flow type in a blocking mode, the maximum fixed number of recorded parameters of each block is set according to the configuration of a server, and the number of the blocks is the same as the number of the data processing sub-flows in the data processing flow to which the data processing sub-flow belongs;
the processing data is used for processing the acquired data;
the storage data is obtained by writing the processed qualified data into a database, generating an error record file by the unqualified data and informing a client;
loading meta information into a process execution context after each data processing process is started, wherein the data processing process and the process steps can be subjected to event monitoring and used for inserting relevant business process logic before and after execution; and after processing in each step, persisting the processing state for counting the processing condition, success quantity and failure quantity.
2. The method for processing massive credit information based on process according to claim 1, wherein the describing by metadata is to describe each resource information related to credit according to resource name, resource code, and resource information record repetition rule, and to perform preliminary classification processing of resource information according to resource name, resource code, and resource information record repetition rule; each item of metadata in each resource information comprises a name, a description, a data type, a mandatory item, a verification rule, an encryption or decryption rule and a conversion rule.
3. The method for processing massive amounts of credit information based on processes according to claim 1, wherein the data processing process corresponding to each task is started manually by a user or automatically at regular time by a system.
4. The method as claimed in claim 1, wherein the processing data includes format verification, block record repeat verification, association verification and information conversion,
the format verification is performed according to a verification rule configured in the meta-information;
the block record repeated verification is used for determining that no mutually repeated records exist in the record processed at the time, and the subsequent steps can be carried out concurrent processing; generating record summary information according to a verification repeated field list configured by the meta information, storing the summary information in a de-duplication set named by the current processing resource information code in a cache, and returning to an existing state if the corresponding summary information exists; marking repeated marks on repeated records, and taking the repeated records as error data to be not processed; the data with non-repeated block records continues to be processed in the subsequent steps; after the whole process is finished, judging whether other tasks of the current resource are running or not, if not, clearing the de-duplication set for storing the summary information of the current resource in the cache;
the association verification is to associate the acquired data with credit subject information, namely to compare only the subject identification information of the acquired data with the subject identification record loaded in the cache when the system is started; adding a body Identification (ID) field to the successful association record; generating a unique main body ID by data which are not successfully correlated, adding the unique main body ID into the current record, triggering an addition event, and adding corresponding main body information into a database in an addition monitoring process; the independent treatment of the supplement process can ensure the treatment efficiency of the main process;
the information conversion is to convert and encrypt the acquired data part field according to the meta-information configuration, and simultaneously reserve the original information of the field; and warehousing all the verified compliant data records with the converted field data, and verifying the unqualified record with the original data export file.
5. The method for processing the mass credit information based on the process according to claim 1, wherein the stored data is compared with the history record before being put into the database to verify whether the stored data is repeated, the method is executed serially in a mode that a plurality of data processing processes which run simultaneously and have the same resource are queued in the queue, and the method is executed concurrently in a mode that a plurality of data processing processes which run simultaneously and have different resources are queued; the repeated verification step of the historical records is the same as the repeated judgment mechanism of the records in the blocks, verification is carried out by comparing summary information, a batch verification mode is adopted in the verification process, and repeated error marks are arranged on the repeated records.
6. The method for processing massive credit information based on process according to claim 1, wherein the writing step of generating error record files in the stored data is performed in batch mode, filtering is performed according to the record state, the files are written only if the error record files are marked as verification error states and are not repeated error marks, and the repeated record data are directly discarded; opening the stream before the step is started, closing the stream generation file after the step is ended, recording a set window value in the memory, and writing in batches if the window value exceeds the threshold value; and the step of entering the database written in the stored data is carried out in a batch mode, batch writing is carried out according to a set window value configured by the server, and only non-error state data records are written.
7. A system for processing mass credit information based on a process is characterized by comprising:
a database storing all credit-related resource information and validation logic described by metadata;
the server is provided with thread pools which are matched with the configuration of the server in quantity and are responsible for multitasking;
the data processing flow corresponds to the tasks to be processed one by one and comprises a plurality of data processing sub-flows which are performed in sequence; the data processing flows corresponding to the tasks are executed simultaneously in a concurrent mode;
the data processing sub-process comprises a data acquisition unit, a data processing unit and a data storage unit;
the data acquisition unit acquires data in the data processing flow to which the data processing sub-flow belongs in a block-by-block mode, parameters recorded by the maximum fixed number of each block are set according to the configuration of the server, and the number of the blocks is the same as that of the data processing sub-flows in the data processing flow to which the data processing sub-flows belong;
the data processing unit is used for processing the acquired data;
the data storage unit writes the processed qualified data into a database, generates an error record file from unqualified data and informs a client of the error record file;
loading meta information into a process execution context after each data processing process is started, wherein the data processing process and the process steps can be subjected to event monitoring and used for inserting relevant business process logic before and after execution; and after processing in each step, persisting the processing state for counting the processing condition, success quantity and failure quantity.
8. The method according to claim 7, wherein the data processing procedure corresponding to each task is manually started by a user or automatically started at regular time by a system.
9. The method as claimed in claim 7, wherein the processing data unit includes a format verification unit, a block record repeat verification unit, an association verification unit, and an information transformation unit,
the format verification unit verifies according to a verification rule configured in the meta-information;
the block record repeated verification unit is used for determining that the records processed at the time are not mutually repeated, and subsequent steps can be concurrently processed; generating record summary information according to a verification repeated field list configured by the meta information, storing the summary information in a de-duplication set named by the current processing resource information code in a cache, and returning to an existing state if the corresponding summary information exists; marking repeated marks on repeated records, and taking the repeated records as error data to be not processed; the data with non-repeated block records continues to be processed in the subsequent steps; after the whole process is finished, judging whether other tasks of the current resource are running or not, if not, clearing the de-duplication set for storing the summary information of the current resource in the cache;
the association verification unit is used for associating the acquired data with credit subject information, namely only comparing the subject identification information of the acquired data with the subject identification record loaded in the cache when the system is started; adding a body Identification (ID) field to the successful association record; generating a unique main body ID by data which are not successfully correlated, adding the unique main body ID into the current record, triggering an addition event, and adding corresponding main body information into a database in an addition monitoring process; the independent treatment of the supplement process can ensure the treatment efficiency of the main process;
the information conversion unit is used for converting and encrypting the acquired data part field according to the meta-information configuration, and simultaneously reserving the original field information; and warehousing all the verified compliant data records with the converted field data, and verifying the unqualified record with the original data export file.
10. The method according to claim 7, wherein the data storage unit is required to compare with the history to verify whether the data processing flows are repeated before qualified data is put into the database, and the method is executed serially in a manner that a plurality of data processing flows which run simultaneously and have the same resource are queued in a queue, and is executed concurrently in a manner that a plurality of data processing flows which run simultaneously and have different resources are queued; the repeated verification step of the historical records is the same as the repeated judgment mechanism of the records in the blocks, verification is carried out by comparing summary information, a batch verification mode is adopted in the verification process, and repeated error marks are arranged on the repeated records.
CN202010743848.9A 2020-07-29 2020-07-29 Method and system for processing massive credit information based on flow Active CN111897845B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010743848.9A CN111897845B (en) 2020-07-29 2020-07-29 Method and system for processing massive credit information based on flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010743848.9A CN111897845B (en) 2020-07-29 2020-07-29 Method and system for processing massive credit information based on flow

Publications (2)

Publication Number Publication Date
CN111897845A true CN111897845A (en) 2020-11-06
CN111897845B CN111897845B (en) 2023-10-31

Family

ID=73182641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010743848.9A Active CN111897845B (en) 2020-07-29 2020-07-29 Method and system for processing massive credit information based on flow

Country Status (1)

Country Link
CN (1) CN111897845B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982180A (en) * 2012-12-18 2013-03-20 华为技术有限公司 Method and device for storing data
CN104657497A (en) * 2015-03-09 2015-05-27 国家电网公司 Mass electricity information concurrent computation system and method based on distributed computation
CN106612320A (en) * 2016-06-14 2017-05-03 四川用联信息技术有限公司 Encrypted data dereplication method for cloud storage
CN106611135A (en) * 2016-06-21 2017-05-03 四川用联信息技术有限公司 Storage data integrity verification and recovery method
US20170286695A1 (en) * 2016-04-01 2017-10-05 Egnyte, Inc. Methods for Improving Performance and Security in a Cloud Computing System

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982180A (en) * 2012-12-18 2013-03-20 华为技术有限公司 Method and device for storing data
CN104657497A (en) * 2015-03-09 2015-05-27 国家电网公司 Mass electricity information concurrent computation system and method based on distributed computation
US20170286695A1 (en) * 2016-04-01 2017-10-05 Egnyte, Inc. Methods for Improving Performance and Security in a Cloud Computing System
CN106612320A (en) * 2016-06-14 2017-05-03 四川用联信息技术有限公司 Encrypted data dereplication method for cloud storage
CN106611135A (en) * 2016-06-21 2017-05-03 四川用联信息技术有限公司 Storage data integrity verification and recovery method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
崔纪锋;张勇;邢春晓;: "元数据在数据库互操作中的应用", 计算机科学与探索, no. 04, pages 305 - 312 *

Also Published As

Publication number Publication date
CN111897845B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
CN110188096B (en) Index creating method, device and equipment for data record
CN110162662B (en) Verification method, device and equipment for data records in block chain type account book
US8825798B1 (en) Business event tracking system
CN111124917B (en) Method, device, equipment and storage medium for managing and controlling public test cases
WO2022068316A1 (en) Data reconciliation method and apparatus, device, and storage medium
WO2019041819A1 (en) Accreditation method and apparatus, and computer device and storage medium
WO2019019447A1 (en) Annuity data processing method and device, server and storage medium
CN114187082A (en) Financial accounting and reimbursement method and system
CN108415990B (en) Data quality monitoring method and device, computer equipment and storage medium
WO2024032350A1 (en) Bill processing method and device for transaction bill
CN111651522B (en) Data synchronization method and device
CN111897845A (en) Method and system for processing mass credit information based on process
WO2023226461A1 (en) Multi-domain data fusion method and device, and storage medium
CN111274248A (en) Charging data generation method and device
CN111309758B (en) Charging data verification comparison method and device
CN113626438B (en) Data table management method, device, computer equipment and storage medium
CN107832021A (en) A kind of electronic evidence fixing means, terminal device and storage medium
CN115344633A (en) Data processing method, device, equipment and storage medium
Brandwain et al. A model of performance for virtual memory systems
CN111382068A (en) Hierarchical testing method and device for mass data
CN113987372B (en) Hot data acquisition method, device and equipment of domain business object model
CN114489974A (en) Method and device for processing real-time data
CN115221125A (en) File processing method and device, electronic equipment and readable storage medium
CN114266638A (en) Internet financial report management system based on artificial intelligence
CN115358832A (en) Processing method and device of data to be certified and financial accounting engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 221000 room 603, block a, building 2, Xuzhou Software Park, No. 6, Software Park Road, Quanshan District, Xuzhou City, Jiangsu Province

Applicant after: Jiangsu xindie Digital Technology Co.,Ltd.

Address before: 221000 room 603, block a, building 2, Xuzhou Software Park, No. 6, Software Park Road, Quanshan District, Xuzhou City, Jiangsu Province

Applicant before: Xuzhou Kingdee Software Co.,Ltd.

GR01 Patent grant
GR01 Patent grant