CN116701381A - Multistage verification system and method for distributed data acquisition and warehousing - Google Patents

Multistage verification system and method for distributed data acquisition and warehousing Download PDF

Info

Publication number
CN116701381A
CN116701381A CN202310967006.5A CN202310967006A CN116701381A CN 116701381 A CN116701381 A CN 116701381A CN 202310967006 A CN202310967006 A CN 202310967006A CN 116701381 A CN116701381 A CN 116701381A
Authority
CN
China
Prior art keywords
data
check
verification
checked
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310967006.5A
Other languages
Chinese (zh)
Other versions
CN116701381B (en
Inventor
姚含
方红渊
崔冬祥
李鸿羽
黄少意
王惠云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Mochou Intelligent Information Technology Co ltd
Original Assignee
Nanjing Mochou Intelligent Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Mochou Intelligent Information Technology Co ltd filed Critical Nanjing Mochou Intelligent Information Technology Co ltd
Priority to CN202310967006.5A priority Critical patent/CN116701381B/en
Publication of CN116701381A publication Critical patent/CN116701381A/en
Application granted granted Critical
Publication of CN116701381B publication Critical patent/CN116701381B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of data verification, and particularly relates to a multi-stage verification system and a verification method for distributed data acquisition and warehousing. The invention utilizes the multistage verification of the logarithmic input data, can clean and normalize the data rapidly and flexibly, and can follow the priority of each verification condition to carry out the stage-by-stage verification in the verification process, ensure the validity of the data, and simultaneously can not cause verification errors, thereby being capable of effectively reducing the difficulty of subsequent data processing, and in addition, flexible configuration information is added, thereby solving the application degree of the framework.

Description

Multistage verification system and method for distributed data acquisition and warehousing
Technical Field
The invention belongs to the technical field of data verification, and particularly relates to a multi-stage verification system and a verification method for distributed data acquisition and warehousing.
Background
With the advent of the information age, more and more online data are required to be integrated and transmitted, but due to the fact that social information explosion exists at present, data sources are more, data repeatability is high, and especially when data sources exist, uniformity of data processing in a short time is required, and high efficiency of data processing is required to be met, the importance is achieved, meanwhile, repeated data occupy more storage space, not only can slow data transmission speed be dragged, but also data retrieval can be affected.
The existing framework structure cannot meet the requirements, cannot clean and normalize data rapidly and flexibly, redundant data are more and more, timeliness of the data is guaranteed, a large amount of failure data can be accumulated, and for solving the problem, the scheme provides a multistage verification system which utilizes three-stage verification, reduces difficulty of subsequent data processing, and solves the problem of applicability of a framework by adding flexible configuration information.
Disclosure of Invention
The invention aims to provide a multi-stage verification system and a verification method for distributed data acquisition and warehousing, which can reduce the difficulty of subsequent data processing by utilizing three-stage verification and solve the problem of the applicability of a framework by adding flexible configuration information.
The technical scheme adopted by the invention is as follows:
a multi-stage verification method for distributed data acquisition and warehousing comprises the following steps:
acquiring input data, and carrying out disassembly processing on the input data to obtain a plurality of message bodies;
adding identification information into the message body to obtain data to be verified, wherein the identification information comprises date, source, destination, size, section, name, row ID and file name;
inputting the data to be checked into a multi-level check model, and judging whether the data to be checked passes the check;
if yes, uploading the data to be checked to an online database through a database operation engine;
if not, word segmentation is carried out on the data to be verified to obtain data to be optimized, and the data to be optimized is synchronously uploaded to an offline database;
inputting the data to be optimized into a data conversion model to obtain unique data, carrying out cluster calculation on the unique data, carrying out cluster combination on calculation results, and uploading combined results to an online database.
In a preferred embodiment, the input data is disassembled in rows.
In a preferred embodiment, the step of inputting the data to be verified into a multi-level verification model to determine whether the data to be verified passes verification includes:
acquiring data to be verified;
invoking a verification condition from the verification model, wherein the verification condition comprises content repetition verification, content deletion verification and content query verification;
and sequentially inputting the data to be checked into the check conditions, determining that the data accords with the check conditions pass the check, synchronously uploading the data to an online database, determining that the data does not accord with the check conditions do not pass the check, and synchronously uploading the data to an offline database.
In a preferred scheme, the priority of the content duplicate check is higher than the priority of the content deletion check, and the priority of the content deletion check is higher than the priority of the content query check;
and the data to be checked is checked step by step according to the priorities of the content repeated check, the content missing check and the content query check, and when the data to be checked passes the check condition with high priority, the check condition with low priority is not executed.
In a preferred embodiment, the step of repeatedly checking the content includes:
acquiring data to be checked, uploading the data to an online database, and judging whether repeated data consistent with the data to be checked exist in the online database;
if the data to be checked exist, the data to be checked are reserved, and the repeated data are screened out from the online database;
if the field is not added, obtaining the structural information of the data to be checked, calibrating the structural information into primary check data, and judging whether a new field exists in the primary check data;
if the newly added field exists in the primary check data, inquiring the total data reporting time according to the structure change time, and judging whether repeated reporting records exist or not;
if yes, the repeated data before the time node is cleaned, the first-level check data is reserved, and is summarized into a first-level data set, otherwise, the first-level check data is directly summarized into the first-level data set;
if no new field exists in the primary check data, acquiring date information of the primary check data, setting a date approval field based on a primary data set, and judging whether only the approval field exists in the primary check data and is inconsistent with the date information;
if yes, judging that the data to be checked passes the content repeated check, and summarizing the data to be checked into a primary data set;
if not, judging that the data to be checked do not pass through the content repeated check, calibrating the data to be checked as second-level check data, and summarizing the data to be checked as a second-level data set.
In a preferred embodiment, the step of performing the content deletion check includes:
acquiring secondary check data and corresponding missing fields thereof from the secondary data set;
acquiring a key field and an identification field corresponding to the secondary check data from the online database, and comparing the key field and the identification field with the secondary check data;
if the missing field in the second-level check data is a key field, judging that the content missing check is not passed, calibrating the content missing check as third-level check data, and summarizing the third-level check data as a third-level data set;
if the missing field in the secondary check data is an identification field or a non-key field, judging that the content missing check is passed, and supplementing identification information and non-key field information into the secondary check data.
In a preferred embodiment, the step of performing the content query verification includes:
acquiring three-level check data from the three-level data set;
counting the number of missing key fields in the three-level check data, and calibrating the number of missing key fields as parameters to be compared;
acquiring an evaluation threshold value and comparing the evaluation threshold value with the parameter to be compared;
if the parameter to be compared is greater than or equal to an evaluation threshold, the third-level verification data is indicated to pass through the content query verification, and the third-level verification data is uploaded to an offline database;
and if the parameter to be compared is smaller than the evaluation threshold, indicating that the three-level check data passes the content query check, and supplementing key fields into the three-level check data.
In a preferred embodiment, the step of inputting the data to be optimized into a data conversion model to obtain unique data includes:
obtaining data to be optimized from the offline database;
calling a conversion algorithm from the data conversion model, inputting the data to be optimized into the conversion algorithm, and calibrating a conversion result into unique data;
wherein the conversion algorithm is a hash algorithm.
The invention also provides a multi-stage verification system for distributed data acquisition and storage, which is applied to the multi-stage verification method for distributed data acquisition and storage, and comprises the following steps:
the acquisition module is used for acquiring input data and carrying out disassembly processing on the input data to obtain a plurality of message bodies;
the identification module is used for adding identification information into the message body to obtain data to be verified, wherein the identification information comprises a date, a source, a destination, a size, a section, a name, a row ID and a file name;
the verification module is used for inputting the data to be verified into a multi-level verification model and judging whether the data to be verified passes the verification;
if yes, uploading the data to be checked to an online database through a database operation engine;
if not, word segmentation is carried out on the data to be verified to obtain data to be optimized, and the data to be optimized is synchronously uploaded to an offline database;
the data conversion module is used for inputting the data to be optimized into a data conversion model to obtain unique data, carrying out cluster calculation on the unique data, carrying out cluster combination on calculation results, and uploading combined results to an online database.
And a multi-stage verification terminal for distributed data acquisition and warehousing, comprising:
at least one processor;
and a memory communicatively coupled to the at least one processor;
the memory stores a computer program executable by the at least one processor, and the computer program is executed by the at least one processor, so that the at least one processor can execute the multi-stage verification method for distributed data acquisition and warehousing.
The invention has the technical effects that:
the invention utilizes the multistage verification of the logarithmic input data, can clean and normalize the data rapidly and flexibly, and can follow the priority of each verification condition to carry out the stage-by-stage verification in the verification process, ensure the validity of the data, and simultaneously can not cause verification errors, thereby being capable of effectively reducing the difficulty of subsequent data processing, and in addition, flexible configuration information is added, thereby solving the application degree of the framework.
Drawings
FIG. 1 is a flow chart of a method provided by the present invention;
fig. 2 is a block diagram of a system provided by the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one preferred embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
Referring to fig. 1 and 2, the present invention provides a multi-stage verification method for distributed data acquisition and storage, which includes:
s1, acquiring input data, and carrying out disassembly processing on the input data to obtain a plurality of message bodies;
s2, adding identification information into the message body to obtain data to be checked, wherein the identification information comprises date, source, destination, size, section, name, row ID and file name;
s3, inputting the data to be checked into a multi-stage check model, and judging whether the data to be checked passes the check;
if yes, uploading the data to be checked to an online database through a database operation engine;
if not, word segmentation is carried out on the data to be checked to obtain data to be optimized, and the data to be optimized is synchronously uploaded to an offline database;
s4, inputting the data to be optimized into a data conversion model to obtain unique data, performing cluster calculation on the unique data, performing cluster combination on calculation results, and uploading combined results to an online database.
As described in the above steps S1 to S4, since the social information explodes now, the data sources are more, the data repeatability is high, and especially when there are multiple data sources, the uniformity of processing data in a short time is required, and the high efficiency of processing data is also required to be satisfied. In this embodiment, the data is disassembled by rows by adopting an acquisition server, packaged into a message body, relevant information such as a source, a destination, a size, a section, a name, a row ID, a file name and the like of the data is added in a message header, and the message is sent to a message queue, wherein the disassembly mode of the input data is that the input data is disassembled by rows, then a multi-stage verification model is utilized to perform verification work on the input data, after a database operation engine collects certain data, the data is submitted to an online database, another part of the data is reprocessed by word segmentation and the like, and is submitted to an offline database, the data in the offline database is calibrated to be data to be optimized, then the data to be optimized is uploaded to a data conversion model, so that unique data can be obtained, the unique data is then transmitted to a spark cluster for calculation, the calculated results are clustered and combined, and the combined results are uploaded to the online database.
In a preferred embodiment, the step of inputting the data to be verified into the multi-level verification model to determine whether the data to be verified passes the verification includes:
s301, acquiring data to be verified;
s302, calling a verification condition from a verification model, wherein the verification condition comprises content repetition verification, content deletion verification and content query verification;
s303, sequentially inputting the data to be checked into the check conditions, determining that the data meets the check conditions is checked, synchronously uploading the data to the online database, determining that the data does not meet the check conditions is not checked, and synchronously uploading the data to the offline database.
As described in the above steps S301-S303, after the data to be verified is determined, the data to be verified is directly input into a multi-stage verification model, and a plurality of verification conditions are set in the multi-stage verification model, wherein the plurality of verification conditions are respectively content repeated verification, content missing verification and content query verification, so that the data to be verified which passes the verification is uploaded to an online database, the data to be verified which does not pass the verification is uploaded to an offline database, and the data in the offline database needs further optimization and conversion subsequently, so that the conditions of uploading to the online database are satisfied.
In a preferred embodiment, the priority of the content duplication check is higher than the priority of the content deletion check, and the priority of the content deletion check is higher than the priority of the content query check;
and when the data to be checked passes the check condition with high priority, the check condition with low priority is not executed.
In this embodiment, among the plurality of verification conditions, the priority of content repetition verification is highest, content deletion verification is sequentially performed, and content query verification is performed finally, when verification is performed on data to be verified, the priority of the plurality of verification conditions is performed from high to low, and when the data to be verified passes through the verification condition with higher priority, a subsequent verification process is not performed any more, so that not only can the accuracy of data verification be ensured, but also the smoothness of input data verification can be ensured.
In a preferred embodiment, the step of performing the content repetition check includes:
stp1, acquiring data to be checked, uploading the data to an online database, and judging whether repeated data consistent with the data to be checked exist in the online database;
stp2, if the data to be checked exist, reserving the data to be checked, and screening the repeated data from the online database;
stp3, if not, obtaining the structural information of the data to be checked, calibrating the structural information into first-level check data, and judging whether a new field exists in the first-level check data;
stp4, if the newly added field exists in the primary check data, inquiring the total data reporting time according to the structural change time, and judging whether repeated reporting records exist or not;
stp5, if yes, cleaning the repeated data before the time node, reserving first-level check data, summarizing the first-level check data into a first-level data set, and otherwise, summarizing the first-level check data into the first-level data set directly;
stp6, if no new field exists in the primary check data, acquiring date information of the primary check data, setting a date approval field based on the primary data set, and judging whether only the approval field exists in the primary check data and is inconsistent with the date information;
stp7, if yes, judging that the data to be checked passes the content repeated check, and summarizing the data to be checked to a first-level data set;
and Stp8, if not, judging that the data to be checked does not pass through the content repeated check, calibrating the data to be the secondary check data, and summarizing the data to be checked to be a secondary data set.
As described in the above steps Stp1-Stp8, when performing content repetition verification, it is first required to determine whether there is repeated data consistent with the repeated data in the online database, if so, the repeated data with a preceding date is screened out from the online database, the repeated data with a succeeding date is retained in the online database, then the structure information of the data to be verified is verified, whether there is a newly added field in the primary verification data is determined, the total data reporting time is queried according to the structure change time, and whether there is a repeated reporting record is determined, and when there is no repeated reporting record, the date information of the primary verification data is approved, so that the repeated reporting of data due to the inconsistent date information is avoided, and when the date information and the structure information are inconsistent, it is determined that the repeated verification of the content is not passed, and the repeated verification is marked as the secondary verification data, and then the verification is continued under the verification condition of the next priority.
In a preferred embodiment, the step of performing the content deletion check includes:
stp9, acquiring secondary check data and corresponding missing fields thereof from the secondary data set;
stp10, acquiring a key field and an identification field corresponding to the secondary check data from an online database, and comparing the key field and the identification field with the secondary check data;
stp11, if the missing field in the second-level check data is a key field, judging that the missing field does not pass the content missing check, and calibrating the missing field as third-level check data, and summarizing the missing field as a third-level data set;
stp12, if the missing field in the second-level check data is an identification field or a non-key field, judging that the second-level check data passes the content missing check, and supplementing identification information and non-key field information into the second-level check data.
As described in the above steps Stp8-Stp12, when the content missing verification is performed, it is necessary to determine whether the missing field existing in the secondary verification data is a critical field, and for the secondary verification data whose missing field is a critical field, it is determined that the content missing verification is not passed, and it is marked as the tertiary verification data, and for the secondary verification data whose missing field is a non-critical field or an identification field, it is determined that the missing part is complemented, and the complemented secondary verification data is determined as passing the verification.
In a preferred embodiment, the step of performing the content query verification includes:
stp13, acquiring three-level check data from the three-level data set;
stp14, counting the missing quantity of key fields in the three-level verification data, and calibrating the missing quantity as a parameter to be compared;
stp15, acquiring an evaluation threshold value, and comparing the evaluation threshold value with parameters to be compared;
stp16, if the parameter to be compared is greater than or equal to the evaluation threshold, indicating that the three-level verification data do not pass the content query verification, and uploading the three-level verification data to an offline database;
stp17, if the parameter to be compared is smaller than the evaluation threshold, shows that the third-level check data passes the content query check, and supplements key fields to the third-level check data.
As described in the above steps Stp13-Stp17, when performing content query verification, it is required to determine the content query verification according to the number of fields in the three-level verification data, in this embodiment, the content query verification is determined as a parameter to be compared, and when the parameter to be compared is greater than or equal to the evaluation threshold, the three-level verification data is determined to be not verified and is synchronously uploaded to the offline database, otherwise, the key fields are supplemented to the three-level verification data, and the supplemented three-level verification data is determined to be verified and is uploaded to the online database.
In a preferred embodiment, the step of inputting the data to be optimized into the data conversion model to obtain the unique data includes:
s401, acquiring data to be optimized from an offline database;
s402, calling a conversion algorithm from the data conversion model, inputting data to be optimized into the conversion algorithm, and calibrating a conversion result into unique data;
wherein the conversion algorithm is a hash algorithm.
As described in the above steps S41-S402, the data to be optimized after the hash algorithm is submitted to the spark cluster for calculation, and then the calculated results are clustered and uploaded to the online database, and in the uploading process, the repeated data in the online database are screened out, and the currently uploaded data is reserved as updated online data.
The invention also provides a multi-stage verification system for distributed data acquisition and storage, which is applied to the multi-stage verification method for distributed data acquisition and storage, and comprises the following steps:
the acquisition module is used for acquiring input data and carrying out disassembly processing on the input data to obtain a plurality of message bodies;
the identification module is used for adding identification information into the message body to obtain data to be verified, wherein the identification information comprises a date, a source, a destination, a size, a section, a name, a row ID and a file name;
the verification module is used for inputting the data to be verified into the multi-level verification model and judging whether the data to be verified passes the verification;
if yes, uploading the data to be checked to an online database through a database operation engine;
if not, word segmentation is carried out on the data to be checked to obtain data to be optimized, and the data to be optimized is synchronously uploaded to an offline database;
the data conversion module is used for inputting the data to be optimized into the data conversion model to obtain the unique data, carrying out cluster calculation on the unique data, carrying out cluster combination on calculation results, and uploading the combination results to the online database.
In the above, when the on-track system is executed, firstly, input data is acquired through the acquisition module, the input data is disassembled according to the disassembly template according to rows, so that a plurality of message bodies can be obtained, then, identification information is added to the message bodies through the identification module, the data to be verified is obtained, so that the uniqueness of the input data is ensured, the phenomenon of data disorder cannot occur when the input data is transmitted and verified later, then, the verification module is combined to verify the data to be verified, when the verification module is executed, the verification module executes multiple verification steps on the data to be verified, the smoothness in the verification process can be ensured, so that the data to be verified which does not pass through verification can be determined as data to be optimized, the data to be optimized can be uploaded to the data conversion module to be subjected to data conversion, then, the converted data are subjected to cluster calculation and cluster combination, and finally, the combination result is uploaded to the on-line database.
And a multi-stage verification terminal for distributed data acquisition and warehousing, comprising:
at least one processor;
and a memory communicatively coupled to the at least one processor;
the memory stores a computer program executable by the at least one processor, and the computer program is executed by the at least one processor, so that the at least one processor can execute the multi-stage verification method for distributed data acquisition and warehousing.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention. Structures, devices and methods of operation not specifically described and illustrated herein, unless otherwise indicated and limited, are implemented according to conventional means in the art.

Claims (10)

1. A multi-stage verification method for distributed data acquisition and warehousing is characterized by comprising the following steps of: comprising the following steps:
acquiring input data, and carrying out disassembly processing on the input data to obtain a plurality of message bodies;
adding identification information into the message body to obtain data to be verified, wherein the identification information comprises date, source, destination, size, section, name, row ID and file name;
inputting the data to be checked into a multi-level check model, and judging whether the data to be checked passes the check;
if yes, uploading the data to be checked to an online database through a database operation engine;
if not, word segmentation is carried out on the data to be verified to obtain data to be optimized, and the data to be optimized is synchronously uploaded to an offline database;
inputting the data to be optimized into a data conversion model to obtain unique data, carrying out cluster calculation on the unique data, carrying out cluster combination on calculation results, and uploading combined results to an online database.
2. The multi-stage verification method for distributed data acquisition and warehousing of claim 1, wherein the method comprises the steps of: the input data is disassembled in a row-by-row manner.
3. The multi-stage verification method for distributed data acquisition and warehousing of claim 1, wherein the method comprises the steps of: the step of inputting the data to be checked into a multi-level check model and judging whether the data to be checked passes the check comprises the following steps:
acquiring data to be verified;
invoking a verification condition from the verification model, wherein the verification condition comprises content repetition verification, content deletion verification and content query verification;
and sequentially inputting the data to be checked into the check conditions, determining that the data accords with the check conditions pass the check, synchronously uploading the data to an online database, determining that the data does not accord with the check conditions do not pass the check, and synchronously uploading the data to an offline database.
4. A multi-stage verification method for distributed data acquisition and warehousing as set forth in claim 3, wherein: the priority of the content repetition check is higher than the priority of the content deletion check, and the priority of the content deletion check is higher than the priority of the content query check;
and the data to be checked is checked step by step according to the priorities of the content repeated check, the content missing check and the content query check, and when the data to be checked passes the check condition with high priority, the check condition with low priority is not executed.
5. A multi-stage verification method for distributed data acquisition and warehousing as set forth in claim 3, wherein: the step of performing the content repetition check includes:
acquiring data to be checked, uploading the data to an online database, and judging whether repeated data consistent with the data to be checked exist in the online database;
if the data to be checked exist, the data to be checked are reserved, and the repeated data are screened out from the online database;
if the field is not added, obtaining the structural information of the data to be checked, calibrating the structural information into primary check data, and judging whether a new field exists in the primary check data;
if the newly added field exists in the primary check data, inquiring the total data reporting time according to the structure change time, and judging whether repeated reporting records exist or not;
if yes, the repeated data before the time node is cleaned, the first-level check data is reserved, and is summarized into a first-level data set, otherwise, the first-level check data is directly summarized into the first-level data set;
if no new field exists in the primary check data, acquiring date information of the primary check data, setting a date approval field based on a primary data set, and judging whether only the approval field exists in the primary check data and is inconsistent with the date information;
if yes, judging that the data to be checked passes the content repeated check, and summarizing the data to be checked into a primary data set;
if not, judging that the data to be checked do not pass through the content repeated check, calibrating the data to be checked as second-level check data, and summarizing the data to be checked as a second-level data set.
6. The multi-stage verification method for distributed data acquisition and warehousing of claim 5, wherein the method comprises the steps of: the step of performing the content deletion verification includes:
acquiring secondary check data and corresponding missing fields thereof from the secondary data set;
acquiring a key field and an identification field corresponding to the secondary check data from the online database, and comparing the key field and the identification field with the secondary check data;
if the missing field in the second-level check data is a key field, judging that the content missing check is not passed, calibrating the content missing check as third-level check data, and summarizing the third-level check data as a third-level data set;
if the missing field in the secondary check data is an identification field or a non-key field, judging that the content missing check is passed, and supplementing identification information and non-key field information into the secondary check data.
7. The multi-stage verification method for distributed data acquisition and warehousing of claim 6, wherein the steps of: the step of performing the content query verification includes:
acquiring three-level check data from the three-level data set;
counting the number of missing key fields in the three-level check data, and calibrating the number of missing key fields as parameters to be compared;
acquiring an evaluation threshold value and comparing the evaluation threshold value with the parameter to be compared;
if the parameter to be compared is greater than or equal to an evaluation threshold, the third-level verification data is indicated to pass through the content query verification, and the third-level verification data is uploaded to an offline database;
and if the parameter to be compared is smaller than the evaluation threshold, indicating that the three-level check data passes the content query check, and supplementing key fields into the three-level check data.
8. The multi-stage verification method for distributed data acquisition and warehousing of claim 1, wherein the method comprises the steps of: the step of inputting the data to be optimized into a data conversion model to obtain unique data comprises the following steps:
obtaining data to be optimized from the offline database;
calling a conversion algorithm from the data conversion model, inputting the data to be optimized into the conversion algorithm, and calibrating a conversion result into unique data;
wherein the conversion algorithm is a hash algorithm.
9. A multi-stage verification system for distributed data acquisition and storage, which is applied to the multi-stage verification method for distributed data acquisition and storage as claimed in any one of claims 1 to 8, and is characterized in that: comprising the following steps:
the acquisition module is used for acquiring input data and carrying out disassembly processing on the input data to obtain a plurality of message bodies;
the identification module is used for adding identification information into the message body to obtain data to be verified, wherein the identification information comprises a date, a source, a destination, a size, a section, a name, a row ID and a file name;
the verification module is used for inputting the data to be verified into a multi-level verification model and judging whether the data to be verified passes the verification;
if yes, uploading the data to be checked to an online database through a database operation engine;
if not, word segmentation is carried out on the data to be verified to obtain data to be optimized, and the data to be optimized is synchronously uploaded to an offline database;
the data conversion module is used for inputting the data to be optimized into a data conversion model to obtain unique data, carrying out cluster calculation on the unique data, carrying out cluster combination on calculation results, and uploading combined results to an online database.
10. A multistage check-up terminal that distributed data acquisition put in storage was used, its characterized in that: comprising the following steps:
at least one processor;
and a memory communicatively coupled to the at least one processor;
wherein the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the multi-level verification method for distributed data acquisition and warehousing of any one of claims 1 to 8.
CN202310967006.5A 2023-08-03 2023-08-03 Multistage verification system and method for distributed data acquisition and warehousing Active CN116701381B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310967006.5A CN116701381B (en) 2023-08-03 2023-08-03 Multistage verification system and method for distributed data acquisition and warehousing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310967006.5A CN116701381B (en) 2023-08-03 2023-08-03 Multistage verification system and method for distributed data acquisition and warehousing

Publications (2)

Publication Number Publication Date
CN116701381A true CN116701381A (en) 2023-09-05
CN116701381B CN116701381B (en) 2023-11-03

Family

ID=87839625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310967006.5A Active CN116701381B (en) 2023-08-03 2023-08-03 Multistage verification system and method for distributed data acquisition and warehousing

Country Status (1)

Country Link
CN (1) CN116701381B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116975206A (en) * 2023-09-25 2023-10-31 华云天下(南京)科技有限公司 Vertical field training method and device based on AIGC large model and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002010962A1 (en) * 2000-07-28 2002-02-07 Storymail, Inc. System, method and computer program product for device, operating system, and network transport neutral secure interactive multi-media messaging
CN110598466A (en) * 2019-07-30 2019-12-20 百度时代网络技术(北京)有限公司 Offline field checking method, device and equipment and computer readable storage medium
CN111291026A (en) * 2018-12-07 2020-06-16 北京京东尚科信息技术有限公司 Data access method, system, device and computer readable medium
CN111711623A (en) * 2020-06-15 2020-09-25 深圳前海微众银行股份有限公司 Data verification method and device
CN113343556A (en) * 2021-05-07 2021-09-03 青岛蓝智现代服务业数字工程技术研究中心 Supply chain optimizing system
CN116303385A (en) * 2023-02-13 2023-06-23 中国铁塔股份有限公司 Data auditing method and device, electronic equipment and storage medium
US20230237029A1 (en) * 2022-01-25 2023-07-27 Dell Products L.P. Data deduplication in a storage system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002010962A1 (en) * 2000-07-28 2002-02-07 Storymail, Inc. System, method and computer program product for device, operating system, and network transport neutral secure interactive multi-media messaging
CN111291026A (en) * 2018-12-07 2020-06-16 北京京东尚科信息技术有限公司 Data access method, system, device and computer readable medium
CN110598466A (en) * 2019-07-30 2019-12-20 百度时代网络技术(北京)有限公司 Offline field checking method, device and equipment and computer readable storage medium
CN111711623A (en) * 2020-06-15 2020-09-25 深圳前海微众银行股份有限公司 Data verification method and device
CN113343556A (en) * 2021-05-07 2021-09-03 青岛蓝智现代服务业数字工程技术研究中心 Supply chain optimizing system
US20230237029A1 (en) * 2022-01-25 2023-07-27 Dell Products L.P. Data deduplication in a storage system
CN116303385A (en) * 2023-02-13 2023-06-23 中国铁塔股份有限公司 Data auditing method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐全 等: ""基于多级自校验和多重切换的自适应相量算法"", 《南方电网技术》, vol. 13, no. 4, pages 18 - 24 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116975206A (en) * 2023-09-25 2023-10-31 华云天下(南京)科技有限公司 Vertical field training method and device based on AIGC large model and electronic equipment
CN116975206B (en) * 2023-09-25 2023-12-08 华云天下(南京)科技有限公司 Vertical field training method and device based on AIGC large model and electronic equipment

Also Published As

Publication number Publication date
CN116701381B (en) 2023-11-03

Similar Documents

Publication Publication Date Title
CN116701381B (en) Multistage verification system and method for distributed data acquisition and warehousing
CN112003942A (en) Method, system, node device and storage medium for responding to data request under link
CN110471945B (en) Active data processing method, system, computer equipment and storage medium
CN110188196B (en) Random forest based text increment dimension reduction method
CN115879748B (en) Enterprise informatization management integrated platform based on big data
CN116049157B (en) Quality data analysis method and system
CN116881718A (en) Artificial intelligence training method and system based on big data cleaning
CN115658620B (en) Data authorization sharing method and system based on big data
CN107908557B (en) Embedded software credible attribute modeling and verifying method
CN116541166A (en) Super-computing power scheduling server and resource management method
CN110597889A (en) Machine tool fault prediction method based on improved Apriori algorithm
CN116185797A (en) Method, device and storage medium for predicting server resource saturation
CN116227989A (en) Multidimensional business informatization supervision method and system
CN111400122A (en) Hard disk health degree assessment method and device
CN113986900A (en) Data quality problem grading processing method, storage medium and system
KR101542558B1 (en) Method for analyzing wafer yield map and recording medium
CN109522915B (en) Virus file clustering method and device and readable medium
CN113495831B (en) Method, system, equipment and medium for generating test case based on keywords
CN111104344B (en) D-S evidence theory-based distributed file system data reading method
CN110913033A (en) IDCIP address allocation method based on CNN convolutional neural network learning
CN117725437B (en) Machine learning-based data accurate matching analysis method
CN111104569A (en) Region segmentation method and device for database table and storage medium
CN117216490B (en) Intelligent big data acquisition system
CN117556187B (en) Cloud data restoration method and system based on deep learning and readable storage medium
CN111294610B (en) Video processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant