CN117271489A - Method, device, equipment and medium for verifying data - Google Patents
Method, device, equipment and medium for verifying data Download PDFInfo
- Publication number
- CN117271489A CN117271489A CN202311219020.3A CN202311219020A CN117271489A CN 117271489 A CN117271489 A CN 117271489A CN 202311219020 A CN202311219020 A CN 202311219020A CN 117271489 A CN117271489 A CN 117271489A
- Authority
- CN
- China
- Prior art keywords
- data
- dictionary
- warehousing system
- type
- combined
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000012795 verification Methods 0.000 claims abstract description 14
- 238000010586 diagram Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012216 screening Methods 0.000 claims description 8
- 238000004458 analytical method Methods 0.000 claims description 6
- 238000013500 data storage Methods 0.000 description 3
- 238000013524 data verification Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a method, a device, equipment and a medium for verifying data, which can be applied to the field of big data or the field of finance. By the method, when the data warehousing system receives new data, the data type and the data mode of the new data can be automatically identified, and the data mode of the new data and the data dictionary of the data warehousing system are combined, so that the data dictionary of the data warehousing system can be automatically updated; and simultaneously, historical data in the data warehousing system can be automatically verified according to the updated data dictionary, so that the data in the data warehousing system all conform to the definition in the data dictionary. Therefore, the verification efficiency of the data in the data warehousing system is improved, and the data dictionary in the data warehousing system can better reflect the current situation of the data.
Description
Technical Field
The present application relates to the field of big data, and in particular, to a method, an apparatus, a device, and a medium for verifying data.
Background
The data warehousing system may be used to store a large amount of business data, while the data dictionary is an information set for describing data, and may define and describe data items, data structures, data flows, data storage, processing logic, and the like of the data.
In the prior art, data dictionary corresponding to the data warehousing system is arranged manually, so that the data in the data warehousing system is maintained. However, the above prior art has a problem of low efficiency.
Disclosure of Invention
In view of this, the present application provides a method and apparatus for verifying data, so as to achieve the purpose of improving the verification efficiency of data.
The method for verifying the data is realized by the following steps:
receiving and analyzing data to obtain the data type of the data;
combining the data mode corresponding to the data type with a data dictionary of the data warehousing system to obtain a combined data dictionary;
and verifying historical data in the data warehousing system according to the combined data dictionary.
Optionally, the data pattern corresponding to the data type includes:
rules corresponding to the data types and association relations corresponding to the data types.
Optionally, reversely verifying the historical data in the data warehousing system according to the combined data dictionary, including:
and screening out fields which do not accord with the rules and the association relations in the historical data based on the rules and the association relations contained in the combined data dictionary.
Optionally, merging the data mode corresponding to the data type with the data dictionary of the data warehousing system to obtain a merged data dictionary, and further including:
and generating a full-end display diagram of the combined data dictionary.
Optionally, after generating the full-end display diagram of the combined data dictionary, the method further includes:
and adding an identifier to the data mode corresponding to the data type in the full-end display diagram.
The application also provides a device for verifying data, which is applied to a data warehousing system and comprises: the system comprises an analysis module, a combination module and a verification module;
the analysis module is used for receiving and analyzing the data to obtain the data type of the data;
the merging module is used for merging the data mode corresponding to the data type with the data dictionary of the data warehousing system to obtain a merged data dictionary;
and the verification module is used for verifying the historical data in the data warehousing system according to the combined data dictionary.
Optionally, the verification module is specifically configured to screen out fields in the history data, which do not conform to the rule and the association relationship, based on the rule and the association relationship included in the merged data dictionary.
Optionally, the apparatus further comprises: a generating module;
and the generating module is used for generating the full-end display diagram of the combined data dictionary.
The present application also provides a computer device comprising: and the processor is coupled with the memory, at least one computer program instruction is stored in the memory, and the at least one computer program instruction is loaded and executed by the processor, so that the computer equipment realizes the data verification method.
The present application also provides a computer storage medium storing a computer program which, when executed, is adapted to carry out the method of verifying data as described above.
Therefore, the beneficial effects of this application are: when the data warehousing system receives new data, the data type and the data mode of the new data can be automatically identified, and the data mode of the new data and the data dictionary of the data warehousing system are combined, so that the data dictionary of the data warehousing system can be automatically updated; and simultaneously, historical data in the data warehousing system can be automatically verified according to the updated data dictionary, so that the data in the data warehousing system all conform to the definition in the data dictionary. Therefore, the verification efficiency of the data in the data warehousing system is improved, and the data dictionary in the data warehousing system can better reflect the current situation of the data.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings for a person skilled in the art.
FIG. 1 is a flow chart of a first embodiment of the present application;
FIG. 2 is a flow chart of a second embodiment of the present application;
FIG. 3 is a schematic view of an apparatus of the present application;
fig. 4 is a schematic diagram of a computer device of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The method, the device, the equipment and the medium for verifying the data can be used in the big data field or the financial field. The foregoing is merely exemplary, and is not intended to limit the application fields of the method, apparatus, device, and medium for verifying data provided in the present application.
The method and the device for verifying the data can be applied to a data warehousing system, so that the data in the data warehousing system can be verified.
In embodiments of the present application, the device that verifies the data may include, but is not limited to, a computer device.
The computer device may include: and the processor is coupled with the memory, and at least one computer program instruction is stored in the memory, and the at least one computer program instruction is loaded and executed by the processor so as to enable the computer equipment to realize the data verification method. The computer device is simply referred to as a computer in the following embodiments.
Referring to fig. 1, the specific steps of the first embodiment of the present application are as follows:
s101: and the computer receives and analyzes the data to obtain the data type of the data.
Because the data warehousing system often receives new data, and the data dictionary corresponding to the new data may be different from the data dictionary corresponding to the data warehousing system, the new data needs to be analyzed, so as to obtain the data dictionary corresponding to the new data, and the data dictionary corresponding to the data warehousing system is combined with the data dictionary corresponding to the data warehousing system so as to manage the data in the data warehousing system.
The new data received by the data warehousing system may originate from different databases and the different databases may be used to store different types of data, and thus the new data received by the data warehousing system may be of different types. At this time, the data type of the data may be an account number, a client number, an address, or a mobile phone number. Specifically, the data type of the data may be set according to actual requirements, and is not limited to the above-mentioned types.
S102: and the computer combines the data mode corresponding to the data type with the data dictionary of the data warehousing system to obtain a combined data dictionary.
The data pattern corresponding to the data is the data dictionary corresponding to the data. Specifically, the data patterns corresponding to the data types may include, but are not limited to: rules corresponding to the data types and association relations corresponding to the data types.
It should be noted that different data types may correspond to different data patterns. For example, data of an amount type is typically a number and data of an address type is typically a text, where rules corresponding to the amount type may be used to constrain the data to be a number and rules corresponding to the address type may be used to constrain the data to be a text. For example, the data of the amount type and the data of the client number type are generally numbers, but the number lengths may be different, and at this time, the rule corresponding to the amount type may be used to restrict the length of the data to be the length a, and the rule corresponding to the client number type may be used to restrict the length of the data to be the length B. Specifically, the data mode corresponding to the data may be set differently according to the actual requirement, and is not limited to the above example.
Meanwhile, the different data types may have relevance, and the relevance in the data mode can reflect the relevance between the different data and the relevance between the different data types.
For example, a customer may correspond to a plurality of accounts, that is, the data a of the customer number type may be associated with the data B, the data C and the data D of the account type, where the association relationship corresponding to the data a may represent the association between the data a and the data B, the association relationship corresponding to the data B may represent the association between the data B and the data a, the association relationship corresponding to the data C may represent the association between the data C and the data a, and the association relationship corresponding to the data D may also represent the association between the data D and the data a.
In addition, there may be a correlation between the same data types. If the new data received by the data warehouse system contains a plurality of texts, the association relationship can associate the data with the same data type in different texts.
It should be noted that, the computer combines the data mode corresponding to the data type with the data dictionary of the data warehousing system to obtain a combined data dictionary, that is, the data dictionary corresponding to the new data received by the data warehousing system is fused with the original data dictionary of the data warehousing system, and the fused data dictionary replaces the original data dictionary of the data warehousing system to be used as the new data dictionary of the data warehousing system.
S103: and the computer verifies the historical data in the data warehousing system according to the combined data dictionary.
Because the historical data in the data warehousing system may have irregular and unsatisfactory fields, such as messy code data, or meaningless data, or the original historical data does not accord with the definition in the combined data dictionary, the original historical data in the data warehousing system needs to be verified according to the combined data dictionary.
In some implementations, the "computer verifies historical data in the data warehousing system according to the merged data dictionary" may be implemented as follows: and screening out fields which do not accord with the rules and the association relations in the historical data based on the rules and the association relations contained in the combined data dictionary by the computer.
And (3) field screening is carried out on the historical data according to the rules, so that the messy code data and meaningless data can be screened out, and the historical data can be more accurate. Meanwhile, field screening is carried out on the historical data according to the association relation, if the data in a certain table is used for storing the data of the data type A, the data of the data type B appears, and at the moment, the field screening is carried out on the historical data, and the data of the data type B can be removed from the table.
In other implementations, the historical data in the data warehousing system may be verified according to the data dictionary of the new data, and then the data dictionary of the new data and the data dictionary of the data warehousing system are combined.
In other implementation manners, the computer combines the data mode corresponding to the data type with the data dictionary of the data warehousing system, so that after the combined data dictionary is obtained, a full-end display diagram of the combined data dictionary can be generated, the combined data dictionary can be displayed more intuitively, and usability of the combined data dictionary is improved.
Specifically, after the computer generates the full-end display diagram of the combined data dictionary, the identifier can be added to the data mode corresponding to the data type in the full-end display diagram, so that what the newly added content of the data dictionary of the data warehousing system is can be known more intuitively.
In the first embodiment of the application, when the data warehousing system receives new data, the data type and the data mode of the new data can be automatically identified, and the data mode of the new data and the data dictionary of the data warehousing system are combined, so that the data dictionary of the data warehousing system can be automatically updated; and simultaneously, historical data in the data warehousing system can be automatically verified according to the updated data dictionary, so that the data in the data warehousing system all conform to the definition in the data dictionary. Therefore, the verification efficiency of the data in the data warehousing system is improved, and the data dictionary in the data warehousing system can better reflect the current situation of the data.
Since the new data received by the data warehousing system may contain a variety of different types of data, this will be described below.
Referring to fig. 2, the steps of the second embodiment of the present application are as follows:
s201: and the computer receives and analyzes the data to obtain a data type A, a data type B and a data type C corresponding to the data.
In the present embodiment, the data includes data of data type a, data of data type B, and data of data type C. Specifically, the data type a may be a client number type, the data type B may be an address type, and the data type C may be a cell phone number type.
In some implementations, the data may include text a including a client number type and an address type and text B including a client number type and a cell phone number type. The client number type, the address type and the mobile phone number type can be obtained by receiving the two texts and carrying out data analysis on the data in the two texts.
S202: and the computer obtains the rule corresponding to the data type A, the data type B and the data type C respectively and the association relation among the data type A, the data type B and the data type C according to the data type A, the data type B and the data type C, thereby obtaining a data dictionary of the data.
Specifically, the rule corresponding to the client number type, the rule corresponding to the address type and the rule corresponding to the mobile phone number type can be set according to actual requirements, the client number type has an association relationship with the address type and the mobile phone number type respectively, and the association relationship is also respectively formed among the data of the client number type, the data of the address type and the data of the mobile phone number type.
S203: and combining the data dictionary of the data with the data dictionary of the data warehousing system by the computer to obtain a combined data dictionary.
S204: the computer obtains historical data in the data warehousing system.
In this embodiment, the acquired history data may include only the history data of the client number type, the address type, and the mobile phone number type.
S205: and screening out fields which do not accord with the rules and the association relations in the historical data based on the rules and the association relations contained in the combined data dictionary by the computer.
S206: the computer generates a full-end view of the merged data dictionary.
S207: the computer adds an identification to a data dictionary of the data in the full-end presentation graph.
It should be noted that, the execution sequence of steps S204 to S205 and steps S206 to S207 is not limited in this application, and steps S204 to S205 may be executed first, steps S206 to S207 may be executed later, steps S206 to S207 may be executed first, and steps S204 to S205 may be executed later.
Specifically, the identifier may be set according to actual requirements, and the specific manner of the identifier is not limited in this application.
In the second embodiment of the application, the data dictionary of the new data is obtained by analyzing the new data received by the data warehousing system and is combined with the data dictionary of the data warehousing system, so that the problems of confusion, redundancy, untimely updating and the like of the data dictionary can be solved, the automatic processing of the whole flow is realized, and the management efficiency of the data dictionary is improved; by displaying the combined data dictionary, the usability of the combined data dictionary can be increased.
Referring to fig. 3, the present application provides an apparatus 300 for verifying data, which is applied to a data warehousing system, and includes: an analysis module 301, a combination module 302 and a verification module 303.
Analysis module 301: the data type is used for receiving and analyzing the data to obtain the data.
The merge module 302: and the data storage system is used for combining the data mode corresponding to the data type with the data dictionary of the data storage system to obtain a combined data dictionary.
Verification module 303: and the historical data in the data warehousing system is verified according to the combined data dictionary.
By the device for verifying the data, when the data warehousing system receives new data, the data type and the data mode of the new data can be automatically identified, and the data mode of the new data and the data dictionary of the data warehousing system are combined, so that the data dictionary of the data warehousing system can be automatically updated; and simultaneously, historical data in the data warehousing system can be automatically verified according to the updated data dictionary, so that the data in the data warehousing system all conform to the definition in the data dictionary. Therefore, the verification efficiency of the data in the data warehousing system is improved, and the data dictionary in the data warehousing system can better reflect the current situation of the data.
Optionally, the verification module 303: the method is particularly used for screening out fields which do not accord with the rules and the association relations in the historical data based on the rules and the association relations contained in the merged data dictionary.
Optionally, an apparatus 300 for verifying data further comprises: a generation module 304.
The generating module 304: for generating a full-end representation of the merged data dictionary.
Optionally, an apparatus 300 for verifying data further comprises: the module 305 is added.
The adding module 305: and the method is used for adding the identification to the data mode corresponding to the data type in the full-end display diagram.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
It should be noted that: in the device for verifying data provided in the above embodiment, when the function of verifying data is implemented, only the division of the above functional modules is used for illustration, in practical application, the above functional allocation may be implemented by different functional modules according to needs, that is, the internal structure of the device for verifying data is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the device for verifying data and the method embodiment for verifying data provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the device for verifying data and the method embodiment are detailed in the detailed description of the method embodiment, which is not repeated here.
Referring to fig. 4, the present application further provides a computer device 400, including: a processor 401 and a memory 402.
The processor 401 is coupled to a memory 402, in which memory 402 at least one computer program instruction is stored, which is loaded and executed by the processor 401 to cause the computer arrangement to carry out a method of verifying data.
The present application also provides a computer storage medium storing a computer program which, when executed, is adapted to carry out the method of verifying data as described above.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A method of validating data for use in a data warehousing system, the method comprising:
receiving and analyzing data to obtain the data type of the data;
combining the data mode corresponding to the data type with the data dictionary of the data warehousing system to obtain a combined data dictionary;
and verifying historical data in the data warehousing system according to the combined data dictionary.
2. The method of claim 1, wherein the data pattern corresponding to the data type comprises:
rules corresponding to the data types and association relations corresponding to the data types.
3. The method of claim 1, wherein said reversely verifying historical data in the data warehousing system based on the merged data dictionary comprises:
and screening out fields which do not accord with the rules and the association relations in the historical data based on the rules and the association relations contained in the combined data dictionary.
4. The method of claim 1, wherein the merging the data pattern corresponding to the data type with the data dictionary of the data warehousing system to obtain the merged data dictionary further comprises:
and generating a full-end display diagram of the combined data dictionary.
5. The method of claim 4, further comprising, after generating the full-end representation of the merged data dictionary:
and adding an identifier to the data mode corresponding to the data type in the full-end display diagram.
6. An apparatus for validating data for use in a data warehousing system, the apparatus comprising: the system comprises an analysis module, a combination module and a verification module;
the analysis module is used for receiving and analyzing the data to obtain the data type of the data;
the merging module is used for merging the data mode corresponding to the data type with the data dictionary of the data warehousing system to obtain a merged data dictionary;
and the verification module is used for verifying the historical data in the data warehousing system according to the combined data dictionary.
7. The apparatus of claim 6, wherein the verification module is specifically configured to screen out fields in the history data that do not conform to the rules and the association relationships based on the rules and the association relationships included in the merged data dictionary.
8. The apparatus of claim 6, wherein the apparatus further comprises: a generating module;
and the generating module is used for generating the full-end display diagram of the combined data dictionary.
9. A computer device, the computer device comprising: a processor coupled to a memory having stored therein at least one computer program instruction that is loaded and executed by the processor to cause the computer arrangement to implement the method of any of claims 1-5.
10. A computer storage medium storing a computer program for implementing the method of any one of claims 1-5 when executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311219020.3A CN117271489A (en) | 2023-09-20 | 2023-09-20 | Method, device, equipment and medium for verifying data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311219020.3A CN117271489A (en) | 2023-09-20 | 2023-09-20 | Method, device, equipment and medium for verifying data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117271489A true CN117271489A (en) | 2023-12-22 |
Family
ID=89210008
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311219020.3A Pending CN117271489A (en) | 2023-09-20 | 2023-09-20 | Method, device, equipment and medium for verifying data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117271489A (en) |
-
2023
- 2023-09-20 CN CN202311219020.3A patent/CN117271489A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299169B (en) | Data visualization method, system, terminal and computer readable storage medium | |
CN111443912B (en) | Component-based page rendering method, device, computer equipment and storage medium | |
CN110716951B (en) | Label configuration method, device and equipment convenient to configure and storage medium | |
CN110334109B (en) | Relational database data query method, system, medium and electronic device | |
WO2009006063A2 (en) | Automatic designation of xbrl taxonomy tags | |
CN111339166A (en) | Word stock-based matching recommendation method, electronic device and storage medium | |
CN112559101A (en) | Page label processing method and device, computer equipment and medium | |
CN113835692A (en) | Dictionary data processing method and device, electronic equipment and computer storage medium | |
US10503823B2 (en) | Method and apparatus providing contextual suggestion in planning spreadsheet | |
CN110232156B (en) | Information recommendation method and device based on long text | |
CN117271489A (en) | Method, device, equipment and medium for verifying data | |
CN116127154A (en) | Knowledge tag recommendation method and device, electronic equipment and storage medium | |
CN105893614A (en) | Information recommendation method and device and electronic equipment | |
CN115617338A (en) | Method and device for quickly generating service page and readable storage medium | |
CN114169306A (en) | Method, device and equipment for generating electronic receipt and readable storage medium | |
CN115687704A (en) | Information display method and device, electronic equipment and computer readable storage medium | |
US20080201652A1 (en) | Techniques for viewing and managing work items and their relationships | |
US8639668B2 (en) | Structured requirements management | |
US7996366B1 (en) | Method and system for identifying stale directories | |
CN111191057A (en) | User-defined retrieval method and device, electronic equipment and storage medium thereof | |
CN110032564A (en) | A kind of determination method and apparatus of tables of data incidence relation | |
CN117112654B (en) | City data display method, device, computer equipment and storage medium | |
CN116227452B (en) | Method, apparatus, device and storage medium for analyzing templates using assembled cards | |
CN118331933A (en) | Electronic file position code assignment method and device, storage medium and electronic equipment | |
US20140244685A1 (en) | Method of searching and generating a relevant search string |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |