CN117271489A - Method, device, equipment and medium for verifying data - Google Patents

Method, device, equipment and medium for verifying data Download PDF

Info

Publication number
CN117271489A
CN117271489A CN202311219020.3A CN202311219020A CN117271489A CN 117271489 A CN117271489 A CN 117271489A CN 202311219020 A CN202311219020 A CN 202311219020A CN 117271489 A CN117271489 A CN 117271489A
Authority
CN
China
Prior art keywords
data
dictionary
warehousing system
type
combined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311219020.3A
Other languages
Chinese (zh)
Inventor
张冰
马文治
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202311219020.3A priority Critical patent/CN117271489A/en
Publication of CN117271489A publication Critical patent/CN117271489A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device, equipment and a medium for verifying data, which can be applied to the field of big data or the field of finance. By the method, when the data warehousing system receives new data, the data type and the data mode of the new data can be automatically identified, and the data mode of the new data and the data dictionary of the data warehousing system are combined, so that the data dictionary of the data warehousing system can be automatically updated; and simultaneously, historical data in the data warehousing system can be automatically verified according to the updated data dictionary, so that the data in the data warehousing system all conform to the definition in the data dictionary. Therefore, the verification efficiency of the data in the data warehousing system is improved, and the data dictionary in the data warehousing system can better reflect the current situation of the data.

Description

Method, device, equipment and medium for verifying data
Technical Field
The present application relates to the field of big data, and in particular, to a method, an apparatus, a device, and a medium for verifying data.
Background
The data warehousing system may be used to store a large amount of business data, while the data dictionary is an information set for describing data, and may define and describe data items, data structures, data flows, data storage, processing logic, and the like of the data.
In the prior art, data dictionary corresponding to the data warehousing system is arranged manually, so that the data in the data warehousing system is maintained. However, the above prior art has a problem of low efficiency.
Disclosure of Invention
In view of this, the present application provides a method and apparatus for verifying data, so as to achieve the purpose of improving the verification efficiency of data.
The method for verifying the data is realized by the following steps:
receiving and analyzing data to obtain the data type of the data;
combining the data mode corresponding to the data type with a data dictionary of the data warehousing system to obtain a combined data dictionary;
and verifying historical data in the data warehousing system according to the combined data dictionary.
Optionally, the data pattern corresponding to the data type includes:
rules corresponding to the data types and association relations corresponding to the data types.
Optionally, reversely verifying the historical data in the data warehousing system according to the combined data dictionary, including:
and screening out fields which do not accord with the rules and the association relations in the historical data based on the rules and the association relations contained in the combined data dictionary.
Optionally, merging the data mode corresponding to the data type with the data dictionary of the data warehousing system to obtain a merged data dictionary, and further including:
and generating a full-end display diagram of the combined data dictionary.
Optionally, after generating the full-end display diagram of the combined data dictionary, the method further includes:
and adding an identifier to the data mode corresponding to the data type in the full-end display diagram.
The application also provides a device for verifying data, which is applied to a data warehousing system and comprises: the system comprises an analysis module, a combination module and a verification module;
the analysis module is used for receiving and analyzing the data to obtain the data type of the data;
the merging module is used for merging the data mode corresponding to the data type with the data dictionary of the data warehousing system to obtain a merged data dictionary;
and the verification module is used for verifying the historical data in the data warehousing system according to the combined data dictionary.
Optionally, the verification module is specifically configured to screen out fields in the history data, which do not conform to the rule and the association relationship, based on the rule and the association relationship included in the merged data dictionary.
Optionally, the apparatus further comprises: a generating module;
and the generating module is used for generating the full-end display diagram of the combined data dictionary.
The present application also provides a computer device comprising: and the processor is coupled with the memory, at least one computer program instruction is stored in the memory, and the at least one computer program instruction is loaded and executed by the processor, so that the computer equipment realizes the data verification method.
The present application also provides a computer storage medium storing a computer program which, when executed, is adapted to carry out the method of verifying data as described above.
Therefore, the beneficial effects of this application are: when the data warehousing system receives new data, the data type and the data mode of the new data can be automatically identified, and the data mode of the new data and the data dictionary of the data warehousing system are combined, so that the data dictionary of the data warehousing system can be automatically updated; and simultaneously, historical data in the data warehousing system can be automatically verified according to the updated data dictionary, so that the data in the data warehousing system all conform to the definition in the data dictionary. Therefore, the verification efficiency of the data in the data warehousing system is improved, and the data dictionary in the data warehousing system can better reflect the current situation of the data.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings for a person skilled in the art.
FIG. 1 is a flow chart of a first embodiment of the present application;
FIG. 2 is a flow chart of a second embodiment of the present application;
FIG. 3 is a schematic view of an apparatus of the present application;
fig. 4 is a schematic diagram of a computer device of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The method, the device, the equipment and the medium for verifying the data can be used in the big data field or the financial field. The foregoing is merely exemplary, and is not intended to limit the application fields of the method, apparatus, device, and medium for verifying data provided in the present application.
The method and the device for verifying the data can be applied to a data warehousing system, so that the data in the data warehousing system can be verified.
In embodiments of the present application, the device that verifies the data may include, but is not limited to, a computer device.
The computer device may include: and the processor is coupled with the memory, and at least one computer program instruction is stored in the memory, and the at least one computer program instruction is loaded and executed by the processor so as to enable the computer equipment to realize the data verification method. The computer device is simply referred to as a computer in the following embodiments.
Referring to fig. 1, the specific steps of the first embodiment of the present application are as follows:
s101: and the computer receives and analyzes the data to obtain the data type of the data.
Because the data warehousing system often receives new data, and the data dictionary corresponding to the new data may be different from the data dictionary corresponding to the data warehousing system, the new data needs to be analyzed, so as to obtain the data dictionary corresponding to the new data, and the data dictionary corresponding to the data warehousing system is combined with the data dictionary corresponding to the data warehousing system so as to manage the data in the data warehousing system.
The new data received by the data warehousing system may originate from different databases and the different databases may be used to store different types of data, and thus the new data received by the data warehousing system may be of different types. At this time, the data type of the data may be an account number, a client number, an address, or a mobile phone number. Specifically, the data type of the data may be set according to actual requirements, and is not limited to the above-mentioned types.
S102: and the computer combines the data mode corresponding to the data type with the data dictionary of the data warehousing system to obtain a combined data dictionary.
The data pattern corresponding to the data is the data dictionary corresponding to the data. Specifically, the data patterns corresponding to the data types may include, but are not limited to: rules corresponding to the data types and association relations corresponding to the data types.
It should be noted that different data types may correspond to different data patterns. For example, data of an amount type is typically a number and data of an address type is typically a text, where rules corresponding to the amount type may be used to constrain the data to be a number and rules corresponding to the address type may be used to constrain the data to be a text. For example, the data of the amount type and the data of the client number type are generally numbers, but the number lengths may be different, and at this time, the rule corresponding to the amount type may be used to restrict the length of the data to be the length a, and the rule corresponding to the client number type may be used to restrict the length of the data to be the length B. Specifically, the data mode corresponding to the data may be set differently according to the actual requirement, and is not limited to the above example.
Meanwhile, the different data types may have relevance, and the relevance in the data mode can reflect the relevance between the different data and the relevance between the different data types.
For example, a customer may correspond to a plurality of accounts, that is, the data a of the customer number type may be associated with the data B, the data C and the data D of the account type, where the association relationship corresponding to the data a may represent the association between the data a and the data B, the association relationship corresponding to the data B may represent the association between the data B and the data a, the association relationship corresponding to the data C may represent the association between the data C and the data a, and the association relationship corresponding to the data D may also represent the association between the data D and the data a.
In addition, there may be a correlation between the same data types. If the new data received by the data warehouse system contains a plurality of texts, the association relationship can associate the data with the same data type in different texts.
It should be noted that, the computer combines the data mode corresponding to the data type with the data dictionary of the data warehousing system to obtain a combined data dictionary, that is, the data dictionary corresponding to the new data received by the data warehousing system is fused with the original data dictionary of the data warehousing system, and the fused data dictionary replaces the original data dictionary of the data warehousing system to be used as the new data dictionary of the data warehousing system.
S103: and the computer verifies the historical data in the data warehousing system according to the combined data dictionary.
Because the historical data in the data warehousing system may have irregular and unsatisfactory fields, such as messy code data, or meaningless data, or the original historical data does not accord with the definition in the combined data dictionary, the original historical data in the data warehousing system needs to be verified according to the combined data dictionary.
In some implementations, the "computer verifies historical data in the data warehousing system according to the merged data dictionary" may be implemented as follows: and screening out fields which do not accord with the rules and the association relations in the historical data based on the rules and the association relations contained in the combined data dictionary by the computer.
And (3) field screening is carried out on the historical data according to the rules, so that the messy code data and meaningless data can be screened out, and the historical data can be more accurate. Meanwhile, field screening is carried out on the historical data according to the association relation, if the data in a certain table is used for storing the data of the data type A, the data of the data type B appears, and at the moment, the field screening is carried out on the historical data, and the data of the data type B can be removed from the table.
In other implementations, the historical data in the data warehousing system may be verified according to the data dictionary of the new data, and then the data dictionary of the new data and the data dictionary of the data warehousing system are combined.
In other implementation manners, the computer combines the data mode corresponding to the data type with the data dictionary of the data warehousing system, so that after the combined data dictionary is obtained, a full-end display diagram of the combined data dictionary can be generated, the combined data dictionary can be displayed more intuitively, and usability of the combined data dictionary is improved.
Specifically, after the computer generates the full-end display diagram of the combined data dictionary, the identifier can be added to the data mode corresponding to the data type in the full-end display diagram, so that what the newly added content of the data dictionary of the data warehousing system is can be known more intuitively.
In the first embodiment of the application, when the data warehousing system receives new data, the data type and the data mode of the new data can be automatically identified, and the data mode of the new data and the data dictionary of the data warehousing system are combined, so that the data dictionary of the data warehousing system can be automatically updated; and simultaneously, historical data in the data warehousing system can be automatically verified according to the updated data dictionary, so that the data in the data warehousing system all conform to the definition in the data dictionary. Therefore, the verification efficiency of the data in the data warehousing system is improved, and the data dictionary in the data warehousing system can better reflect the current situation of the data.
Since the new data received by the data warehousing system may contain a variety of different types of data, this will be described below.
Referring to fig. 2, the steps of the second embodiment of the present application are as follows:
s201: and the computer receives and analyzes the data to obtain a data type A, a data type B and a data type C corresponding to the data.
In the present embodiment, the data includes data of data type a, data of data type B, and data of data type C. Specifically, the data type a may be a client number type, the data type B may be an address type, and the data type C may be a cell phone number type.
In some implementations, the data may include text a including a client number type and an address type and text B including a client number type and a cell phone number type. The client number type, the address type and the mobile phone number type can be obtained by receiving the two texts and carrying out data analysis on the data in the two texts.
S202: and the computer obtains the rule corresponding to the data type A, the data type B and the data type C respectively and the association relation among the data type A, the data type B and the data type C according to the data type A, the data type B and the data type C, thereby obtaining a data dictionary of the data.
Specifically, the rule corresponding to the client number type, the rule corresponding to the address type and the rule corresponding to the mobile phone number type can be set according to actual requirements, the client number type has an association relationship with the address type and the mobile phone number type respectively, and the association relationship is also respectively formed among the data of the client number type, the data of the address type and the data of the mobile phone number type.
S203: and combining the data dictionary of the data with the data dictionary of the data warehousing system by the computer to obtain a combined data dictionary.
S204: the computer obtains historical data in the data warehousing system.
In this embodiment, the acquired history data may include only the history data of the client number type, the address type, and the mobile phone number type.
S205: and screening out fields which do not accord with the rules and the association relations in the historical data based on the rules and the association relations contained in the combined data dictionary by the computer.
S206: the computer generates a full-end view of the merged data dictionary.
S207: the computer adds an identification to a data dictionary of the data in the full-end presentation graph.
It should be noted that, the execution sequence of steps S204 to S205 and steps S206 to S207 is not limited in this application, and steps S204 to S205 may be executed first, steps S206 to S207 may be executed later, steps S206 to S207 may be executed first, and steps S204 to S205 may be executed later.
Specifically, the identifier may be set according to actual requirements, and the specific manner of the identifier is not limited in this application.
In the second embodiment of the application, the data dictionary of the new data is obtained by analyzing the new data received by the data warehousing system and is combined with the data dictionary of the data warehousing system, so that the problems of confusion, redundancy, untimely updating and the like of the data dictionary can be solved, the automatic processing of the whole flow is realized, and the management efficiency of the data dictionary is improved; by displaying the combined data dictionary, the usability of the combined data dictionary can be increased.
Referring to fig. 3, the present application provides an apparatus 300 for verifying data, which is applied to a data warehousing system, and includes: an analysis module 301, a combination module 302 and a verification module 303.
Analysis module 301: the data type is used for receiving and analyzing the data to obtain the data.
The merge module 302: and the data storage system is used for combining the data mode corresponding to the data type with the data dictionary of the data storage system to obtain a combined data dictionary.
Verification module 303: and the historical data in the data warehousing system is verified according to the combined data dictionary.
By the device for verifying the data, when the data warehousing system receives new data, the data type and the data mode of the new data can be automatically identified, and the data mode of the new data and the data dictionary of the data warehousing system are combined, so that the data dictionary of the data warehousing system can be automatically updated; and simultaneously, historical data in the data warehousing system can be automatically verified according to the updated data dictionary, so that the data in the data warehousing system all conform to the definition in the data dictionary. Therefore, the verification efficiency of the data in the data warehousing system is improved, and the data dictionary in the data warehousing system can better reflect the current situation of the data.
Optionally, the verification module 303: the method is particularly used for screening out fields which do not accord with the rules and the association relations in the historical data based on the rules and the association relations contained in the merged data dictionary.
Optionally, an apparatus 300 for verifying data further comprises: a generation module 304.
The generating module 304: for generating a full-end representation of the merged data dictionary.
Optionally, an apparatus 300 for verifying data further comprises: the module 305 is added.
The adding module 305: and the method is used for adding the identification to the data mode corresponding to the data type in the full-end display diagram.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
It should be noted that: in the device for verifying data provided in the above embodiment, when the function of verifying data is implemented, only the division of the above functional modules is used for illustration, in practical application, the above functional allocation may be implemented by different functional modules according to needs, that is, the internal structure of the device for verifying data is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the device for verifying data and the method embodiment for verifying data provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the device for verifying data and the method embodiment are detailed in the detailed description of the method embodiment, which is not repeated here.
Referring to fig. 4, the present application further provides a computer device 400, including: a processor 401 and a memory 402.
The processor 401 is coupled to a memory 402, in which memory 402 at least one computer program instruction is stored, which is loaded and executed by the processor 401 to cause the computer arrangement to carry out a method of verifying data.
The present application also provides a computer storage medium storing a computer program which, when executed, is adapted to carry out the method of verifying data as described above.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of validating data for use in a data warehousing system, the method comprising:
receiving and analyzing data to obtain the data type of the data;
combining the data mode corresponding to the data type with the data dictionary of the data warehousing system to obtain a combined data dictionary;
and verifying historical data in the data warehousing system according to the combined data dictionary.
2. The method of claim 1, wherein the data pattern corresponding to the data type comprises:
rules corresponding to the data types and association relations corresponding to the data types.
3. The method of claim 1, wherein said reversely verifying historical data in the data warehousing system based on the merged data dictionary comprises:
and screening out fields which do not accord with the rules and the association relations in the historical data based on the rules and the association relations contained in the combined data dictionary.
4. The method of claim 1, wherein the merging the data pattern corresponding to the data type with the data dictionary of the data warehousing system to obtain the merged data dictionary further comprises:
and generating a full-end display diagram of the combined data dictionary.
5. The method of claim 4, further comprising, after generating the full-end representation of the merged data dictionary:
and adding an identifier to the data mode corresponding to the data type in the full-end display diagram.
6. An apparatus for validating data for use in a data warehousing system, the apparatus comprising: the system comprises an analysis module, a combination module and a verification module;
the analysis module is used for receiving and analyzing the data to obtain the data type of the data;
the merging module is used for merging the data mode corresponding to the data type with the data dictionary of the data warehousing system to obtain a merged data dictionary;
and the verification module is used for verifying the historical data in the data warehousing system according to the combined data dictionary.
7. The apparatus of claim 6, wherein the verification module is specifically configured to screen out fields in the history data that do not conform to the rules and the association relationships based on the rules and the association relationships included in the merged data dictionary.
8. The apparatus of claim 6, wherein the apparatus further comprises: a generating module;
and the generating module is used for generating the full-end display diagram of the combined data dictionary.
9. A computer device, the computer device comprising: a processor coupled to a memory having stored therein at least one computer program instruction that is loaded and executed by the processor to cause the computer arrangement to implement the method of any of claims 1-5.
10. A computer storage medium storing a computer program for implementing the method of any one of claims 1-5 when executed.
CN202311219020.3A 2023-09-20 2023-09-20 Method, device, equipment and medium for verifying data Pending CN117271489A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311219020.3A CN117271489A (en) 2023-09-20 2023-09-20 Method, device, equipment and medium for verifying data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311219020.3A CN117271489A (en) 2023-09-20 2023-09-20 Method, device, equipment and medium for verifying data

Publications (1)

Publication Number Publication Date
CN117271489A true CN117271489A (en) 2023-12-22

Family

ID=89210008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311219020.3A Pending CN117271489A (en) 2023-09-20 2023-09-20 Method, device, equipment and medium for verifying data

Country Status (1)

Country Link
CN (1) CN117271489A (en)

Similar Documents

Publication Publication Date Title
CN109299169B (en) Data visualization method, system, terminal and computer readable storage medium
CN111443912B (en) Component-based page rendering method, device, computer equipment and storage medium
CN110716951B (en) Label configuration method, device and equipment convenient to configure and storage medium
CN110334109B (en) Relational database data query method, system, medium and electronic device
WO2009006063A2 (en) Automatic designation of xbrl taxonomy tags
CN111339166A (en) Word stock-based matching recommendation method, electronic device and storage medium
CN112559101A (en) Page label processing method and device, computer equipment and medium
CN113835692A (en) Dictionary data processing method and device, electronic equipment and computer storage medium
US10503823B2 (en) Method and apparatus providing contextual suggestion in planning spreadsheet
CN110232156B (en) Information recommendation method and device based on long text
CN117271489A (en) Method, device, equipment and medium for verifying data
CN116127154A (en) Knowledge tag recommendation method and device, electronic equipment and storage medium
CN105893614A (en) Information recommendation method and device and electronic equipment
CN115617338A (en) Method and device for quickly generating service page and readable storage medium
CN114169306A (en) Method, device and equipment for generating electronic receipt and readable storage medium
CN115687704A (en) Information display method and device, electronic equipment and computer readable storage medium
US20080201652A1 (en) Techniques for viewing and managing work items and their relationships
US8639668B2 (en) Structured requirements management
US7996366B1 (en) Method and system for identifying stale directories
CN111191057A (en) User-defined retrieval method and device, electronic equipment and storage medium thereof
CN110032564A (en) A kind of determination method and apparatus of tables of data incidence relation
CN117112654B (en) City data display method, device, computer equipment and storage medium
CN116227452B (en) Method, apparatus, device and storage medium for analyzing templates using assembled cards
CN118331933A (en) Electronic file position code assignment method and device, storage medium and electronic equipment
US20140244685A1 (en) Method of searching and generating a relevant search string

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination