CN110008193A - Data normalization method and device - Google Patents

Data normalization method and device Download PDF

Info

Publication number
CN110008193A
CN110008193A CN201910304451.7A CN201910304451A CN110008193A CN 110008193 A CN110008193 A CN 110008193A CN 201910304451 A CN201910304451 A CN 201910304451A CN 110008193 A CN110008193 A CN 110008193A
Authority
CN
China
Prior art keywords
metadata
data
professional standard
standard library
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910304451.7A
Other languages
Chinese (zh)
Other versions
CN110008193B (en
Inventor
刘俊良
廖华琛
王怡君
王双
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sefon Software Co Ltd
Original Assignee
Chengdu Sefon Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sefon Software Co Ltd filed Critical Chengdu Sefon Software Co Ltd
Priority to CN201910304451.7A priority Critical patent/CN110008193B/en
Publication of CN110008193A publication Critical patent/CN110008193A/en
Application granted granted Critical
Publication of CN110008193B publication Critical patent/CN110008193B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • G06F16/1794Details of file format conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a kind of data normalization method and device, and the metadata by the metadata of service database successively with multiple standard databases is compared, and finds out identical metadata, and be identified as similar metadata.For difference metadata different between the standard database in service database.Calculate the similarity between the sample data prestored in the corresponding data of difference metadata and the service database.The corresponding metadata of sample data that data similarity is greater than preset threshold is identified as similar metadata in industry java standard library.It is identified as the quantity of the metadata of the similar metadata in statistics the sector java standard library, the most professional standard library of the quantity is determined as the immediate professional standard library of service database.

Description

Data normalization method and device
Technical field
This application involves data processing fields, in particular to a kind of data normalization method and device.
Background technique
With the universal and development of information technology, the level of informatization of government and enterprise is higher and higher, and then leads to business Data volume also further increases.In face of a large amount of business datum, the data model accurately and standardized has efficiently and quickly been established As trend.But a large amount of professional standard is faced, actual traffic data is established between existing standard by manual identified mode Relationship can devote a tremendous amount of time and energy.
Summary of the invention
In order to overcome at least one deficiency in the prior art, the first purpose of the application is to provide a kind of data standard Change method, is applied to data processing equipment, and the data processing equipment prestores multiple professional standard libraries, the professional standard library Prestore sample data;The described method includes:
Obtain service database;
For each professional standard library, by first number of the metadata in the professional standard library and the service database According to being compared;
Metadata identical with the service database in the professional standard library is identified as similar metadata;
For difference metadata different between the professional standard library in the service database, the difference is calculated Data similarity is more than default by the similarity between sample data in the corresponding data of metadata and the professional standard library Metadata corresponding to the sample data of threshold value is identified as similar metadata in the professional standard library;
The quantity that the metadata of the similar metadata is identified as in each professional standard library is counted, by the number It measures most professional standard libraries and is determined as the immediate professional standard library of the service database.
Optionally, the sample data calculated in the corresponding data of the difference metadata and the professional standard library it Between similarity the step of include:
Pass through the sample in the corresponding data of difference metadata described in artificial neural networks and the professional standard library Similarity between data.
Optionally, the method also includes:
Standard information database is created according to the similar metadata in the immediate professional standard library;
Data corresponding with the similar metadata in the immediate professional standard library are obtained from the service database, It is stored in the standard information database.
Optionally, the data processing equipment further includes industry shared information library, the method also includes:
The metadata of the metadata in industry shared information library and the standard information database is compared, is determined Out in the standard information database with the identical shared metadata in industry shared information library;
According to the corresponding data creation shared data table of the shared metadata.
Optionally, the method also includes:
For each shared data table, corresponding interface is provided, so that other equipment pass through described in interface acquisition Data in shared data table.
Optionally, the metadata includes field name, it is described by the professional standard library with the service database The step of identical metadata is identified as similar metadata include:
Field name identical with the service database in the professional standard library is identified as similar metadata.
Optionally, the metadata further includes table name, field type and field length.
The another object of the embodiment of the present application is to provide a kind of data normalization device, is applied to data processing equipment, The data processing equipment prestores multiple professional standard libraries, and the professional standard library prestores sample data, the data mark Quasi- makeup is set including obtaining module, comparison module, mark module, similarity calculation module and statistical module;
The acquisition module is for obtaining service database;
The comparison module be used for be directed to each professional standard library, by the metadata in the professional standard library with it is described The metadata of service database is compared;
The mark module is for metadata identical with the service database in the professional standard library to be identified as Similar metadata;
The similarity calculation module is used for for different between the professional standard library in the service database Difference metadata calculates similar between the corresponding data of the difference metadata and the sample data in the professional standard library Data similarity is more than that metadata corresponding to the sample data of preset threshold is identified as phase in the professional standard library by degree Like metadata;
The statistical module is for counting the first number for being identified as the similar metadata in each professional standard library According to quantity, the most professional standard library of the quantity is determined as the immediate professional standard library of the service database.
Optionally, the comparison module is in the following manner by the member of the metadata of the sector java standard library and service database Data are compared:
Pass through the sample in the corresponding data of difference metadata described in artificial neural networks and the professional standard library Similarity between data.
Optionally, the data normalization device further includes creation module, writing module;
The creation module is used to create standard information according to the similar metadata in the immediate professional standard library Database;
The write module is used to obtain from the service database similar in the immediate professional standard library The corresponding data of metadata, are stored in the standard information database.
In terms of existing technologies, the application has the advantages that
The embodiment of the present application provides a kind of data normalization method and device, by the metadata of service database successively with more The metadata of a standard database is compared, and finds out identical metadata, and be identified as similar metadata.For business datum Difference metadata different between the standard database in library.Calculate the corresponding data of difference metadata and the business datum The similarity between sample data prestored in library.Data similarity is greater than to the corresponding metadata of sample data of preset threshold Similar metadata is identified as in industry java standard library.The metadata of the similar metadata is identified as in statistics the sector java standard library Quantity, the most professional standard library of the quantity is determined as the immediate professional standard library of service database.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 is the block diagram of data processing equipment provided by the embodiments of the present application;
Fig. 2 is the step flow chart of data normalization method provided by the embodiments of the present application;
Fig. 3 is business datum table provided by the embodiments of the present application and industry standard data table contrast schematic diagram;
Fig. 4 is one of the structural schematic diagram of data normalization device provided by the embodiments of the present application;
Fig. 5 is the second structural representation of data normalization device provided by the embodiments of the present application.
Icon: 100- data processing equipment;130- processor;120- memory;110- data normalization device;500- industry Business tables of data;600- industry standard data table;1101- obtains module;1102- comparison module;1103- mark module;1104- phase Like degree computing module;1105- statistical module;1106- creation module;1107- writing module.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is implemented The component of example can be arranged and be designed with a variety of different configurations.
Therefore, the detailed description of the embodiments herein provided in the accompanying drawings is not intended to limit below claimed Scope of the present application, but be merely representative of the selected embodiment of the application.Based on the embodiment in the application, this field is common Technical staff's every other embodiment obtained without creative efforts belongs to the model of the application protection It encloses.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.
Please refer to Fig. 1, Fig. 1 is the block diagram of data processing equipment 100 provided by the embodiments of the present application, at the data Managing equipment 100 includes data normalization device 110, memory 120 and processor 130.
The memory 120 and each element of processor 130 are directly or indirectly electrically connected between each other, to realize data Transmission or interaction.Electrically connect for example, these elements can be realized between each other by one or more communication bus or signal wire It connects.The data normalization device 110 includes described at least one can be stored in the form of software or firmware (firmware) In memory 120 or the software function that is solidificated in the operating system (operating system, OS) of data processing equipment 100 Module.The processor 130 is for executing the executable module stored in the memory 120, such as the data normalization Software function module included by device 110 and computer program etc..
The data processing equipment 100 may be, but not limited to, smart phone, PC (personal Computer, PC), tablet computer, personal digital assistant (personal digital assistant, PDA), mobile Internet access set Standby (mobile Internet device, MID) etc..
Wherein, the memory 120 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..Wherein, memory 120 is for storing program, the processor 130 after receiving and executing instruction, Execute described program.
The processor 130 may be a kind of IC chip, the processing capacity with signal.Above-mentioned processor can To be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;Can also be digital signal processor (DSP), specific integrated circuit (ASIC), Field programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hard Part component.It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present application.General processor It can be microprocessor or the processor be also possible to any conventional processor etc..
Referring to figure 2., Fig. 2 is the step process of the data normalization method applied to data processing equipment 100 shown in Fig. 1 Figure, the data processing equipment 100 prestore multiple professional standard libraries, and the sector java standard library prestores sample data;Below should Each step of data normalization method is described in detail.
Step S100 obtains service database.
Optionally, the sector java standard library is the database for recording typical data in various industries.For example, in a kind of possibility Example in, the professional standard library of education sector includes the data such as student name, student class, students' genders and student performance. The professional standard library of financial industry includes the data such as capital, interest rate, depositor's title, gender and the time limit.The data processing equipment 100 link service databases, obtain the metadata of the service database, the metadata of the service database includes database name Title, table name, field name and field type.
Step S200, for each professional standard library, by the metadata in the professional standard library and the business number It is compared according to the metadata in library.
Metadata identical with the service database in the professional standard library is identified as similar finite element number by step S300 According to.
Optionally, for each professional standard library, the data processing equipment 100 as target industry java standard library, Metadata in service database is compared with the metadata in the target industry java standard library, finds out identical metadata.It should Identical metadata token is similar metadata by data processing equipment 100.For example, referring to figure 3., in a kind of possible example In, which includes field name.Business datum table 500 include field name " age ", " fisrtname " and "lastname".Industry standard data table 600 includes field name " age ", " number " and " name ".The data processing equipment 100 are compared the normal data table 600 of the same trade of business datum table 500, wherein " age " field name is identical, general " age " field mark is similar metadata.
Optionally, in order to further ensure that the corresponding data of identical metadata in service database and professional standard library It is similar.The data processing equipment 100 is respectively by service database number corresponding with metadata identical in professional standard library According to doing similarity calculation.The metadata that similarity is greater than preset threshold is identified as similar metadata.Referring to figure 2., the data Processing equipment 100 does the corresponding data of service database " age " field data corresponding with " age " field in professional standard library Similarity calculation.
It is whether identical by comparing metadata, quickly filter out similar first number in service database and professional standard library According to.Due to different developers, for identical data, naming Data field names, there may be discrepancy, for example, being directed to student Total marks of the examination, field name may be named as " score " or " achievement " by different developers.Pass through simple first number It is that can not judge whether the two is similar according to comparing.
Step S400, for difference metadata different between the professional standard library in the service database, meter The similarity between the sample data in the corresponding data of the difference metadata and the professional standard library is calculated, data are similar Degree is more than that metadata corresponding to the sample data of preset threshold is identified as similar metadata in the professional standard library.
Optionally, not identical since there may be field names in service database, but the similar repetition of real data Field.The data processing equipment 100 is by the whole in the corresponding data same industry java standard library of difference metadata in service database Sample data does similarity calculation, is more than metadata corresponding to the sample data of preset threshold in the row by data similarity Similar metadata is identified as in industry java standard library.
In a kind of embodiment provided by the present application, the data processing equipment 100 is by the corresponding data of difference metadata And all sample data inputs artificial neural network in professional standard library, calculate the corresponding data of each difference metadata with Similarity in professional standard library between the corresponding sample data of each metadata.The data processing equipment 100 is by similarity Metadata corresponding greater than the sample data of preset threshold is identified as similar metadata.
In another embodiment provided by the present application, which is successively selected from difference metadata Target difference metadata is taken, by the corresponding sample of each metadata in the corresponding data same industry java standard library of target difference metadata Notebook data carries out similarity calculation, and the corresponding metadata of sample data that similarity is greater than preset threshold is identified as similar finite element number According to.Referring again to Fig. 3, the difference metadata in business datum table 500 is " lastname " and " firstname ".At data Equipment 100 is managed by " age " field in the corresponding data same industry normal data table 600 of " lastname " field, " number " Field and " name " field carry out similarity calculation respectively.Data processing equipment 100 is corresponding by " firstname " field again " age " field, " number " field and " name " field in data same industry normal data table 600 carry out similarity meter respectively It calculates.If the similarity of " lastname " field and " age " field, " number " field and " name " is respectively 0.2,0.1,0.7, Wherein, the preset threshold of similarity is 0.6.Then data processing equipment 100 is by " name " field in industry standard data table 600 It is identified as similar field corresponding with " lastname " field.
Step S500 counts the number that the metadata of the similar metadata is identified as in each professional standard library Amount, is determined as the immediate professional standard library of the service database for the most professional standard library of the quantity.
Optionally, since the data processing equipment 100 prestores multiple professional standard libraries, each professional standard library is counted In be marked as similar field metadata quantity, the most professional standard library of similar metadata quantity is determined as and business The immediate professional standard library of database.
Optionally, which creates standard according to the similar metadata in immediate professional standard library Information database.The data processing equipment 100 obtains and the similar finite element number in immediate professional standard library from service database According to corresponding data, it is stored in the standard information database.
Referring once again to Fig. 3, data processing equipment 100 mentions " name " field in professional standard library with " age " field It takes out, and standard information database is created according to " name " field and " age " field.And it will be in business datum table 500 " age " field data corresponding with " lastname " field are stored in the standard information database.It is worth noting that at the data It manages equipment 100 data in business datum table 500 are stored in standard information library, if data type or data length be not identical, It will do it and do corresponding processing.
Optionally, data processing equipment 100 further includes industry shared information library, by the metadata in industry shared information library and The metadata of standard information database is compared, and is determined in the standard information database and in trade information shared information library Identical shared metadata.The data processing equipment 100 shares the corresponding data creation shared data table of first number according to this.
Optionally, for each shared data table, corresponding interface is provided, so that other equipment can be with by the interface Data in accessing shared data table.
The embodiment of the present application also provides a kind of data normalization device 110, is applied to data processing equipment 100, at the tree Reason equipment prestores multiple professional standard libraries, and the sector java standard library prestores sample data.Referring to figure 4., the data normalization Device 110 includes obtaining module 1101, comparison module 1102, mark module 1103, similarity calculation module 1104 and statistics mould Block 1105.
The acquisition module 1101 is for obtaining service database.
In the present embodiment, which is used to execute the step S100 in Fig. 2, about the acquisition module 1101 Detailed description can refer to step S100 detailed description.
The comparison module 1102 is used to be directed to each professional standard library, by the metadata in the professional standard library and institute The metadata for stating service database is compared.
In the present embodiment, which is used to execute the step S200 in Fig. 2, about the comparison module 1102 Detailed description can refer to step S200 detailed description.
The mark module 1103 is used to identify metadata identical with the service database in the professional standard library For similar metadata.
In the present embodiment, which is used to execute the step S300 in Fig. 2, about the mark module 1103 Detailed description can refer to step S300 detailed description.
The similarity calculation module 1104 is used for for different between the professional standard library in the service database Difference metadata, calculate the phase between the corresponding data of the difference metadata and the sample data in the professional standard library It is more than that metadata corresponding to the sample data of preset threshold is identified as in the professional standard library by data similarity like degree Similar metadata.
In the present embodiment, which is used to execute the step S400 in Fig. 2, about the similarity The detailed description of computing module 1104 can refer to the detailed description of step S400.
The statistical module 1105 is for counting the member for being identified as the similar metadata in each professional standard library The most professional standard library of the quantity is determined as the immediate professional standard library of the service database by the quantity of data.
In the present embodiment, which is used to execute the step S500 in Fig. 2, about statistical module 1105 Detailed description can refer to the detailed description of step S500.
Optionally, the comparison module 1102 is in the following manner by the metadata and service database of the sector java standard library Metadata be compared:
Pass through the sample in the corresponding data of difference metadata described in artificial neural networks and the professional standard library Similarity between data.
Referring once again to Fig. 5, which further includes creation module 1106, writing module 1107.
The creation module 1106 is used for according to the similar metadata creation standard letter in the immediate professional standard library Cease database.
The writing module 1107 is used to obtain and the phase in the immediate professional standard library from the service database Like the corresponding data of metadata, it is stored in the standard information database.
In conclusion the embodiment of the present application provides a kind of data normalization method and device, by first number of service database It is compared according to the metadata successively with multiple standard databases, finds out identical metadata, and be identified as similar metadata.Needle To difference metadata different between the standard database in service database.Calculate the corresponding data of difference metadata with The similarity between sample data prestored in the service database.Data similarity is greater than to the sample data pair of preset threshold The metadata answered is identified as similar metadata in industry java standard library.The similar finite element number is identified as in statistics the sector java standard library According to metadata quantity, the most professional standard library of the quantity is determined as the immediate professional standard of service database Library.
In embodiment provided herein, it should be understood that disclosed device and method, it can also be by other Mode realize.The apparatus embodiments described above are merely exemplary, for example, the flow chart and block diagram in attached drawing are shown According to device, the architectural framework in the cards of method and computer program product, function of multiple embodiments of the application And operation.In this regard, each box in flowchart or block diagram can represent one of a module, section or code Point, a part of the module, section or code includes one or more for implementing the specified logical function executable Instruction.It should also be noted that function marked in the box can also be attached to be different from some implementations as replacement The sequence marked in figure occurs.For example, two continuous boxes can actually be basically executed in parallel, they sometimes may be used To execute in the opposite order, this depends on the function involved.It is also noted that each of block diagram and or flow chart The combination of box in box and block diagram and or flow chart can be based on the defined function of execution or the dedicated of movement The system of hardware is realized, or can be realized using a combination of dedicated hardware and computer instructions.
In addition, each functional module in each embodiment of the application can integrate one independent portion of formation together Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module It is stored in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) execute each embodiment the method for the application all or part of the steps. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
The above, the only various embodiments of the application, but the protection scope of the application is not limited thereto, it is any Those familiar with the art within the technical scope of the present application, can easily think of the change or the replacement, and should all contain Lid is within the scope of protection of this application.Therefore, the protection scope of the application shall be subject to the protection scope of the claim.

Claims (10)

1. a kind of data normalization method, which is characterized in that be applied to data processing equipment, the data processing equipment prestores Multiple professional standard libraries, the professional standard library prestore sample data;The described method includes:
Obtain service database;
For each professional standard library, by the metadata in the professional standard library and the metadata of the service database into Row compares;
Metadata identical with the service database in the professional standard library is identified as similar metadata;
For difference metadata different between the professional standard library in the service database, the difference member number is calculated It is more than preset threshold by data similarity according to the similarity between the sample data in corresponding data and the professional standard library Sample data corresponding to metadata be identified as similar metadata in the professional standard library;
The quantity that the metadata of the similar metadata is identified as in each professional standard library is counted, most by the quantity More professional standard libraries is determined as the immediate professional standard library of the service database.
2. data normalization method according to claim 1, which is characterized in that the calculating difference metadata is corresponding Data and the professional standard library in sample data between similarity the step of include:
Pass through the sample data in the corresponding data of difference metadata described in artificial neural networks and the professional standard library Between similarity.
3. data normalization method according to claim 1, is characterized in that, the method also includes:
Standard information database is created according to the similar metadata in the immediate professional standard library;
Data corresponding with the similar metadata in the immediate professional standard library, deposit are obtained from the service database The standard information database.
4. data normalization method according to claim 3, which is characterized in that the data processing equipment further includes that industry is total Information bank is enjoyed, the method also includes:
The metadata of the metadata in industry shared information library and the standard information database is compared, determines institute State in standard information database with the identical shared metadata in industry shared information library;
According to the corresponding data creation shared data table of the shared metadata.
5. data normalization method according to claim 4, which is characterized in that the method also includes:
For each shared data table, corresponding interface is provided, so that other equipment obtain described share by the interface Data in tables of data.
6. data normalization method according to claim 1, which is characterized in that the metadata includes field name, institute Stating the step of metadata identical with the service database in the professional standard library is identified as similar metadata includes:
Field name identical with the service database in the professional standard library is identified as similar metadata.
7. data normalization method according to claim 1, which is characterized in that the metadata further includes table name, word Segment type and field length.
8. a kind of data normalization device, which is characterized in that be applied to data processing equipment, the data processing equipment prestores Multiple professional standard libraries, the professional standard library prestore sample data, and the data normalization device includes obtaining module, ratio Compared with module, mark module, similarity calculation module and statistical module;
The acquisition module is for obtaining service database;
The comparison module is used to be directed to each professional standard library, by the metadata in the professional standard library and the business The metadata of database is compared;
The mark module is similar for metadata identical to the service database in the professional standard library to be identified as Metadata;
The similarity calculation module is used for for difference different between the professional standard library in the service database Metadata calculates the similarity between the sample data in the corresponding data of the difference metadata and the professional standard library, By data similarity be more than preset threshold sample data corresponding to metadata be identified as in the professional standard library it is similar Metadata;
The statistical module is for counting the metadata for being identified as the similar metadata in each professional standard library The most professional standard library of the quantity is determined as the immediate professional standard library of the service database by quantity.
9. data normalization device according to claim 8, which is characterized in that the comparison module in the following manner will The metadata of the sector java standard library is compared with the metadata of service database:
Pass through the sample data in the corresponding data of difference metadata described in artificial neural networks and the professional standard library Between similarity.
10. data normalization device according to claim 8, which is characterized in that the data normalization device further includes Creation module, writing module;
The creation module is used to create standard information data according to the similar metadata in the immediate professional standard library Library;
The write module is used to obtain and the similar finite element number in the immediate professional standard library from the service database According to corresponding data, it is stored in the standard information database.
CN201910304451.7A 2019-04-16 2019-04-16 Data standardization method and device Active CN110008193B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910304451.7A CN110008193B (en) 2019-04-16 2019-04-16 Data standardization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910304451.7A CN110008193B (en) 2019-04-16 2019-04-16 Data standardization method and device

Publications (2)

Publication Number Publication Date
CN110008193A true CN110008193A (en) 2019-07-12
CN110008193B CN110008193B (en) 2021-06-18

Family

ID=67172159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910304451.7A Active CN110008193B (en) 2019-04-16 2019-04-16 Data standardization method and device

Country Status (1)

Country Link
CN (1) CN110008193B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765118A (en) * 2019-10-21 2020-02-07 北京明略软件系统有限公司 Data revision method, revision device and readable storage medium
CN111078639A (en) * 2019-12-03 2020-04-28 望海康信(北京)科技股份公司 Data standardization method and device and electronic equipment
CN112084245A (en) * 2020-09-03 2020-12-15 深圳力维智联技术有限公司 Data management method, device and equipment based on micro-service architecture and storage medium
CN112699160A (en) * 2021-03-23 2021-04-23 中国信息通信研究院 Metadata template upgrading method and device and readable storage medium
CN113111636A (en) * 2021-05-17 2021-07-13 京东科技控股股份有限公司 Data uniqueness standard identification method and device
CN113282650A (en) * 2020-11-24 2021-08-20 苏州律点信息科技有限公司 Service data processing method and device based on big data
WO2021184995A1 (en) * 2020-03-19 2021-09-23 华为技术有限公司 Data processing method and data standard management system
CN115185923A (en) * 2022-07-07 2022-10-14 中国气象局气象探测中心 Method, system and intelligent terminal for managing meteorological observation metadata

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2793906A1 (en) * 1999-05-19 2000-11-24 Bull Sa SYSTEM AND METHOD FOR MANAGING ATTRIBUTES IN AN OBJECT-ORIENTED ENVIRONMENT
CN106845058A (en) * 2015-12-04 2017-06-13 北大医疗信息技术有限公司 The standardized method of disease data and modular station
CN107844560A (en) * 2017-10-30 2018-03-27 北京锐安科技有限公司 A kind of method, apparatus of data access, computer equipment and readable storage medium storing program for executing
CN109408561A (en) * 2018-10-17 2019-03-01 杭州骑轻尘信息技术有限公司 Business Name matching process and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2793906A1 (en) * 1999-05-19 2000-11-24 Bull Sa SYSTEM AND METHOD FOR MANAGING ATTRIBUTES IN AN OBJECT-ORIENTED ENVIRONMENT
CN106845058A (en) * 2015-12-04 2017-06-13 北大医疗信息技术有限公司 The standardized method of disease data and modular station
CN107844560A (en) * 2017-10-30 2018-03-27 北京锐安科技有限公司 A kind of method, apparatus of data access, computer equipment and readable storage medium storing program for executing
CN109408561A (en) * 2018-10-17 2019-03-01 杭州骑轻尘信息技术有限公司 Business Name matching process and device

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765118A (en) * 2019-10-21 2020-02-07 北京明略软件系统有限公司 Data revision method, revision device and readable storage medium
CN110765118B (en) * 2019-10-21 2022-05-17 北京明略软件系统有限公司 Data revision method, revision device and readable storage medium
CN111078639A (en) * 2019-12-03 2020-04-28 望海康信(北京)科技股份公司 Data standardization method and device and electronic equipment
WO2021184995A1 (en) * 2020-03-19 2021-09-23 华为技术有限公司 Data processing method and data standard management system
CN112084245A (en) * 2020-09-03 2020-12-15 深圳力维智联技术有限公司 Data management method, device and equipment based on micro-service architecture and storage medium
CN112084245B (en) * 2020-09-03 2024-03-12 深圳力维智联技术有限公司 Data management method, device, equipment and storage medium based on micro-service architecture
CN113282650A (en) * 2020-11-24 2021-08-20 苏州律点信息科技有限公司 Service data processing method and device based on big data
CN112699160A (en) * 2021-03-23 2021-04-23 中国信息通信研究院 Metadata template upgrading method and device and readable storage medium
CN113111636A (en) * 2021-05-17 2021-07-13 京东科技控股股份有限公司 Data uniqueness standard identification method and device
CN113111636B (en) * 2021-05-17 2024-04-12 京东科技控股股份有限公司 Data uniqueness standard identification method and device
CN115185923A (en) * 2022-07-07 2022-10-14 中国气象局气象探测中心 Method, system and intelligent terminal for managing meteorological observation metadata
CN115185923B (en) * 2022-07-07 2023-03-07 中国气象局气象探测中心 Method and system for managing meteorological observation metadata and intelligent terminal

Also Published As

Publication number Publication date
CN110008193B (en) 2021-06-18

Similar Documents

Publication Publication Date Title
CN110008193A (en) Data normalization method and device
Aste Cryptocurrency market structure: connecting emotions and economics
Hammond From computer-assisted to data-driven: Journalism and Big Data
CN111125266B (en) Data processing method, device, equipment and storage medium
CN104361119A (en) Data cleaning method and system
US9524475B1 (en) Presenting discriminant change history records on topology graphs
EP2528031A1 (en) Methods and apparatus for on-line analysis of financial accounting data
Spanos et al. Error statistical modeling and inference: Where methodology meets ontology
CN111427971A (en) Business modeling method, device, system and medium for computer system
US20220343198A1 (en) Systems and methods for determining data criticality based on causal evaluation
CN111444073A (en) Method, device and system for testing performance of financial database
CN111858600B (en) Data wide table construction method, device, equipment and storage medium
CN113538154A (en) Risk object identification method and device, storage medium and electronic equipment
CN110750530A (en) Service system and data checking method thereof
US9037607B2 (en) Unsupervised analytical review
CN110675249A (en) Matching method, device, server and storage medium for network lending
CN115907970A (en) Credit risk identification method and device, electronic equipment and storage medium
CN114840531A (en) Data model reconstruction method, device, equipment and medium based on blood relationship
CN112882956A (en) Method and device for automatically generating full-scene automatic test case through data combination calculation, storage medium and electronic equipment
US20140279389A1 (en) Automated detection of underwriting system manipulation
US9892411B2 (en) Efficient tail calculation to exploit data correlation
CN107016028A (en) Data processing method and its equipment
Gilens Simulating representation: The devil’s in the detail
Hussain et al. Financial inclusion and economic growth: Comparative panel evidence from developed and developing Asian countries
CN110020930B (en) Financial data system construction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant