CN117332286A - System, method and device for data mapping verification - Google Patents

System, method and device for data mapping verification Download PDF

Info

Publication number
CN117332286A
CN117332286A CN202311272327.XA CN202311272327A CN117332286A CN 117332286 A CN117332286 A CN 117332286A CN 202311272327 A CN202311272327 A CN 202311272327A CN 117332286 A CN117332286 A CN 117332286A
Authority
CN
China
Prior art keywords
data
mapping
field
mapping relation
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311272327.XA
Other languages
Chinese (zh)
Inventor
郑清正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Suning Bank Co Ltd
Original Assignee
Jiangsu Suning Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Suning Bank Co Ltd filed Critical Jiangsu Suning Bank Co Ltd
Priority to CN202311272327.XA priority Critical patent/CN117332286A/en
Publication of CN117332286A publication Critical patent/CN117332286A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a system, a method and a device for data mapping verification, comprising the following steps: the normalization module is used for acquiring data source information, preprocessing historical data and newly added data, and constructing a sample set; the data clustering module is used for acquiring data of the sample set, carrying out clustering treatment on the sample set and constructing a mapping dictionary; the data multi-classification module is used for training the prediction model, inputting the newly added field information into the trained prediction model, outputting a mapping relation according to the prediction result, if the prediction result is null, establishing the mapping relation of the newly added field information, generating a mapping data set according to the mapping relation, and updating the mapping dictionary and the prediction model. The system, the method and the device for data mapping verification solve the mapping problem of field names and field contents with similarity, train a prediction model according to the existing data, further realize the automation of the mapping relation, reduce the time consumption of manual operation and improve the working efficiency.

Description

System, method and device for data mapping verification
Technical Field
The invention belongs to the technical field of data mapping verification, and particularly relates to a system, a method and a device for data mapping verification.
Background
Financial enterprises such as banks have a need to connect to different external data service providers. There are scenarios of the same class, but with simultaneous or sequential access to different data sources. The data field structures provided by different service providers are both similar and dissimilar. Such as homogeneous fields, but code value specifications are not consistent, such as partial fields are not consistent. Such data needs to be fused together from the standpoint of unified management and maintenance of the data. The variability among them is time consuming and laborious through manual combing. When the data management works in carding financial business standards, similarity judgment and merging optimization of standard definition of the same business and different products exist. It is also desirable to use a similarity check and fusion analysis technique for the data source fields.
The reference method of the prior art is as follows: the method comprises the following steps: patent number CN114462421a, matching is performed using the similarity of the data table and the fields. The method comprises the steps of carrying out semantic recognition on table names and field names of a data source and a destination to obtain data source semantics and destination semantics; similarity comparison is carried out on the semantics of each field of each data source and the semantics of all fields of the destination end, so that a semantic similarity list of each field of the corresponding data source is obtained; determining the mapping relation between the data source and the destination terminal from a mapping rule set according to the semantic similarity list; storing all the mapping relations into a mapping relation library; judging whether all mapping relations in the mapping relation library are reasonable, if not, giving an alarm and waiting for manual intervention; and (5) incorporating the mapping relation confirmed by the manual stem prognosis into a mapping rule set. In a second method, patent numbers CN115729935B,2022 provide a data interaction processing method and system based on an ORM framework. The method adopts the relevant configuration of the data source to be converted into a rule for adapting the data source to read; and constructing the data of different data sources into data types of unified rules to obtain unified data. The two modes cannot be directly applied to the data fusion processing scene with certain similarity but difference, and the rule mapping processing is used, so that more time is needed to be invested to make mapping definition for each field.
Therefore, a way is needed to realize certain automatic mapping processing aiming at management and data standardization processing of similar data sources, and reduce the time consumption of manual one-to-one mapping processing.
Disclosure of Invention
The invention aims to provide a system, a method and a device for verifying data mapping, which are used for solving the problems that management of similar data sources and data standardization processing need manual processing, so that the mapping processing is more time-consuming and lower in efficiency.
In order to achieve the above purpose, the present invention provides the following technical solutions: a system for data mapping verification, comprising:
the normalization module is used for acquiring data source information, wherein the data source information comprises historical data and newly-added data, preprocessing the historical data and the newly-added data respectively to obtain historical field information and newly-added field information, selecting samples of the historical field information, and constructing a sample set;
the data clustering module is used for acquiring data of the sample set, carrying out clustering treatment on the sample set to obtain a clustering result, storing the mapping clustering result, cleaning the mapping clustering result with similarity, and constructing a mapping dictionary;
the data multi-classification module is used for acquiring the mapping relation in the mapping dictionary, extracting and fusing the characteristics of the mapping relation, training a prediction model, inputting newly added field information into the trained prediction model to obtain a prediction result, outputting the mapping relation, if the prediction result is null, establishing the mapping relation of the newly added field information, generating a mapping data set according to the mapping relation, and updating the mapping dictionary and the prediction model.
Preferably, the history data and the newly added data each include a field name, a field content and field data,
the normalization module comprises:
the field name preprocessing module is used for cleaning the fields of the field names to obtain standard field names;
the field content preprocessing module is used for carrying out field cleaning on the field content to obtain standard field content and constructing a sample set of the standard field content;
the field data preprocessing module is used for carrying out field data duplication elimination and counting field data;
and the field merging module is used for merging the standard field name, the standard field content and the field data into a fusion character string and carrying out vectorization processing on the character string.
Preferably, the clustering result includes a point cluster and noise points,
the data clustering module comprises:
and a cluster calculation module: the method comprises the steps of calculating data of a sample set, generating a clustering result, and identifying point clusters and noise points in the clustering result;
the feature mapping module is used for mapping the clustering result, mapping the point clusters and the noise points and constructing a mapping dictionary according to the mapping relation;
the manual intervention module is used for providing a port for manual operation;
the data verification module is used for verifying the mapping relation in the feature mapping module:
responding to the noise point checking command, inputting the noise point to the feature mapping module through the manual intervention module, and if the existing mapping relation does not exist, establishing a new mapping relation;
and responding to the field information checking command, judging the similarity of the standard field names through a manual intervention module, and manually determining the mapping relation of the standard field names with the similarity but different meanings.
Preferably, the data multi-classification module includes:
the model training module is used for acquiring the mapping relation and the corresponding standard field names as characteristics, fusing the characteristics by converting the standard field names, inputting the fused characteristics into the prediction model, and training the prediction model;
and the data updating module is used for acquiring newly added field information with the empty prediction result, inputting the newly added field information into the data clustering module, updating the mapping relation of the newly added field information, and updating the mapping dictionary and the prediction model.
A method of data mapping verification, comprising:
acquiring historical data, preprocessing the historical data to obtain historical field information, and performing sample selection on the historical field information to construct a sample set;
based on the data of the sample set, carrying out clustering treatment on the sample set to obtain a clustering result, mapping the clustering result, cleaning the mapping clustering result with similarity, and constructing a mapping dictionary;
based on the constructed mapping dictionary, obtaining a mapping relation, extracting and fusing the characteristics of the mapping relation, and training a prediction model;
acquiring newly added data, and preprocessing the newly added data to obtain newly added field information;
based on the obtained newly added field information, inputting the newly added field information into a trained prediction model to obtain a prediction result, outputting a mapping relation, if the prediction result is null, establishing the mapping relation of the newly added field information, generating a mapping data set according to the mapping relation, and updating a mapping dictionary and the prediction model.
Preferably, the history data and the newly added data each include a field name, a field content and field data,
preprocessing the historical data and the newly added data respectively comprises the following steps:
preprocessing a field name, and cleaning the field name to obtain a standard field name;
preprocessing field content, performing field cleaning on the field content to obtain standard field content, and constructing a sample set of the standard field content;
preprocessing field data, de-duplicating the field data, and counting the field data.
Preferably, the sample selection is performed on the history field information, and before the sample set is constructed, the method further comprises: vectorization processing is carried out on the history field information and the newly added field information respectively, and the method comprises the following steps:
acquiring a standard field name, standard field content and field data;
merging the standard field name, the standard field content and the field data into a fusion character string, and if the merged fusion character string is too large, sampling and processing the standard field content and the field data, and merging to construct the fusion character string;
and carrying out vectorization processing on the character string.
Preferably, the clustering result includes a point cluster and noise points,
building the mapping dictionary includes:
clustering calculation: calculating data of the sample set, generating a clustering result, and identifying point clusters and noise points in the clustering result;
feature mapping, mapping clustering results, mapping point clusters and noise points, and constructing a mapping dictionary according to a mapping relation;
manual intervention, providing manual operation during data verification;
data verification, verifying the mapping relation in the feature mapping:
when checking a command, inputting a noise point to feature mapping through manual intervention, and if no mapping relation exists, establishing a new mapping relation;
when the field information checks the command, the similarity of the standard field names is judged through manual intervention, and the mapping relation of the standard field names with the similarity but different meanings is manually determined.
Preferably, extracting and fusing the features of the mapping relation, and training the prediction model includes:
model training, namely acquiring a mapping relation and corresponding standard field names as features, fusing the features by converting the standard field names, inputting the fused features into a prediction model, and training the prediction model;
and updating data, namely acquiring newly added field information with a null prediction result, establishing a mapping relation of the newly added field information, and updating a mapping dictionary and a prediction model.
The utility model provides a data mapping check-up's device which characterized in that: a processor and a memory, the memory storing a computer program executable by the processor, the processor implementing the above method when executing the computer program.
The invention has the technical effects and advantages that:
the data multi-classification module trains the prediction model through the existing historical field information to realize automatic verification of the mapping relation, establishes the mapping relation for the non-existing newly-added field information by inputting the newly-added field information, increases the mapping relation and grouping names, solves the mapping problem of the newly-added field information, perfects the prediction model, further realizes automation of the mapping relation, reduces time consumption of manual operation and improves the working efficiency.
Drawings
FIG. 1 is a schematic diagram of a system of the present invention;
FIG. 2 is a schematic diagram of the method of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a method, a system and a device for automatically generating page object codes, as shown in fig. 1-2, wherein the system comprises the following steps: the method is executed by using the modules, and the running environment can be in a Python and comprises the following steps:
s1: inputting a standardized public data dictionary of a plurality of pieces of external tax data, such as Jiangsu tax, anhui tax and the like, to a normalization module to serve as data source information, and mapping the external data source into the standardized public dictionary, wherein the data source information comprises historical data and newly-added data;
the history data and the newly added data are respectively processed by a field preprocessing module, the field names, the field contents and the field data of a plurality of tables are respectively processed,
the preprocessing of the field names comprises: the field names are field cleaned to obtain standard field names, such as duplicate removal spaces, bracket special characters, and the like.
The preprocessing of the field content comprises: and performing field cleaning on the field content, including removing repeated blank spaces, bracketing special characters and the like, so as to obtain standard field content. For data with excessive field content description, secondary processing information with specified text length can be constructed.
The preprocessing of field data comprises: and counting the maximum value, the minimum value and the number of the fields after de-duplication.
After the processing, the standard field names, the standard field contents and the field data in the history data and the newly added data are respectively subjected to vectorization processing to respectively generate history field information and newly added field information, and the method comprises the following steps:
adopting an Embedding tool such as a word vector or a sentence vector to respectively convert a standard field name, standard field content and field data into three Embedding vectors, and splicing the three Embedding vectors into a large vector V in sequence; establishing a mapping relation D between standard field names and coding V v ={v i :c i … }, where v i For the Embedding vector, c i And repeating the step of establishing the mapping relation for the original field name, wherein the history field information and the newly added field information respectively finish vectorization.
Finally, sampling and selecting N different samples according to the vectorized standard field name, standard field content and field data to construct a sample set, wherein the field data also needs to judge whether the type of the field content is an enumeration value type or a continuous numerical value type, if the type of the field content is an enumeration type field, and constructing enumeration content as a sample List after de-duplication; if the field is continuous, a sample set is constructed, and the maximum value, the minimum value and the number after de-duplication in the sample are obtained; and judging the problem of missing information dimension by constructing a sample set of history field information so as to facilitate subsequent clustering.
S2: by passing throughThe data clustering module processes the sample set: unsupervised aggregation processing is carried out on the set of Embedding V by using a DBSCAN algorithm (clustering algorithm based on density) to obtain a plurality of different groups { g } 1 ,g 2 …, a set of partially unclassified scatter points E; wherein, the aggregate requirements for DBSCAN are: the minimum number of samples must be greater than the randomly selected number of samples N for each field;
establishing a unified grouping name specification and establishing a standard mapping relation, namely each grouping g, through a feature mapping module i Constructing a mapping dictionary: v (V) i :C i Wherein V is i G is g i C i To conform to standard field names (i.e., packet names), the set D is combined v Dictionary, establishing original field name c for point cluster i To standard field name C i Mapping relation of (3);
checking the correctness of the noise points and the mapping relation through a data checking module comprises the following steps:
manually checking noise points: checking the similarity of the noise points and the existing mapping relation, judging whether the noise points have combinable attribution subsets, and if the noise points have the relevant subsets, combining the noise points into the existing mapping relation; if no related subset can be related, a new group and a group name are established for storing the mapping relation of the scattered points; and judging a plurality of similar mapping relations through a data verification module so as to consider the mapping relation between the point clusters and the noise points, recording the mapping relation to the noise points through manual verification, considering each clustering result, and constructing a complete mapping relation.
And (3) manually checking the mapping relation: when the correctness of the mapping relation of the clustering result is checked, and two similar standard field names or standard field contents are checked manually, if the semantics are the same but the field names are similar, the two similar standard field names or standard field contents are compatible, and unified naming processing is carried out; if the two standard field names or the standard field contents are similar but semantically different, manual intervention is performed to classify the two similar standard field names or the standard field contents, such as adjusting the standard field names, adjusting the standard field contents or adjusting the field data, so that the two similar standard field names or the standard field contents can be distinguished, such as 'registration date', 'change date', and the mapping relationship is manually determined to avoid confusion.
And by means of manual intervention, the automatic mapping accuracy is improved, and the integrity of a subsequently generated prediction model is ensured.
S3: obtaining all mapping relations in a mapping dictionary;
training a prediction model according to the existing mapping relation, and comprising the following steps:
the mapping recording sequence is scattered randomly, so that the problem of sample sequence is avoided, and the robustness and accuracy of a prediction model are influenced;
the existing transducer Model is utilized to carry out fine tuning on the transducer Model or a multi-classification Model is established on the basis of the transducer Model, in the embodiment, a Bert-chip-Base is adopted as a basic prediction Model, the prediction Model is trained by taking the mapping relation of historical field information as training content, namely, extracting the characteristics of standard fields, generating a prediction Model-X, and the mapping relation of noise points is input in a manual confirmation mode, so that the integrity of the mapping result of the trained prediction Model-X is ensured.
S4: and (3) acquiring newly added field information and updating the newly added field information to a prediction Model, if the generated prediction result is a null value, manually inputting newly added data, and establishing a mapping relation to perfect the prediction Model-X, and manually confirming the consistency of the newly added field information and the existing standard field:
if the artificial confirmation result is consistent, updating the newly added field information to the mapping relation of the group name based on the mapping result;
if the artificial confirmation result is inconsistent, or if the screening result is lower than the preset screening parameter (i.e. the output is null) through the preset screening parameter (such as 50%), a new sample and a mapping relation are established after the clustering result is obtained, the mapping result is manually input into a mapping dictionary, and after a certain new number is accumulated, calibration training is carried out on the prediction Model again, and the prediction Model is updated to the prediction Model-X.
Automatically merging the parts of the newly added field information which are consistent with the standard fields, manually entering a mapping relation and newly establishing a grouping name for the inconsistent newly added field information, and updating the mapping relation to a prediction Model-X; the Model-X of the prediction Model is more complete, so that the standard mapping data set is more complete, and the mapping relation verification is more accurate.
Training a prediction Model through the trained prediction Model-X and through the mapping relation of the history field information, and finally outputting a mapping verification result; the mapping relation is automatically grouped through the data clustering module, a large amount of data exploration analysis time in the early stage is saved, the manual confirmation is combined, the data multi-classification module is further processed, a prediction Model-X is generated, the mapping accuracy of the prediction Model-X is improved through manually confirming the newly added field information to the prediction Model-X, the time cost of manual analysis mining is saved, and the working efficiency is improved.
Corresponding to the system and the method, the invention also provides a device for checking the data mapping, which comprises the following components: the system comprises a processor and a memory, wherein the memory stores a computer program executable by the processor, and the processor realizes a data mapping checking method when executing the computer program.
Finally, it should be noted that: the foregoing description is only illustrative of the preferred embodiments of the present invention, and although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described, or equivalents may be substituted for elements thereof, and any modifications, equivalents, improvements or changes may be made without departing from the spirit and principles of the present invention.

Claims (10)

1. A system for data mapping verification, characterized by: comprising the following steps:
the normalization module is used for acquiring data source information, wherein the data source information comprises historical data and newly-added data, preprocessing the historical data and the newly-added data respectively to obtain historical field information and newly-added field information, selecting samples of the historical field information, and constructing a sample set;
the data clustering module is used for acquiring data of the sample set, carrying out clustering treatment on the sample set to obtain a clustering result, storing the mapping clustering result, cleaning the mapping clustering result with similarity, and constructing a mapping dictionary;
the data multi-classification module is used for acquiring the mapping relation in the mapping dictionary, extracting and fusing the characteristics of the mapping relation, training a prediction model, inputting newly added field information into the trained prediction model to obtain a prediction result, outputting the mapping relation, if the prediction result is null, establishing the mapping relation of the newly added field information, generating a mapping data set according to the mapping relation, and updating the mapping dictionary and the prediction model.
2. The system for data mapping verification of claim 1, wherein,
the history data and the newly added data each include a field name, a field content and field data,
the normalization module comprises:
the field name preprocessing module is used for cleaning the fields of the field names to obtain standard field names;
the field content preprocessing module is used for carrying out field cleaning on the field content to obtain standard field content and constructing a sample set of the standard field content;
the field data preprocessing module is used for carrying out field data duplication elimination and counting field data;
and the field merging module is used for merging the standard field name, the standard field content and the field data into a fusion character string and carrying out vectorization processing on the character string.
3. A system for data mapping verification as defined in claim 1, wherein:
the clustering result includes a point cluster and noise points,
the data clustering module comprises:
and a cluster calculation module: the method comprises the steps of calculating data of a sample set, generating a clustering result, and identifying point clusters and noise points in the clustering result;
the feature mapping module is used for mapping the clustering result, mapping the point clusters and the noise points and constructing a mapping dictionary according to the mapping relation;
the manual intervention module is used for providing a port for manual operation;
the data verification module is used for verifying the mapping relation in the feature mapping module:
responding to the noise point checking command, inputting the noise point to the feature mapping module through the manual intervention module, and if the existing mapping relation does not exist, establishing a new mapping relation;
and responding to the field information checking command, judging the similarity of the standard field names through a manual intervention module, and manually determining the mapping relation of the standard field names with the similarity but different meanings.
4. A system for data mapping verification according to claim 3, wherein:
the data multi-classification module comprises:
the model training module is used for acquiring the mapping relation and the corresponding standard field names as characteristics, fusing the characteristics by converting the standard field names, inputting the fused characteristics into the prediction model, and training the prediction model;
and the data updating module is used for acquiring newly added field information with the empty prediction result, inputting the newly added field information into the data clustering module, updating the mapping relation of the newly added field information, and updating the mapping dictionary and the prediction model.
5. A data mapping checking method is characterized in that: the method comprises the following steps:
acquiring historical data, preprocessing the historical data to obtain historical field information, and performing sample selection on the historical field information to construct a sample set;
based on the data of the sample set, carrying out clustering treatment on the sample set to obtain a clustering result, mapping the clustering result, cleaning the mapping clustering result with similarity, and constructing a mapping dictionary;
based on the constructed mapping dictionary, obtaining a mapping relation, extracting and fusing the characteristics of the mapping relation, and training a prediction model;
acquiring newly added data, and preprocessing the newly added data to obtain newly added field information;
based on the obtained newly added field information, inputting the newly added field information into a trained prediction model to obtain a prediction result, outputting a mapping relation, if the prediction result is null, establishing the mapping relation of the newly added field information, generating a mapping data set according to the mapping relation, and updating a mapping dictionary and the prediction model.
6. The method for data mapping verification of claim 5, wherein:
the history data and the newly added data each include a field name, a field content and field data,
preprocessing the historical data and the newly added data respectively comprises the following steps:
preprocessing a field name, and cleaning the field name to obtain a standard field name;
preprocessing field content, performing field cleaning on the field content to obtain standard field content, and constructing a sample set of the standard field content;
preprocessing field data, de-duplicating the field data, and counting the field data.
7. The method for data mapping verification of claim 5, wherein:
sample selection is carried out on the history field information, and the method further comprises the following steps before the sample set is constructed: vectorization processing is carried out on the history field information and the newly added field information respectively, and the method comprises the following steps:
acquiring a standard field name, standard field content and field data;
merging the standard field name, the standard field content and the field data into a fusion character string;
and carrying out vectorization processing on the character string.
8. The method of data mapping verification of claim 6, wherein:
the clustering result includes a point cluster and noise points,
building the mapping dictionary includes:
clustering calculation: calculating data of the sample set, generating a clustering result, and identifying point clusters and noise points in the clustering result;
feature mapping, mapping clustering results, mapping point clusters and noise points, and constructing a mapping dictionary according to a mapping relation;
manual intervention, providing manual operation during data verification;
data verification, verifying the mapping relation in the feature mapping:
when checking a command, inputting a noise point to feature mapping through manual intervention, and if no mapping relation exists, establishing a new mapping relation;
when the field information checks the command, the similarity of the standard field names is judged through manual intervention, and the mapping relation of the standard field names with the similarity but different meanings is manually determined.
9. The method for data mapping verification of claim 5, wherein:
extracting and fusing the features of the mapping relation, and training a prediction model comprises:
model training, namely acquiring a mapping relation and corresponding standard field names as features, fusing the features by converting the standard field names, inputting the fused features into a prediction model, and training the prediction model;
and updating data, namely acquiring newly added field information with a null prediction result, establishing a mapping relation of the newly added field information, and updating a mapping dictionary and a prediction model.
10. The utility model provides a data mapping check-up's device which characterized in that: a processor and a memory storing a computer program executable by the processor, the processor implementing the method of any one of claims 5-9 when the computer program is executed.
CN202311272327.XA 2023-09-28 2023-09-28 System, method and device for data mapping verification Pending CN117332286A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311272327.XA CN117332286A (en) 2023-09-28 2023-09-28 System, method and device for data mapping verification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311272327.XA CN117332286A (en) 2023-09-28 2023-09-28 System, method and device for data mapping verification

Publications (1)

Publication Number Publication Date
CN117332286A true CN117332286A (en) 2024-01-02

Family

ID=89282401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311272327.XA Pending CN117332286A (en) 2023-09-28 2023-09-28 System, method and device for data mapping verification

Country Status (1)

Country Link
CN (1) CN117332286A (en)

Similar Documents

Publication Publication Date Title
CN108256074B (en) Verification processing method and device, electronic equipment and storage medium
CN110580308B (en) Information auditing method and device, electronic equipment and storage medium
CN113191156A (en) Medical examination item standardization system and method based on medical knowledge graph and pre-training model
CN113760891B (en) Data table generation method, device, equipment and storage medium
US9542456B1 (en) Automated name standardization for big data
CN113254507B (en) Intelligent construction and inventory method for data asset directory
CN112163553B (en) Material price accounting method, device, storage medium and computer equipment
CN110597796B (en) Big data real-time modeling method and system based on full life cycle
CN113127339A (en) Method for acquiring Github open source platform data and source code defect repair system
CN113987199A (en) BIM intelligent image examination method, system and medium with standard automatic interpretation
CN112685374B (en) Log classification method and device and electronic equipment
CN115146062A (en) Intelligent event analysis method and system fusing expert recommendation and text clustering
CN115794798A (en) Market supervision informationized standard management and dynamic maintenance system and method
CN113742396A (en) Mining method and device for object learning behavior pattern
CN117648093A (en) RPA flow automatic generation method based on large model and self-customized demand template
CN116841779A (en) Abnormality log detection method, abnormality log detection device, electronic device and readable storage medium
CN115953123A (en) Method, device and equipment for generating robot automation flow and storage medium
CN117544482A (en) Operation and maintenance fault determining method, device, equipment and storage medium based on AI
CN117827923A (en) Query demand processing method and device, computer equipment and storage medium
CN110929509B (en) Domain event trigger word clustering method based on louvain community discovery algorithm
CN115587190A (en) Construction method and device of knowledge graph in power field and electronic equipment
CN117332286A (en) System, method and device for data mapping verification
CN115688729A (en) Power transmission and transformation project cost data integrated management system and method thereof
CN114780403A (en) Software defect prediction method and device based on enhanced code attribute graph
CN115204179A (en) Entity relationship prediction method and device based on power grid public data model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination