CN111062185A - External verification method and device for data format, computer equipment and storage medium - Google Patents

External verification method and device for data format, computer equipment and storage medium Download PDF

Info

Publication number
CN111062185A
CN111062185A CN201911182276.5A CN201911182276A CN111062185A CN 111062185 A CN111062185 A CN 111062185A CN 201911182276 A CN201911182276 A CN 201911182276A CN 111062185 A CN111062185 A CN 111062185A
Authority
CN
China
Prior art keywords
data
verification
format
current
data source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911182276.5A
Other languages
Chinese (zh)
Inventor
于善友
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN201911182276.5A priority Critical patent/CN111062185A/en
Publication of CN111062185A publication Critical patent/CN111062185A/en
Priority to PCT/CN2020/103952 priority patent/WO2021103607A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents

Abstract

The invention discloses an external verification method and device for a data format, computer equipment and a storage medium. The method comprises the steps of receiving a current data source to be subjected to format verification and uploaded by a second uploading end; preprocessing key value extraction is carried out on each data in the current data source to obtain a feature vector corresponding to each data in the current data source; inputting the feature vectors corresponding to the data in the current data source into a convolutional neural network to obtain format check results corresponding to the data in the current data source; and sending the format check result to the second uploading end. The method realizes that the external server is specially used for carrying out format verification on the uploaded data, and can find the wrong data format in the data source in time by the convolutional neural network stored in the server, thereby improving the efficiency of mass data format verification.

Description

External verification method and device for data format, computer equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to an external verification method and apparatus for a data format, a computer device, and a storage medium.
Background
At present, data interaction between a user side and a server and between the servers is more and more frequent. For example, before the first server sends the data source to the second server, a code for data format verification is generally deployed in the local area of the second server in advance, and only a predetermined error format determination can be verified, if a new error data format determination mode needs to be added, the local deployment code of the second server needs to be modified and reissued, so that the whole deployment process is complicated, and the efficiency of massive data format verification is reduced.
Disclosure of Invention
The embodiment of the invention provides an external verification method and device for a data format, computer equipment and a storage medium, and aims to solve the problems that in the prior art, a server needs to pre-arrange codes to verify the data format of a data source, only preset error format judgment can be verified, and if a new error data format judgment mode needs to be added, the codes need to be modified and then reissued, so that the efficiency is low.
In a first aspect, an embodiment of the present invention provides an external verification method for a data format, which includes:
receiving a historical data source uploaded by a first uploading end;
performing data format verification on the historical data source through a first verification script for verifying the format to obtain a first data set passing the verification and a second data set not passing the verification;
sending the first notification information to be edited of the first verification script and the second data set to a first uploading end;
receiving a second check script uploaded by the first uploading end, and performing data format check on the second data set through the second check script for checking the format to obtain a first sub data set passing the check and a second sub data set not passing the check;
preprocessing key value extraction on each data in the second sub data set to obtain a feature vector corresponding to each data in the second sub data set;
taking the feature vector corresponding to each data in the second sub data set as the input of the convolutional neural network to be trained, taking the label value corresponding to each feature vector as the output of the convolutional neural network to be trained, and training the convolutional neural network to obtain the convolutional neural network for identifying the wrong data format;
receiving a current data source to be subjected to format verification uploaded by a second uploading end;
preprocessing key value extraction is carried out on each data in the current data source to obtain a feature vector corresponding to each data in the current data source;
inputting the feature vectors corresponding to the data in the current data source into a convolutional neural network to obtain format check results corresponding to the data in the current data source; and
and sending the format checking result to the second uploading end.
In a second aspect, an embodiment of the present invention provides an external verification device in a data format, which includes:
the first receiving unit is used for receiving the historical data source uploaded by the first uploading end;
the first verification unit is used for verifying the data format of the historical data source through a first verification script used for verifying the format to obtain a first data set which passes verification and a second data set which does not pass verification;
the notification unit is used for sending the first notification information to be edited of the first verification script and the second data set to a first uploading end;
the second checking unit is used for receiving a second checking script uploaded by the first uploading end, and performing data format checking on the second data set through the second checking script for checking the format to obtain a first sub data set passing the checking and a second sub data set not passing the checking;
a first feature extraction unit, configured to perform a key value extraction preprocessing on each data in the second sub-data set to obtain a feature vector corresponding to each data in the second sub-data set;
the model training unit is used for taking the feature vectors corresponding to the data in the second sub data set as the input of the convolutional neural network to be trained, taking the labeled values corresponding to the feature vectors as the output of the convolutional neural network to be trained, and training the convolutional neural network to obtain the convolutional neural network for identifying the wrong data format;
the second receiving unit is used for receiving the current data source to be subjected to format verification uploaded by the second uploading end;
the second feature extraction unit is used for preprocessing key value extraction of each data in the current data source to obtain a feature vector corresponding to each data in the current data source;
the third verification unit is used for inputting the feature vectors corresponding to the data in the current data source into a convolutional neural network to obtain format verification results corresponding to the data in the current data source; and
and the result sending unit is used for sending the format checking result to the second uploading end.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the external verification method for the data format according to the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the external verification method for a data format according to the first aspect.
The embodiment of the invention provides an external verification method and device for a data format, computer equipment and a storage medium, which are used for realizing that an external server is specially used for carrying out format verification on uploaded data, error data formats in a data source can be timely found by a convolutional neural network stored in the server, and the efficiency of mass data format verification is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of an external verification method for a data format according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of an external verification method for a data format according to an embodiment of the present invention;
fig. 3 is a schematic sub-flow chart of an external verification method for a data format according to an embodiment of the present invention;
fig. 4 is a schematic block diagram of an external verification device in a data format according to an embodiment of the present invention;
FIG. 5 is a schematic block diagram of a sub-unit of an external verification device in a data format according to an embodiment of the present invention;
FIG. 6 is a schematic block diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of an external verification method for a data format according to an embodiment of the present invention; fig. 2 is a schematic flow chart of an external verification method for a data format according to an embodiment of the present invention, where the external verification method for a data format is applied to a server, and the method is executed by application software installed in the server.
As shown in fig. 2, the method includes steps S101 to S110.
S101, receiving a historical data source uploaded by a first uploading end.
In this embodiment, a check server may be separately configured to perform format check on data, and the check server may receive a check script that is thermally deployed by a plurality of uploading terminals at any time, and may also receive a data source that is uploaded by each uploading terminal in real time at any time. When the verification script is received in the verification server, all processes of the verification server do not need to be suspended for deployment, and direct hot deployment is only needed. For example, the first upload terminal may be a terminal specially used for uploading a test data source to test the verification passing rate of each verification script in the verification server, and the first upload terminal may further adjust the verification script according to the test effect and then upload the verification script to the verification server again for hot deployment.
In one embodiment, step S101 is followed by:
caching the historical data source into a cache region which is correspondingly established by taking the serial number of the historical data source as an identification name; the serial number of the historical data source comprises uploading time and data size.
In this embodiment, after receiving the historical data source uploaded by the first uploading end, the verification server needs to obtain the uploading time of the data (e.g. 201810281312, which represents the data uploaded in 2018, month 10, day 28, day 13: 12) and the data size (e.g. 1546KB), at this time, 2018102813121546 consisting of 201810281312 and 1546 is used as the serial number of the historical data source. At this time, a cache region is created in the verification service according to the serial number, and then the historical data source is cached to the newly created cache region. Then, the storage mode of each data source received by the verification server may refer to the storage mode of the historical data source.
S102, carrying out data format verification on the historical data source through a first verification script for verifying the format to obtain a first data set passing the verification and a second data set not passing the verification.
In this embodiment, the first verification script may be a Groovy script at the first upload end (Groovy is an agile dynamic language for the Java virtual machine, and is a mature object-oriented programming language, and may be used for both object-oriented programming and pure scripting language), for example, the verification rule written in the first verification script is as follows:
including tag 1, tag 2, and tag 3, and tag 1 ═ tag 2 ═ tag 3 ═ 1, field 1 ═ X1, field 2 ═ X2, field 3 ═ X3;
before executing the groovy script to perform format verification on each piece of data in the historical data source, caching the historical data source in a storage area of a verification server, and then executing the groovy script to obtain a first data set passing the verification and a second data set not passing the verification. And after the verification is finished, if the ratio of the total number of the data in the second data set to the total number of the data in the second data set is greater than a preset data proportion threshold (for example, the data proportion threshold is set to be 1:9), sending first notification information to be edited of the first verification script to a first uploading end. For example, a verification server receives a historical data source including 10000 pieces of data (most of the 10000 pieces of data are data that should pass data verification, and it is possible that some data cannot pass the primary format verification), the historical data source is cached first, then the historical data source is verified item by item through a first verification script, for example, 9000 pieces of data pass, and the remaining 1000 pieces of data do not pass, at this time, a first notification message to be edited by the first verification script needs to be sent to a first uploading end. Automatic verification of the historical data source is achieved through the first verification script, and manual verification is not needed.
S103, sending the first notification information to be edited of the first verification script and the second data set to a first uploading end.
In this embodiment, because part of the historical data source (i.e., the second data set) is not verified, and part of the data in the second data set is verified again, the first verification script may be adjusted, so that the limiting conditions of the verification rule in the first verification script are reduced. At this time, the verification server may send the first notification information to be edited of the first verification script and the second data set to the first uploading end to notify the first uploading end that the first verification script needs to be adjusted, and at this time, the first verification script may be specifically modified with reference to the second data set.
S104, receiving a second check script uploaded by the first uploading end, and performing data format check on the second data set through the second check script for checking the format to obtain a first sub data set passing the check and a second sub data set not passing the check.
In this embodiment, after the first upload terminal correspondingly adjusts the first check script according to the second data set and the first check script, the second check script is obtained. At the moment, the first uploading end uploads the second verification script to the verification server. For example, after the first uploading end receives the first notification information and the second data set, some verification rules are reduced on the basis of the first verification script, so that the second verification script which is compatible with most of data in the second data set and passes verification is realized. For example, the rules in the second check-up script are:
including tag 1, tag 2, and tag 3, and tag 1 ═ tag 2 ═ tag 3 ═ 1, field 3 ═ X3;
and at the moment, the verification server receives the uploaded second verification script uploaded by the first uploading end, and performs data format verification on the second data set through the second verification script to obtain a verified first subdata set and a verified second subdata set. Because the second check script is compatible with the first data set, the data of the second data set which still fails to pass format check under the second check script is the second sub data set with format error, and the second sub data set can be used as a training set of the convolutional neural network to be trained. Through multi-round screening, the real data with format errors can be effectively acquired.
S105, preprocessing key value extraction is carried out on each data in the second sub data set, and feature vectors corresponding to each data in the second sub data set are obtained.
In this embodiment, in order to perform quantization processing on a corresponding error data format in the second sub-data set, at this time, key value extraction may be performed on each data in the second sub-data set, so as to obtain a feature vector corresponding to each data in the second sub-data set.
In one embodiment, as shown in fig. 3, step S105 includes:
s1051, obtaining the data content corresponding to each data in the second sub data set;
s1052, extracting field values in the data content corresponding to each data, which are the same as the related key fields in the preset key field list, to concatenate feature vectors corresponding to each data in the second sub-data set.
In this embodiment, for example, the data content corresponding to one of the pieces of data in the second sub data set is as follows:
tag 1 ═ tag 2 ═ tag 3 ═ 0, tag 4 ═ 1, tag 5 ═ 1, field 1 ═ X1, field 2 ═ X2, field 3 ═ X3, field 4 ═ X4, … …, and field N ═ XN;
positioning each field and the value of each field to obtain the data content corresponding to each data;
the set key field list comprises a label 1, a label 2, a label 3, a label 4, a field 1, a field 2, a field 4 and a field 6; extracting the values of the fields in the data content of each data, and sequentially connecting the values in series to form a feature vector corresponding to each data in the second sub-data set. More specifically, for example, one piece of data corresponds to data content of tag 1 ═ tag 2 ═ tag 3 ═ 0, tag 4 ═ 1, and tag 5 ═ 1, field 1 ═ X1, field 2 ═ X2, field 3 ═ X3, field 4 ═ X4, … …, and field N ═ XN; after the specified field values of the piece of data are extracted according to the preset key field list, the feature vector is [0001X 1X 2X 4X 6], and the obtained feature vector is 0 (tag 1), 0 (tag 2), 0 (tag 3), 0 (tag 4), 1 (tag 4), X1 (field 1), X2 (field 2), X4 (field 4), and X6 (field 6). Through the above process, extraction of the quantization features in each data in the second sub-data set is realized.
And S106, taking the feature vector corresponding to each data in the second sub-data set as the input of the convolutional neural network to be trained, and taking the label value corresponding to each feature vector as the output of the convolutional neural network to be trained so as to train the convolutional neural network to obtain the convolutional neural network for identifying the wrong data format.
In this embodiment, the standard values corresponding to the feature vectors corresponding to each data in the second sub-data set are all 0, which indicates that all the data have a format error. After the convolutional neural network to be trained is trained through the second sub-data set, the trained first neural network is a model capable of identifying an error data format.
In an embodiment, step S106 further includes, before:
automatically labeling 0 corresponding to the feature vector corresponding to each data in the second sub-data set to serve as a labeled value corresponding to each feature vector; wherein, the marking value 0 indicates that the data has a format error.
In this embodiment, the automatically labeling 0 the feature vector corresponding to each data in the second sub-data set may be automatically performed in a verification server. Namely, a column of labeled value column is added to each data in the second sub-data set, and then the labeled value column is automatically assigned to 0 by default, so that the automatic labeling of each data in the second sub-data set can be realized.
And S107, receiving the current data source to be subjected to format verification uploaded by the second uploading end.
In this embodiment, when a user corresponding to another upload terminal (i.e., a user of the second upload terminal) needs to upload data for format verification, it is only necessary to directly upload the current data source to be subjected to format verification from the second upload terminal to the verification server.
And S108, preprocessing key value extraction is carried out on each data in the current data source to obtain a feature vector corresponding to each data in the current data source.
In this embodiment, the preprocessing for extracting the key value of each data in the current data source may refer to a manner of extracting the key value of each data in the second sub-data set, and perform preprocessing on each data in the current data source to obtain a feature vector corresponding to each data in the current data source.
S109, inputting the feature vectors corresponding to the data in the current data source into a convolutional neural network to obtain format check results corresponding to the data in the current data source.
In this embodiment, after the feature vectors corresponding to the data in the current data source are obtained, the feature vectors are input to the first convolutional neural network, so that which data have similar problems with the second sub-data set and cannot pass format verification can be identified. Specifically, after the feature vectors corresponding to the data in the current data source are input to the convolutional neural network, the data with the output value of 0 is represented as data with failed format check, and the data with the output value of non-0 is represented as data with passed format check.
At this time, the first data set of the current data source may be formed by data which passes verification in the format verification result corresponding to each data in the current data source, and the second data set of the current data source may be formed by data which does not pass verification in the format verification result corresponding to each data in the current data source. Through the convolutional neural network, intelligent verification of the format of each data in the current data source is realized, manual verification is not needed, and the verification efficiency is improved.
And S110, sending the format verification result to the second uploading end.
In this embodiment, after the intelligent verification of the format of each data in the current data source is completed in the verification server through the convolutional neural network, both the first data set of the current data source and the second data set of the current data source may be sent to the second upload terminal to notify the format verification result.
In an embodiment, step S110 is followed by:
and S111, dividing the total number of the verification passing data in the format verification result corresponding to each data in the current data source by the total number of the data in the current data source to obtain the pre-judgment passing rate.
In this embodiment, after the current data source first data set and the current data source second data set are obtained, the pre-determination passing rate may be obtained according to the total data number of the current data source first data set/(the total data number of the current data source first data set + the total data number of the current data source second data set), that is, the total data number of the current data source first data set/the total data number of the current data source, for example, the pre-determination passing rate is denoted as Y1. The pre-judgment passing rate is obtained by performing preliminary judgment on the current data source through the convolutional neural network, and can be used as a comparison object for performing format verification on the current data source through a verification script subsequently to obtain another verification passing rate.
In an embodiment, step S111 is followed by:
s112, receiving a current verification script uploaded by a second uploading end, and performing data format verification on the current data source through the current verification script for verifying the format to obtain a third subdata set passing the verification and a data occupation ratio passing the verification;
s113, judging whether the data occupation ratio is lower than the pre-judgment passing rate or not;
s114, if the data occupation ratio is lower than the pre-judgment passing rate, sending second notification information that the current verification script does not reach the verification target to a second uploading end;
and S115, if the data occupation ratio is not lower than the pre-judgment passing rate, sending third notification information that the current verification script reaches the verification target to a second uploading end.
In this embodiment, the second upload terminal may upload the edited current verification script, and then implement hot deployment of the current verification script in the verification server. At this time, performing data format verification on the current data source through the current verification script to obtain a verified third sub data set and a verified data occupation ratio, and recording the obtained data occupation ratio as Y2.
If Y2 < Y1 indicates that the verification rule of the uploaded current verification script is defective and cannot be compatible with the verification of some data, the second uploading end is notified to modify the uploaded current verification script so as to enable more data of the current data source to pass the verification, and if the second notification information that the current verification script does not reach the verification target is sent to the second uploading end.
If Y2 is more than or equal to Y1, the verification rule of the uploaded current verification script is good in verification effect and can be compatible with the verification of some data, and at the moment, third notification information that the current verification script has reached the verification target is sent to the second uploading end.
The method realizes that the external verification server is specially used for carrying out format verification on the uploaded data, and can find error data formats in the data source in time by the convolutional neural network stored in the verification server, thereby improving the efficiency of mass data format verification.
The embodiment of the invention also provides an external verification device in a data format, which is used for executing any embodiment of the external verification method in the data format. Specifically, referring to fig. 4, fig. 4 is a schematic block diagram of an external verification device in a data format according to an embodiment of the present invention. The external verification device 100 in the data format may be configured in a server.
As shown in fig. 4, the external verification apparatus 100 in data format includes a first receiving unit 101, a first verifying unit 102, a notifying unit 103, a second verifying unit 104, a first feature extracting unit 105, a model training unit 106, a second receiving unit 107, a second feature extracting unit 108, a third verifying unit 109, and a result sending unit 110.
The first receiving unit 101 is configured to receive a history data source uploaded by the first uploading end.
In this embodiment, a check server may be separately configured to perform format check on data, and the check server may receive a check script that is thermally deployed by a plurality of uploading terminals at any time, and may also receive a data source that is uploaded by each uploading terminal in real time at any time. When the verification script is received in the verification server, all processes of the verification server do not need to be suspended for deployment, and direct hot deployment is only needed. For example, the first upload terminal may be a terminal specially used for uploading a test data source to test the verification passing rate of each verification script in the verification server, and the first upload terminal may further adjust the verification script according to the test effect and then upload the verification script to the verification server again for hot deployment.
In an embodiment, the external verification device 100 in data format further includes:
the cache region creating unit is used for caching the historical data source into a cache region which is created correspondingly by taking the serial number of the historical data source as an identification name; the serial number of the historical data source comprises uploading time and data size.
In this embodiment, after receiving the historical data source uploaded by the first uploading end, the verification server needs to obtain the uploading time of the data (e.g. 201810281312, which represents the data uploaded in 2018, month 10, day 28, day 13: 12) and the data size (e.g. 1546KB), at this time, 2018102813121546 consisting of 201810281312 and 1546 is used as the serial number of the historical data source. At this time, a cache region is created in the verification service according to the serial number, and then the historical data source is cached to the newly created cache region. Then, the storage mode of each data source received by the verification server may refer to the storage mode of the historical data source.
The first checking unit 102 is configured to perform data format checking on the historical data source through a first checking script for checking a format, so as to obtain a first data set that passes the checking and a second data set that does not pass the checking.
In this embodiment, the first verification script may be a Groovy script at the first upload end (Groovy is an agile dynamic language for the Java virtual machine, and is a mature object-oriented programming language, and may be used for both object-oriented programming and pure scripting language), for example, the verification rule written in the first verification script is as follows:
including tag 1, tag 2, and tag 3, and tag 1 ═ tag 2 ═ tag 3 ═ 1, field 1 ═ X1, field 2 ═ X2, field 3 ═ X3;
before executing the groovy script to perform format verification on each piece of data in the historical data source, caching the historical data source in a storage area of a verification server, and then executing the groovy script to obtain a first data set passing the verification and a second data set not passing the verification. And after the verification is finished, if the ratio of the total number of the data in the second data set to the total number of the data in the second data set is greater than a preset data proportion threshold (for example, the data proportion threshold is set to be 1:9), sending first notification information to be edited of the first verification script to a first uploading end. For example, a verification server receives a historical data source including 10000 pieces of data (most of the 10000 pieces of data are data that should pass data verification, and it is possible that some data cannot pass the primary format verification), the historical data source is cached first, then the historical data source is verified item by item through a first verification script, for example, 9000 pieces of data pass, and the remaining 1000 pieces of data do not pass, at this time, a first notification message to be edited by the first verification script needs to be sent to a first uploading end. Automatic verification of the historical data source is achieved through the first verification script, and manual verification is not needed.
The notification unit 103 is configured to send the first notification information to be edited of the first verification script and the second data set to a first upload terminal.
In this embodiment, because part of the historical data source (i.e., the second data set) is not verified, and part of the data in the second data set is verified again, the first verification script may be adjusted, so that the limiting conditions of the verification rule in the first verification script are reduced. At this time, the verification server may send the first notification information to be edited of the first verification script and the second data set to the first uploading end to notify the first uploading end that the first verification script needs to be adjusted, and at this time, the first verification script may be specifically modified with reference to the second data set.
And the second checking unit 104 is configured to receive the second check script uploaded by the first uploading end, and perform data format check on the second data set through the second check script for checking a format to obtain a first sub data set that passes the check and a second sub data set that does not pass the check.
In this embodiment, after the first upload terminal correspondingly adjusts the first check script according to the second data set and the first check script, the second check script is obtained. At the moment, the first uploading end uploads the second verification script to the verification server. For example, after the first uploading end receives the first notification information and the second data set, some verification rules are reduced on the basis of the first verification script, so that the second verification script which is compatible with most of data in the second data set and passes verification is realized. For example, the rules in the second check-up script are:
including tag 1, tag 2, and tag 3, and tag 1 ═ tag 2 ═ tag 3 ═ 1, field 3 ═ X3;
and at the moment, the verification server receives the uploaded second verification script uploaded by the first uploading end, and performs data format verification on the second data set through the second verification script to obtain a verified first subdata set and a verified second subdata set. Because the second check script is compatible with the first data set, the data of the second data set which still fails to pass format check under the second check script is the second sub data set with format error, and the second sub data set can be used as a training set of the convolutional neural network to be trained. Through multi-round screening, the real data with format errors can be effectively acquired.
A first feature extraction unit 105, configured to perform a preprocessing of extracting a key value on each data in the second sub-data set to obtain a feature vector corresponding to each data in the second sub-data set.
In this embodiment, in order to perform quantization processing on a corresponding error data format in the second sub-data set, at this time, key value extraction may be performed on each data in the second sub-data set, so as to obtain a feature vector corresponding to each data in the second sub-data set.
In one embodiment, as shown in fig. 5, the first feature extraction unit 105 includes:
a data content obtaining unit 1051, configured to obtain data content corresponding to each data in the second sub-data set;
a key field extracting unit 1052, configured to extract field values in data content corresponding to each data, which are the same as related key fields in a preset key field list, so as to concatenate feature vectors corresponding to each data in the second sub-data set.
In this embodiment, for example, the data content corresponding to one of the pieces of data in the second sub data set is as follows:
tag 1 ═ tag 2 ═ tag 3 ═ 0, tag 4 ═ 1, tag 5 ═ 1, field 1 ═ X1, field 2 ═ X2, field 3 ═ X3, field 4 ═ X4, … …, and field N ═ XN;
positioning each field and the value of each field to obtain the data content corresponding to each data;
the set key field list comprises a label 1, a label 2, a label 3, a label 4, a field 1, a field 2, a field 4 and a field 6; extracting the values of the fields in the data content of each data, and sequentially connecting the values in series to form a feature vector corresponding to each data in the second sub-data set. More specifically, for example, one piece of data corresponds to data content of tag 1 ═ tag 2 ═ tag 3 ═ 0, tag 4 ═ 1, and tag 5 ═ 1, field 1 ═ X1, field 2 ═ X2, field 3 ═ X3, field 4 ═ X4, … …, and field N ═ XN; after the specified field values of the piece of data are extracted according to the preset key field list, the feature vector is [0001X 1X 2X 4X 6], and the obtained feature vector is 0 (tag 1), 0 (tag 2), 0 (tag 3), 0 (tag 4), 1 (tag 4), X1 (field 1), X2 (field 2), X4 (field 4), and X6 (field 6). Through the above process, extraction of the quantization features in each data in the second sub-data set is realized.
And the model training unit 106 is configured to use the feature vector corresponding to each data in the second sub-data set as an input of the convolutional neural network to be trained, use the labeled value corresponding to each feature vector as an output of the convolutional neural network to be trained, train the convolutional neural network to be trained, and obtain the convolutional neural network for identifying an erroneous data format.
In this embodiment, the standard values corresponding to the feature vectors corresponding to each data in the second sub-data set are all 0, which indicates that all the data have a format error. After the convolutional neural network to be trained is trained through the second sub-data set, the trained first neural network is a model capable of identifying an error data format.
In an embodiment, the external verification device 100 in data format further includes:
the automatic labeling unit is used for correspondingly and automatically labeling 0 for the characteristic vector corresponding to each data in the second sub-data set to serve as a labeled value corresponding to each characteristic vector; wherein, the marking value 0 indicates that the data has a format error.
In this embodiment, the automatically labeling 0 the feature vector corresponding to each data in the second sub-data set may be automatically performed in a verification server. Namely, a column of labeled value column is added to each data in the second sub-data set, and then the labeled value column is automatically assigned to 0 by default, so that the automatic labeling of each data in the second sub-data set can be realized.
The second receiving unit 107 is configured to receive the current data source to be format-checked, which is uploaded by the second uploading end.
In this embodiment, when a user corresponding to another upload terminal (i.e., a user of the second upload terminal) needs to upload data for format verification, it is only necessary to directly upload the current data source to be subjected to format verification from the second upload terminal to the verification server.
A second feature extraction unit 108, configured to perform a preprocessing of extracting a key value on each data in the current data source to obtain a feature vector corresponding to each data in the current data source.
In this embodiment, the preprocessing for extracting the key value of each data in the current data source may refer to a manner of extracting the key value of each data in the second sub-data set, and perform preprocessing on each data in the current data source to obtain a feature vector corresponding to each data in the current data source.
And a third checking unit 109, configured to input the feature vector corresponding to each data in the current data source to a convolutional neural network, so as to obtain a format checking result corresponding to each data in the current data source.
In this embodiment, after the feature vectors corresponding to the data in the current data source are obtained, the feature vectors are input to the first convolutional neural network, so that which data have similar problems with the second sub-data set and cannot pass format verification can be identified. Specifically, after the feature vectors corresponding to the data in the current data source are input to the convolutional neural network, the data with the output value of 0 is represented as data with failed format check, and the data with the output value of non-0 is represented as data with passed format check.
At this time, the first data set of the current data source may be formed by data which passes verification in the format verification result corresponding to each data in the current data source, and the second data set of the current data source may be formed by data which does not pass verification in the format verification result corresponding to each data in the current data source. Through the convolutional neural network, intelligent verification of the format of each data in the current data source is realized, manual verification is not needed, and the verification efficiency is improved.
A result sending unit 110, configured to send the format check result to the second upload end.
In this embodiment, after the intelligent verification of the format of each data in the current data source is completed in the verification server through the convolutional neural network, both the first data set of the current data source and the second data set of the current data source may be sent to the second upload terminal to notify the format verification result.
In an embodiment, the external verification device 100 in data format further includes:
and the pre-judgment passing rate acquisition unit is used for dividing the total number of the verification passing data in the format verification result corresponding to each data in the current data source by the total number of the data in the current data source to obtain the pre-judgment passing rate.
In this embodiment, after the current data source first data set and the current data source second data set are obtained, the pre-determination passing rate may be obtained according to the total data number of the current data source first data set/(the total data number of the current data source first data set + the total data number of the current data source second data set), that is, the total data number of the current data source first data set/the total data number of the current data source, for example, the pre-determination passing rate is denoted as Y1. The pre-judgment passing rate is obtained by performing preliminary judgment on the current data source through the convolutional neural network, and can be used as a comparison object for performing format verification on the current data source through a verification script subsequently to obtain another verification passing rate.
In an embodiment, the external verification device 100 in data format further includes:
the third verification unit is used for receiving the current verification script uploaded by the second uploading end, and performing data format verification on the current data source through the current verification script for verifying the format to obtain a third subdata set passing the verification and a data occupation ratio passing the verification;
the judging unit is used for judging whether the data occupation ratio is lower than the pre-judging passing rate or not;
the verification failure notification unit is used for sending second notification information that the current verification script does not reach the verification target to a second uploading end if the data occupation ratio is lower than the pre-judgment passing rate;
and the verification passing notification unit is used for sending third notification information that the current verification script reaches the verification target to a second uploading end if the data occupation ratio is not lower than the pre-judgment passing rate.
In this embodiment, the second upload terminal may upload the edited current verification script, and then implement hot deployment of the current verification script in the verification server. At this time, performing data format verification on the current data source through the current verification script to obtain a verified third sub data set and a verified data occupation ratio, and recording the obtained data occupation ratio as Y2.
If Y2 < Y1 indicates that the verification rule of the uploaded current verification script is defective and cannot be compatible with the verification of some data, the second uploading end is notified to modify the uploaded current verification script so as to enable more data of the current data source to pass the verification, and if the second notification information that the current verification script does not reach the verification target is sent to the second uploading end.
If Y2 is more than or equal to Y1, the verification rule of the uploaded current verification script is good in verification effect and can be compatible with the verification of some data, and at the moment, third notification information that the current verification script has reached the verification target is sent to the second uploading end.
The device realizes that the external verification server is specially used for format verification of uploaded data, and can timely discover the wrong data format in the data source by the convolutional neural network stored in the verification server, thereby improving the efficiency of mass data format verification.
The external verification means in the above data format may be implemented in the form of a computer program, which may be run on a computer device as shown in fig. 6.
Referring to fig. 6, fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 500 is a server, and the server may be an independent server or a server cluster composed of a plurality of servers.
Referring to fig. 6, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, causes the processor 502 to perform an external verification method in a data format.
The processor 502 is used to provide computing and control capabilities that support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be enabled to execute an external verification method in a data format.
The network interface 505 is used for network communication, such as providing transmission of data information. Those skilled in the art will appreciate that the configuration shown in fig. 6 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing device 500 to which aspects of the present invention may be applied, and that a particular computing device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
The processor 502 is configured to run the computer program 5032 stored in the memory to implement the external verification method of the data format disclosed in the embodiment of the present invention.
Those skilled in the art will appreciate that the embodiment of a computer device illustrated in fig. 6 does not constitute a limitation on the specific construction of the computer device, and that in other embodiments a computer device may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may only include a memory and a processor, and in such embodiments, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in fig. 6, and are not described herein again.
It should be understood that, in the embodiment of the present invention, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the external verification method for data formats disclosed in embodiments of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An external verification method for a data format is characterized by comprising the following steps:
receiving a historical data source uploaded by a first uploading end;
performing data format verification on the historical data source through a first verification script for verifying the format to obtain a first data set passing the verification and a second data set not passing the verification;
sending the first notification information to be edited of the first verification script and the second data set to a first uploading end;
receiving a second check script uploaded by the first uploading end, and performing data format check on the second data set through the second check script for checking the format to obtain a first sub data set passing the check and a second sub data set not passing the check;
preprocessing key value extraction on each data in the second sub data set to obtain a feature vector corresponding to each data in the second sub data set;
taking the feature vector corresponding to each data in the second sub data set as the input of the convolutional neural network to be trained, taking the label value corresponding to each feature vector as the output of the convolutional neural network to be trained, and training the convolutional neural network to obtain the convolutional neural network for identifying the wrong data format;
receiving a current data source to be subjected to format verification uploaded by a second uploading end;
preprocessing key value extraction is carried out on each data in the current data source to obtain a feature vector corresponding to each data in the current data source;
inputting the feature vectors corresponding to the data in the current data source into a convolutional neural network to obtain format check results corresponding to the data in the current data source; and
and sending the format checking result to the second uploading end.
2. The external verification method for data format according to claim 1, wherein after sending the format verification result to the second upload terminal, the method further comprises:
and dividing the total number of the verification passing data in the format verification result corresponding to each data in the current data source by the total number of the data in the current data source to obtain the pre-judgment passing rate.
3. The external verification method for data format according to claim 2, wherein after the predetermined passing rate is obtained by dividing the total number of data passing verification in the format verification result corresponding to each data in the current data source by the total number of data in the current data source, the method further comprises:
receiving a current verification script uploaded by a second uploading end, and performing data format verification on the current data source through the current verification script for verifying the format to obtain a third subdata set passing the verification and a data occupation ratio passing the verification;
judging whether the data occupation ratio is lower than the pre-judgment passing rate or not;
if the data occupation ratio is lower than the pre-judgment passing rate, sending second notification information that the current verification script does not reach the verification target to a second uploading end;
and if the data occupation ratio is not lower than the pre-judgment passing rate, sending third notification information that the current verification script reaches the verification target to a second uploading end.
4. The external verification method for data format according to claim 3, after receiving the historical data source uploaded by the first uploading end, further comprising:
caching the historical data source into a cache region which is correspondingly established by taking the serial number of the historical data source as an identification name; the serial number of the historical data source comprises uploading time and data size.
5. The external verification method for data format according to any one of claims 1 to 4, wherein the preprocessing of key value extraction is performed on each data in the second sub data set to obtain a feature vector corresponding to each data in the second sub data set, includes:
acquiring data content corresponding to each data in the second sub-data set;
and extracting field values which are the same as related key fields in a preset key field list in the data content corresponding to each data so as to be connected in series into a feature vector corresponding to each data in the second sub-data set.
6. The external verification method for data format according to any one of claims 1 to 4, wherein before the using the feature vector corresponding to each data in the second sub data set as an input of the convolutional neural network to be trained, and using the labeled value corresponding to each feature vector as an output of the convolutional neural network to be trained, so as to train the convolutional neural network, and obtain the convolutional neural network for identifying the wrong data format, the method further comprises:
automatically labeling 0 corresponding to the feature vector corresponding to each data in the second sub-data set to serve as a labeled value corresponding to each feature vector; wherein, the marking value 0 indicates that the data has a format error.
7. An external verification device for data format, comprising:
the first receiving unit is used for receiving the historical data source uploaded by the first uploading end;
the first verification unit is used for verifying the data format of the historical data source through a first verification script used for verifying the format to obtain a first data set which passes verification and a second data set which does not pass verification;
the notification unit is used for sending the first notification information to be edited of the first verification script and the second data set to a first uploading end;
the second checking unit is used for receiving a second checking script uploaded by the first uploading end, and performing data format checking on the second data set through the second checking script for checking the format to obtain a first sub data set passing the checking and a second sub data set not passing the checking;
a first feature extraction unit, configured to perform a key value extraction preprocessing on each data in the second sub-data set to obtain a feature vector corresponding to each data in the second sub-data set;
the model training unit is used for taking the feature vectors corresponding to the data in the second sub data set as the input of the convolutional neural network to be trained, taking the labeled values corresponding to the feature vectors as the output of the convolutional neural network to be trained, and training the convolutional neural network to obtain the convolutional neural network for identifying the wrong data format;
the second receiving unit is used for receiving the current data source to be subjected to format verification uploaded by the second uploading end;
the second feature extraction unit is used for preprocessing key value extraction of each data in the current data source to obtain a feature vector corresponding to each data in the current data source;
the third verification unit is used for inputting the feature vectors corresponding to the data in the current data source into a convolutional neural network to obtain format verification results corresponding to the data in the current data source; and
and the result sending unit is used for sending the format checking result to the second uploading end.
8. The external verification device for data format of claim 7, further comprising:
and the pre-judgment passing rate acquisition unit is used for dividing the total number of the verification passing data in the format verification result corresponding to each data in the current data source by the total number of the data in the current data source to obtain the pre-judgment passing rate.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the external verification method of the data format according to any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, causes the processor to carry out the method of external verification of a data format according to any one of claims 1 to 6.
CN201911182276.5A 2019-11-27 2019-11-27 External verification method and device for data format, computer equipment and storage medium Pending CN111062185A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911182276.5A CN111062185A (en) 2019-11-27 2019-11-27 External verification method and device for data format, computer equipment and storage medium
PCT/CN2020/103952 WO2021103607A1 (en) 2019-11-27 2020-07-24 External data format checking method, device, computer apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911182276.5A CN111062185A (en) 2019-11-27 2019-11-27 External verification method and device for data format, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111062185A true CN111062185A (en) 2020-04-24

Family

ID=70298960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911182276.5A Pending CN111062185A (en) 2019-11-27 2019-11-27 External verification method and device for data format, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN111062185A (en)
WO (1) WO2021103607A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021103607A1 (en) * 2019-11-27 2021-06-03 深圳壹账通智能科技有限公司 External data format checking method, device, computer apparatus, and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8316277B2 (en) * 2007-12-06 2012-11-20 Fusion-Io, Inc. Apparatus, system, and method for ensuring data validity in a data storage process
CN108460058A (en) * 2017-02-22 2018-08-28 北京京东尚科信息技术有限公司 Data processing method and system
CN109711145A (en) * 2018-11-09 2019-05-03 深圳壹账通智能科技有限公司 Data verification method and device, storage medium, computer equipment
CN110222282A (en) * 2019-04-17 2019-09-10 深圳壹账通智能科技有限公司 Data processing method, device, server and storage medium
CN111062185A (en) * 2019-11-27 2020-04-24 深圳壹账通智能科技有限公司 External verification method and device for data format, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021103607A1 (en) * 2019-11-27 2021-06-03 深圳壹账通智能科技有限公司 External data format checking method, device, computer apparatus, and storage medium

Also Published As

Publication number Publication date
WO2021103607A1 (en) 2021-06-03

Similar Documents

Publication Publication Date Title
CN110046073B (en) Log collection method and device, equipment and storage medium
CN105630977A (en) Application recommending method, device and system
CN103106186A (en) Form verification method and form verification system
CN111045921A (en) Automatic interface testing method and device, computer equipment and storage medium
CN109002424B (en) File format conversion method and device, computer equipment and storage medium
CN106570984B (en) Support card number verification method, the apparatus and system of a variety of Wiegand formats
CN109859002B (en) Product pushing method, device, computer equipment and storage medium
JP2009017298A (en) Data analysis apparatus
CN110069279B (en) Method, device and storage medium for checking direct current control protection program
CN106790727A (en) Information push method and device
CN108536580A (en) Utilize the system and method for lightweight device authentication protocol test equipment
WO2023169274A1 (en) Data processing method and device, and storage medium and processor
CN107911227A (en) A kind of breakpoint data follow-up method, electronic device and computer-readable recording medium
CN111126928B (en) Method and device for auditing release content
CN111062185A (en) External verification method and device for data format, computer equipment and storage medium
CN109783287B (en) Test instruction generation method, system, terminal and medium based on configuration file
CN115242896A (en) Dynamic message analysis method and device, electronic equipment and computer readable storage medium
CN114219596A (en) Data processing method based on decision tree model and related equipment
CN106168918A (en) Extended error correction coded data stores
CN111277569B (en) Network message decoding method and device and electronic equipment
CN112529321A (en) Risk prediction method and device based on user data and computer equipment
CN113726610B (en) Routing protocol-based UI (user interface) automatic test method, device, equipment and medium
CN110351330B (en) Data uploading method and device, computer equipment and storage medium
CN114386853A (en) Data auditing processing method, device and equipment based on universal auditing model
CN109040990B (en) Information acquisition method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination