CN110427375A - The recognition methods of field classification and device - Google Patents
The recognition methods of field classification and device Download PDFInfo
- Publication number
- CN110427375A CN110427375A CN201910690819.8A CN201910690819A CN110427375A CN 110427375 A CN110427375 A CN 110427375A CN 201910690819 A CN201910690819 A CN 201910690819A CN 110427375 A CN110427375 A CN 110427375A
- Authority
- CN
- China
- Prior art keywords
- field
- classification information
- name
- identified
- recognized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000012795 verification Methods 0.000 claims abstract description 21
- 238000004590 computer program Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013524 data verification Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of recognition methods of field classification and devices, this method comprises: the field name of field to be identified is matched with the field classification information in dictionary library, wherein, each of described dictionary library field classification information includes: field name and field classification;In the case where there is field classification information matched with the field name of the field to be identified in the dictionary library, the first field classification in matched field classification information is obtained;According to verification rule corresponding with the first field classification, the field to be identified is verified;In the case where the verification passes through, determine that the field classification of the field to be identified is the first field classification.By the invention it is possible to solve the problem of to lead to not fast and accurately identify field generic due to the name difference to same field.
Description
Technical Field
The invention relates to the field of communication, in particular to a field type identification method and device.
Background
In a structured data store based on a table schema, fields of the same class differ in naming due to human reasons or differences in naming specifications, i.e., fields belonging to the same class have different names. For example, for a field whose category is an identification number, different names exist in different structured data storage systems, that is, the field may be named as a certificate number, an identification card, an identification number, and the like, and the categories of the fields identified by the different names are identification number categories. The difference in naming for the same field results in an inability to quickly and accurately identify the category to which the field belongs.
Aiming at the problem that the category of the field cannot be quickly and accurately identified due to the difference of the names of the same field in the related technology, an effective technical scheme is not provided.
Disclosure of Invention
The embodiment of the invention provides a field type identification method and device, which at least solve the problem that the type of a field cannot be quickly and accurately identified due to the fact that the same field is named differently in the related technology.
According to an embodiment of the present invention, there is provided a field category identification method, including:
matching the field name of the field to be recognized with field classification information in a dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category;
under the condition that field classification information matched with the field name of the field to be recognized exists in the dictionary library, acquiring a first field type in the matched field classification information;
verifying the field to be identified according to a verification rule corresponding to the first field type;
and under the condition that the check is passed, determining that the field type of the field to be identified is the first field type.
Optionally, the field names of the fields to be identified include a first field name and a second field name; the matching of the field name of the field to be recognized and the field classification information in the dictionary library comprises the following steps: matching the first field name with the field classification information; and matching the second field name with the field classification information under the condition that the field classification information matched with the first field name does not exist in the dictionary database.
Optionally, the field to be identified includes: instance data, before the matching of the field name of the field to be recognized with the field classification information in the dictionary repository, the method further comprising: and storing the corresponding relation between the field types and the check rules, wherein each field type corresponds to one check rule, and the check rules are used for checking whether the example data belong to the field type corresponding to the check rules.
Optionally, after determining that the field category of the field to be identified is the first field category, the method further includes: checking whether the field name of the field to be identified exists in the field classification information of the dictionary library; and under the condition that the field name of the field to be recognized does not exist in the field classification information of the dictionary library, adding a field classification information in the dictionary library, wherein the added field classification information comprises the field name of the field to be recognized and the first field category.
Optionally, the dictionary database includes a plurality of the field classification information, where for each field category, at least one corresponding field classification information exists in the dictionary database; the method further comprises the following steps: and under the condition that the dictionary library does not have field classification information matched with the field name of the field to be recognized, determining the class of the field to be recognized as unknown classification.
Optionally, the added field classification information is used to identify the field classification of the next field to be identified.
Optionally, when the verification fails, determining that the category of the field to be identified is an unknown classification.
According to another embodiment of the present invention, there is also provided an apparatus for identifying a field category, including:
the matching module is used for matching the field name of the field to be identified with the field classification information in the dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category;
the acquisition module is used for acquiring a first field category in the matched field classification information under the condition that the field classification information matched with the field name of the field to be recognized exists in the dictionary library;
the verification module is used for verifying the field to be identified according to a verification rule corresponding to the first field type;
and the determining module is used for determining the field type of the field to be identified as the first field type under the condition that the check is passed.
Optionally, the field names of the fields to be identified include a first field name and a second field name; wherein the matching module comprises: the first matching unit is used for matching the first field name with the field classification information; and the second matching unit is used for matching the second field name with the field classification information under the condition that the field classification information matched with the first field name does not exist in the dictionary database.
Optionally, the field to be identified includes: instance data; wherein the apparatus further comprises: and the storage module is used for storing the corresponding relation between the field types and the check rules before the field names of the fields to be identified are matched with the field classification information in the dictionary library, wherein each field type corresponds to one check rule, and the check rules are used for checking whether the example data belong to the field type corresponding to the check rule.
Optionally, the apparatus further comprises: the checking module is used for checking whether the field name of the field to be recognized exists in the field classification information of the dictionary library after the field class of the field to be recognized is determined to be the first field class; and the adding module is used for adding field classification information in the dictionary library under the condition that the field name of the field to be recognized does not exist in the field classification information of the dictionary library, wherein the added field classification information comprises the field name of the field to be recognized and the first field category.
Optionally, the dictionary database includes a plurality of the field classification information, where for each field category, at least one corresponding field classification information exists in the dictionary database; the determining module is further configured to determine that the category of the field to be recognized is unknown classification under the condition that field classification information matched with the field name of the field to be recognized does not exist in the dictionary repository.
Optionally, the matching module is further configured to identify a field category of a next field to be identified by using the added field classification information.
Optionally, the determining module is further configured to determine that the category of the field to be identified is an unknown classification when the verification fails.
According to another embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is configured to execute the method for identifying any one of the field categories when the computer program runs.
According to another embodiment of the present invention, there is also provided an electronic apparatus, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform any one of the above field category identification methods.
According to the invention, the field name of the field to be identified is matched with the field classification information in the dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category; under the condition that field classification information matched with the field name of the field to be recognized exists in the dictionary library, acquiring a first field type in the matched field classification information; verifying the field to be identified according to a verification rule corresponding to the first field type; and under the condition that the check is passed, determining that the field type of the field to be identified is the first field type. By adopting the technical scheme, the problem that the category of the field cannot be quickly and accurately identified due to the naming difference of the same field in the related technology is solved, and the effect of quickly and accurately identifying the category of the field is achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram of identification of field categories according to an embodiment of the invention;
FIG. 2 is a flow diagram of a method of another optional field category according to an embodiment of the invention;
fig. 3 is a block diagram of a structure of a field category identification apparatus according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Example 1
An embodiment of the present invention provides a field type identification method, and fig. 1 is a flowchart of a field type identification method according to an embodiment of the present invention, as shown in fig. 1, including:
step S102, matching the field name of the field to be identified with the field classification information in the dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category;
step S104, under the condition that field classification information matched with the field name of the field to be recognized exists in the dictionary database, acquiring a first field type in the matched field classification information;
step S106, according to the check rule corresponding to the first field type, checking the field to be identified;
and step S108, determining the field type of the field to be identified as the first field type under the condition that the verification is passed.
According to the invention, the field name of the field to be identified is matched with the field classification information in the dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category; under the condition that field classification information matched with the field name of the field to be recognized exists in the dictionary library, acquiring a first field type in the matched field classification information; according to the check rule corresponding to the first field type, checking the field to be identified; and under the condition that the check is passed, determining the field type of the field to be identified as the first field type. By adopting the technical scheme, the problem that the category of the field cannot be quickly and accurately identified due to the naming difference of the same field in the related technology is solved, and the effect of quickly and accurately identifying the category of the field is achieved. The first field type identified by field type identification according to the field to be identified can be used for probing data quality and evaluating and checking data legality and correctness. The subsequent data governance can also use corresponding processing logic according to the classification result of the field. This will greatly save the calculated amount of the data governance process, improve the quality of the data governance at the same time.
In an optional embodiment of the present invention, the field name of the field to be identified includes a first field name and a second field name; the matching of the field name of the field to be recognized and the field classification information in the dictionary library comprises the following steps: matching the first field name with the field classification information; and matching the second field name with the field classification information under the condition that the field classification information matched with the first field name does not exist in the dictionary database.
In this embodiment, the field to be recognized may have a plurality of field names, for example, the field to be recognized has a chinese name (i.e., the first field name) and an english name (i.e., the second field name).
In an optional embodiment of the present invention, the field to be identified includes: instance data, before the matching of the field name of the field to be recognized with the field classification information in the dictionary repository, the method further comprising: and storing the corresponding relation between the field types and the check rules, wherein each field type corresponds to one check rule, and the check rules are used for checking whether the example data belong to the field type corresponding to the check rules.
In this embodiment, a corresponding relationship between field categories and check rules is stored in advance, each field category corresponds to one check rule, and the check rules are used to check whether a field to be identified belongs to a field category corresponding to the check rule. For example, the instance data included in the field to be identified may be checked using a check rule.
In an optional embodiment of the present invention, after determining that the field category of the field to be identified is the first field category, the method further includes: checking whether the field name of the field to be identified exists in the field classification information of the dictionary library; and under the condition that the field name of the field to be recognized does not exist in the field classification information of the dictionary library, adding a field classification information in the dictionary library, wherein the added field classification information comprises the field name of the field to be recognized and the first field category.
In this embodiment, field classification information is newly added to the dictionary repository, the field name of the newly added field classification information is the field name of the field to be recognized, and the field category is the field category (i.e., the first field category) in which the field to be recognized is recognized.
In an optional embodiment of the present invention, the dictionary repository includes a plurality of the field classification information, wherein, for each field category, there is at least one corresponding field classification information in the dictionary repository; the method further comprises the following steps: and under the condition that the dictionary library does not have field classification information matched with the field name of the field to be recognized, determining the class of the field to be recognized as unknown classification.
In this embodiment, a dictionary library stores a plurality of field classification information, each of which includes a field name and a field category. For a specified field category, at least one field classification information belonging to the specified field category exists in the dictionary database, namely, the field categories of the at least one field classification information in the dictionary database are all the specified field categories, and the field names of the field classification information can be different.
In an optional embodiment of the present invention, the added field classification information is used to identify the field class of the next field to be identified.
In this embodiment, when the field type of the next field to be identified is identified, the field name of the next field to be identified may be matched with the added field classification information. And updating the dictionary database by using the field to be recognized of the recognized field type, and recognizing the field type of the next field to be recognized by using the updated dictionary database. Therefore, the dictionary database can be automatically updated, the accuracy of identifying the category of the field to be identified by using the dictionary database is further improved, and the effect of further improving the identification accuracy of the field category is achieved.
In an optional embodiment of the present invention, when the verification fails, the category of the field to be identified is determined to be an unknown classification. In this embodiment, when the field to be identified fails to be checked by using the first field type, the field to be identified is determined to be unknown.
The following describes the field type identification method with reference to an example, but the method is not limited to the technical solution of the embodiment of the present invention, and the technical solution of the example of the present invention is as follows:
step 1, storing N field classification information in a dictionary base, wherein the N field classification information relates to M field categories. For any field classification information f in the N field classification information, the field classification information f corresponds to one field classification in the field classifications, and the field classification information f comprises a Chinese field name and an English field name;
step 2, the field F of the to-be-identified classification (i.e. the above-mentioned field to be identified) includes instance data V, and the instance data V may be the instance data stored in the field F of the to-be-identified classification. And matching the Chinese field names or English field names of the fields F for identification and classification with the dictionary database. Optionally, the Chinese field names of the fields F to be identified and classified may be matched with the field classification information in the dictionary library, and when the field classification information matched with the fields F to be identified and classified exists in the dictionary library, the field category C of the matched field classification information is extracted; when no matching item exists in the dictionary database, matching the English field name of the field F for identifying and classifying with the field classification information in the dictionary database; if the Chinese field name and the English field name of the field F can not find a matching item in the dictionary library, identifying the field F as unknown classification, and ending the process;
and 3, verifying the sample data V according to the verification rule corresponding to the field type C. If the verification is passed, determining the field type of the field F as a field type C; if the verification fails, confirming that the field F does not belong to the field type C, identifying the field F as an unknown type, and ending the process;
step 4, checking whether the Chinese field name or the English field name of the field F exists in the dictionary library, and if not, synchronizing the field name (namely the Chinese field name and the English field name) of the field F into the dictionary library;
and 5, repeatedly executing the steps 1-4 for the new field to be identified.
The following describes the above field type identification method with reference to another example, but the method is not limited to the technical solution of the embodiment of the present invention, and the technical solution of the example of the present invention is as follows:
three pieces of field classification information are stored in the classification dictionary library Dict, that is, the classification dictionary library Dict can be expressed as follows:
dict [ { "chinese name": identification number and English name: "sfzh", "category": the "identity card class" },
{ "Chinese name": "certificate number", "english name": "zjhm", "category": the "identity card class" },
{ "Chinese name": identity card and English name: "sfz", "category": "ID cards" }
The field F to be identified is represented in the form:
f { "chinese name": identity card and English name: "sfzhhm", "sample data": "340323198910100533"}. In the present embodiment, the sample data, i.e., the above-mentioned sample data, is the data stored in the field F.
Classifying the field to which the identified field F belongs according to the following steps:
matching is performed in the dictionary repository according to the Chinese name (i.e., the Chinese field name) "ID card" of the field F.
Matching field classification information exists in the dictionary library, namely the field classification information { "Chinese name": identity card and English name: "sfz", "category": the 'identity card class' extracts the class 'identity card class' of the field classification information.
And (3) checking the sample data in the field F, namely, performing data checking on the sample data in the field F according to a checking rule (for example, an identity card number rule) corresponding to the category 'identity card class', for example, checking whether the sample data is a valid and valid 18-bit citizen identity card number.
And under the condition that the verification result is true, the field type of the field F is described as the identity card type, and the field type of the field F is identified as the identity card type.
Checking whether the english name of the field F exists in the dictionary library Dict, and if not, synchronizing the field F to the dictionary library Dict, for example, adding the following field classification information in the dictionary library: { "Chinese name": identity card and English name: "sfzhhm", "category": "identification card class".
The following describes the above field category identification method with reference to another example, but the method is not limited to the technical solution of the embodiment of the present invention, fig. 2 is a flowchart of another optional field category identification method according to the embodiment of the present invention, as shown in fig. 2, the technical solution of the example of the present invention is as follows:
step 101, extracting Chinese names, English names and sample data of fields to be identified, and turning to step 102;
102, matching the Chinese and English fields obtained in the step 101 with field classification information in a dictionary library in sequence, and if the dictionary library has field classification information matched with the Chinese or English name of the field to be recognized, turning to a step 104; otherwise go to step 103;
step 103, identifying the classification of the field to be identified as unknown classification, and ending;
step 104, using the verification rule corresponding to the field type of the field classification information matched in the step 102 to perform data verification on the sample data in the step 101, if the verification result is false, turning to the step 105, and if the verification result is true, turning to the step 106;
step 105, identifying the classification of the field to be identified as unknown classification, and ending;
step 106, identifying the classification of the field to be identified as the classification of the matched field classification information, and going to step 107
Step 107, checking whether the Chinese name or English of the field to be recognized exists in the dictionary library, if not, turning to step 108, otherwise, ending;
and step 108, synchronizing the Chinese names, the English names and the classifications of the fields to be identified into the dictionary library, and ending.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
In this embodiment, a field type identification apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and the description already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of a structure of an apparatus for identifying field categories according to an embodiment of the present invention, as shown in fig. 3, the apparatus including:
a matching module 202, configured to match a field name of a field to be identified with field classification information in a dictionary library, where each field classification information in the dictionary library includes: field name and field category;
an obtaining module 204, configured to obtain a first field category in the matched field classification information when the field classification information matching the field name of the field to be identified exists in the dictionary repository;
a checking module 206, configured to check the field to be identified according to a checking rule corresponding to the first field type;
a determining module 208, configured to determine, when the check is passed, that the field category of the field to be identified is the first field category.
According to the invention, the field name of the field to be identified is matched with the field classification information in the dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category; under the condition that field classification information matched with the field name of the field to be recognized exists in the dictionary library, acquiring a first field type in the matched field classification information; verifying the field to be identified according to a verification rule corresponding to the first field type; and under the condition that the check is passed, determining that the field type of the field to be identified is the first field type. By adopting the technical scheme, the problem that the category of the field cannot be quickly and accurately identified due to the naming difference of the same field in the related technology is solved, and the effect of quickly and accurately identifying the category of the field is achieved.
In an optional embodiment of the present invention, the field name of the field to be identified includes a first field name and a second field name; wherein the matching module comprises: the first matching unit is used for matching the first field name with the field classification information; and the second matching unit is used for matching the second field name with the field classification information under the condition that the field classification information matched with the first field name does not exist in the dictionary database.
In an optional embodiment of the present invention, the field to be identified includes: instance data; wherein the apparatus further comprises: and the storage module is used for storing the corresponding relation between the field types and the check rules before the field names of the fields to be identified are matched with the field classification information in the dictionary library, wherein each field type corresponds to one check rule, and the check rules are used for checking whether the example data belong to the field type corresponding to the check rule.
In an optional embodiment of the invention, the apparatus further comprises: the checking module is used for checking whether the field name of the field to be recognized exists in the field classification information of the dictionary library after the field class of the field to be recognized is determined to be the first field class; and the adding module is used for adding field classification information in the dictionary library under the condition that the field name of the field to be recognized does not exist in the field classification information of the dictionary library, wherein the added field classification information comprises the field name of the field to be recognized and the first field category.
In an optional embodiment of the present invention, the dictionary repository includes a plurality of the field classification information, wherein, for each field category, there is at least one corresponding field classification information in the dictionary repository; the determining module is further configured to determine that the category of the field to be recognized is unknown classification under the condition that field classification information matched with the field name of the field to be recognized does not exist in the dictionary repository.
In an optional embodiment of the present invention, the matching module 202 is further configured to identify a field type of a next field to be identified by using the added field classification information.
In an optional embodiment of the present invention, the determining module 208 is further configured to determine that the category of the field to be identified is an unknown classification when the checking fails.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Example 3
Embodiments of the present invention also provide a storage medium comprising a stored program, wherein the program is arranged to perform any of the methods described above when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store program codes for performing the following steps:
step S1, matching the field name of the field to be recognized with the field classification information in the dictionary library, wherein each field classification information in the dictionary library includes: field name and field category;
step S2, under the condition that the dictionary database has field classification information matched with the field name of the field to be recognized, acquiring a first field type in the matched field classification information;
step S3, according to the check rule corresponding to the first field type, checking the field to be identified;
step S4, determining that the field type of the field to be identified is the first field type when the check is passed.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
Example 4
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
step S1, matching the field name of the field to be recognized with the field classification information in the dictionary library, wherein each field classification information in the dictionary library includes: field name and field category;
step S2, under the condition that the dictionary database has field classification information matched with the field name of the field to be recognized, acquiring a first field type in the matched field classification information;
step S3, according to the check rule corresponding to the first field type, checking the field to be identified;
step S4, determining that the field type of the field to be identified is the first field type when the check is passed.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A field category identification method is characterized by comprising the following steps:
matching the field name of the field to be recognized with field classification information in a dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category;
under the condition that field classification information matched with the field name of the field to be recognized exists in the dictionary library, acquiring a first field type in the matched field classification information;
verifying the field to be identified according to a verification rule corresponding to the first field type;
and under the condition that the check is passed, determining that the field type of the field to be identified is the first field type.
2. The method of claim 1, wherein the field names of the fields to be identified comprise a first field name and a second field name;
the matching of the field name of the field to be recognized and the field classification information in the dictionary library comprises the following steps:
matching the first field name with the field classification information;
and matching the second field name with the field classification information under the condition that the field classification information matched with the first field name does not exist in the dictionary database.
3. The method of claim 1, wherein the field to be identified comprises: instance data, before the matching of the field name of the field to be recognized with the field classification information in the dictionary repository, the method further comprising:
and storing the corresponding relation between the field types and the check rules, wherein each field type corresponds to one check rule, and the check rules are used for checking whether the example data belong to the field type corresponding to the check rules.
4. The method of claim 1, wherein after determining that the field category of the field to be identified is the first field category, the method further comprises:
checking whether the field name of the field to be identified exists in the field classification information of the dictionary library;
and under the condition that the field name of the field to be recognized does not exist in the field classification information of the dictionary library, adding a field classification information in the dictionary library, wherein the added field classification information comprises the field name of the field to be recognized and the first field category.
5. The method of claim 1, wherein a plurality of said field classification information is included in said dictionary repository, wherein for each field category, there is a corresponding at least one of said field classification information in said dictionary repository; the method further comprises the following steps:
and under the condition that the dictionary library does not have field classification information matched with the field name of the field to be recognized, determining the class of the field to be recognized as unknown classification.
6. An apparatus for identifying a field category, comprising:
the matching module is used for matching the field name of the field to be identified with the field classification information in the dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category;
the acquisition module is used for acquiring a first field category in the matched field classification information under the condition that the field classification information matched with the field name of the field to be recognized exists in the dictionary library;
the verification module is used for verifying the field to be identified according to a verification rule corresponding to the first field type;
and the determining module is used for determining the field type of the field to be identified as the first field type under the condition that the check is passed.
7. The apparatus of claim 6, wherein the field name of the field to be identified comprises a first field name and a second field name; wherein,
the matching module comprises: the first matching unit is used for matching the first field name with the field classification information; and the second matching unit is used for matching the second field name with the field classification information under the condition that the field classification information matched with the first field name does not exist in the dictionary database.
8. The apparatus of claim 6, wherein the field to be identified comprises: instance data; wherein the apparatus further comprises:
and the storage module is used for storing the corresponding relation between the field types and the check rules before the field names of the fields to be identified are matched with the field classification information in the dictionary library, wherein each field type corresponds to one check rule, and the check rules are used for checking whether the example data belong to the field type corresponding to the check rule.
9. The apparatus of claim 6, further comprising:
the checking module is used for checking whether the field name of the field to be recognized exists in the field classification information of the dictionary library after the field class of the field to be recognized is determined to be the first field class;
and the adding module is used for adding field classification information in the dictionary library under the condition that the field name of the field to be recognized does not exist in the field classification information of the dictionary library, wherein the added field classification information comprises the field name of the field to be recognized and the first field category.
10. The apparatus of claim 6, wherein a plurality of said field classification information is included in said dictionary repository, wherein for each field category, there is a corresponding at least one of said field classification information in said dictionary repository;
the determining module is further configured to determine that the category of the field to be recognized is unknown classification under the condition that field classification information matched with the field name of the field to be recognized does not exist in the dictionary repository.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910690819.8A CN110427375B (en) | 2019-07-29 | 2019-07-29 | Method and device for identifying field type |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910690819.8A CN110427375B (en) | 2019-07-29 | 2019-07-29 | Method and device for identifying field type |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110427375A true CN110427375A (en) | 2019-11-08 |
CN110427375B CN110427375B (en) | 2022-12-09 |
Family
ID=68411177
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910690819.8A Active CN110427375B (en) | 2019-07-29 | 2019-07-29 | Method and device for identifying field type |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110427375B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111104481A (en) * | 2019-12-17 | 2020-05-05 | 东软集团股份有限公司 | Method, device and equipment for identifying matching field |
CN111143374A (en) * | 2019-12-31 | 2020-05-12 | 杭州依图医疗技术有限公司 | Data auxiliary identification method, system, computing equipment and storage medium |
CN111209538A (en) * | 2020-01-03 | 2020-05-29 | 北京明略软件系统有限公司 | Table data quality probing method and device |
CN115168345A (en) * | 2022-06-27 | 2022-10-11 | 天翼爱音乐文化科技有限公司 | Database classification method, system, device and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107784058A (en) * | 2017-04-11 | 2018-03-09 | 平安医疗健康管理股份有限公司 | Drug data processing method and processing device |
US20180196886A1 (en) * | 2017-01-12 | 2018-07-12 | Innovationdock, Inc. | Devices and methods for implementing dynamic collaborative workflow systems |
CN109656985A (en) * | 2018-09-27 | 2019-04-19 | 深圳壹账通智能科技有限公司 | Data lead-in method, system, terminal and storage medium |
-
2019
- 2019-07-29 CN CN201910690819.8A patent/CN110427375B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180196886A1 (en) * | 2017-01-12 | 2018-07-12 | Innovationdock, Inc. | Devices and methods for implementing dynamic collaborative workflow systems |
CN107784058A (en) * | 2017-04-11 | 2018-03-09 | 平安医疗健康管理股份有限公司 | Drug data processing method and processing device |
CN109656985A (en) * | 2018-09-27 | 2019-04-19 | 深圳壹账通智能科技有限公司 | Data lead-in method, system, terminal and storage medium |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111104481A (en) * | 2019-12-17 | 2020-05-05 | 东软集团股份有限公司 | Method, device and equipment for identifying matching field |
CN111104481B (en) * | 2019-12-17 | 2023-10-10 | 东软集团股份有限公司 | Method, device and equipment for identifying matching field |
CN111143374A (en) * | 2019-12-31 | 2020-05-12 | 杭州依图医疗技术有限公司 | Data auxiliary identification method, system, computing equipment and storage medium |
CN111143374B (en) * | 2019-12-31 | 2023-04-25 | 杭州依图医疗技术有限公司 | Data auxiliary identification method, system, computing device and storage medium |
CN111209538A (en) * | 2020-01-03 | 2020-05-29 | 北京明略软件系统有限公司 | Table data quality probing method and device |
CN115168345A (en) * | 2022-06-27 | 2022-10-11 | 天翼爱音乐文化科技有限公司 | Database classification method, system, device and storage medium |
CN115168345B (en) * | 2022-06-27 | 2023-04-18 | 天翼爱音乐文化科技有限公司 | Database classification method, system, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110427375B (en) | 2022-12-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110427375B (en) | Method and device for identifying field type | |
US20220004878A1 (en) | Systems and methods for synthetic document and data generation | |
CN112416778B (en) | Test case recommendation method and device and electronic equipment | |
CN107491536B (en) | Test question checking method, test question checking device and electronic equipment | |
CN110929125A (en) | Search recall method, apparatus, device and storage medium thereof | |
CN109885597B (en) | User grouping processing method and device based on machine learning and electronic terminal | |
CN109241014B (en) | Data processing method and device and server | |
CN110674360B (en) | Tracing method and system for data | |
CN109614327B (en) | Method and apparatus for outputting information | |
CN111767350A (en) | Data warehouse testing method and device, terminal equipment and storage medium | |
CN104756113A (en) | Method, apparatus and computer program for detecting deviations in data sources | |
CN110737650A (en) | Data quality detection method and device | |
WO2004023342A1 (en) | Method and system for registering goods information | |
CN112733146A (en) | Penetration testing method, device and equipment based on machine learning and storage medium | |
CN110019762B (en) | Problem positioning method, storage medium and server | |
CN113836002A (en) | Test interface verification method and device, storage medium and electronic device | |
CN111723182B (en) | Key information extraction method and device for vulnerability text | |
CN110852082B (en) | Synonym determination method and device | |
CN112860722A (en) | Data checking method and device, electronic equipment and readable storage medium | |
CN116451175A (en) | Multi-mode data processing method and device, electronic equipment and storage medium | |
CN116340172A (en) | Data collection method and device based on test scene and test case detection method | |
CN115801309A (en) | Big data-based computer terminal access security verification method and system | |
CN115221893A (en) | Quality inspection rule automatic configuration method and device based on rule and semantic analysis | |
CN111199423B (en) | User behavior track generation method, device, equipment and storage medium | |
CN113901075A (en) | Method and device for generating SQL (structured query language) statement, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |