CN110427375A - The recognition methods of field classification and device - Google Patents

The recognition methods of field classification and device Download PDF

Info

Publication number
CN110427375A
CN110427375A CN201910690819.8A CN201910690819A CN110427375A CN 110427375 A CN110427375 A CN 110427375A CN 201910690819 A CN201910690819 A CN 201910690819A CN 110427375 A CN110427375 A CN 110427375A
Authority
CN
China
Prior art keywords
field
classification information
name
identified
recognized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910690819.8A
Other languages
Chinese (zh)
Other versions
CN110427375B (en
Inventor
堵新政
张毅然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201910690819.8A priority Critical patent/CN110427375B/en
Publication of CN110427375A publication Critical patent/CN110427375A/en
Application granted granted Critical
Publication of CN110427375B publication Critical patent/CN110427375B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of recognition methods of field classification and devices, this method comprises: the field name of field to be identified is matched with the field classification information in dictionary library, wherein, each of described dictionary library field classification information includes: field name and field classification;In the case where there is field classification information matched with the field name of the field to be identified in the dictionary library, the first field classification in matched field classification information is obtained;According to verification rule corresponding with the first field classification, the field to be identified is verified;In the case where the verification passes through, determine that the field classification of the field to be identified is the first field classification.By the invention it is possible to solve the problem of to lead to not fast and accurately identify field generic due to the name difference to same field.

Description

Method and device for identifying field type
Technical Field
The invention relates to the field of communication, in particular to a field type identification method and device.
Background
In a structured data store based on a table schema, fields of the same class differ in naming due to human reasons or differences in naming specifications, i.e., fields belonging to the same class have different names. For example, for a field whose category is an identification number, different names exist in different structured data storage systems, that is, the field may be named as a certificate number, an identification card, an identification number, and the like, and the categories of the fields identified by the different names are identification number categories. The difference in naming for the same field results in an inability to quickly and accurately identify the category to which the field belongs.
Aiming at the problem that the category of the field cannot be quickly and accurately identified due to the difference of the names of the same field in the related technology, an effective technical scheme is not provided.
Disclosure of Invention
The embodiment of the invention provides a field type identification method and device, which at least solve the problem that the type of a field cannot be quickly and accurately identified due to the fact that the same field is named differently in the related technology.
According to an embodiment of the present invention, there is provided a field category identification method, including:
matching the field name of the field to be recognized with field classification information in a dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category;
under the condition that field classification information matched with the field name of the field to be recognized exists in the dictionary library, acquiring a first field type in the matched field classification information;
verifying the field to be identified according to a verification rule corresponding to the first field type;
and under the condition that the check is passed, determining that the field type of the field to be identified is the first field type.
Optionally, the field names of the fields to be identified include a first field name and a second field name; the matching of the field name of the field to be recognized and the field classification information in the dictionary library comprises the following steps: matching the first field name with the field classification information; and matching the second field name with the field classification information under the condition that the field classification information matched with the first field name does not exist in the dictionary database.
Optionally, the field to be identified includes: instance data, before the matching of the field name of the field to be recognized with the field classification information in the dictionary repository, the method further comprising: and storing the corresponding relation between the field types and the check rules, wherein each field type corresponds to one check rule, and the check rules are used for checking whether the example data belong to the field type corresponding to the check rules.
Optionally, after determining that the field category of the field to be identified is the first field category, the method further includes: checking whether the field name of the field to be identified exists in the field classification information of the dictionary library; and under the condition that the field name of the field to be recognized does not exist in the field classification information of the dictionary library, adding a field classification information in the dictionary library, wherein the added field classification information comprises the field name of the field to be recognized and the first field category.
Optionally, the dictionary database includes a plurality of the field classification information, where for each field category, at least one corresponding field classification information exists in the dictionary database; the method further comprises the following steps: and under the condition that the dictionary library does not have field classification information matched with the field name of the field to be recognized, determining the class of the field to be recognized as unknown classification.
Optionally, the added field classification information is used to identify the field classification of the next field to be identified.
Optionally, when the verification fails, determining that the category of the field to be identified is an unknown classification.
According to another embodiment of the present invention, there is also provided an apparatus for identifying a field category, including:
the matching module is used for matching the field name of the field to be identified with the field classification information in the dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category;
the acquisition module is used for acquiring a first field category in the matched field classification information under the condition that the field classification information matched with the field name of the field to be recognized exists in the dictionary library;
the verification module is used for verifying the field to be identified according to a verification rule corresponding to the first field type;
and the determining module is used for determining the field type of the field to be identified as the first field type under the condition that the check is passed.
Optionally, the field names of the fields to be identified include a first field name and a second field name; wherein the matching module comprises: the first matching unit is used for matching the first field name with the field classification information; and the second matching unit is used for matching the second field name with the field classification information under the condition that the field classification information matched with the first field name does not exist in the dictionary database.
Optionally, the field to be identified includes: instance data; wherein the apparatus further comprises: and the storage module is used for storing the corresponding relation between the field types and the check rules before the field names of the fields to be identified are matched with the field classification information in the dictionary library, wherein each field type corresponds to one check rule, and the check rules are used for checking whether the example data belong to the field type corresponding to the check rule.
Optionally, the apparatus further comprises: the checking module is used for checking whether the field name of the field to be recognized exists in the field classification information of the dictionary library after the field class of the field to be recognized is determined to be the first field class; and the adding module is used for adding field classification information in the dictionary library under the condition that the field name of the field to be recognized does not exist in the field classification information of the dictionary library, wherein the added field classification information comprises the field name of the field to be recognized and the first field category.
Optionally, the dictionary database includes a plurality of the field classification information, where for each field category, at least one corresponding field classification information exists in the dictionary database; the determining module is further configured to determine that the category of the field to be recognized is unknown classification under the condition that field classification information matched with the field name of the field to be recognized does not exist in the dictionary repository.
Optionally, the matching module is further configured to identify a field category of a next field to be identified by using the added field classification information.
Optionally, the determining module is further configured to determine that the category of the field to be identified is an unknown classification when the verification fails.
According to another embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is configured to execute the method for identifying any one of the field categories when the computer program runs.
According to another embodiment of the present invention, there is also provided an electronic apparatus, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform any one of the above field category identification methods.
According to the invention, the field name of the field to be identified is matched with the field classification information in the dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category; under the condition that field classification information matched with the field name of the field to be recognized exists in the dictionary library, acquiring a first field type in the matched field classification information; verifying the field to be identified according to a verification rule corresponding to the first field type; and under the condition that the check is passed, determining that the field type of the field to be identified is the first field type. By adopting the technical scheme, the problem that the category of the field cannot be quickly and accurately identified due to the naming difference of the same field in the related technology is solved, and the effect of quickly and accurately identifying the category of the field is achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram of identification of field categories according to an embodiment of the invention;
FIG. 2 is a flow diagram of a method of another optional field category according to an embodiment of the invention;
fig. 3 is a block diagram of a structure of a field category identification apparatus according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Example 1
An embodiment of the present invention provides a field type identification method, and fig. 1 is a flowchart of a field type identification method according to an embodiment of the present invention, as shown in fig. 1, including:
step S102, matching the field name of the field to be identified with the field classification information in the dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category;
step S104, under the condition that field classification information matched with the field name of the field to be recognized exists in the dictionary database, acquiring a first field type in the matched field classification information;
step S106, according to the check rule corresponding to the first field type, checking the field to be identified;
and step S108, determining the field type of the field to be identified as the first field type under the condition that the verification is passed.
According to the invention, the field name of the field to be identified is matched with the field classification information in the dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category; under the condition that field classification information matched with the field name of the field to be recognized exists in the dictionary library, acquiring a first field type in the matched field classification information; according to the check rule corresponding to the first field type, checking the field to be identified; and under the condition that the check is passed, determining the field type of the field to be identified as the first field type. By adopting the technical scheme, the problem that the category of the field cannot be quickly and accurately identified due to the naming difference of the same field in the related technology is solved, and the effect of quickly and accurately identifying the category of the field is achieved. The first field type identified by field type identification according to the field to be identified can be used for probing data quality and evaluating and checking data legality and correctness. The subsequent data governance can also use corresponding processing logic according to the classification result of the field. This will greatly save the calculated amount of the data governance process, improve the quality of the data governance at the same time.
In an optional embodiment of the present invention, the field name of the field to be identified includes a first field name and a second field name; the matching of the field name of the field to be recognized and the field classification information in the dictionary library comprises the following steps: matching the first field name with the field classification information; and matching the second field name with the field classification information under the condition that the field classification information matched with the first field name does not exist in the dictionary database.
In this embodiment, the field to be recognized may have a plurality of field names, for example, the field to be recognized has a chinese name (i.e., the first field name) and an english name (i.e., the second field name).
In an optional embodiment of the present invention, the field to be identified includes: instance data, before the matching of the field name of the field to be recognized with the field classification information in the dictionary repository, the method further comprising: and storing the corresponding relation between the field types and the check rules, wherein each field type corresponds to one check rule, and the check rules are used for checking whether the example data belong to the field type corresponding to the check rules.
In this embodiment, a corresponding relationship between field categories and check rules is stored in advance, each field category corresponds to one check rule, and the check rules are used to check whether a field to be identified belongs to a field category corresponding to the check rule. For example, the instance data included in the field to be identified may be checked using a check rule.
In an optional embodiment of the present invention, after determining that the field category of the field to be identified is the first field category, the method further includes: checking whether the field name of the field to be identified exists in the field classification information of the dictionary library; and under the condition that the field name of the field to be recognized does not exist in the field classification information of the dictionary library, adding a field classification information in the dictionary library, wherein the added field classification information comprises the field name of the field to be recognized and the first field category.
In this embodiment, field classification information is newly added to the dictionary repository, the field name of the newly added field classification information is the field name of the field to be recognized, and the field category is the field category (i.e., the first field category) in which the field to be recognized is recognized.
In an optional embodiment of the present invention, the dictionary repository includes a plurality of the field classification information, wherein, for each field category, there is at least one corresponding field classification information in the dictionary repository; the method further comprises the following steps: and under the condition that the dictionary library does not have field classification information matched with the field name of the field to be recognized, determining the class of the field to be recognized as unknown classification.
In this embodiment, a dictionary library stores a plurality of field classification information, each of which includes a field name and a field category. For a specified field category, at least one field classification information belonging to the specified field category exists in the dictionary database, namely, the field categories of the at least one field classification information in the dictionary database are all the specified field categories, and the field names of the field classification information can be different.
In an optional embodiment of the present invention, the added field classification information is used to identify the field class of the next field to be identified.
In this embodiment, when the field type of the next field to be identified is identified, the field name of the next field to be identified may be matched with the added field classification information. And updating the dictionary database by using the field to be recognized of the recognized field type, and recognizing the field type of the next field to be recognized by using the updated dictionary database. Therefore, the dictionary database can be automatically updated, the accuracy of identifying the category of the field to be identified by using the dictionary database is further improved, and the effect of further improving the identification accuracy of the field category is achieved.
In an optional embodiment of the present invention, when the verification fails, the category of the field to be identified is determined to be an unknown classification. In this embodiment, when the field to be identified fails to be checked by using the first field type, the field to be identified is determined to be unknown.
The following describes the field type identification method with reference to an example, but the method is not limited to the technical solution of the embodiment of the present invention, and the technical solution of the example of the present invention is as follows:
step 1, storing N field classification information in a dictionary base, wherein the N field classification information relates to M field categories. For any field classification information f in the N field classification information, the field classification information f corresponds to one field classification in the field classifications, and the field classification information f comprises a Chinese field name and an English field name;
step 2, the field F of the to-be-identified classification (i.e. the above-mentioned field to be identified) includes instance data V, and the instance data V may be the instance data stored in the field F of the to-be-identified classification. And matching the Chinese field names or English field names of the fields F for identification and classification with the dictionary database. Optionally, the Chinese field names of the fields F to be identified and classified may be matched with the field classification information in the dictionary library, and when the field classification information matched with the fields F to be identified and classified exists in the dictionary library, the field category C of the matched field classification information is extracted; when no matching item exists in the dictionary database, matching the English field name of the field F for identifying and classifying with the field classification information in the dictionary database; if the Chinese field name and the English field name of the field F can not find a matching item in the dictionary library, identifying the field F as unknown classification, and ending the process;
and 3, verifying the sample data V according to the verification rule corresponding to the field type C. If the verification is passed, determining the field type of the field F as a field type C; if the verification fails, confirming that the field F does not belong to the field type C, identifying the field F as an unknown type, and ending the process;
step 4, checking whether the Chinese field name or the English field name of the field F exists in the dictionary library, and if not, synchronizing the field name (namely the Chinese field name and the English field name) of the field F into the dictionary library;
and 5, repeatedly executing the steps 1-4 for the new field to be identified.
The following describes the above field type identification method with reference to another example, but the method is not limited to the technical solution of the embodiment of the present invention, and the technical solution of the example of the present invention is as follows:
three pieces of field classification information are stored in the classification dictionary library Dict, that is, the classification dictionary library Dict can be expressed as follows:
dict [ { "chinese name": identification number and English name: "sfzh", "category": the "identity card class" },
{ "Chinese name": "certificate number", "english name": "zjhm", "category": the "identity card class" },
{ "Chinese name": identity card and English name: "sfz", "category": "ID cards" }
The field F to be identified is represented in the form:
f { "chinese name": identity card and English name: "sfzhhm", "sample data": "340323198910100533"}. In the present embodiment, the sample data, i.e., the above-mentioned sample data, is the data stored in the field F.
Classifying the field to which the identified field F belongs according to the following steps:
matching is performed in the dictionary repository according to the Chinese name (i.e., the Chinese field name) "ID card" of the field F.
Matching field classification information exists in the dictionary library, namely the field classification information { "Chinese name": identity card and English name: "sfz", "category": the 'identity card class' extracts the class 'identity card class' of the field classification information.
And (3) checking the sample data in the field F, namely, performing data checking on the sample data in the field F according to a checking rule (for example, an identity card number rule) corresponding to the category 'identity card class', for example, checking whether the sample data is a valid and valid 18-bit citizen identity card number.
And under the condition that the verification result is true, the field type of the field F is described as the identity card type, and the field type of the field F is identified as the identity card type.
Checking whether the english name of the field F exists in the dictionary library Dict, and if not, synchronizing the field F to the dictionary library Dict, for example, adding the following field classification information in the dictionary library: { "Chinese name": identity card and English name: "sfzhhm", "category": "identification card class".
The following describes the above field category identification method with reference to another example, but the method is not limited to the technical solution of the embodiment of the present invention, fig. 2 is a flowchart of another optional field category identification method according to the embodiment of the present invention, as shown in fig. 2, the technical solution of the example of the present invention is as follows:
step 101, extracting Chinese names, English names and sample data of fields to be identified, and turning to step 102;
102, matching the Chinese and English fields obtained in the step 101 with field classification information in a dictionary library in sequence, and if the dictionary library has field classification information matched with the Chinese or English name of the field to be recognized, turning to a step 104; otherwise go to step 103;
step 103, identifying the classification of the field to be identified as unknown classification, and ending;
step 104, using the verification rule corresponding to the field type of the field classification information matched in the step 102 to perform data verification on the sample data in the step 101, if the verification result is false, turning to the step 105, and if the verification result is true, turning to the step 106;
step 105, identifying the classification of the field to be identified as unknown classification, and ending;
step 106, identifying the classification of the field to be identified as the classification of the matched field classification information, and going to step 107
Step 107, checking whether the Chinese name or English of the field to be recognized exists in the dictionary library, if not, turning to step 108, otherwise, ending;
and step 108, synchronizing the Chinese names, the English names and the classifications of the fields to be identified into the dictionary library, and ending.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
In this embodiment, a field type identification apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and the description already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of a structure of an apparatus for identifying field categories according to an embodiment of the present invention, as shown in fig. 3, the apparatus including:
a matching module 202, configured to match a field name of a field to be identified with field classification information in a dictionary library, where each field classification information in the dictionary library includes: field name and field category;
an obtaining module 204, configured to obtain a first field category in the matched field classification information when the field classification information matching the field name of the field to be identified exists in the dictionary repository;
a checking module 206, configured to check the field to be identified according to a checking rule corresponding to the first field type;
a determining module 208, configured to determine, when the check is passed, that the field category of the field to be identified is the first field category.
According to the invention, the field name of the field to be identified is matched with the field classification information in the dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category; under the condition that field classification information matched with the field name of the field to be recognized exists in the dictionary library, acquiring a first field type in the matched field classification information; verifying the field to be identified according to a verification rule corresponding to the first field type; and under the condition that the check is passed, determining that the field type of the field to be identified is the first field type. By adopting the technical scheme, the problem that the category of the field cannot be quickly and accurately identified due to the naming difference of the same field in the related technology is solved, and the effect of quickly and accurately identifying the category of the field is achieved.
In an optional embodiment of the present invention, the field name of the field to be identified includes a first field name and a second field name; wherein the matching module comprises: the first matching unit is used for matching the first field name with the field classification information; and the second matching unit is used for matching the second field name with the field classification information under the condition that the field classification information matched with the first field name does not exist in the dictionary database.
In an optional embodiment of the present invention, the field to be identified includes: instance data; wherein the apparatus further comprises: and the storage module is used for storing the corresponding relation between the field types and the check rules before the field names of the fields to be identified are matched with the field classification information in the dictionary library, wherein each field type corresponds to one check rule, and the check rules are used for checking whether the example data belong to the field type corresponding to the check rule.
In an optional embodiment of the invention, the apparatus further comprises: the checking module is used for checking whether the field name of the field to be recognized exists in the field classification information of the dictionary library after the field class of the field to be recognized is determined to be the first field class; and the adding module is used for adding field classification information in the dictionary library under the condition that the field name of the field to be recognized does not exist in the field classification information of the dictionary library, wherein the added field classification information comprises the field name of the field to be recognized and the first field category.
In an optional embodiment of the present invention, the dictionary repository includes a plurality of the field classification information, wherein, for each field category, there is at least one corresponding field classification information in the dictionary repository; the determining module is further configured to determine that the category of the field to be recognized is unknown classification under the condition that field classification information matched with the field name of the field to be recognized does not exist in the dictionary repository.
In an optional embodiment of the present invention, the matching module 202 is further configured to identify a field type of a next field to be identified by using the added field classification information.
In an optional embodiment of the present invention, the determining module 208 is further configured to determine that the category of the field to be identified is an unknown classification when the checking fails.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Example 3
Embodiments of the present invention also provide a storage medium comprising a stored program, wherein the program is arranged to perform any of the methods described above when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store program codes for performing the following steps:
step S1, matching the field name of the field to be recognized with the field classification information in the dictionary library, wherein each field classification information in the dictionary library includes: field name and field category;
step S2, under the condition that the dictionary database has field classification information matched with the field name of the field to be recognized, acquiring a first field type in the matched field classification information;
step S3, according to the check rule corresponding to the first field type, checking the field to be identified;
step S4, determining that the field type of the field to be identified is the first field type when the check is passed.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
Example 4
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
step S1, matching the field name of the field to be recognized with the field classification information in the dictionary library, wherein each field classification information in the dictionary library includes: field name and field category;
step S2, under the condition that the dictionary database has field classification information matched with the field name of the field to be recognized, acquiring a first field type in the matched field classification information;
step S3, according to the check rule corresponding to the first field type, checking the field to be identified;
step S4, determining that the field type of the field to be identified is the first field type when the check is passed.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A field category identification method is characterized by comprising the following steps:
matching the field name of the field to be recognized with field classification information in a dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category;
under the condition that field classification information matched with the field name of the field to be recognized exists in the dictionary library, acquiring a first field type in the matched field classification information;
verifying the field to be identified according to a verification rule corresponding to the first field type;
and under the condition that the check is passed, determining that the field type of the field to be identified is the first field type.
2. The method of claim 1, wherein the field names of the fields to be identified comprise a first field name and a second field name;
the matching of the field name of the field to be recognized and the field classification information in the dictionary library comprises the following steps:
matching the first field name with the field classification information;
and matching the second field name with the field classification information under the condition that the field classification information matched with the first field name does not exist in the dictionary database.
3. The method of claim 1, wherein the field to be identified comprises: instance data, before the matching of the field name of the field to be recognized with the field classification information in the dictionary repository, the method further comprising:
and storing the corresponding relation between the field types and the check rules, wherein each field type corresponds to one check rule, and the check rules are used for checking whether the example data belong to the field type corresponding to the check rules.
4. The method of claim 1, wherein after determining that the field category of the field to be identified is the first field category, the method further comprises:
checking whether the field name of the field to be identified exists in the field classification information of the dictionary library;
and under the condition that the field name of the field to be recognized does not exist in the field classification information of the dictionary library, adding a field classification information in the dictionary library, wherein the added field classification information comprises the field name of the field to be recognized and the first field category.
5. The method of claim 1, wherein a plurality of said field classification information is included in said dictionary repository, wherein for each field category, there is a corresponding at least one of said field classification information in said dictionary repository; the method further comprises the following steps:
and under the condition that the dictionary library does not have field classification information matched with the field name of the field to be recognized, determining the class of the field to be recognized as unknown classification.
6. An apparatus for identifying a field category, comprising:
the matching module is used for matching the field name of the field to be identified with the field classification information in the dictionary library, wherein each field classification information in the dictionary library comprises: field name and field category;
the acquisition module is used for acquiring a first field category in the matched field classification information under the condition that the field classification information matched with the field name of the field to be recognized exists in the dictionary library;
the verification module is used for verifying the field to be identified according to a verification rule corresponding to the first field type;
and the determining module is used for determining the field type of the field to be identified as the first field type under the condition that the check is passed.
7. The apparatus of claim 6, wherein the field name of the field to be identified comprises a first field name and a second field name; wherein,
the matching module comprises: the first matching unit is used for matching the first field name with the field classification information; and the second matching unit is used for matching the second field name with the field classification information under the condition that the field classification information matched with the first field name does not exist in the dictionary database.
8. The apparatus of claim 6, wherein the field to be identified comprises: instance data; wherein the apparatus further comprises:
and the storage module is used for storing the corresponding relation between the field types and the check rules before the field names of the fields to be identified are matched with the field classification information in the dictionary library, wherein each field type corresponds to one check rule, and the check rules are used for checking whether the example data belong to the field type corresponding to the check rule.
9. The apparatus of claim 6, further comprising:
the checking module is used for checking whether the field name of the field to be recognized exists in the field classification information of the dictionary library after the field class of the field to be recognized is determined to be the first field class;
and the adding module is used for adding field classification information in the dictionary library under the condition that the field name of the field to be recognized does not exist in the field classification information of the dictionary library, wherein the added field classification information comprises the field name of the field to be recognized and the first field category.
10. The apparatus of claim 6, wherein a plurality of said field classification information is included in said dictionary repository, wherein for each field category, there is a corresponding at least one of said field classification information in said dictionary repository;
the determining module is further configured to determine that the category of the field to be recognized is unknown classification under the condition that field classification information matched with the field name of the field to be recognized does not exist in the dictionary repository.
CN201910690819.8A 2019-07-29 2019-07-29 Method and device for identifying field type Active CN110427375B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910690819.8A CN110427375B (en) 2019-07-29 2019-07-29 Method and device for identifying field type

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910690819.8A CN110427375B (en) 2019-07-29 2019-07-29 Method and device for identifying field type

Publications (2)

Publication Number Publication Date
CN110427375A true CN110427375A (en) 2019-11-08
CN110427375B CN110427375B (en) 2022-12-09

Family

ID=68411177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910690819.8A Active CN110427375B (en) 2019-07-29 2019-07-29 Method and device for identifying field type

Country Status (1)

Country Link
CN (1) CN110427375B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104481A (en) * 2019-12-17 2020-05-05 东软集团股份有限公司 Method, device and equipment for identifying matching field
CN111143374A (en) * 2019-12-31 2020-05-12 杭州依图医疗技术有限公司 Data auxiliary identification method, system, computing equipment and storage medium
CN111209538A (en) * 2020-01-03 2020-05-29 北京明略软件系统有限公司 Table data quality probing method and device
CN115168345A (en) * 2022-06-27 2022-10-11 天翼爱音乐文化科技有限公司 Database classification method, system, device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784058A (en) * 2017-04-11 2018-03-09 平安医疗健康管理股份有限公司 Drug data processing method and processing device
US20180196886A1 (en) * 2017-01-12 2018-07-12 Innovationdock, Inc. Devices and methods for implementing dynamic collaborative workflow systems
CN109656985A (en) * 2018-09-27 2019-04-19 深圳壹账通智能科技有限公司 Data lead-in method, system, terminal and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180196886A1 (en) * 2017-01-12 2018-07-12 Innovationdock, Inc. Devices and methods for implementing dynamic collaborative workflow systems
CN107784058A (en) * 2017-04-11 2018-03-09 平安医疗健康管理股份有限公司 Drug data processing method and processing device
CN109656985A (en) * 2018-09-27 2019-04-19 深圳壹账通智能科技有限公司 Data lead-in method, system, terminal and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104481A (en) * 2019-12-17 2020-05-05 东软集团股份有限公司 Method, device and equipment for identifying matching field
CN111104481B (en) * 2019-12-17 2023-10-10 东软集团股份有限公司 Method, device and equipment for identifying matching field
CN111143374A (en) * 2019-12-31 2020-05-12 杭州依图医疗技术有限公司 Data auxiliary identification method, system, computing equipment and storage medium
CN111143374B (en) * 2019-12-31 2023-04-25 杭州依图医疗技术有限公司 Data auxiliary identification method, system, computing device and storage medium
CN111209538A (en) * 2020-01-03 2020-05-29 北京明略软件系统有限公司 Table data quality probing method and device
CN115168345A (en) * 2022-06-27 2022-10-11 天翼爱音乐文化科技有限公司 Database classification method, system, device and storage medium
CN115168345B (en) * 2022-06-27 2023-04-18 天翼爱音乐文化科技有限公司 Database classification method, system, device and storage medium

Also Published As

Publication number Publication date
CN110427375B (en) 2022-12-09

Similar Documents

Publication Publication Date Title
CN110427375B (en) Method and device for identifying field type
US20220004878A1 (en) Systems and methods for synthetic document and data generation
CN112416778B (en) Test case recommendation method and device and electronic equipment
CN107491536B (en) Test question checking method, test question checking device and electronic equipment
CN110929125A (en) Search recall method, apparatus, device and storage medium thereof
CN109885597B (en) User grouping processing method and device based on machine learning and electronic terminal
CN109241014B (en) Data processing method and device and server
CN110674360B (en) Tracing method and system for data
CN109614327B (en) Method and apparatus for outputting information
CN111767350A (en) Data warehouse testing method and device, terminal equipment and storage medium
CN104756113A (en) Method, apparatus and computer program for detecting deviations in data sources
CN110737650A (en) Data quality detection method and device
WO2004023342A1 (en) Method and system for registering goods information
CN112733146A (en) Penetration testing method, device and equipment based on machine learning and storage medium
CN110019762B (en) Problem positioning method, storage medium and server
CN113836002A (en) Test interface verification method and device, storage medium and electronic device
CN111723182B (en) Key information extraction method and device for vulnerability text
CN110852082B (en) Synonym determination method and device
CN112860722A (en) Data checking method and device, electronic equipment and readable storage medium
CN116451175A (en) Multi-mode data processing method and device, electronic equipment and storage medium
CN116340172A (en) Data collection method and device based on test scene and test case detection method
CN115801309A (en) Big data-based computer terminal access security verification method and system
CN115221893A (en) Quality inspection rule automatic configuration method and device based on rule and semantic analysis
CN111199423B (en) User behavior track generation method, device, equipment and storage medium
CN113901075A (en) Method and device for generating SQL (structured query language) statement, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant