CN117951172A

CN117951172A - Key field processing method and device, electronic equipment and storage medium

Info

Publication number: CN117951172A
Application number: CN202311786271.XA
Authority: CN
Inventors: 陈刚; 赫建营
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2023-12-22
Filing date: 2023-12-22
Publication date: 2024-04-30

Abstract

The embodiment of the application discloses a key field processing method, a device, electronic equipment and a storage medium. The method comprises the following steps: acquiring first data corresponding to a first field from a database; matching the first data with a preset key field expression to obtain a matching result corresponding to the first field; the key field expression is used for representing the data content characteristics which need to be met if one field is used as a key field; and if the first field is determined to meet the preset matching condition according to the matching result, determining that the first field is a key field in the database. The method and the device can improve the accuracy of key field identification.

Description

Key field processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to a method and apparatus for processing a key field, an electronic device, and a storage medium.

Background

The rapid development of the internet has led to the generation of large amounts of data, which can serve the development of various businesses, for example, businesses make business decisions using the generated data, which has led to the higher and higher status of storing and computing large amounts of data in the businesses. In these mass data, various key data, such as customer name, mobile phone number, identification card number, bank card number, etc., are typically distributed. How to quickly and accurately know where these critical data are distributed and how to avoid leakage of critical information becomes a very prominent problem, so it is important to implement a set of critical data automatic discovery schemes for big data.

Disclosure of Invention

The embodiment of the application aims to provide a method and a device for processing key fields, electronic equipment and a storage medium, which are used for improving the accuracy of key field identification.

In order to solve the technical problems, the embodiment of the application is realized as follows:

in one aspect, an embodiment of the present application provides a method for processing a key field, including:

Acquiring first data corresponding to a first field from a database;

Matching the first data with a preset key field expression to obtain a matching result corresponding to the first field; the key field expression is used for representing the data content characteristics which need to be met if one field is used as a key field;

and if the first field is determined to meet the preset matching condition according to the matching result, determining that the first field is a key field in the database.

In another aspect, an embodiment of the present application provides a device for processing a key field, including:

the acquisition module is used for acquiring first data corresponding to the first field from the database;

The matching module is used for matching the first data with a preset key field expression to obtain a matching result corresponding to the first field; the key field expression is used for representing the data content characteristics which need to be met if one field is used as a key field;

And the determining module is used for determining the first field as a key field in the database if the first field is determined to meet the preset matching condition according to the matching result.

In yet another aspect, an embodiment of the present application provides an electronic device, including a processor and a memory electrically connected to the processor, where the memory stores a computer program, and the processor is configured to call and execute the computer program from the memory to implement a method for processing the key field.

In yet another aspect, embodiments of the present application provide a computer readable storage medium storing a computer program executable by a processor to implement the above-described key field processing method.

By adopting the technical scheme of the embodiment of the application, the matching result is obtained by matching the first data corresponding to the first field with the preset key field expression, and further, if the first field is determined to meet the preset matching condition according to the matching result corresponding to the first field, the first field is determined to be the key field in the database. Because the key field expression is used for representing the data content characteristics that a field needs to satisfy if being used as the key field, the first data and the key field expression are matched, so that the key field can be identified from the dimension of the data content characteristics, the condition that the key field is omitted when only the field name is matched or only the database name is matched (for example, the field name is not the name of the key field) is avoided, and the accuracy of key field identification is improved while the automatic identification of the key field is realized.

Drawings

In order to more clearly illustrate one or more embodiments of the present application or the technical solutions in the prior art, the following description will briefly describe the drawings used in the embodiments or the description of the prior art, and it is apparent that the drawings in the following description are only some embodiments described in one or more embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a schematic flow chart of a method of processing key fields according to an embodiment of the application;

FIG. 2 is a schematic flow chart of a method of processing key fields according to another embodiment of the application;

FIG. 3 is a schematic block diagram of a key field processing apparatus in accordance with an embodiment of the present application;

Fig. 4 is a schematic block diagram of an electronic device in accordance with an embodiment of the application.

Detailed Description

The embodiment of the application provides a method and a device for processing key fields, electronic equipment and a storage medium, which are used for improving the accuracy of key field identification.

In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, shall fall within the scope of the application.

As described above, how to accurately find key fields in a database is one of very important researches in the internet field. Currently, a key field in a database is identified by scanning a database name, a table name or a field name, and specifically, if the scanned database name, table name or field name contains a preset key word, the scanned database name, table name or field name is determined to be the key field. For example, if the table name is scanned as "user bank card information", the fields in the scanned table may be determined to be key fields. For another example, if the scanned field name includes an "identification card number," the scanned field may be determined to be a critical field. However, it is difficult to identify all key fields in the database by the current scanning method, because: the database names, the table names and the field names are usually manually set, so that subjectivity is high, the database names, the table names and the field names with the same meaning are numerous, keywords contained in the database names, the table names and the field names are also different, for example, for a bank card number, some field names are "card numbers", some field names are "bank cards", and the like; it is difficult for the user to statistically complete the keywords contained in these field names, which is labor intensive and makes it difficult to accurately identify all of the key fields. Moreover, the manner of identifying the key fields by merely scanning the database name, the table name or the field name cannot ensure that all the key fields are identified, for example, the field names of some key fields do not contain keywords, but the corresponding data content is the key content (i.e., the data content corresponding to the key field), and the key fields are often missed, so that the accuracy of identifying the key words is low. In view of the above problems, if the data in the database is directly matched with the keywords, that is, all the data contents in the database are scanned to determine whether the data contents contain the keywords, the scanning task takes a lot of time, which is not reimbursed. Especially for some databases (like Hive tables) where the amount of stored data is huge, the recognition of key fields is very inefficient.

According to the key field processing method provided by the application, the first data corresponding to the first field is obtained from the database, the first data corresponding to the first field is matched with the preset key field expression, a matching result is obtained, and further, if the first field is determined to meet the preset matching condition according to the matching result corresponding to the first field, the first field is determined to be the key field in the database. Because the key field expression is used for representing the data content characteristics that a field needs to satisfy if being used as the key field, the first data and the key field expression are matched, so that the key field can be identified from the dimension of the data content characteristics, the condition that the key field is omitted when only the field name is matched or only the database name is matched (for example, the field name is not the name of the key field) is avoided, and the accuracy of key field identification is improved while the automatic identification of the key field is realized.

The method for processing the key field provided by the embodiment of the application can be executed by the electronic equipment or software installed in the electronic equipment, and specifically, the electronic equipment can be terminal equipment or service end equipment. The terminal device may include a smart phone, a notebook computer, an intelligent wearable device, an on-board terminal, and the like, and the server device may include an independent physical server, a server cluster composed of a plurality of servers, or a cloud server capable of performing cloud computing.

FIG. 1 is a schematic flow chart of a key field processing method according to an embodiment of the application, as shown in FIG. 1, the method includes:

s102, acquiring first data corresponding to the first field from a database.

One or more fields may be included in the database, each field corresponding to one or more data. The database may be any database having a data storage function, such as Hive tables. Alternatively, the fields in the database may be processed (or identified) in multiple passes, i.e., a portion of all the fields are processed at a time. The first field refers to a field to be processed at this time, and the first field may be one or more fields in the database.

The data corresponding to a field refers to the data stored under the field. For example, the data corresponding to the field "identification card number" is the identification card number stored under the field "identification card number". The data corresponding to each field has respective data content characteristics, and the data content characteristics can reflect the formation rule of the data, for example, the data corresponding to the field 'mobile phone number' has the following data content characteristics: the mobile phone number consists of 6 parts, wherein the first part is 86, 17951, 95 or 'space+000', the second part is '1', 2 numbers of 0-9 are spliced, the third part is space or '-', the fourth part is 4 numbers of 0-9, the fifth part is space or '-', and the sixth part is 4 numbers of 0-9.

When data corresponding to a field is obtained from the database, the data matching the field is queried in the database and output by any one of the following modes: outputting field matched data one by one, namely outputting each data as independent data no matter how many matched data are, wherein the number of the data corresponding to the field is multiple; and in the second mode, splicing the data matched with the field into one data for output, wherein the data corresponding to the field is obtained. For example, assume that a field is an identification card field, and 2 identification card numbers, an identification card number 1 and an identification card number 2 corresponding to the field are obtained from a database; in the first mode, the identity card number 1 and the identity card number are respectively output, and in the second mode, the identity card number 1 and the identity card number 2 are spliced to obtain one identity card data 3, and the identity card data 3 is output.

If the first data is output in the above manner, the first data may be any one of the N second data, but how to determine whether the first data is a key field is described by taking the first data as an example, and the other second data corresponding to the first field may be determined in the same manner; if the second mode is adopted for output, the first data comprises N second data, and the N second data are spliced.

S104, matching the first data with a preset key field expression to obtain a matching result corresponding to the first field; the key field expression is used to characterize the data content characteristics that a field would satisfy if it were a key field.

If the first data is any one of the N second data, the first data may be directly matched with the key field expression, and at this time, the matching result corresponding to the first field is the matching result between the first data and the key field expression. If the first data includes N second data and is formed by splicing the N second data, that is, the first data is output as a whole, the first data may be first split into N independent second data, and then each second data is matched with the key field expression, where the matching result corresponding to the first field includes the matching result between the N second data and the key field expression. How to split the first data into N independent second data will be described in detail in the following embodiments.

Optionally, the first data is composed of N second data corresponding to the first field (N is an integer greater than 1), and the matching result corresponding to the first field may include at least one of the following: the number of the second data matched with the key field expression in the first data and the duty ratio of the matched second data in the first data. If the first field includes a plurality of fields, the matching result corresponding to the first field may include a matching result corresponding to each field.

The matching result between the data (first data or second data) and the key field expression may include: matched or unmatched. And if the data accords with the data content characteristic characterized by the key field expression, indicating that the data is matched with the key field expression. And if the data does not accord with the data content characteristic characterized by the key field expression, indicating that the data is not matched with the key field expression.

And S106, if the first field is determined to meet the preset matching condition according to the matching result, determining that the first field is a key field in the database.

Optionally, the first data is composed of N second data corresponding to the first field (N is an integer greater than 1), and the first field meeting the preset matching condition may include at least one of the following: the number of the second data matched with the key field expression in the first data reaches a preset number threshold, and the duty ratio of the matched second data in the first data reaches a preset duty ratio threshold.

If the number of the first data is one, the first field satisfying the preset matching condition may include: the match result indicates that the first data matches the key field expression.

In one embodiment, the key field expression is a regular expression. The key field expressions which are needed to be met by the key fields are listed below, and the key field expressions which are needed to be met by the key fields are regular expressions corresponding to the key fields.

The regular expression corresponding to the key field 'name' is:

/^[\u4E00-\u9FA5]{2,10}((\\u00B7|\.)[\u4E00-\u9FA5]{2,10}){0,2}$/

In the regular expression corresponding to the key field "name", "/… … $/" represents from beginning to end, "[ \u4E00\u9FA5 ]" represents Chinese characters from beginning to end, i.e., all Chinese characters are matched. "{2,10}" means that the preceding Chinese character allows 2 to 10, i.e., 2 to 10 Chinese characters to be matched in accordance with "[ \u4E00- \u9F5 ]". "(\\u00B7|\.)". A second set of matching content is represented, the two adjacent sets of matching content are connected by "," connected by "{0,2}" means that the matching content appears twice at most, can be matched to data satisfying the following rules a1-a 2:

Rule a1, beginning with Chinese, length 2 to 10 Chinese; for example, the name "Zhang Sano".

Rule a2, beginning with chinese, length 2 to 10 chinese, continuing to follow with length 2 to 10 chinese after connection with "", repeating at most twice; for example, the name "Ab. Ab XX".

The regular expression corresponding to the key field 'identity card number' is:

/^[1-9]\d{5}(18|19|([23]\d))\d{2}((0[1-9])|(10|11|12))(([0-2][1-9])|10

|20|30|31)\d{3}[0-9Xx]$/

In the regular expression corresponding to the key field 'identity card number', the '… … $/' represents from the beginning to the end, and the regular expression divides the 'identity card number' into 8 parts in total. The first part "[1-9]" represents a number of 1 to 9; the second part "d {5}" represents 5 numbers from 0 to 9; the third part "(18|19| ([ 23] \d))" represents a number from 0 to 9 on the 18, 19, or 23 splice; the fourth part "d {2}" represents 2 numbers from 0 to 9; the fifth section "((0[1-9 ]) (10.sub.11.sub.12))" represents 10, 11, 12, or 0 splice 1 to 9 digits; the sixth section "([ 0-2] [1-9 ])|10|20|30|31)" represents a number from 1 to 9 at a number splice of 10, 20, 30, 31, or 0 to 2; the seventh part "d {3}" represents 3 numbers of 0 to 9; the eighth section "[0-9Xx ]" represents a number of 0 to 9 or X.

The regular expression corresponding to the key field "telephone number" is:

/(.*？)(^|\s*\+？00？0|86|17951|95|\D)(1\d{2})[\s-]{0,3}(\d{4})[\s-]{0,3}(\d{4})(？＝\D|$)/

In the regular expression for the key field "phone number", "/… …/" means from beginning to end, "a.*? ccc "may match" axxccc "," aoccc ", etc. The regular expression divides the "phone number" into 6 parts in total. The first part "(+|\s; the second part "(1\d {2 })" represents the number of 2 0-9 on the 1 splice; the third part "[ \s- ] {0,3}" represents a space or "-"; the fourth part "(\d {4 })" represents 4 numbers from 0 to 9; the fifth part "[ \s- ] {0,3}" represents a space or "-"; the sixth section "(\d {4 })" represents 4 numbers from 0 to 9.

The regular expression corresponding to the key field 'bank card number' is:

/^([1-9]{1})(\d{15}|\d{16}|\d{18})$\

In the regular expression corresponding to the key field "bank card number", "/… … $/" means from the beginning to the end, the regular expression divides the "bank card number" into 2 parts in total. The first part "([ 1-9] {1 })" represents one number of 1 to 9, and the second part "(\d {15} |\d {16} |\d {18 })" represents a number of 0 to 9 of 15 bits, 16 bits, or 18 bits.

Therefore, the data content characteristics of the data corresponding to the key field are represented by the key field expression, so that the first data is matched with the key field expression, and whether the field corresponding to the first data is the key field or not is accurately identified. That is, as long as the first data accords with the data content characteristics corresponding to the key fields, the first fields corresponding to the first data can be identified as the key fields, and compared with the mode of only matching the database names or the field names, the identification accuracy of the key fields is greatly improved.

Based on the representation mode of the key field expression, if the first data is composed of N pieces of second data corresponding to the first field (N is an integer greater than 1), when the first data is matched with the preset key field expression, the first data can be split into N pieces of character strings according to the data content characteristics corresponding to the first field, and one character string corresponds to one piece of second data. And then matching the character strings with the regular expressions aiming at each character string to obtain matching results corresponding to the character strings, and further determining the matching results corresponding to the first field according to the matching results corresponding to the N character strings. The matching result corresponding to the first field may include at least one of: the number of the second data matched with the key field expression in the first data and the duty ratio of the matched second data in the first data.

For example, the second data corresponding to the first field is an identification card number, and the data content corresponding to the first field is characterized in that: the data length is 18 bits (i.e. the length of the identification card number). Then, the first data may be split according to a rule that the data length of each second data is 18 bits, so as to obtain N strings, and each string is 18 bits in length.

Through splitting first data into a plurality of character strings, each character string can accurately correspond to one second data, a plurality of different second data are prevented from being spliced together to be matched, incomplete first data are prevented from being matched, whether the plurality of different second data are spliced together or the incomplete first data are matched, a matching result is wrong, the data content characteristics of the second data cannot be accurately matched, therefore, the first data are split into a plurality of character strings, one character string corresponds to one second data, the fact that the data matched with a key field expression each time are single complete data can be guaranteed, and the accuracy of a matching result is further guaranteed.

In one embodiment, after determining that the first field is a critical field in the database, the first field may be cached in a corresponding critical field cache of the database. Wherein the key field cache is used for caching key fields in the database.

When the first data corresponding to the first field is obtained from the database, whether the key field cache contains the first field or not can be firstly queried, and if the key field cache does not contain the first field, the first data corresponding to the first field is further obtained from the database.

In this embodiment, the first field determined as the key field is cached in the key field cache, so that the field identified as the key field is recorded, and therefore, when the key field is continuously identified later, the cached field does not need to be identified any more, for example, if the key field cache contains the first field, it is indicated that the first field is determined as the key field, and the first data corresponding to the first field does not need to be acquired any more, so that the field processing time is saved, and the identification efficiency of the key field is improved.

In one embodiment, when first data corresponding to a first field is acquired from a database, firstly acquiring the field name of the first field from a metadata base according to a preset data acquisition rule and the database name of the database; and then generating a data acquisition statement corresponding to the first field according to a preset data acquisition rule and the field name of the first field. Further, based on the data acquisition statement, first data corresponding to the first field is acquired from the database.

The metadata base is used for storing the database name of the database and the field name of the field in the database. If the number of the databases is a plurality of, the database names and the field names can be associated and stored in the metadata base, so that the field names which are associated and stored with the database names can be obtained according to the database names. The preset data acquisition rule comprises at least one of the following: the method comprises the steps of data acquisition period, data acquisition mode, field acquisition mode, acquired field number and acquired data number. The data acquisition mode may include a data acquisition sequence, a data acquisition number, data designated to be acquired, and the like. The data acquisition sequence may be sequentially acquired according to the storage position of the data in the database, or may be randomly acquired. The field acquisition manner may include a field acquisition order, a field to specify acquisition, and the like. The field acquisition order may be sequentially acquired according to the arrangement position of the fields in the database, or may be randomly acquired.

Several examples of optional data acquisition rules are as follows: performing full-library scanning for the first time, and performing accurate scanning for the non-first time; performing full library scanning every time; each time an accurate scan is performed. Wherein, the full-library scanning refers to that the object when data is acquired each time is all fields in the database, and the accurate scanning refers to that the object when data is acquired each time is a specified field in the database. The field name of the specified field can be set in the data acquisition rule, and the accurate scanning can be suitable for data acquisition in a specific service, for example, a specific service corresponds to a plurality of fields, wherein the data importance corresponding to the field A of the known specific service is lower, namely the field A cannot be a key field, the data importance corresponding to the field B is higher, and the data importance corresponding to the field B can be a key field, so that the field B can be specified in the data acquisition rule set for the specific service, thereby saving unnecessary data acquisition work and improving the field identification efficiency.

For example, the preset data acquisition rules are as follows: data corresponding to two fields are acquired each time, and 10 ten thousand pieces of data are acquired each time. The first field includes two fields, and when the first data corresponding to the first field is acquired for the first time, 10 ten thousand pieces of data can be acquired from the data corresponding to the first two fields in all the fields according to the arrangement positions of the fields in the database, and the data is taken as the first data. The number of data corresponding to the two fields can be respectively obtained, or can be set in a data obtaining rule, for example, 5 ten thousand data corresponding to the two fields are respectively obtained, or all data corresponding to the first field is firstly obtained, and then other data are obtained from the data corresponding to the second field according to the obtained data number, so long as the total amount of the obtained data meets 10 ten thousand.

When generating a data acquisition statement corresponding to the first field according to a preset data acquisition rule and a field name of the first field, the data acquisition statement may be generated in the following manner: select from limit N. The selection is extraction, N is the number of data acquired each time (single time), and the value of N can be determined according to a preset data acquisition rule.

For example, the table name of the database table is a (hereinafter referred to as table a), and there are 5 fields in table a, namely column1, column2, column3, column4 and column5, and the total data corresponding to the 5 fields is 1000 ten thousand. Assuming that in the data acquisition rule, the number of data acquired each time is 1000, and the field acquisition mode is as follows: data of all unidentified fields is acquired. Assuming that 10 ten thousand pieces of data are to be acquired in total, 100 times of acquisition of 1000 pieces of data each time are required from table a. From the data acquisition rules and field names, the following data acquisition statements may be generated: "select column1, column2, column3, column4, column5 from A limit 1000". The meaning of the data acquisition statement is that the first 1000 pieces of data of 5 fields of column1, column2, column3, column4 and column5 are acquired from table a, i.e., 1000 pieces are acquired at a time.

If field column1 is first determined to be the critical field, column1 is cached in the critical field cache corresponding to Table A. When the data of the table A is acquired next time, the data of the field column1 is not required to be processed, and the generated data acquisition statement is as follows: "select column2, column3, column4, column5 from A limit 1000, 2000". The meaning of the data acquisition statement is that 1001 st to 2000 th data of 4 fields of column2, column3, column4 and column5 are acquired from table a, namely 1000 pieces are acquired at a time. And so on until 10 ten thousand pieces of data are obtained from table a.

In this embodiment, before each data acquisition, the current data acquisition statement is generated, so that the data is acquired according to the data acquisition statement, and accuracy and order of data acquisition are ensured. And by dynamically adjusting the data acquisition statement and deleting the field name which is determined to be the key field from the data acquisition statement, the field which is determined to be the key field can be prevented from being repeatedly processed, and the recognition efficiency of the key field is improved.

In one embodiment, the number of first fields is a plurality. When generating a data acquisition statement corresponding to the first field according to the data acquisition rule and the field name of the first field, the plurality of first fields can be split into a plurality of field groups, and each field group comprises at least one first field. And further, generating a data acquisition statement corresponding to the field group according to a preset data acquisition rule and the field name of the first field in the field group for each field group.

If the data acquisition rule is that the set field acquisition mode is: data of all unidentified fields is acquired. Assuming that the target database includes 100 fields and the number of fields is large, the 100 fields can be split into a plurality of field groups, and the number of fields included in each field group can be preset, for example, set to 5 fields, and then the 100 fields can be split into 20 field groups, and each field group includes 5 first fields. For 20 field groups, data acquisition sentences corresponding to each field group are generated, and 20 data acquisition sentences are generated in total. Two of the data acquisition statements are listed as follows:

field set 1: "select column1, column2, column3, column4, column5 from Alimit 1000";

Field set 2: "select column6, column7, column8, column9, column10 from A limit 1000".

It can be seen that the first fields column1, column2, column3, column4, column5 are divided into one field group. The first fields column6, column7, column8, column9, column10 are divided into one field group.

After generating the data acquisition statement corresponding to each field group, sequentially executing, for each field group of the plurality of field groups: and acquiring the first data corresponding to the first field included in the field group from the database based on the data acquisition statement corresponding to the field group.

Along the above example, first data corresponding to the first field included in the field group 1 is obtained from the database, and then first data corresponding to the first field included in the field group 2 is obtained from the database.

In this embodiment, the plurality of first fields are split into the plurality of field groups, so that according to a preset data acquisition rule and field names of the first fields in each field group, data acquisition sentences corresponding to each field group are respectively generated, and data of the corresponding field groups are respectively acquired based on the data acquisition sentences corresponding to each field group, so that under the condition that the number of fields is large, data can be sequentially acquired in batches, the situation that tasks are easily interrupted when the number of data acquired in a single time is large is avoided, and the accuracy of data acquisition is ensured.

In one embodiment, first data corresponding to the first field is obtained from a database according to a pre-created data acquisition task. The data acquisition task includes at least one of the following information: the field name of the first field, the number of fields, the field location, the number of first data. The number of fields refers to the number of fields that need to be processed when the data acquisition task is executed. The field location is used to characterize the location information of the field in the database. The first data quantity refers to the data quantity required to be acquired when the data acquisition task is executed.

In one embodiment, when the first data corresponding to the first field is obtained from the database, whether an idle thread exists or not may be determined first, and if the idle thread exists, the data obtaining task is allocated to the idle thread, and the idle thread is triggered to execute the data obtaining task. If the idle thread does not exist, the data acquisition task is added into a task queue to be processed, or the data acquisition task is intercepted until the idle thread exists.

The state of all threads can be periodically detected, and whether idle threads exist or not is judged according to the state of the threads. If no idle thread exists, the data acquisition task can be added into the task queue to be processed, and after the idle thread is detected, the data acquisition task is pulled from the task queue to be processed and distributed to the idle thread for processing. Or the data acquisition task can be intercepted, and after the idle thread is detected, the intercepted data acquisition task is distributed to the idle thread for processing.

In this embodiment, the number of threads in the thread pool, the maximum number of threads working simultaneously, and the length of the task queue to be processed may be flexibly set according to actual requirements. By flexibly setting the information, the data acquisition tasks can be orderly executed, and the data acquisition tasks are still stopped being allocated after the maximum thread number is reached, so that the situation that the equipment is loaded and the task processing is interrupted due to excessive data acquisition tasks is avoided. Therefore, the number of data acquisition tasks to be executed simultaneously is controlled by controlling the number of threads, so that smooth execution of the data acquisition tasks can be ensured.

Fig. 2 is a schematic flow chart of a key field processing method according to another embodiment of the present application, and as shown in fig. 2, the method is applied to an electronic device, and includes the following steps:

s201, acquiring task identification information of a data acquisition task in response to arrival of task execution time.

The task information of the data acquisition task can be created in advance, the task information comprises task identification information and task attribute information, the task identification information has a unique identification function, and the task attribute information can comprise at least one of the following information: the field name of the first field, the number of fields, the field location, the number of first data. The first field refers to a field that needs to be processed when the current data acquisition task is executed, and the first data refers to data that needs to be acquired when the current data acquisition task is executed. The number of fields refers to the number of fields that need to be processed to perform the current data acquisition task. The field location is used to characterize the location information of the field in the database. The first data amount refers to the amount of data that needs to be acquired to perform the current data acquisition task.

After creating the task information of the data acquisition task, the task information of the data acquisition task may be stored in advance in a security system of the electronic device. Alternatively, a plurality of data acquisition tasks may be created in advance, and each data acquisition task and task information association stored in the electronic device.

S202, checking the data acquisition task.

In this embodiment, the task information of the data acquisition task may be checked, for example, whether the task information is complete, whether the task information conforms to a preset information description rule, and the like.

S203, after the test is passed, judging whether an idle thread exists currently. If yes, executing S204; if not, S209 is performed.

S204, distributing the data acquisition task to the idle thread, and triggering the idle thread to execute the data acquisition task.

When the data acquisition task is distributed to the idle thread, the task identification information is sent to the idle thread. The idle thread begins executing the data acquisition task in accordance with the subsequent steps.

S205, acquiring task attribute information of the data acquisition task according to task identification information of the data acquisition task.

S206, generating a data acquisition statement corresponding to the data acquisition task according to the task attribute information, and acquiring first data corresponding to the first field from the database according to the data acquisition statement.

Wherein the task attribute information may include at least one of the following: the field name, the number of fields, the field location, the number of data of the first field. When the data acquisition task is created in advance, task attribute information can be determined according to a preset data acquisition rule. The data acquisition rules may include at least one of: the method comprises the steps of data acquisition period, data acquisition mode, field acquisition mode, acquired field number and acquired data number. It can be seen that, according to the data acquisition rule, the field name, the field number, the field position of the field to be processed (i.e., the first field) corresponding to each data acquisition task, and the information such as the data number and the data position of the data to be acquired (i.e., the first data) can be determined.

Optionally, if the current data acquisition task is not the first data acquisition task corresponding to the database, before generating the data acquisition statement, it may be determined whether the first field is stored in the key field cache corresponding to the database, and if not, the data acquisition statement may be further generated. If so, indicating that the first field has been identified as a key field, the field name of the first field may be filtered out when generating the data acquisition statement.

S207, matching the acquired first data with a preset key field expression to obtain a matching result corresponding to the first field.

Wherein, the key field expression may be a regular expression, how to characterize the data content features of the data corresponding to the key field by using the regular expression is described in detail in the above embodiments, and is not repeated here.

S208, determining the first field meeting the preset matching condition as a key field according to the matching result.

If the first data is composed of N second data corresponding to the first field (N is an integer greater than 1), the matching result corresponding to the first field may include at least one of the following: the number of the second data matched with the key field expression in the first data and the duty ratio of the matched second data in the first data. The first field meeting a preset matching condition includes at least one of: the number of the matched second data reaches a preset number threshold, and the duty ratio of the matched second data in the first data reaches a preset duty ratio threshold.

If the number of the first data is one, the matching result corresponding to the first field is the matching result between the first data and the key field expression, and the first field meeting the preset matching condition comprises: the first data matches the key field expression.

S209, adding the data acquisition task into a task queue to be processed, or intercepting the data acquisition task until an idle thread exists.

By adopting the technical scheme of the embodiment, the matching result is obtained by matching the first data corresponding to the first field with the preset key field expression, and further, if the first field is determined to meet the preset matching condition according to the matching result corresponding to the first field, the first field is determined to be the key field in the database. Because the key field expression is used for representing the data content characteristics that a field needs to satisfy if being used as the key field, the key field can be identified from the dimension of the data content characteristics by matching the first data with the key field expression, and the condition that the key field is omitted when only the field name is matched or only the database name is matched (for example, the field name is not the name of the key field) is avoided, so that the accuracy of key field identification is improved while the automatic identification of the key field is realized. In addition, the first field determined as the key field is cached in the key field cache, so that the field which is already identified as the key field is recorded, and therefore, when the key field is continuously identified later, the cached field does not need to be identified any more, for example, if the key field cache contains the first field, it is indicated that the first field is already determined as the key field, and the first data corresponding to the first field does not need to be acquired any more, so that the field processing time is saved, and the identification efficiency of the key field is improved.

In summary, particular embodiments of the present subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may be advantageous.

The above method for processing the key field provided by the embodiment of the present application is based on the same thought, and the embodiment of the present application further provides a device for processing the key field.

FIG. 3 is a schematic block diagram of an apparatus for processing key fields according to an embodiment of the present application, as shown in FIG. 3, the apparatus comprising:

an obtaining module 31, configured to obtain first data corresponding to the first field from the database;

The matching module 32 is configured to match the first data with a preset key field expression, so as to obtain a matching result corresponding to the first field; the key field expression is used for representing the data content characteristics which need to be met if one field is used as a key field;

and the determining module 33 is configured to determine that the first field is a key field in the database if the first field is determined to satisfy a preset matching condition according to the matching result.

In one embodiment, the first data is composed of N second data corresponding to the first field, where N is an integer greater than 1; the matching result comprises at least one of the following: the number of the second data matched with the key field expression in the first data and the ratio of the matched second data in the first data;

The first field meeting a preset matching condition includes at least one of the following: the number of the matched second data reaches a preset number threshold, and the duty ratio of the matched second data in the first data reaches a preset duty ratio threshold.

In one embodiment, the apparatus further comprises:

The caching module is used for caching the first field into a key field cache library corresponding to the database after the first field is determined to be the key field in the database;

The obtaining module 31 performs the following steps when obtaining the first data corresponding to the first field from the database:

querying whether the key field cache library contains the first field;

and if the key field cache library does not contain the first field, acquiring the first data corresponding to the first field from the database.

In one embodiment, the obtaining module 31 performs the following steps when obtaining the first data corresponding to the first field from the database:

Acquiring a field name of the first field from a metadata base according to a preset data acquisition rule and a database name of the database; the metadata base is used for storing the database name and the field name of the field in the database;

Generating a data acquisition statement corresponding to the first field according to the data acquisition rule and the field name of the first field; the data acquisition rule includes at least one of: the method comprises the steps of data acquisition period, data acquisition mode, field acquisition mode, acquired field quantity and acquired data quantity;

And acquiring the first data corresponding to the first field from the database based on the data acquisition statement.

Acquiring the first data from the database according to a pre-established data acquisition task; the data acquisition task includes at least one of the following information: the field name, the number of fields, the field location, the number of the first data of the first field.

In one embodiment, the acquiring module 31 performs the following steps when acquiring the first data from the database according to a pre-created data acquisition task:

judging whether an idle thread exists or not;

If yes, the data acquisition task is distributed to the idle thread, and the idle thread is triggered to execute the data acquisition task;

If not, adding the data acquisition task into a task queue to be processed, or intercepting the data acquisition task until an idle thread exists.

In one embodiment, the key field expression is a regular expression;

The matching module 32 performs the following steps when matching the first data with a preset key field expression to obtain a matching result corresponding to the first field:

Splitting the first data according to the data content characteristics corresponding to the first field to obtain N character strings; one character string corresponds to one second data;

matching the character strings with the regular expression aiming at each character string to obtain a matching result corresponding to the character strings;

and determining the matching result corresponding to the first field according to the matching results corresponding to the N character strings.

By adopting the device of the embodiment of the application, the matching result is obtained by matching the first data corresponding to the first field with the preset key field expression, and further, if the first field is determined to meet the preset matching condition according to the matching result corresponding to the first field, the first field is determined to be the key field in the database. Because the key field expression is used for representing the data content characteristics that a field needs to satisfy if being used as the key field, the first data and the key field expression are matched, so that the key field can be identified from the dimension of the data content characteristics, the condition that the key field is omitted when only the field name is matched or only the database name is matched (for example, the field name is not the name of the key field) is avoided, and the accuracy of key field identification is improved while the automatic identification of the key field is realized.

It should be understood by those skilled in the art that the processing apparatus for the key fields in fig. 3 can be used to implement the foregoing method for processing the key fields, and the detailed description thereof should be similar to that of the foregoing method section, so as to avoid complexity and avoid redundancy.

Based on the same thought, the embodiment of the application also provides electronic equipment, as shown in fig. 4. The electronic device may vary considerably in configuration or performance and may include one or more processors 401 and memory 402, where the memory 402 may store one or more stored applications or data. Wherein the memory 402 may be transient storage or persistent storage. The application programs stored in memory 402 may include one or more modules (not shown), each of which may include a series of computer-executable instructions for use in an electronic device. Still further, the processor 401 may be arranged to communicate with the memory 402 and execute a series of computer executable instructions in the memory 402 on an electronic device. The electronic device may also include one or more power supplies 403, one or more wired or wireless network interfaces 404, one or more input/output interfaces 405, and one or more keyboards 406.

In particular, in this embodiment, an electronic device includes a memory, and one or more programs, where the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the electronic device, and the one or more programs configured to be executed by one or more processors include instructions for:

Acquiring first data corresponding to a first field from a database;

The embodiments of the present application also provide a computer-readable storage medium storing one or more computer programs, the one or more computer programs including instructions, which when executed by an electronic device including a plurality of application programs, enable the electronic device to perform the processes of the above-mentioned key field processing method embodiments, and specifically configured to perform:

Acquiring first data corresponding to a first field from a database;

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A method for processing a key field, comprising:

acquiring first data matched with a first field from a database;

2. The method of claim 1, wherein the first data is composed of N second data corresponding to the first field, N being an integer greater than 1; the matching result comprises at least one of the following: the number of the second data matched with the key field expression in the first data and the ratio of the matched second data in the first data;

3. The method of claim 1, wherein after said determining that said first field is a critical field in said database, said method further comprises:

Caching the first field into a key field cache corresponding to the database;

The obtaining the first data corresponding to the first field from the database includes:

querying whether the key field cache library contains the first field;

4. The method of claim 1, wherein the obtaining the first data corresponding to the first field from the database comprises:

5. The method of claim 1, wherein the obtaining the first data corresponding to the first field from the database comprises:

6. The method of claim 5, wherein the acquiring the first data from the database according to a pre-created data acquisition task comprises:

judging whether an idle thread exists or not;

7. The method of claim 2, wherein the key field expression is a regular expression;

the step of matching the first data with a preset key field expression to obtain a matching result corresponding to the first field comprises the following steps:

Matching the character strings with the regular expression aiming at each field string to obtain a matching result corresponding to the character strings;

8. A key field processing apparatus, comprising:

9. An electronic device comprising a processor and a memory electrically connected to the processor, the memory storing a computer program, the processor being configured to invoke and execute the computer program from the memory to implement the method of processing the key fields of any of claims 1-7.

10. A computer readable storage medium storing a computer program executable by a processor to implement the method of processing key fields according to any of claims 1-7.