CN112231417A

CN112231417A - Data classification method and device, electronic equipment and storage medium

Info

Publication number: CN112231417A
Application number: CN202011099802.4A
Authority: CN
Inventors: 陈昱彬; 李婕
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2021-01-15

Abstract

The invention relates to big data technology, and discloses a data classification method, which comprises the following steps: the method comprises the steps of obtaining an original data dictionary set and a preset service subject set, extracting a data dictionary in the original data dictionary set to the service subject set to obtain an original service entity table, carrying out missing value detection and duplicate removal operation on the original service entity table to obtain a standard service entity table, obtaining a standard data table, generating a mapping relation table according to the standard data table, generating a query statement according to the standard service entity table and the mapping relation table, generating a data extraction script according to the query statement, extracting data by using the data extraction script, and classifying to obtain a classification result. In addition, the invention also relates to a block chain technology, and the classification result can be stored in a node of the block chain. The invention also provides a data classification device, electronic equipment and a computer readable storage medium. The invention can solve the problem that technical personnel need to know specific services to classify data.

Description

Data classification method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of big data technologies, and in particular, to a data classification method and apparatus, an electronic device, and a computer-readable storage medium.

Background

With the improvement of internet big data platforms and technologies, the application requirements of various professional industry fields on business data analysis and prediction of the big data field are increased day by day. Technicians need to clean, integrate, process and standardize multi-channel and multi-source data so as to provide accurate business analysis and business prediction for managers.

For the above scenario, the prior art has the following drawbacks: 1. in the market, the data processing is mainly based on the processing and application of general source data, and technical personnel need to know the specific service of the data to process the data. 2. Specific automatic data processing methods are lacked in some fields with higher requirements. For example, in the service data in the judicial field, data modeling, classification and layering and industry standardization are required to be performed on the service data, and a processing method for the complex service data is lacked in the prior art.

Disclosure of Invention

The invention provides a data classification method, a data classification device and a computer readable storage medium, and mainly aims to solve the problem that a technician needs to know specific services to classify data.

In order to achieve the above object, the present invention provides a data classification method, including:

acquiring an original data dictionary set and a preset service theme set, and extracting a data dictionary in the original data dictionary set to the service theme set to obtain an original service entity table under each service theme;

carrying out missing value detection and duplicate removal operation on the original business entity table to obtain a standard business entity table;

acquiring a standard data table, and generating a mapping relation table according to the standard business entity table and the standard data table;

generating a query statement according to the standard business entity table and the mapping relation table;

and generating a data extraction script according to the query statement, extracting data by using the data extraction script and classifying to obtain a classification result.

Optionally, the obtaining an original data dictionary set and a preset service theme set, and extracting a data dictionary in the original data dictionary set to the service theme set to obtain an original service entity table under each service theme includes:

extracting key words in the service theme set by using a preset language processing algorithm;

matching a corresponding data dictionary in the original data dictionary set according to the keywords, and extracting metadata in the data dictionary to the service theme set;

and summarizing metadata in all data dictionaries under all the business topics in the business topic set to obtain the original business entity table.

Optionally, the extracting the keywords in the service theme set by using a preset language processing algorithm includes:

performing word segmentation processing on the text in the service theme set, and removing stop words to obtain word segmentation results;

and selecting one or more keywords from the word segmentation result.

Optionally, the performing missing value detection and duplicate removal operations on the original business entity table to obtain a standard business entity table includes:

carrying out missing value detection and filling on the data in the original business entity table to obtain a filled original business entity table;

and carrying out duplication removal operation on the data filled in the original business entity table, and obtaining the standard business entity table according to a preset business rule.

Optionally, the generating a mapping relationship table according to the standard business entity table and the standard data table includes:

finding data which is the same as the standard field name in the standard data table from the standard business entity table;

and configuring the mapping relation between the data and the standard code value corresponding to the standard field, and generating a mapping relation table.

Optionally, the generating a query statement according to the standard business entity table and the mapping relationship table includes:

generating a table building statement of the standard business entity table by using a preset statement building function;

acquiring the mapping ID of the standard business entity table, and searching all mapping scripts under the same mapping ID in the mapping relation table;

and summarizing the table building statement and the mapping script to obtain the query statement.

Optionally, the generating a data extraction script according to the query statement, extracting data by using the data extraction script, and classifying to obtain a classification result includes:

acquiring an operation script template of a preset platform, and generating the data extraction script by using the operation script template and the query statement;

and running the data extraction script in a preset time, extracting data from a database according to the data extraction script, and classifying to obtain the classification result.

In order to solve the above problem, the present invention also provides a data sorting apparatus, comprising:

the data dictionary extraction module is used for acquiring an original data dictionary set and a preset service theme set, extracting a data dictionary in the original data dictionary set to the service theme set and obtaining an original service entity table under each service theme;

the entity table processing module is used for carrying out missing value detection and duplicate removal operation on the original business entity table to obtain a standard business entity table;

the relation mapping module is used for acquiring a standard data table and generating a mapping relation table according to the standard business entity table and the standard data table;

the statement generating module is used for generating query statements according to the standard business entity table and the mapping relation table;

and the data classification module is used for generating a data extraction script according to the query statement, extracting data by using the data extraction script and classifying to obtain a classification result.

In order to solve the above problem, the present invention also provides an electronic device, including:

a memory storing at least one instruction; and

and the processor executes the instructions stored in the memory to realize the data classification method.

In order to solve the above problem, the present invention further provides a computer-readable storage medium, which stores at least one instruction, and the at least one instruction is executed by a processor in an electronic device to implement the data classification method described above.

According to the embodiment of the invention, the original business entity table under each business topic can be accurately determined through the original data dictionary set and the preset business topic set, missing value detection and duplicate removal operation are carried out on the original business entity table to obtain the standard business entity table, the accuracy of data in the standard business entity table can be improved, meanwhile, the mapping relation table is generated according to the standard business entity table and the standard data table, the query statement is generated according to the standard business entity table and the mapping relation table, the data extraction script is generated according to the query statement, and data standardization and data classification can be directly carried out. Therefore, the data classification method, the data classification device, the electronic equipment and the computer readable storage medium provided by the invention can solve the problem that technical personnel need to know specific services to classify data.

Drawings

Fig. 1 is a schematic flow chart of a data classification method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart showing a detailed implementation of one of the steps in FIG. 1;

FIG. 3 is a schematic flow chart showing another step of FIG. 1;

FIG. 4 is a schematic flow chart showing another step of FIG. 1;

FIG. 5 is a mapping representation intent;

FIG. 6 is a schematic flow chart showing another step of FIG. 1;

FIG. 7 is a schematic diagram of a mapping script;

FIG. 8 is a schematic flow chart showing another step of FIG. 1;

FIG. 9 is a functional block diagram of a data sorting apparatus according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an electronic device implementing the data classification method according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The embodiment of the application provides a data classification method. The execution subject of the data classification method includes, but is not limited to, at least one of electronic devices such as a server and a terminal, which can be configured to execute the method provided by the embodiments of the present application. In other words, the data classification method may be performed by software or hardware installed in the terminal device or the server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.

Fig. 1 is a schematic flow chart of a data classification method according to an embodiment of the present invention. In this embodiment, the data classification method includes:

s1, acquiring an original data dictionary set and a preset service theme set, extracting a data dictionary in the original data dictionary set to the service theme set, and acquiring an original service entity table under each service theme.

In an embodiment of the present invention, the data dictionary includes metadata that generally describes content of data, and the raw data dictionary includes a plurality of the data dictionaries. For example, in the big data platform, the "case table" has fields such as case ID, case number, contractor, etc., wherein the values of the three fields are 23423546666, (2020) yue 0308 min 453 and zhang san, respectively, wherein the "case table" is a data dictionary, and the case ID, case number, contractor are metadata in the data dictionary "case table". The preset service theme set may be a service theme in multiple fields, for example, a judicial service theme set in a judicial field may be divided into: case information, judge information, party information, document information, evidence information, and the like.

Preferably, referring to fig. 2, the S1 includes:

s10, extracting keywords in the service theme set by using a preset language processing algorithm;

s11, matching a corresponding data dictionary in the original data dictionary set according to the keywords, and extracting metadata in the data dictionary to the service theme set;

s12, summarizing metadata in all data dictionaries under all the business topics in the business topic set to obtain the original business entity table.

In detail, the extracting the keywords in the service theme set by using a preset language processing algorithm includes:

and selecting one or more keywords from the word segmentation result.

The preset language processing algorithm in the embodiment of the invention can be a TextRank which is disclosed at present, a keyword extraction algorithm based on semantics and the like. For example, under judicial business, extracting a keyword ' case information ' in a judicial business topic set, matching a data dictionary ' case table ' in an original data dictionary set according to the keyword ' case ', extracting fields such as metadata ' case ID, case number, contractor ' and the like in the data dictionary to the situation information ' business topic, and summarizing to obtain the original business entity table under the ' case information '.

According to the embodiment of the invention, the preset language processing algorithm is utilized to quickly identify the data in the original data dictionary, so that the omission of some key information in the original data dictionary is avoided.

And S2, carrying out missing value detection and duplicate removal operation on the original business entity table to obtain a standard business entity table.

Preferably, referring to fig. 3, the S2 includes:

s20, carrying out missing value detection and filling on the data in the original business entity table to obtain a filled original business entity table;

and S21, carrying out duplication elimination operation on the data filled in the original business entity table, and obtaining the standard business entity table according to a preset business rule.

In the embodiment of the invention, whether the data in the original business entity table has a missing value or not can be detected through a mismap function missing function, if the data does not have the missing value, the data is not processed, and if the data has the missing value, the missing value is filled by using a preset filling algorithm to obtain the filled original business entity table.

In detail, the preset filling algorithm may be:

wherein L (θ) represents a filled data missing value, x_iRepresenting the ith data missing value, theta representing the probability parameter corresponding to the filled data missing value, n representing the data quantity in the original business entity table, p (x)_i| θ) represents the probability of the data missing value of the padding.

Further, in the embodiment of the present invention, the data in the original business entity table is filled with a distance formula, where the distance formula includes:

wherein d represents the distance value of any two data in the filling original business entity table, w_1jAnd w_2jRepresenting any two data in the populated original business entity table. And deleting any one of the data when the distance value is smaller than a preset distance value, and simultaneously keeping the two data if the distance value is not smaller than the preset distance value. Preferably, the preset distance value may be 0.1.

Further, in the embodiment of the present invention, the preset business rule refers to a rule for accepting or rejecting the original business entity table in different business scenarios, for example, in a judicial business scenario, if "certificate data" appears repeatedly in a "case table" or a "document table", the "document table" in the original business entity table is removed.

According to the embodiment of the invention, the missing value detection and the duplicate removal operation are carried out on the data in the original business entity table, and the data in the original business entity table is adjusted according to the preset business rule, so that the accuracy of the data is improved.

And S3, acquiring a standard data table, and generating a mapping relation table according to the standard business entity table and the standard data table.

Preferably, the standard data table may be a national standard data table, and the national standard data table specifies each standard field and a corresponding standard code value of the standard field. For example, in the national standard data table, the value 1 of the gender field indicates male, and the value 2 indicates female.

In detail, referring to fig. 4, the generating a mapping relationship table according to the standard business entity table and the standard data table includes:

s30, finding the data which is the same as the standard field name in the standard data table from the standard business entity table;

s31, configuring the mapping relation between the data and the standard code value corresponding to the standard field, and generating a mapping relation table.

Preferably, for example, the standard business entity table "party information" has a gender field, which may merge gender data from a system a (value 01 represents male, value 02 represents female) and B system (value 00 represents male, and value 01 represents female), and data in the standard business entity table can be unified through the mapping table, so as to improve data utilization efficiency. Illustratively, referring to the mapping table shown in fig. 5, the source fields "XB" and "SEX" are both mapped to the standard field "xingbie", and the source code value "value 00 represents male, value 01 represents female" and "value 01 represents male, value 02 represents female" are both mapped to "value 1 represents male, value 2 represents female", and so on.

And S4, generating a query statement according to the standard business entity table and the mapping relation table.

Preferably, the query statement generated by the embodiment of the present invention may be Structured Query Language (SQL) that is currently disclosed, where the SQL is the most widely used language in data processing, and allows a user to concisely and briefly declare required business logic, and the SQL belongs to a set-up language, and only needs to clearly express a requirement without knowing a specific implementation; SQL can be optimized, various query optimizers are built in, and the various query optimizers can translate an optimal execution plan for SQL.

Preferably, referring to fig. 6, the S4 includes:

s40, generating a table building statement of the standard business entity table by using a preset statement building function;

s41, obtaining the mapping ID of the standard business entity table, and searching all mapping scripts in the mapping relation table under the same mapping ID;

and S42, summarizing the table building statement and the mapping script to obtain the query statement.

In the implementation of the present invention, the preset statement creation function may be, for example, credit TABLE IF NOT EXISTS RY _ ZP _ HTXX (id string comment 'xx'). According to the embodiment of the invention, the statement for creating the TABLE by using the statement creating function can be a CREAT TABLE IF NOT EXISTS RY _ ZP _ HTXX (id string comment 'id'), an id string comment 'person id', a scbs string comment 'delete identifier', …. Where "ryid" denotes "person id" and "scbs" denotes "delete id", ….

In the embodiment of the invention, each standard business entity table has a unique mapping ID. For example, as shown in the mapping script of fig. 7, all mapping scripts under the mapping ID "MP 0001" are searched to obtain a complete mapping script: "select case where a. xb ═ 00 'the' 1 ', where a. xb ═ 01' the '2' else null end as a. xb from dsrx xx a join CD _ yinggsxx _ YSLDMXX b a. xb ═ BDMZ".

According to the embodiment of the invention, the mapping script is generated through the unique mapping ID and the mapping relation table, so that the mapping is more accurate, the modification difficulty of the mapping script is reduced, and the maintainability is improved.

And S5, generating a data extraction script according to the query statement, extracting data by using the data extraction script and classifying to obtain a classification result.

Preferably, referring to fig. 8, the S5 includes:

s50, acquiring an operation script template of a preset platform, and generating the data extraction script by using the operation script template and the query statement;

and S51, operating the data extraction script in a preset time, extracting data from a database according to the data extraction script, and classifying to obtain the classification result.

Preferably, the preset platform may be a pre-constructed big data management platform, the data extraction script may be a shell script, and the big data management platform has a schedule scheduling task management module which provides a script template for running at regular time. The query statement is input into the big data management platform, a scheduling task is newly established, a data extraction script can be generated according to the query statement at regular time, data is extracted into the standard business entity table, and meanwhile the mapping script in the query statement is used for conducting standardized processing on the data in the standard business entity table to obtain a final classification result. For example, the set script extracts data every morning, and the classification result can be directly obtained.

The embodiment of the invention utilizes the big data management platform to automatically generate the data extraction script, simultaneously reduces the operation threshold, and can also operate without knowing specific services by technical personnel.

According to the embodiment of the invention, the original business entity table under each business topic can be accurately determined through the original data dictionary set and the preset business topic set, missing value detection and duplicate removal operation are carried out on the original business entity table to obtain the standard business entity table, the accuracy of data in the standard business entity table can be improved, meanwhile, the mapping relation table is generated according to the standard business entity table and the standard data table, the query statement is generated according to the standard business entity table and the mapping relation table, the data extraction script is generated according to the query statement, and data standardization and data classification can be directly carried out. Therefore, the embodiment of the invention can solve the problem that the data classification can be carried out only by a technician needing to know specific services.

Fig. 9 is a functional block diagram of a data sorting apparatus according to an embodiment of the present invention.

The data sorting apparatus 100 of the present invention may be installed in an electronic device. According to the realized functions, the data classification device 100 may include a data dictionary extraction module 101, an entity table processing module 102, a relationship mapping module 103, a sentence generation module 104, and a data classification module 105. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

the data dictionary extraction module 101 is configured to obtain an original data dictionary set and a preset service theme set, extract a data dictionary in the original data dictionary set to the service theme set, and obtain an original service entity table under each service theme.

Preferably, the data dictionary extraction module 101 obtains the original business entity table by:

In detail, the data dictionary extraction module 101 obtains the keywords in the business topic set by the following operations:

and selecting one or more keywords from the word segmentation result.

The entity table processing module 102 is configured to perform missing value detection and duplicate removal operations on the original business entity table to obtain a standard business entity table.

Preferably, the entity table processing module 102 obtains the standard business entity table by:

In detail, the preset filling algorithm may be:

The relation mapping module 103 is configured to obtain a standard data table, and generate a mapping relation table according to the standard business entity table and the standard data table.

In detail, the relationship mapping module 103 generates the mapping relationship table by:

Preferably, for example, the standard business entity table "party information" has a gender field, which may merge gender data from a system a (value 01 represents male, value 02 represents female) and B system (value 00 represents male, and value 01 represents female), and data in the standard business entity table can be unified through the mapping table, so as to improve data utilization efficiency.

The statement generating module 104 is configured to generate a query statement according to the standard business entity table and the mapping relationship table.

Preferably, the statement generation module 104 generates the query statement by:

In the implementation of the present invention, for example, the preset sentence creating function may be a credit IF NOT EXISTS RY _ ZP _ HTXX (id string comment 'xx'), and the TABLE creating sentence may be a credit IF NOT EXISTS RY _ ZP _ HTXX (id string comment 'id'), an rfid string comment 'person id', a scbs string comment 'delete identifier', …. Where "ryid" denotes "person id" and "scbs" denotes "delete id", ….

Further, in this embodiment of the present invention, each of the standard business entity tables has a unique mapping ID.

The data classification module 105 is configured to generate a data extraction script according to the query statement, extract data by using the data extraction script, and classify the data to obtain a classification result.

Preferably, the data classification module 105 obtains the classification result by:

Fig. 10 is a schematic structural diagram of an electronic device implementing a data classification method according to an embodiment of the present invention.

The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a data sorting program 12, stored in the memory 11 and operable on the processor 10.

The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of the data sorting program 12, but also to temporarily store data that has been output or is to be output.

The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., data classification programs, etc.) stored in the memory 11 and calling data stored in the memory 11.

The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.

Fig. 10 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 10 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.

Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The data classification program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:

Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiments corresponding to fig. 1 to fig. 8, which is not repeated herein.

Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a non-volatile computer-readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method of data classification, the method comprising:

2. The data classification method according to claim 1, wherein the obtaining an original data dictionary set and a preset service topic set, and extracting a data dictionary from the original data dictionary set to the service topic set to obtain an original service entity table under each service topic comprises:

3. The data classification method according to claim 2, wherein the extracting the keywords in the business topic sets by using a preset language processing algorithm comprises:

and selecting one or more keywords from the word segmentation result.

4. The data classification method of claim 1, wherein the performing missing value detection and deduplication operations on the original business entity table to obtain a standard business entity table comprises:

5. The data classification method according to claim 1, wherein the generating a mapping relation table according to the standard business entity table and the standard data table comprises:

6. The data classification method according to claim 1, wherein the generating a query statement from the standard business entity table and the mapping relationship table comprises:

7. The data classification method according to claim 1, wherein the generating a data extraction script according to the query statement, and extracting and classifying data by using the data extraction script to obtain a classification result comprises:

8. An apparatus for classifying data, the apparatus comprising:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a data classification method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a data classification method according to any one of claims 1 to 7.