CN109597892A - Classification method, device, equipment and the storage medium of data in a kind of database - Google Patents

Classification method, device, equipment and the storage medium of data in a kind of database Download PDF

Info

Publication number
CN109597892A
CN109597892A CN201811595262.1A CN201811595262A CN109597892A CN 109597892 A CN109597892 A CN 109597892A CN 201811595262 A CN201811595262 A CN 201811595262A CN 109597892 A CN109597892 A CN 109597892A
Authority
CN
China
Prior art keywords
data
target
database
sample survey
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811595262.1A
Other languages
Chinese (zh)
Inventor
林国华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN201811595262.1A priority Critical patent/CN109597892A/en
Publication of CN109597892A publication Critical patent/CN109597892A/en
Pending legal-status Critical Current

Links

Abstract

This application discloses a kind of classification methods of data in database, comprising: obtains the metadata and data from the sample survey in target database;Extract the data characteristics in metadata;Combined data feature classifies data from the sample survey, obtains corresponding data type.The method classified in compared to the prior art to data from the sample survey, this method further obtains the metadata in target database, and the data characteristics in metadata is extracted, classify then in conjunction with the data characteristics extracted to data from the sample survey, to obtain the corresponding data type of each data from the sample survey.Classify in conjunction with the data characteristics of metadata to data from the sample survey, can be improved the accuracy to data from the sample survey classification, namely improves the accuracy to the data information classification stored in database.Disclosed herein as well is sorter, equipment and the computer readable storage mediums of data in a kind of database, all have above-mentioned beneficial effect.

Description

Classification method, device, equipment and the storage medium of data in a kind of database
Technical field
The present invention relates to database processing field, in particular to classification method, device, the equipment of data in a kind of database And computer readable storage medium.
Background technique
Under the promotion of computer internet technology fast development situation, big data application is grown rapidly.At the same time, by It opens and is contradicted with secret protection in data information violation collection, data information, and to an extensive style " knife for data information Cut " way to manage etc., severe security challenge is brought to the development that big data is applied.
Therefore, the prior art realizes the essence of data resource by carrying out classification classification to the data being stored in database profession ZOOM analysis and protection, it is ensured that the active balance of data application and data protection.Specifically, in the prior art in database Then the method that data are classified utilizes disaggregated model such as canonical table generally by the data from the sample survey obtained in database Classify up to formula or name body identification to data from the sample survey.But the method that this data in database are classified, it is past Toward there is classification inaccuracy, so that data information still remains serious security risk.
Therefore, how to improve to the accuracy of the data classification in database is that those skilled in the art need to solve at present The technical issues of.
Summary of the invention
In view of this, can be improved logarithm the purpose of the present invention is to provide a kind of classification method of data in database According to the accuracy of the data classification in library;It is a further object of the present invention to provide the sorter of data in a kind of database, set Standby and computer readable storage medium all has above-mentioned beneficial effect.
In order to solve the above technical problems, the present invention provides a kind of classification method of data in database, comprising:
Obtain the metadata and data from the sample survey in target database;
Extract the data characteristics in the metadata;
The data from the sample survey is classified in conjunction with the data characteristics, obtains corresponding data type.
Preferably, data characteristics described in the combination classifies the data from the sample survey, obtains corresponding data type It specifically includes:
The data from the sample survey is divided into coding class data and text class data by presorting model;
In conjunction with the data characteristics, the coding class data are matched using pre-set regular expression;
According to the corresponding relationship of regular expression and data type, for the target to match with target regular expression It encodes class data and corresponding data type is set;
In conjunction with the data characteristics, the name body in method for distinguishing extraction target text class data is known using name body, and It is that corresponding data type is arranged in the target text class data according to the name body.
Preferably, described specifically to be wrapped according to the name body for the corresponding data type of target text class data setting It includes:
Whether the quantity for judging the name body extracted is multiple;
If so, being matched respectively to multiple name bodies using pre-set multiple text templates;
When there is the target text template to match with multiple name bodies, according to text template and data type Data type corresponding with the target text template is arranged for the target text class data in corresponding relationship;
If it is not, being then that corresponding data are arranged in the target text class data according to the data type of the single name body Type.
Preferably, described specifically to be wrapped according to the name body for the corresponding data type of target text class data setting It includes:
Whether the quantity for judging the name body extracted is multiple;
If so, calculating TF-IDF value of each name body in the target text class data;
Judge whether each TF-IDF value is greater than preset threshold;
It is greater than the target TF-IDF value of the preset threshold if it exists, then utilizes mesh corresponding with the target TF-IDF value The data type of mark name body is that corresponding data type is arranged in the target text class data;
If it is not, being then that corresponding data are arranged in the target text class data according to the data type of the single name body Type.
Preferably, after the metadata and data from the sample survey in the acquisition target database, further comprise:
Judge whether there is the target metadata to match with pre-set Field Template;
It if it exists, then is corresponding with the target metadata according to the corresponding relationship of Field Template and data type Corresponding data type is arranged in the data from the sample survey.
Preferably, further comprise:
When there is no the target regular expression to match with the target code class data, supervised learning is utilized Method training obtain characteristic matching model;
The target code class data are input in the characteristic matching model and carry out characteristic matching;
For the target code class data, corresponding data type is set.
Preferably, the data characteristics described in the combination classifies the data from the sample survey, obtains corresponding data class After type, further comprise:
According to the corresponding relationship of pre-set data type and data-level, corresponding number is set for the data from the sample survey According to rank, so that user is according to the data from the sample survey of the access authority of itself access corresponding data rank.
In order to solve the above technical problems, the present invention also provides a kind of sorters of data in database, comprising:
Module is obtained, for obtaining metadata and data from the sample survey in target database;
Extraction module, for extracting the data characteristics in the metadata;
Categorization module obtains corresponding data class for the data from the sample survey to be classified in conjunction with the data characteristics Type.
Preferably, further comprise:
Judgment module, for judging whether there is the target metadata to match with pre-set Field Template;If depositing Then calling the first setup module;
First setup module is and the target metadata for the corresponding relationship according to Field Template and data type Corresponding data type is arranged in the corresponding data from the sample survey.
Preferably, further comprise:
Training module, for when there is no the target regular expressions to match with the target code class data When, target regular expression is obtained using the method training of supervised learning;
Matching module is used in conjunction with the data characteristics, using the target regular expression to the target code class Data are matched;
Second setup module is the mesh for the corresponding relationship according to the target regular expression and data type Corresponding data type is arranged in mark coding class data.
Preferably, further comprise:
Data-level setup module, for the corresponding relationship according to pre-set data type and data-level, for institute It states data from the sample survey and corresponding data-level is set, so that user is according to the access authority of itself access corresponding data rank Data from the sample survey.
In order to solve the above technical problems, the present invention also provides a kind of sorting devices of data in database, comprising:
Memory, for storing computer program;
Processor realizes the classification method of data in any of the above-described kind of database when for executing the computer program Step.
In order to solve the above technical problems, the present invention also provides a kind of computer readable storage medium, it is described computer-readable Computer program is stored on storage medium, the computer program is realized when being executed by processor in any of the above-described kind of database The step of classification method of data.
The classification method of data in a kind of database provided by the invention, compared to the prior art in data from the sample survey carry out The method of classification, this method further obtain the metadata in target database, and extract the data characteristics in metadata, then Classify in conjunction with the data characteristics extracted to data from the sample survey, to obtain the corresponding data type of each data from the sample survey.Due to The particularity of database self structure, storage be all structuring data, and metadata is for describing data from the sample survey Data, therefore include the data characteristics of the data from the sample survey of storage in metadata, therefore combine the data characteristics of metadata to pumping Sample data are classified, and can be improved the accuracy to data from the sample survey classification, namely improve and believe the data stored in database Cease the accuracy of classification.
In order to solve the above technical problems, the present invention also provides sorter, equipment and the meters of data in a kind of database Calculation machine readable storage medium storing program for executing, all has above-mentioned beneficial effect.
Detailed description of the invention
It in order to illustrate the embodiments of the present invention more clearly or the technical solution of the prior art, below will be to embodiment or existing Attached drawing needed in technical description is briefly described, it should be apparent that, the accompanying drawings in the following description is only this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to the attached drawing of offer.
Fig. 1 is the flow chart of the classification method of data in a kind of database provided in an embodiment of the present invention;
Fig. 2 be database shown in FIG. 1 in data classification method step S30 in combined data feature by data from the sample survey into Row classification, obtains corresponding data type specific flow chart;
Fig. 3 is the structure chart of the sorter of data in a kind of database provided in an embodiment of the present invention;
Fig. 4 is the structure chart of the sorting device of data in a kind of database provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
The core of the embodiment of the present invention is to provide a kind of classification method of data in database, can be improved in database Data classification accuracy;Another core of the invention is to provide sorter, equipment and the meter of data in a kind of database Calculation machine readable storage medium storing program for executing, all has above-mentioned beneficial effect.
It is right with reference to the accompanying drawings and detailed description in order to make those skilled in the art more fully understand the present invention program The present invention is described in further detail.
Fig. 1 is the flow chart of the classification method of data in a kind of database provided in an embodiment of the present invention.As shown in Figure 1, The classification method of data includes: in a kind of database
S10: the metadata and data from the sample survey in target database are obtained;
The purpose of this step is the metadata and data from the sample survey obtained in target database.Target database refers to needing Database where the data information classified, metadata refer in target database for describing stored data letter The data of breath, metadata include the field name that the data information of storage is described and field description, carry out to database table Table name and database name of description etc., the present embodiment to the concrete type of metadata without limitation.Data from the sample survey can be mesh Mark all data informations of actual storage in database;It is also possible to the partial data randomly selected from target database letter Breath, by classifying to data from the sample survey information, to realize the classification to all data informations stored in target database. It should be noted that in specific implementation, metadata corresponding with data from the sample survey is preferably obtained, thus by reducing acquisition The quantity of metadata reduces the operating quantity of the data characteristics of the extraction metadata needed to be implemented.Certainly, in other embodiments In, it is also possible to obtain corresponding metadata according to practical application request, the present embodiment does not limit this.Specifically, can be with It is directly by way of extracting metadata and data from the sample survey in target database, being also possible to by running preset acquisition The mode of script achievees the purpose that metadata and data from the sample survey in acquisition target database, and the present embodiment, which does not also do this, to be had The restriction of body.
S20: the data characteristics in metadata is extracted.
It is understood that the data stored in target database are believed according to the particularity of target database self structure Breath is usually the data of complete lattice;And metadata is in target database for describing the number of stored data information According to, therefore include the data characteristics of the data information of storage in metadata.It is understood that metadata is usually textual data According to, therefore, the data characteristics in metadata is extracted, namely excavate the data characteristics in text data, and mining data feature Concrete mode is content known in those skilled in the art, therefore is not repeated herein.The purpose of this step is to extract first number Data characteristics in, it is more smart in order to be carried out to data from the sample survey using as characteristic of division when classifying to data from the sample survey Quasi- classification.
S30: combined data feature classifies data from the sample survey, obtains corresponding data type.
Specifically, data type refers to carrying out data from the sample survey on the label that classification obtains, in extracting metadata After data characteristics, in conjunction with the data characteristics extracted, sort operation is carried out to data from the sample survey, to obtain each data from the sample survey pair The data type answered.Specifically, utilizing regular expression or the basis for naming body to identify when classifying to data from the sample survey On, the data characteristics gone out according to meta-data extraction is further utilized, obtains the data information stored in current database substantially Classification, and using data characteristics as the considerations classified to data from the sample survey, classify to data from the sample survey, and be classification Corresponding data type is arranged in data from the sample survey afterwards.
The classification method of data in a kind of database provided in an embodiment of the present invention, compared to the prior art in sampling number According to the method classified, this method further obtains the metadata in target database, and extracts the spy of the data in metadata Sign, classifies to data from the sample survey then in conjunction with the data characteristics extracted, to obtain the corresponding data class of each data from the sample survey Type.Due to the particularity of database self structure, storage be all structuring data, and metadata be for describe sampling The data of data, therefore include the data characteristics of the data from the sample survey of storage in metadata, therefore combine the data of metadata special Sign classifies to data from the sample survey, can be improved the accuracy to data from the sample survey classification, namely improves to storing in database The accuracy of data information classification.
Fig. 2 be database shown in FIG. 1 in data classification method step S30 in combined data feature by data from the sample survey into Row classification, obtains corresponding data type specific flow chart.Referring to FIG. 2, combined data feature classifies data from the sample survey, Corresponding data type is obtained to specifically include:
S21: data from the sample survey is divided into coding class data and text class data by presorting model.
Specifically, in the present embodiment, after getting the metadata and data from the sample survey in target database, being first Data from the sample survey is divided into coding class data and text class data by presorting model, in order to be utilized respectively and encode class data Classification method corresponding with text class data classifies to data from the sample survey.
More specifically, in by way of presorting model and data from the sample survey is divided into coding class data and text class data, It can be by the way that clustering algorithm be arranged in model of presorting, so that data from the sample survey is classified.It should be noted that clustering algorithm It is varied, for example, K-Means (K mean value) cluster, mean shift clustering, density clustering method (DBSCAN) and solidifying Poly- hierarchical clustering etc., the present embodiment to the type of the clustering algorithm in model of presorting without limitation.
S22: combined data feature matches coding class data using pre-set regular expression;
S23: according to the corresponding relationship of regular expression and data type, for the target to match with target regular expression It encodes class data and corresponding data type is set.
Specifically, this step sets coding class data by combined data feature, using pre-set regular expression Set corresponding data type.It should be noted that coding class data refer to the data with obvious numerical characteristics, such as identity card Number, phone number etc., coding class data are matched by presetting regular expression.Specifically, can be pre- Multiple regular expressions are first set, is then utilized respectively multiple regular expressions and target code class data is matched, when certain When a regular expression and target code class data match, namely obtain target regular expression;It is also possible to using preparatory A regular expression being arranged matches multiple coding class data using the regular expression, when the regular expression When matching with target code class data, namely obtain target code class data and corresponding target regular expression.
Specifically, corresponding data type is arranged for each regular expression, therefore work as mesh when regular expression is arranged Regular expression is marked when being matched to corresponding target code class data, then just for target code class data setting and the target The then corresponding data type of expression formula.For example, when the corresponding mesh for the target regular expression matching for being used to match identification card number When mark coding class data, indicate that the data type of the target code class data is ID card No., therefore by the target code class The data type of data is set as identification card number.
S24: combined data feature knows the name body in method for distinguishing extraction target text class data using name body, and It is that corresponding data type is arranged in target text class data according to name body.
Specifically, text class data generally comprise Chinese character, English alphabet, phonetic symbol etc., name body is known in method for distinguishing, Be in advance based on knowledge base or dictionary setting rule template, the Feature Words in rule template include keyword, deictic words and direction word, Position word (such as tail word), centre word, by matching target text class data with set rule template, when in mesh When detecting Feature Words in mark text class data, then further combined with the content of text before and after the specific word, target text is extracted Name body in this class data, and then be that corresponding data type is arranged in target text class data according to the name body extracted. For example, it is assumed that a target text class data are " Zhang San is reading a book ", then the key in pre-set rule template is utilized Word " is opened ", in conjunction with knowledge base or dictionary, is extracted " Zhang San ", and set " name " for the data type of " Zhang San ".
As it can be seen that in the present embodiment, encoding class data and text class data by being in advance divided into data from the sample survey, then distinguish Classified otherwise to corresponding data from the sample survey using regular expression and name body knowledge, to realize raising to database In data classification accuracy purpose.
Quantity in view of in practical applications, there is the name body extracted otherwise by name body knowledge is more It is a, then need a kind of suitable mode corresponding data type can be set to the multiple name bodies identified.In above-mentioned implementation On the basis of example, the present embodiment has made further instruction and optimization to technical solution, specifically, being target text according to name body This class data are arranged corresponding data type and specifically include:
Whether the quantity for judging the name body extracted is multiple;
If so, being matched respectively to multiple name bodies using pre-set multiple text templates;
It is corresponding with data type according to text template when there is the target text template to match with multiple name bodies Data type corresponding with target text template is arranged for target text class data in relationship;
If it is not, being then that corresponding data type is arranged in target text class data according to the data type of single name body.
In the present embodiment, method for distinguishing is known using name body and is mentioned in combined data feature as preferred embodiment After taking the name body in target text class data, first it is whether the quantity for the name body that judgement extracts is multiple, if so, Then multiple name bodies are matched respectively using pre-set multiple text templates.Specifically, including in text template Multiple name body information, the name body information for including in different text templates are different, and are passed through and are utilized different text moulds Plate respectively matches the multiple name bodies extracted, when there is the target text template to match with multiple name bodies, Indicate multiple name bodies that multiple name bodies included in target text template are and extract from target text class data It is corresponding, therefore according to the corresponding relationship of pre-set text template and data type, it is arranged for target text class data Data type corresponding with target text template.If whether the quantity of the name body extracted is one, according to this life Corresponding data type is arranged for target text class data in the type of name body.
It is that the corresponding data type of target text class data setting is specifically wrapped according to name body in another kind specific implementation It includes:
Whether the quantity for judging the name body extracted is multiple;
If so, calculating TF-IDF value of each name body in target text class data;
Judge whether each TF-IDF value is greater than preset threshold;
It is greater than the target TF-IDF value of preset threshold if it exists, then utilizes object naming body corresponding with target TF-IDF value Data type be that corresponding data type is arranged in target text class data;
If it is not, being then that corresponding data type is arranged in target text class data according to the data type of single name body.
It is understood that if the frequency that occurs in target text class data d of name body w is high, and in other texts Seldom occur in this class data, then it is assumed that name body w has good separating capacity, is adapted to a target text class data d It comes with other text class data separations, also the data type of body w can will be named as the data of target text class data Type.
Specifically, calculating each life when the quantity for the name body for judging to extract from target text class data is multiple TF-IDF (term frequency-inverse document frequency, word of the name body in target text class data Frequently-inverse document frequency) value, then judge whether each TF-IDF value is greater than preset threshold.It should be noted that calculating TF-IDF value Mode be content known in those skilled in the art, details are not described herein again.Calculate it is each name body TF-IDF value it Afterwards, it is greater than the target TF-IDF value of preset threshold if it exists, then it represents that existing, which can be used in, distinguishes target text class data and its His text carrys out the object naming body of data, therefore, according to the corresponding relationship between TF-IDF value and name body, acquisition and target Then the corresponding object naming body of TF-IDF value is that the setting of target text class data corresponds to using the data type of object naming body Data type.It is target text according to the type of this name body if whether the quantity of the name body extracted is one Corresponding data type is arranged in this class data.
It should be noted that after to the target text class data setting data type for including multiple name bodies, if mesh Mark text class data are multiple name bodies in the same tables of data, then the present embodiment is also possible to for the tables of data setting table Type;In addition, if target text class data are multiple name bodies in multiple tables of data in the same database, this reality It applies example to be also possible to for the data lab setting data type, with specific reference to the composition of multiple name bodies in target text class data Depending on mode.
As it can be seen that method provided in this embodiment, can be arranged the text class data for including multiple name bodies corresponding Data type improves the accuracy classified to the data in database.
On the basis of the above embodiments, the present embodiment has made further instruction and optimization to technical solution, specifically, After obtaining the metadata and data from the sample survey in target database, further comprise:
Judge whether there is the target metadata to match with pre-set Field Template;
It if it exists, then is sampling corresponding with target metadata according to the corresponding relationship of Field Template and data type Corresponding data type is arranged in data.
Specifically, since metadata can be the column name of a certain column data from the sample survey in tables of data, that is, where column name Column in data from the sample survey data type it is corresponding with the class name.The present embodiment is by presetting for matching first number The Field Template of sensitive field in, and the corresponding relationship of Field Template and data type is preset, then obtaining After metadata and data from the sample survey into target database, metadata is matched using pre-set Field Template, When the sensitive field for judging to be matched to using Field Template in corresponding target metadata, indicate corresponding to the target metadata Data from the sample survey data type it is consistent with the data type of the Field Template, so that corresponding data be arranged for the data from the sample survey Type.
It should be noted that after the data type of corresponding data from the sample survey is provided according to Field Template, Ke Yi These data from the sample survey that data type has been provided are deleted in the data from the sample survey got, to only need in the next steps Classify for the data from the sample survey of data type is provided not yet, so that classifying to the data in database Efficiency.
In the present embodiment, it is contemplated that regular expression is pre-set by technical staff, therefore is utilizing canonical When expression formula matches coding class data, it is understood that there may be the target regular expressions not matched with target code class data The case where formula.Therefore, the present embodiment on the basis of the above embodiments, has made further instruction and optimization to technical solution, Specifically, further comprising:
When there is no the target regular expression to match with target code class data, instructed using the method for supervised learning Get out characteristic matching model;
Target code class data are input in characteristic matching model and carry out characteristic matching;
For target code class data, corresponding data type is set.
Specifically, determining that there is no the target regular expressions to match with target code class data in the present embodiment When formula, obtained using encoding samples class data and the training of corresponding data type for matching target by the method for supervised learning Encode the characteristic matching model of class data.Specifically, advancing with a large amount of as ID card No., telephone number equal samples encode Class data carry out feature extraction, and the data characteristics extracted is corresponding with known data type, to train and each volume The corresponding characteristic matching model of data characteristics of code class data.That is, characteristic matching model is a kind of single input and multi-output Model, by the way that target code class data are input in characteristic matching model, so that characteristic matching model is by target code class number According to corresponding with the previously known data type data characteristics of data characteristics carry out characteristic matching, to be target code class number According to the corresponding data type of setting.
As it can be seen that the present embodiment is that there is no corresponding with target code class data in pre-set regular expression When target regular expression, target code class data are input in characteristic matching model and carry out characteristic matching, is target code Corresponding data type is arranged in class data, further improves and the accurate of corresponding data type is arranged for the data in database Degree.
On the basis of the above embodiments, the present embodiment has made further instruction and optimization to technical solution, specifically, Data from the sample survey is classified in combined data feature, after obtaining corresponding data type, further comprises:
According to the corresponding relationship of pre-set data type and data-level, corresponding data level is set for data from the sample survey Not, the data from the sample survey of corresponding data rank is accessed according to the access authority of itself so as to user.
It should be noted that in actual operation, the corresponding access authority of different user identity is different, namely not Same user is able to access that the data-level in target database is different, and the high user of permission, which is able to access that, gets phase To the more data informations of user that permission is low.Therefore, in the present embodiment, by according to pre-set data type sum number According to the corresponding relationship of rank, corresponding data-level is set for data from the sample survey.It is set specifically, can be for each data type A corresponding data-level is set, that is, data type and data-level are one-to-one;But due to the number of user identity Amount is generally less than the quantity of data type, and therefore, the present embodiment is that a corresponding data-level is arranged for numerous types of data, That is, multiple data types correspond to same data-level, it is configured with specific reference to actual demand, the present embodiment does not do this It limits.
As it can be seen that the present embodiment by for data from the sample survey be arranged data-level, so as to user according to the access authority of itself visit It asks the data from the sample survey of corresponding data rank, the safety of data information can be further increased.
In order to make those skilled in the art better understand the technical solutions in the application, below with reference to practical application field Scape technical solutions in the embodiments of the present application is described in detail.The classification side of data in the database provided in the present embodiment Method, the specific steps are as follows:
(1) metadata and data from the sample survey in target database DB are obtained, metadata includes the database of target database For identifying the title of each column data in data table name and data table in title, target database;
(2) data of actual storage in target database DB are obtained in such a way that operation acquires script, namely is taken out Sample data;
(3) data characteristics in metadata is extracted in the way of data mining;
(4) by clustering algorithm, data from the sample survey is divided into coding class data and text class data;
(5) combined data feature matches coding class data using pre-set regular expression, will be with canonical The data type for the coding class data that expression formula matches is set as type corresponding with the regular expression;
(6) combined data feature knows the name body extracted in text class data otherwise using name body, and judges to mention Whether the quantity of the name body of taking-up is multiple;If name body be it is multiple, using pre-set text template go matching it is more A name body obtains the text template that can be matched with multiple name bodies, and recycling the type of text template is multiple lives The corresponding data type of name body setting data;It is text class according to the single data type for naming body if name body is single Corresponding data type is arranged in data.
The classification method of data in a kind of database provided in an embodiment of the present invention, compared to the prior art in sampling number According to the method classified, this method further obtains the metadata in target database, and extracts the spy of the data in metadata Sign, classifies to data from the sample survey then in conjunction with the data characteristics extracted, to obtain the corresponding data class of each data from the sample survey Type.Due to the particularity of database self structure, storage be all structuring data, and metadata be for describe sampling The data of data, therefore include the data characteristics of the data from the sample survey of storage in metadata, therefore combine the data of metadata special Sign classifies to data from the sample survey, can be improved the accuracy to data from the sample survey classification, namely improves to storing in database The accuracy of data information classification.
Detailed retouch has been carried out for the embodiment of the classification method of data in a kind of database provided by the invention above It states, the present invention also provides the sorter of data in a kind of database corresponding with this method, equipment and computer-readable deposits Storage media, due to device, equipment and computer readable storage medium part embodiment and method part embodiment mutually according to It answers, therefore the embodiment of device, equipment and computer readable storage medium part refers to the description of the embodiment of method part, Here it wouldn't repeat.
Fig. 3 is the structure chart of the sorter of data in a kind of database provided in an embodiment of the present invention, as shown in figure 3, The sorter of data includes: in a kind of database
Module 31 is obtained, for obtaining metadata and data from the sample survey in target database;
Extraction module 32, for extracting the data characteristics in metadata;
Data from the sample survey is classified for combined data feature, obtains corresponding data type by categorization module 33.
The sorter of data in database provided in an embodiment of the present invention, the classification side with data in above-mentioned database The beneficial effect of method.
As preferred embodiment, the sorter of data further comprises in database provided in this embodiment:
Judgment module, for judging whether there is the target metadata to match with pre-set Field Template;If depositing Then calling the first setup module;
First setup module is opposite with target metadata for the corresponding relationship according to Field Template and data type Corresponding data type is arranged in the data from the sample survey answered.
As preferred embodiment, the sorter of data further comprises in database provided in this embodiment:
Training module, for utilizing prison when there is no the target regular expression to match with target code class data The method training that educational inspector practises obtains target regular expression;
Matching module is used for combined data feature, is matched using target regular expression to target code class data;
Second setup module is target code class for the corresponding relationship according to target regular expression and data type Corresponding data type is arranged in data.
As preferred embodiment, the sorter of data further comprises in database provided in this embodiment:
Data-level setup module, for the corresponding relationship according to pre-set data type and data-level, to take out Corresponding data-level is arranged in sample data, so that user is according to the sampling number of the access authority of itself access corresponding data rank According to.
Fig. 4 is the structure chart of the sorting device of data in a kind of database provided in an embodiment of the present invention, as shown in figure 4, The sorting device of data includes: in a kind of database
Memory 41, for storing computer program;
Processor 42, when for executing computer program in realization such as above-mentioned database the step of the classification method of data.
The sorting device of data in database provided in an embodiment of the present invention, the classification side with data in above-mentioned database The beneficial effect of method.
In order to solve the above technical problems, the present invention also provides a kind of computer readable storage medium, computer-readable storage It is stored with computer program on medium, the classification side such as data in above-mentioned database is realized when computer program is executed by processor The step of method.
Computer readable storage medium provided in an embodiment of the present invention, with the classification method of data in above-mentioned database Beneficial effect.
Above to the classification method of data, device, equipment and computer-readable storage in database provided by the present invention Medium is described in detail.Principle and implementation of the present invention are described for specific embodiment used herein, The above description of the embodiment is only used to help understand the method for the present invention and its core ideas.It should be pointed out that for this technology For the those of ordinary skill in field, without departing from the principle of the present invention, several improvement can also be carried out to the present invention And modification, these improvements and modifications also fall within the scope of protection of the claims of the present invention.
Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment Speech, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part illustration ?.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond the scope of this invention.

Claims (10)

1. the classification method of data in a kind of database characterized by comprising
Obtain the metadata and data from the sample survey in target database;
Extract the data characteristics in the metadata;
The data from the sample survey is classified in conjunction with the data characteristics, obtains corresponding data type.
2. the method according to claim 1, wherein data characteristics described in the combination by the data from the sample survey into Row classification, obtains corresponding data type and specifically includes:
The data from the sample survey is divided into coding class data and text class data by presorting model;
In conjunction with the data characteristics, the coding class data are matched using pre-set regular expression;
According to the corresponding relationship of regular expression and data type, for the target code to match with target regular expression Corresponding data type is arranged in class data;
In conjunction with the data characteristics, using the name body in name body knowledge method for distinguishing extraction target text class data, and according to The name body is that corresponding data type is arranged in the target text class data.
3. according to the method described in claim 2, it is characterized in that, it is described according to the name body be the target text class number It is specifically included according to corresponding data type is arranged:
Whether the quantity for judging the name body extracted is multiple;
If so, being matched respectively to multiple name bodies using pre-set multiple text templates;
It is corresponding with data type according to text template when there is the target text template to match with multiple name bodies Data type corresponding with the target text template is arranged for the target text class data in relationship;
If it is not, being then that corresponding data class is arranged in the target text class data according to the data type of the single name body Type.
4. according to the method described in claim 2, it is characterized in that, it is described according to the name body be the target text class number It is specifically included according to corresponding data type is arranged:
Whether the quantity for judging the name body extracted is multiple;
If so, calculating TF-IDF value of each name body in the target text class data;
Judge whether each TF-IDF value is greater than preset threshold;
It is greater than the target TF-IDF value of the preset threshold if it exists, then target corresponding with the target TF-IDF value is utilized to order The data type of name body is that corresponding data type is arranged in the target text class data;
If it is not, being then that corresponding data class is arranged in the target text class data according to the data type of the single name body Type.
5. the method according to claim 1, wherein metadata and sampling in the acquisition target database After data, further comprise:
Judge whether there is the target metadata to match with pre-set Field Template;
It if it exists, then is corresponding with the target metadata described according to the corresponding relationship of Field Template and data type Corresponding data type is arranged in data from the sample survey.
6. according to the method described in claim 2, it is characterized in that, further comprising:
When there is no the target regular expression to match with the target code class data, the side of supervised learning is utilized Method training obtains characteristic matching model;
The target code class data are input in the characteristic matching model and carry out characteristic matching;
For the target code class data, corresponding data type is set.
7. method according to any one of claims 1 to 6, which is characterized in that the data characteristics described in the combination is by institute It states data from the sample survey to classify, after obtaining corresponding data type, further comprises:
According to the corresponding relationship of pre-set data type and data-level, corresponding data level is set for the data from the sample survey Not, the data from the sample survey of corresponding data rank is accessed according to the access authority of itself so as to user.
8. the sorter of data in a kind of database characterized by comprising
Module is obtained, for obtaining metadata and data from the sample survey in target database;
Extraction module, for extracting the data characteristics in the metadata;
Categorization module obtains corresponding data type for the data from the sample survey to be classified in conjunction with the data characteristics.
9. the sorting device of data in a kind of database characterized by comprising
Memory, for storing computer program;
Processor realizes data in database as described in any one of claim 1 to 7 when for executing the computer program Classification method the step of.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program, the computer program realize data in database as described in any one of claim 1 to 7 when being executed by processor The step of classification method.
CN201811595262.1A 2018-12-25 2018-12-25 Classification method, device, equipment and the storage medium of data in a kind of database Pending CN109597892A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811595262.1A CN109597892A (en) 2018-12-25 2018-12-25 Classification method, device, equipment and the storage medium of data in a kind of database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811595262.1A CN109597892A (en) 2018-12-25 2018-12-25 Classification method, device, equipment and the storage medium of data in a kind of database

Publications (1)

Publication Number Publication Date
CN109597892A true CN109597892A (en) 2019-04-09

Family

ID=65962704

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811595262.1A Pending CN109597892A (en) 2018-12-25 2018-12-25 Classification method, device, equipment and the storage medium of data in a kind of database

Country Status (1)

Country Link
CN (1) CN109597892A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427992A (en) * 2019-07-23 2019-11-08 杭州城市大数据运营有限公司 Data matching method, device, computer equipment and storage medium
CN110704873A (en) * 2019-09-25 2020-01-17 全球能源互联网研究院有限公司 Method and system for preventing sensitive data from being leaked
CN110727743A (en) * 2019-10-12 2020-01-24 杭州城市大数据运营有限公司 Data identification method and device, computer equipment and storage medium
CN110781173A (en) * 2019-10-12 2020-02-11 杭州城市大数据运营有限公司 Data identification method and device, computer equipment and storage medium
CN114860941A (en) * 2022-07-05 2022-08-05 南京云创大数据科技股份有限公司 Industry data management method and system based on data brain

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731976A (en) * 2015-04-14 2015-06-24 海量云图(北京)数据技术有限公司 Method for finding and sorting private data in data table
CN106408140A (en) * 2015-07-27 2017-02-15 广州西麦信息科技有限公司 Grading and classifying model method based on power grid enterprise data
CN106815605A (en) * 2017-01-23 2017-06-09 上海上讯信息技术股份有限公司 A kind of data classification method and equipment based on machine learning
US20170320103A1 (en) * 2016-05-04 2017-11-09 Jessica Marie Schreiber Scalable systems and methods for classifying textile samples
CN107368892A (en) * 2017-06-07 2017-11-21 无锡小天鹅股份有限公司 Model training method and device based on machine learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731976A (en) * 2015-04-14 2015-06-24 海量云图(北京)数据技术有限公司 Method for finding and sorting private data in data table
CN106408140A (en) * 2015-07-27 2017-02-15 广州西麦信息科技有限公司 Grading and classifying model method based on power grid enterprise data
US20170320103A1 (en) * 2016-05-04 2017-11-09 Jessica Marie Schreiber Scalable systems and methods for classifying textile samples
CN106815605A (en) * 2017-01-23 2017-06-09 上海上讯信息技术股份有限公司 A kind of data classification method and equipment based on machine learning
CN107368892A (en) * 2017-06-07 2017-11-21 无锡小天鹅股份有限公司 Model training method and device based on machine learning

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427992A (en) * 2019-07-23 2019-11-08 杭州城市大数据运营有限公司 Data matching method, device, computer equipment and storage medium
CN110704873A (en) * 2019-09-25 2020-01-17 全球能源互联网研究院有限公司 Method and system for preventing sensitive data from being leaked
CN110704873B (en) * 2019-09-25 2021-05-25 全球能源互联网研究院有限公司 Method and system for preventing sensitive data from being leaked
CN110727743A (en) * 2019-10-12 2020-01-24 杭州城市大数据运营有限公司 Data identification method and device, computer equipment and storage medium
CN110781173A (en) * 2019-10-12 2020-02-11 杭州城市大数据运营有限公司 Data identification method and device, computer equipment and storage medium
CN114860941A (en) * 2022-07-05 2022-08-05 南京云创大数据科技股份有限公司 Industry data management method and system based on data brain

Similar Documents

Publication Publication Date Title
CN109597892A (en) Classification method, device, equipment and the storage medium of data in a kind of database
CN107766371B (en) Text information classification method and device
CN110020424B (en) Contract information extraction method and device and text information extraction method
WO2020252919A1 (en) Resume identification method and apparatus, and computer device and storage medium
CN110427612B (en) Entity disambiguation method, device, equipment and storage medium based on multiple languages
CN110298039B (en) Event place identification method, system, equipment and computer readable storage medium
CN110457585B (en) Negative text pushing method, device and system and computer equipment
CN109299277A (en) The analysis of public opinion method, server and computer readable storage medium
CN111291177A (en) Information processing method and device and computer storage medium
CN111177367A (en) Case classification method, classification model training method and related products
CN111143507A (en) Reading understanding method based on composite problems
CN112132238A (en) Method, device, equipment and readable medium for identifying private data
CN113076735A (en) Target information acquisition method and device and server
CN112749283A (en) Entity relationship joint extraction method for legal field
CN115936624A (en) Basic level data management method and device
CN110674297B (en) Public opinion text classification model construction method, public opinion text classification device and public opinion text classification equipment
CN115687647A (en) Notarization document generation method and device, electronic equipment and storage medium
CN113722492A (en) Intention identification method and device
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN114842982B (en) Knowledge expression method, device and system for medical information system
CN108520012B (en) Mobile internet user comment mining method based on machine learning
CN111401047A (en) Method and device for generating dispute focus of legal document and computer equipment
CN115130455A (en) Article processing method and device, electronic equipment and storage medium
CN114997167A (en) Resume content extraction method and device
CN113590792A (en) User problem processing method and device and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190409

RJ01 Rejection of invention patent application after publication