CN109597892A - Classification method, device, equipment and the storage medium of data in a kind of database - Google Patents
Classification method, device, equipment and the storage medium of data in a kind of database Download PDFInfo
- Publication number
- CN109597892A CN109597892A CN201811595262.1A CN201811595262A CN109597892A CN 109597892 A CN109597892 A CN 109597892A CN 201811595262 A CN201811595262 A CN 201811595262A CN 109597892 A CN109597892 A CN 109597892A
- Authority
- CN
- China
- Prior art keywords
- data
- target
- database
- sample survey
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
This application discloses a kind of classification methods of data in database, comprising: obtains the metadata and data from the sample survey in target database;Extract the data characteristics in metadata;Combined data feature classifies data from the sample survey, obtains corresponding data type.The method classified in compared to the prior art to data from the sample survey, this method further obtains the metadata in target database, and the data characteristics in metadata is extracted, classify then in conjunction with the data characteristics extracted to data from the sample survey, to obtain the corresponding data type of each data from the sample survey.Classify in conjunction with the data characteristics of metadata to data from the sample survey, can be improved the accuracy to data from the sample survey classification, namely improves the accuracy to the data information classification stored in database.Disclosed herein as well is sorter, equipment and the computer readable storage mediums of data in a kind of database, all have above-mentioned beneficial effect.
Description
Technical field
The present invention relates to database processing field, in particular to classification method, device, the equipment of data in a kind of database
And computer readable storage medium.
Background technique
Under the promotion of computer internet technology fast development situation, big data application is grown rapidly.At the same time, by
It opens and is contradicted with secret protection in data information violation collection, data information, and to an extensive style " knife for data information
Cut " way to manage etc., severe security challenge is brought to the development that big data is applied.
Therefore, the prior art realizes the essence of data resource by carrying out classification classification to the data being stored in database profession
ZOOM analysis and protection, it is ensured that the active balance of data application and data protection.Specifically, in the prior art in database
Then the method that data are classified utilizes disaggregated model such as canonical table generally by the data from the sample survey obtained in database
Classify up to formula or name body identification to data from the sample survey.But the method that this data in database are classified, it is past
Toward there is classification inaccuracy, so that data information still remains serious security risk.
Therefore, how to improve to the accuracy of the data classification in database is that those skilled in the art need to solve at present
The technical issues of.
Summary of the invention
In view of this, can be improved logarithm the purpose of the present invention is to provide a kind of classification method of data in database
According to the accuracy of the data classification in library;It is a further object of the present invention to provide the sorter of data in a kind of database, set
Standby and computer readable storage medium all has above-mentioned beneficial effect.
In order to solve the above technical problems, the present invention provides a kind of classification method of data in database, comprising:
Obtain the metadata and data from the sample survey in target database;
Extract the data characteristics in the metadata;
The data from the sample survey is classified in conjunction with the data characteristics, obtains corresponding data type.
Preferably, data characteristics described in the combination classifies the data from the sample survey, obtains corresponding data type
It specifically includes:
The data from the sample survey is divided into coding class data and text class data by presorting model;
In conjunction with the data characteristics, the coding class data are matched using pre-set regular expression;
According to the corresponding relationship of regular expression and data type, for the target to match with target regular expression
It encodes class data and corresponding data type is set;
In conjunction with the data characteristics, the name body in method for distinguishing extraction target text class data is known using name body, and
It is that corresponding data type is arranged in the target text class data according to the name body.
Preferably, described specifically to be wrapped according to the name body for the corresponding data type of target text class data setting
It includes:
Whether the quantity for judging the name body extracted is multiple;
If so, being matched respectively to multiple name bodies using pre-set multiple text templates;
When there is the target text template to match with multiple name bodies, according to text template and data type
Data type corresponding with the target text template is arranged for the target text class data in corresponding relationship;
If it is not, being then that corresponding data are arranged in the target text class data according to the data type of the single name body
Type.
Preferably, described specifically to be wrapped according to the name body for the corresponding data type of target text class data setting
It includes:
Whether the quantity for judging the name body extracted is multiple;
If so, calculating TF-IDF value of each name body in the target text class data;
Judge whether each TF-IDF value is greater than preset threshold;
It is greater than the target TF-IDF value of the preset threshold if it exists, then utilizes mesh corresponding with the target TF-IDF value
The data type of mark name body is that corresponding data type is arranged in the target text class data;
If it is not, being then that corresponding data are arranged in the target text class data according to the data type of the single name body
Type.
Preferably, after the metadata and data from the sample survey in the acquisition target database, further comprise:
Judge whether there is the target metadata to match with pre-set Field Template;
It if it exists, then is corresponding with the target metadata according to the corresponding relationship of Field Template and data type
Corresponding data type is arranged in the data from the sample survey.
Preferably, further comprise:
When there is no the target regular expression to match with the target code class data, supervised learning is utilized
Method training obtain characteristic matching model;
The target code class data are input in the characteristic matching model and carry out characteristic matching;
For the target code class data, corresponding data type is set.
Preferably, the data characteristics described in the combination classifies the data from the sample survey, obtains corresponding data class
After type, further comprise:
According to the corresponding relationship of pre-set data type and data-level, corresponding number is set for the data from the sample survey
According to rank, so that user is according to the data from the sample survey of the access authority of itself access corresponding data rank.
In order to solve the above technical problems, the present invention also provides a kind of sorters of data in database, comprising:
Module is obtained, for obtaining metadata and data from the sample survey in target database;
Extraction module, for extracting the data characteristics in the metadata;
Categorization module obtains corresponding data class for the data from the sample survey to be classified in conjunction with the data characteristics
Type.
Preferably, further comprise:
Judgment module, for judging whether there is the target metadata to match with pre-set Field Template;If depositing
Then calling the first setup module;
First setup module is and the target metadata for the corresponding relationship according to Field Template and data type
Corresponding data type is arranged in the corresponding data from the sample survey.
Preferably, further comprise:
Training module, for when there is no the target regular expressions to match with the target code class data
When, target regular expression is obtained using the method training of supervised learning;
Matching module is used in conjunction with the data characteristics, using the target regular expression to the target code class
Data are matched;
Second setup module is the mesh for the corresponding relationship according to the target regular expression and data type
Corresponding data type is arranged in mark coding class data.
Preferably, further comprise:
Data-level setup module, for the corresponding relationship according to pre-set data type and data-level, for institute
It states data from the sample survey and corresponding data-level is set, so that user is according to the access authority of itself access corresponding data rank
Data from the sample survey.
In order to solve the above technical problems, the present invention also provides a kind of sorting devices of data in database, comprising:
Memory, for storing computer program;
Processor realizes the classification method of data in any of the above-described kind of database when for executing the computer program
Step.
In order to solve the above technical problems, the present invention also provides a kind of computer readable storage medium, it is described computer-readable
Computer program is stored on storage medium, the computer program is realized when being executed by processor in any of the above-described kind of database
The step of classification method of data.
The classification method of data in a kind of database provided by the invention, compared to the prior art in data from the sample survey carry out
The method of classification, this method further obtain the metadata in target database, and extract the data characteristics in metadata, then
Classify in conjunction with the data characteristics extracted to data from the sample survey, to obtain the corresponding data type of each data from the sample survey.Due to
The particularity of database self structure, storage be all structuring data, and metadata is for describing data from the sample survey
Data, therefore include the data characteristics of the data from the sample survey of storage in metadata, therefore combine the data characteristics of metadata to pumping
Sample data are classified, and can be improved the accuracy to data from the sample survey classification, namely improve and believe the data stored in database
Cease the accuracy of classification.
In order to solve the above technical problems, the present invention also provides sorter, equipment and the meters of data in a kind of database
Calculation machine readable storage medium storing program for executing, all has above-mentioned beneficial effect.
Detailed description of the invention
It in order to illustrate the embodiments of the present invention more clearly or the technical solution of the prior art, below will be to embodiment or existing
Attached drawing needed in technical description is briefly described, it should be apparent that, the accompanying drawings in the following description is only this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to the attached drawing of offer.
Fig. 1 is the flow chart of the classification method of data in a kind of database provided in an embodiment of the present invention;
Fig. 2 be database shown in FIG. 1 in data classification method step S30 in combined data feature by data from the sample survey into
Row classification, obtains corresponding data type specific flow chart;
Fig. 3 is the structure chart of the sorter of data in a kind of database provided in an embodiment of the present invention;
Fig. 4 is the structure chart of the sorting device of data in a kind of database provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
The core of the embodiment of the present invention is to provide a kind of classification method of data in database, can be improved in database
Data classification accuracy;Another core of the invention is to provide sorter, equipment and the meter of data in a kind of database
Calculation machine readable storage medium storing program for executing, all has above-mentioned beneficial effect.
It is right with reference to the accompanying drawings and detailed description in order to make those skilled in the art more fully understand the present invention program
The present invention is described in further detail.
Fig. 1 is the flow chart of the classification method of data in a kind of database provided in an embodiment of the present invention.As shown in Figure 1,
The classification method of data includes: in a kind of database
S10: the metadata and data from the sample survey in target database are obtained;
The purpose of this step is the metadata and data from the sample survey obtained in target database.Target database refers to needing
Database where the data information classified, metadata refer in target database for describing stored data letter
The data of breath, metadata include the field name that the data information of storage is described and field description, carry out to database table
Table name and database name of description etc., the present embodiment to the concrete type of metadata without limitation.Data from the sample survey can be mesh
Mark all data informations of actual storage in database;It is also possible to the partial data randomly selected from target database letter
Breath, by classifying to data from the sample survey information, to realize the classification to all data informations stored in target database.
It should be noted that in specific implementation, metadata corresponding with data from the sample survey is preferably obtained, thus by reducing acquisition
The quantity of metadata reduces the operating quantity of the data characteristics of the extraction metadata needed to be implemented.Certainly, in other embodiments
In, it is also possible to obtain corresponding metadata according to practical application request, the present embodiment does not limit this.Specifically, can be with
It is directly by way of extracting metadata and data from the sample survey in target database, being also possible to by running preset acquisition
The mode of script achievees the purpose that metadata and data from the sample survey in acquisition target database, and the present embodiment, which does not also do this, to be had
The restriction of body.
S20: the data characteristics in metadata is extracted.
It is understood that the data stored in target database are believed according to the particularity of target database self structure
Breath is usually the data of complete lattice;And metadata is in target database for describing the number of stored data information
According to, therefore include the data characteristics of the data information of storage in metadata.It is understood that metadata is usually textual data
According to, therefore, the data characteristics in metadata is extracted, namely excavate the data characteristics in text data, and mining data feature
Concrete mode is content known in those skilled in the art, therefore is not repeated herein.The purpose of this step is to extract first number
Data characteristics in, it is more smart in order to be carried out to data from the sample survey using as characteristic of division when classifying to data from the sample survey
Quasi- classification.
S30: combined data feature classifies data from the sample survey, obtains corresponding data type.
Specifically, data type refers to carrying out data from the sample survey on the label that classification obtains, in extracting metadata
After data characteristics, in conjunction with the data characteristics extracted, sort operation is carried out to data from the sample survey, to obtain each data from the sample survey pair
The data type answered.Specifically, utilizing regular expression or the basis for naming body to identify when classifying to data from the sample survey
On, the data characteristics gone out according to meta-data extraction is further utilized, obtains the data information stored in current database substantially
Classification, and using data characteristics as the considerations classified to data from the sample survey, classify to data from the sample survey, and be classification
Corresponding data type is arranged in data from the sample survey afterwards.
The classification method of data in a kind of database provided in an embodiment of the present invention, compared to the prior art in sampling number
According to the method classified, this method further obtains the metadata in target database, and extracts the spy of the data in metadata
Sign, classifies to data from the sample survey then in conjunction with the data characteristics extracted, to obtain the corresponding data class of each data from the sample survey
Type.Due to the particularity of database self structure, storage be all structuring data, and metadata be for describe sampling
The data of data, therefore include the data characteristics of the data from the sample survey of storage in metadata, therefore combine the data of metadata special
Sign classifies to data from the sample survey, can be improved the accuracy to data from the sample survey classification, namely improves to storing in database
The accuracy of data information classification.
Fig. 2 be database shown in FIG. 1 in data classification method step S30 in combined data feature by data from the sample survey into
Row classification, obtains corresponding data type specific flow chart.Referring to FIG. 2, combined data feature classifies data from the sample survey,
Corresponding data type is obtained to specifically include:
S21: data from the sample survey is divided into coding class data and text class data by presorting model.
Specifically, in the present embodiment, after getting the metadata and data from the sample survey in target database, being first
Data from the sample survey is divided into coding class data and text class data by presorting model, in order to be utilized respectively and encode class data
Classification method corresponding with text class data classifies to data from the sample survey.
More specifically, in by way of presorting model and data from the sample survey is divided into coding class data and text class data,
It can be by the way that clustering algorithm be arranged in model of presorting, so that data from the sample survey is classified.It should be noted that clustering algorithm
It is varied, for example, K-Means (K mean value) cluster, mean shift clustering, density clustering method (DBSCAN) and solidifying
Poly- hierarchical clustering etc., the present embodiment to the type of the clustering algorithm in model of presorting without limitation.
S22: combined data feature matches coding class data using pre-set regular expression;
S23: according to the corresponding relationship of regular expression and data type, for the target to match with target regular expression
It encodes class data and corresponding data type is set.
Specifically, this step sets coding class data by combined data feature, using pre-set regular expression
Set corresponding data type.It should be noted that coding class data refer to the data with obvious numerical characteristics, such as identity card
Number, phone number etc., coding class data are matched by presetting regular expression.Specifically, can be pre-
Multiple regular expressions are first set, is then utilized respectively multiple regular expressions and target code class data is matched, when certain
When a regular expression and target code class data match, namely obtain target regular expression;It is also possible to using preparatory
A regular expression being arranged matches multiple coding class data using the regular expression, when the regular expression
When matching with target code class data, namely obtain target code class data and corresponding target regular expression.
Specifically, corresponding data type is arranged for each regular expression, therefore work as mesh when regular expression is arranged
Regular expression is marked when being matched to corresponding target code class data, then just for target code class data setting and the target
The then corresponding data type of expression formula.For example, when the corresponding mesh for the target regular expression matching for being used to match identification card number
When mark coding class data, indicate that the data type of the target code class data is ID card No., therefore by the target code class
The data type of data is set as identification card number.
S24: combined data feature knows the name body in method for distinguishing extraction target text class data using name body, and
It is that corresponding data type is arranged in target text class data according to name body.
Specifically, text class data generally comprise Chinese character, English alphabet, phonetic symbol etc., name body is known in method for distinguishing,
Be in advance based on knowledge base or dictionary setting rule template, the Feature Words in rule template include keyword, deictic words and direction word,
Position word (such as tail word), centre word, by matching target text class data with set rule template, when in mesh
When detecting Feature Words in mark text class data, then further combined with the content of text before and after the specific word, target text is extracted
Name body in this class data, and then be that corresponding data type is arranged in target text class data according to the name body extracted.
For example, it is assumed that a target text class data are " Zhang San is reading a book ", then the key in pre-set rule template is utilized
Word " is opened ", in conjunction with knowledge base or dictionary, is extracted " Zhang San ", and set " name " for the data type of " Zhang San ".
As it can be seen that in the present embodiment, encoding class data and text class data by being in advance divided into data from the sample survey, then distinguish
Classified otherwise to corresponding data from the sample survey using regular expression and name body knowledge, to realize raising to database
In data classification accuracy purpose.
Quantity in view of in practical applications, there is the name body extracted otherwise by name body knowledge is more
It is a, then need a kind of suitable mode corresponding data type can be set to the multiple name bodies identified.In above-mentioned implementation
On the basis of example, the present embodiment has made further instruction and optimization to technical solution, specifically, being target text according to name body
This class data are arranged corresponding data type and specifically include:
Whether the quantity for judging the name body extracted is multiple;
If so, being matched respectively to multiple name bodies using pre-set multiple text templates;
It is corresponding with data type according to text template when there is the target text template to match with multiple name bodies
Data type corresponding with target text template is arranged for target text class data in relationship;
If it is not, being then that corresponding data type is arranged in target text class data according to the data type of single name body.
In the present embodiment, method for distinguishing is known using name body and is mentioned in combined data feature as preferred embodiment
After taking the name body in target text class data, first it is whether the quantity for the name body that judgement extracts is multiple, if so,
Then multiple name bodies are matched respectively using pre-set multiple text templates.Specifically, including in text template
Multiple name body information, the name body information for including in different text templates are different, and are passed through and are utilized different text moulds
Plate respectively matches the multiple name bodies extracted, when there is the target text template to match with multiple name bodies,
Indicate multiple name bodies that multiple name bodies included in target text template are and extract from target text class data
It is corresponding, therefore according to the corresponding relationship of pre-set text template and data type, it is arranged for target text class data
Data type corresponding with target text template.If whether the quantity of the name body extracted is one, according to this life
Corresponding data type is arranged for target text class data in the type of name body.
It is that the corresponding data type of target text class data setting is specifically wrapped according to name body in another kind specific implementation
It includes:
Whether the quantity for judging the name body extracted is multiple;
If so, calculating TF-IDF value of each name body in target text class data;
Judge whether each TF-IDF value is greater than preset threshold;
It is greater than the target TF-IDF value of preset threshold if it exists, then utilizes object naming body corresponding with target TF-IDF value
Data type be that corresponding data type is arranged in target text class data;
If it is not, being then that corresponding data type is arranged in target text class data according to the data type of single name body.
It is understood that if the frequency that occurs in target text class data d of name body w is high, and in other texts
Seldom occur in this class data, then it is assumed that name body w has good separating capacity, is adapted to a target text class data d
It comes with other text class data separations, also the data type of body w can will be named as the data of target text class data
Type.
Specifically, calculating each life when the quantity for the name body for judging to extract from target text class data is multiple
TF-IDF (term frequency-inverse document frequency, word of the name body in target text class data
Frequently-inverse document frequency) value, then judge whether each TF-IDF value is greater than preset threshold.It should be noted that calculating TF-IDF value
Mode be content known in those skilled in the art, details are not described herein again.Calculate it is each name body TF-IDF value it
Afterwards, it is greater than the target TF-IDF value of preset threshold if it exists, then it represents that existing, which can be used in, distinguishes target text class data and its
His text carrys out the object naming body of data, therefore, according to the corresponding relationship between TF-IDF value and name body, acquisition and target
Then the corresponding object naming body of TF-IDF value is that the setting of target text class data corresponds to using the data type of object naming body
Data type.It is target text according to the type of this name body if whether the quantity of the name body extracted is one
Corresponding data type is arranged in this class data.
It should be noted that after to the target text class data setting data type for including multiple name bodies, if mesh
Mark text class data are multiple name bodies in the same tables of data, then the present embodiment is also possible to for the tables of data setting table
Type;In addition, if target text class data are multiple name bodies in multiple tables of data in the same database, this reality
It applies example to be also possible to for the data lab setting data type, with specific reference to the composition of multiple name bodies in target text class data
Depending on mode.
As it can be seen that method provided in this embodiment, can be arranged the text class data for including multiple name bodies corresponding
Data type improves the accuracy classified to the data in database.
On the basis of the above embodiments, the present embodiment has made further instruction and optimization to technical solution, specifically,
After obtaining the metadata and data from the sample survey in target database, further comprise:
Judge whether there is the target metadata to match with pre-set Field Template;
It if it exists, then is sampling corresponding with target metadata according to the corresponding relationship of Field Template and data type
Corresponding data type is arranged in data.
Specifically, since metadata can be the column name of a certain column data from the sample survey in tables of data, that is, where column name
Column in data from the sample survey data type it is corresponding with the class name.The present embodiment is by presetting for matching first number
The Field Template of sensitive field in, and the corresponding relationship of Field Template and data type is preset, then obtaining
After metadata and data from the sample survey into target database, metadata is matched using pre-set Field Template,
When the sensitive field for judging to be matched to using Field Template in corresponding target metadata, indicate corresponding to the target metadata
Data from the sample survey data type it is consistent with the data type of the Field Template, so that corresponding data be arranged for the data from the sample survey
Type.
It should be noted that after the data type of corresponding data from the sample survey is provided according to Field Template, Ke Yi
These data from the sample survey that data type has been provided are deleted in the data from the sample survey got, to only need in the next steps
Classify for the data from the sample survey of data type is provided not yet, so that classifying to the data in database
Efficiency.
In the present embodiment, it is contemplated that regular expression is pre-set by technical staff, therefore is utilizing canonical
When expression formula matches coding class data, it is understood that there may be the target regular expressions not matched with target code class data
The case where formula.Therefore, the present embodiment on the basis of the above embodiments, has made further instruction and optimization to technical solution,
Specifically, further comprising:
When there is no the target regular expression to match with target code class data, instructed using the method for supervised learning
Get out characteristic matching model;
Target code class data are input in characteristic matching model and carry out characteristic matching;
For target code class data, corresponding data type is set.
Specifically, determining that there is no the target regular expressions to match with target code class data in the present embodiment
When formula, obtained using encoding samples class data and the training of corresponding data type for matching target by the method for supervised learning
Encode the characteristic matching model of class data.Specifically, advancing with a large amount of as ID card No., telephone number equal samples encode
Class data carry out feature extraction, and the data characteristics extracted is corresponding with known data type, to train and each volume
The corresponding characteristic matching model of data characteristics of code class data.That is, characteristic matching model is a kind of single input and multi-output
Model, by the way that target code class data are input in characteristic matching model, so that characteristic matching model is by target code class number
According to corresponding with the previously known data type data characteristics of data characteristics carry out characteristic matching, to be target code class number
According to the corresponding data type of setting.
As it can be seen that the present embodiment is that there is no corresponding with target code class data in pre-set regular expression
When target regular expression, target code class data are input in characteristic matching model and carry out characteristic matching, is target code
Corresponding data type is arranged in class data, further improves and the accurate of corresponding data type is arranged for the data in database
Degree.
On the basis of the above embodiments, the present embodiment has made further instruction and optimization to technical solution, specifically,
Data from the sample survey is classified in combined data feature, after obtaining corresponding data type, further comprises:
According to the corresponding relationship of pre-set data type and data-level, corresponding data level is set for data from the sample survey
Not, the data from the sample survey of corresponding data rank is accessed according to the access authority of itself so as to user.
It should be noted that in actual operation, the corresponding access authority of different user identity is different, namely not
Same user is able to access that the data-level in target database is different, and the high user of permission, which is able to access that, gets phase
To the more data informations of user that permission is low.Therefore, in the present embodiment, by according to pre-set data type sum number
According to the corresponding relationship of rank, corresponding data-level is set for data from the sample survey.It is set specifically, can be for each data type
A corresponding data-level is set, that is, data type and data-level are one-to-one;But due to the number of user identity
Amount is generally less than the quantity of data type, and therefore, the present embodiment is that a corresponding data-level is arranged for numerous types of data,
That is, multiple data types correspond to same data-level, it is configured with specific reference to actual demand, the present embodiment does not do this
It limits.
As it can be seen that the present embodiment by for data from the sample survey be arranged data-level, so as to user according to the access authority of itself visit
It asks the data from the sample survey of corresponding data rank, the safety of data information can be further increased.
In order to make those skilled in the art better understand the technical solutions in the application, below with reference to practical application field
Scape technical solutions in the embodiments of the present application is described in detail.The classification side of data in the database provided in the present embodiment
Method, the specific steps are as follows:
(1) metadata and data from the sample survey in target database DB are obtained, metadata includes the database of target database
For identifying the title of each column data in data table name and data table in title, target database;
(2) data of actual storage in target database DB are obtained in such a way that operation acquires script, namely is taken out
Sample data;
(3) data characteristics in metadata is extracted in the way of data mining;
(4) by clustering algorithm, data from the sample survey is divided into coding class data and text class data;
(5) combined data feature matches coding class data using pre-set regular expression, will be with canonical
The data type for the coding class data that expression formula matches is set as type corresponding with the regular expression;
(6) combined data feature knows the name body extracted in text class data otherwise using name body, and judges to mention
Whether the quantity of the name body of taking-up is multiple;If name body be it is multiple, using pre-set text template go matching it is more
A name body obtains the text template that can be matched with multiple name bodies, and recycling the type of text template is multiple lives
The corresponding data type of name body setting data;It is text class according to the single data type for naming body if name body is single
Corresponding data type is arranged in data.
The classification method of data in a kind of database provided in an embodiment of the present invention, compared to the prior art in sampling number
According to the method classified, this method further obtains the metadata in target database, and extracts the spy of the data in metadata
Sign, classifies to data from the sample survey then in conjunction with the data characteristics extracted, to obtain the corresponding data class of each data from the sample survey
Type.Due to the particularity of database self structure, storage be all structuring data, and metadata be for describe sampling
The data of data, therefore include the data characteristics of the data from the sample survey of storage in metadata, therefore combine the data of metadata special
Sign classifies to data from the sample survey, can be improved the accuracy to data from the sample survey classification, namely improves to storing in database
The accuracy of data information classification.
Detailed retouch has been carried out for the embodiment of the classification method of data in a kind of database provided by the invention above
It states, the present invention also provides the sorter of data in a kind of database corresponding with this method, equipment and computer-readable deposits
Storage media, due to device, equipment and computer readable storage medium part embodiment and method part embodiment mutually according to
It answers, therefore the embodiment of device, equipment and computer readable storage medium part refers to the description of the embodiment of method part,
Here it wouldn't repeat.
Fig. 3 is the structure chart of the sorter of data in a kind of database provided in an embodiment of the present invention, as shown in figure 3,
The sorter of data includes: in a kind of database
Module 31 is obtained, for obtaining metadata and data from the sample survey in target database;
Extraction module 32, for extracting the data characteristics in metadata;
Data from the sample survey is classified for combined data feature, obtains corresponding data type by categorization module 33.
The sorter of data in database provided in an embodiment of the present invention, the classification side with data in above-mentioned database
The beneficial effect of method.
As preferred embodiment, the sorter of data further comprises in database provided in this embodiment:
Judgment module, for judging whether there is the target metadata to match with pre-set Field Template;If depositing
Then calling the first setup module;
First setup module is opposite with target metadata for the corresponding relationship according to Field Template and data type
Corresponding data type is arranged in the data from the sample survey answered.
As preferred embodiment, the sorter of data further comprises in database provided in this embodiment:
Training module, for utilizing prison when there is no the target regular expression to match with target code class data
The method training that educational inspector practises obtains target regular expression;
Matching module is used for combined data feature, is matched using target regular expression to target code class data;
Second setup module is target code class for the corresponding relationship according to target regular expression and data type
Corresponding data type is arranged in data.
As preferred embodiment, the sorter of data further comprises in database provided in this embodiment:
Data-level setup module, for the corresponding relationship according to pre-set data type and data-level, to take out
Corresponding data-level is arranged in sample data, so that user is according to the sampling number of the access authority of itself access corresponding data rank
According to.
Fig. 4 is the structure chart of the sorting device of data in a kind of database provided in an embodiment of the present invention, as shown in figure 4,
The sorting device of data includes: in a kind of database
Memory 41, for storing computer program;
Processor 42, when for executing computer program in realization such as above-mentioned database the step of the classification method of data.
The sorting device of data in database provided in an embodiment of the present invention, the classification side with data in above-mentioned database
The beneficial effect of method.
In order to solve the above technical problems, the present invention also provides a kind of computer readable storage medium, computer-readable storage
It is stored with computer program on medium, the classification side such as data in above-mentioned database is realized when computer program is executed by processor
The step of method.
Computer readable storage medium provided in an embodiment of the present invention, with the classification method of data in above-mentioned database
Beneficial effect.
Above to the classification method of data, device, equipment and computer-readable storage in database provided by the present invention
Medium is described in detail.Principle and implementation of the present invention are described for specific embodiment used herein,
The above description of the embodiment is only used to help understand the method for the present invention and its core ideas.It should be pointed out that for this technology
For the those of ordinary skill in field, without departing from the principle of the present invention, several improvement can also be carried out to the present invention
And modification, these improvements and modifications also fall within the scope of protection of the claims of the present invention.
Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities
The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment
Speech, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part illustration
?.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure
And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession
Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered
Think beyond the scope of this invention.
Claims (10)
1. the classification method of data in a kind of database characterized by comprising
Obtain the metadata and data from the sample survey in target database;
Extract the data characteristics in the metadata;
The data from the sample survey is classified in conjunction with the data characteristics, obtains corresponding data type.
2. the method according to claim 1, wherein data characteristics described in the combination by the data from the sample survey into
Row classification, obtains corresponding data type and specifically includes:
The data from the sample survey is divided into coding class data and text class data by presorting model;
In conjunction with the data characteristics, the coding class data are matched using pre-set regular expression;
According to the corresponding relationship of regular expression and data type, for the target code to match with target regular expression
Corresponding data type is arranged in class data;
In conjunction with the data characteristics, using the name body in name body knowledge method for distinguishing extraction target text class data, and according to
The name body is that corresponding data type is arranged in the target text class data.
3. according to the method described in claim 2, it is characterized in that, it is described according to the name body be the target text class number
It is specifically included according to corresponding data type is arranged:
Whether the quantity for judging the name body extracted is multiple;
If so, being matched respectively to multiple name bodies using pre-set multiple text templates;
It is corresponding with data type according to text template when there is the target text template to match with multiple name bodies
Data type corresponding with the target text template is arranged for the target text class data in relationship;
If it is not, being then that corresponding data class is arranged in the target text class data according to the data type of the single name body
Type.
4. according to the method described in claim 2, it is characterized in that, it is described according to the name body be the target text class number
It is specifically included according to corresponding data type is arranged:
Whether the quantity for judging the name body extracted is multiple;
If so, calculating TF-IDF value of each name body in the target text class data;
Judge whether each TF-IDF value is greater than preset threshold;
It is greater than the target TF-IDF value of the preset threshold if it exists, then target corresponding with the target TF-IDF value is utilized to order
The data type of name body is that corresponding data type is arranged in the target text class data;
If it is not, being then that corresponding data class is arranged in the target text class data according to the data type of the single name body
Type.
5. the method according to claim 1, wherein metadata and sampling in the acquisition target database
After data, further comprise:
Judge whether there is the target metadata to match with pre-set Field Template;
It if it exists, then is corresponding with the target metadata described according to the corresponding relationship of Field Template and data type
Corresponding data type is arranged in data from the sample survey.
6. according to the method described in claim 2, it is characterized in that, further comprising:
When there is no the target regular expression to match with the target code class data, the side of supervised learning is utilized
Method training obtains characteristic matching model;
The target code class data are input in the characteristic matching model and carry out characteristic matching;
For the target code class data, corresponding data type is set.
7. method according to any one of claims 1 to 6, which is characterized in that the data characteristics described in the combination is by institute
It states data from the sample survey to classify, after obtaining corresponding data type, further comprises:
According to the corresponding relationship of pre-set data type and data-level, corresponding data level is set for the data from the sample survey
Not, the data from the sample survey of corresponding data rank is accessed according to the access authority of itself so as to user.
8. the sorter of data in a kind of database characterized by comprising
Module is obtained, for obtaining metadata and data from the sample survey in target database;
Extraction module, for extracting the data characteristics in the metadata;
Categorization module obtains corresponding data type for the data from the sample survey to be classified in conjunction with the data characteristics.
9. the sorting device of data in a kind of database characterized by comprising
Memory, for storing computer program;
Processor realizes data in database as described in any one of claim 1 to 7 when for executing the computer program
Classification method the step of.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program, the computer program realize data in database as described in any one of claim 1 to 7 when being executed by processor
The step of classification method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811595262.1A CN109597892A (en) | 2018-12-25 | 2018-12-25 | Classification method, device, equipment and the storage medium of data in a kind of database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811595262.1A CN109597892A (en) | 2018-12-25 | 2018-12-25 | Classification method, device, equipment and the storage medium of data in a kind of database |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109597892A true CN109597892A (en) | 2019-04-09 |
Family
ID=65962704
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811595262.1A Pending CN109597892A (en) | 2018-12-25 | 2018-12-25 | Classification method, device, equipment and the storage medium of data in a kind of database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109597892A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427992A (en) * | 2019-07-23 | 2019-11-08 | 杭州城市大数据运营有限公司 | Data matching method, device, computer equipment and storage medium |
CN110704873A (en) * | 2019-09-25 | 2020-01-17 | 全球能源互联网研究院有限公司 | Method and system for preventing sensitive data from being leaked |
CN110727743A (en) * | 2019-10-12 | 2020-01-24 | 杭州城市大数据运营有限公司 | Data identification method and device, computer equipment and storage medium |
CN110781173A (en) * | 2019-10-12 | 2020-02-11 | 杭州城市大数据运营有限公司 | Data identification method and device, computer equipment and storage medium |
CN114860941A (en) * | 2022-07-05 | 2022-08-05 | 南京云创大数据科技股份有限公司 | Industry data management method and system based on data brain |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104731976A (en) * | 2015-04-14 | 2015-06-24 | 海量云图(北京)数据技术有限公司 | Method for finding and sorting private data in data table |
CN106408140A (en) * | 2015-07-27 | 2017-02-15 | 广州西麦信息科技有限公司 | Grading and classifying model method based on power grid enterprise data |
CN106815605A (en) * | 2017-01-23 | 2017-06-09 | 上海上讯信息技术股份有限公司 | A kind of data classification method and equipment based on machine learning |
US20170320103A1 (en) * | 2016-05-04 | 2017-11-09 | Jessica Marie Schreiber | Scalable systems and methods for classifying textile samples |
CN107368892A (en) * | 2017-06-07 | 2017-11-21 | 无锡小天鹅股份有限公司 | Model training method and device based on machine learning |
-
2018
- 2018-12-25 CN CN201811595262.1A patent/CN109597892A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104731976A (en) * | 2015-04-14 | 2015-06-24 | 海量云图(北京)数据技术有限公司 | Method for finding and sorting private data in data table |
CN106408140A (en) * | 2015-07-27 | 2017-02-15 | 广州西麦信息科技有限公司 | Grading and classifying model method based on power grid enterprise data |
US20170320103A1 (en) * | 2016-05-04 | 2017-11-09 | Jessica Marie Schreiber | Scalable systems and methods for classifying textile samples |
CN106815605A (en) * | 2017-01-23 | 2017-06-09 | 上海上讯信息技术股份有限公司 | A kind of data classification method and equipment based on machine learning |
CN107368892A (en) * | 2017-06-07 | 2017-11-21 | 无锡小天鹅股份有限公司 | Model training method and device based on machine learning |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427992A (en) * | 2019-07-23 | 2019-11-08 | 杭州城市大数据运营有限公司 | Data matching method, device, computer equipment and storage medium |
CN110704873A (en) * | 2019-09-25 | 2020-01-17 | 全球能源互联网研究院有限公司 | Method and system for preventing sensitive data from being leaked |
CN110704873B (en) * | 2019-09-25 | 2021-05-25 | 全球能源互联网研究院有限公司 | Method and system for preventing sensitive data from being leaked |
CN110727743A (en) * | 2019-10-12 | 2020-01-24 | 杭州城市大数据运营有限公司 | Data identification method and device, computer equipment and storage medium |
CN110781173A (en) * | 2019-10-12 | 2020-02-11 | 杭州城市大数据运营有限公司 | Data identification method and device, computer equipment and storage medium |
CN114860941A (en) * | 2022-07-05 | 2022-08-05 | 南京云创大数据科技股份有限公司 | Industry data management method and system based on data brain |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109597892A (en) | Classification method, device, equipment and the storage medium of data in a kind of database | |
CN107766371B (en) | Text information classification method and device | |
CN110020424B (en) | Contract information extraction method and device and text information extraction method | |
WO2020252919A1 (en) | Resume identification method and apparatus, and computer device and storage medium | |
CN110427612B (en) | Entity disambiguation method, device, equipment and storage medium based on multiple languages | |
CN110298039B (en) | Event place identification method, system, equipment and computer readable storage medium | |
CN110457585B (en) | Negative text pushing method, device and system and computer equipment | |
CN109299277A (en) | The analysis of public opinion method, server and computer readable storage medium | |
CN111291177A (en) | Information processing method and device and computer storage medium | |
CN111177367A (en) | Case classification method, classification model training method and related products | |
CN111143507A (en) | Reading understanding method based on composite problems | |
CN112132238A (en) | Method, device, equipment and readable medium for identifying private data | |
CN113076735A (en) | Target information acquisition method and device and server | |
CN112749283A (en) | Entity relationship joint extraction method for legal field | |
CN115936624A (en) | Basic level data management method and device | |
CN110674297B (en) | Public opinion text classification model construction method, public opinion text classification device and public opinion text classification equipment | |
CN115687647A (en) | Notarization document generation method and device, electronic equipment and storage medium | |
CN113722492A (en) | Intention identification method and device | |
CN116402166B (en) | Training method and device of prediction model, electronic equipment and storage medium | |
CN114842982B (en) | Knowledge expression method, device and system for medical information system | |
CN108520012B (en) | Mobile internet user comment mining method based on machine learning | |
CN111401047A (en) | Method and device for generating dispute focus of legal document and computer equipment | |
CN115130455A (en) | Article processing method and device, electronic equipment and storage medium | |
CN114997167A (en) | Resume content extraction method and device | |
CN113590792A (en) | User problem processing method and device and server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190409 |
|
RJ01 | Rejection of invention patent application after publication |