CN109408555A - Data type recognition methods and device, data storage method and device - Google Patents

Data type recognition methods and device, data storage method and device Download PDF

Info

Publication number
CN109408555A
CN109408555A CN201811096054.7A CN201811096054A CN109408555A CN 109408555 A CN109408555 A CN 109408555A CN 201811096054 A CN201811096054 A CN 201811096054A CN 109408555 A CN109408555 A CN 109408555A
Authority
CN
China
Prior art keywords
data
feature
column
training
disaggregated model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811096054.7A
Other languages
Chinese (zh)
Other versions
CN109408555B (en
Inventor
王海波
李晓宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan Smartq Beijing Mdt Infotech Ltd
Original Assignee
Yunnan Smartq Beijing Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan Smartq Beijing Mdt Infotech Ltd filed Critical Yunnan Smartq Beijing Mdt Infotech Ltd
Priority to CN201811096054.7A priority Critical patent/CN109408555B/en
Publication of CN109408555A publication Critical patent/CN109408555A/en
Application granted granted Critical
Publication of CN109408555B publication Critical patent/CN109408555B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The invention discloses a kind of data type recognition methods and devices, data storage method and device, wherein in the data type recognition methods, comprising: S1 obtains column data to be identified, includes column head and data content in column data;The feature that S2 extracts column data obtains feature vector, includes column head feature and data content feature in feature vector;S3 will classify to it in the disaggregated model of feature vector input pre-training, complete the identification to column data.It obtains feature vector according to the column head and data content of column data, inputs in the disaggregated model of pre-training and classifies to it, obtains its affiliated semantic attribute, the identification to structured data type is completed, it is simple and convenient and high-efficient, without artificial intervention, manpower and material resources are greatly saved.

Description

Data type recognition methods and device, data storage method and device
Technical field
The present invention relates to technical field of data processing, in particular to a kind of data type recognition methods and device, data enter Library method and device.
Background technique
Structured data analysis is one of important link in data mining.For the storage of the formatted files such as Excel, csv Structural data is difficult directly to be analyzed.Analyst would generally complete complexity by relevant database or chart database Analysis operation, that is, need by the data deposit relevant database or chart database in file, then by other analysis blocks Frame completes analysis work.During storage, analyst needs the column data and data in the file of the formats such as Excel, csv A field in library is mapped.
Currently, usually there are two types of modes for the fields match problem during data loading: one is analysts to manually complete Mapping, needs a large amount of manpower intervention, time-consuming and laborious, inefficiency.Another kind is the effect for reaching automatic mapping by strategy Fruit can be realized by following two mode: manually mapping before 1. records as a result, if current file arranges (usually with column head As standard) it is processed before, then Rapid matching maps;2. being matched by the hardness of column head and Database field or canonical It completes to map with equal strategies.Both modes all there is a problem of it is inflexible, when appearance one arrange it is no processed similar Data, it is still desirable to manpower intervention.
Summary of the invention
The object of the present invention is to provide a kind of data type recognition methods and devices, data storage method and device, effectively The technical issues of structured data type identification is inflexible in the prior art for solution, inefficiency.
Technical solution provided by the invention is as follows:
A kind of data type recognition methods, comprising:
S1 obtains column data to be identified, includes column head and data content in the column data;
The feature that S2 extracts the column data obtains feature vector, includes in column head feature and data in described eigenvector Hold feature;
S3 will classify to it in the disaggregated model of feature vector input pre-training, complete the identification to column data.
It is further preferred that in step s 2, comprising:
S21 extracts the column head in column data, obtains column head feature;
S22 extracts the first default feature of single data in data content;
S23 extracts the second default feature for all data contents;
S24 splicing column head feature, the first default feature and the second default feature obtain the feature vector of the column data.
It is further preferred that in the step s 21, using word incorporation model by column head be converted to the feature of default dimension to Amount;
And/or in step S22, it is special to extract the string length of single data, format and constitution element in data content Sign;
And/or in step S23, dispersion, continuity and Variance feature are extracted for all data contents.
It is further preferred that further including the steps that being trained disaggregated model before step S1, comprising:
S01 selectes training corpus, and carries out pretreatment operation to it;
S02 selectes disaggregated model;
S03 extracts training sample from the training corpus after pretreatment operation;
S04 marks class categories to the training sample of extraction;
S05 will be labelled in the training sample of class categories and input disaggregated model, be trained to it.
The present invention also provides a kind of data storage methods, including above-mentioned data type recognition methods, further includes:
S4 obtains its affiliated semantic attribute, the classification of the disaggregated model output according to the class categories that disaggregated model exports Mapping relations are prestored between classification and semantic attribute belonging to it;
S5 matches semantic attribute belonging to obtained column data with the semantic attribute of Database field, completes to column The in-stockroom operation of data, the semantic attribute of semantic attribute and Database field belonging to the column data of disaggregated model output it Between prestore mapping relations.
The present invention also provides a kind of data type identification devices, comprising:
Data acquisition module includes column head and data content in the column data for obtaining column data to be identified;
Characteristic extracting module obtains feature vector for extracting the feature of column data of data acquisition module acquisition, described It include column head feature and data content feature in feature vector;
Data categorization module, it is right in the disaggregated model of pre-training that the feature vector for extracting characteristic extracting module inputs It is classified, and the identification to column data is completed.
It is further preferred that including: in characteristic extracting module
Feature extraction unit obtains column head feature for extracting the column head in column data;Extract single number in data content According to the first default feature;And the second default feature is extracted for all data contents;
Merging features unit obtains the columns for splicing column head feature, the first default feature and the second default feature According to feature vector.
It is further preferred that column head to be converted to the spy of default dimension using word incorporation model in feature extraction unit Levy vector;Extract string length, format and the constitution element feature of single data in data content;And in all data Hold and extracts dispersion, continuity and Variance feature.
It is further preferred that the identification device further includes training module, for being trained to disaggregated model;The instruction Practice in module and includes:
Corpus pretreatment unit carries out pretreatment operation for selecting training corpus, and to it;
Sample extraction unit, for extracting training sample from the training corpus after pretreatment operation;
Unit is marked, for marking class categories to the training sample of extraction;
Training unit inputs selected disaggregated model and instructs to it for will be labelled in the training sample of class categories Practice.
The present invention also provides a kind of data loading devices, including above-mentioned data type identification device, further includes:
Matching module, the class categories for being exported according to disaggregated model obtain its affiliated semantic attribute, and for that will count It is matched according to the recognition result of type identification device with the semantic attribute of Database field, completes to grasp the storage of column data Make, wherein prestore mapping relations, column data between the class categories of the disaggregated model output and the semantic attribute belonging to it Mapping relations are prestored between the semantic attribute of affiliated semantic attribute and Database field, are stored in memory module.
In data type recognition methods provided by the invention and device, obtained according to the column head and data content of column data Feature vector inputs in the disaggregated model of pre-training and classifies to it, obtains its affiliated semantic attribute, completes to structuring number It is simple and convenient according to the identification of type and high-efficient, without artificial intervention, manpower and material resources are greatly saved;In addition, can be directed to The different corresponding disaggregated models of application scenarios training, is widely used.It, only need to be by columns during structural data storage According to semantic attribute and Database field semantic attribute establish map, can be realized it is quick, flexible, accurately mapping recommend.
Detailed description of the invention
Below by clearly understandable mode, preferred embodiment is described with reference to the drawings, to a kind of log processing side Above-mentioned characteristic, technical characteristic, advantage and its implementation of method and system are further described.
Fig. 1 is data type recognition methods flow diagram in the present invention;
Fig. 2 is disaggregated model training flow diagram in the present invention;
Fig. 3 is data type identification device schematic diagram in the present invention.
Description of symbols:
100- graph data structure converter, 110- entity split module, 120- entity merging module, and 130- link splits mould Block.
Specific embodiment
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, Detailed description of the invention will be compareed below A specific embodiment of the invention.It should be evident that drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing, and obtain other embodiments.
To make simplified form, part related to the present invention is only schematically shown in each figure, they are not represented Its practical structures as product.In addition, there is identical structure or function in some figures so that simplified form is easy to understand Component only symbolically depicts one of those, or has only marked one of those.Herein, "one" is not only indicated " only this ", can also indicate the situation of " more than one ".
It is as shown in Figure 1 data type recognition methods flow diagram provided by the invention, it can be seen from the figure that at this Include: that S1 obtains column data to be identified in recognition methods, includes column head and data content in column data;S2 extracts column data Feature obtains feature vector, includes column head feature and data content feature in feature vector;Feature vector is inputted pre-training by S3 Disaggregated model in classify to it, complete identification to column data.
In the method, column data refers to the column data in the file of the formats such as Excel, csv, under normal conditions its data Format are as follows: column head+data content, wherein column head is the first row data in file, for describing the content when forefront.Word is embedding Entering the type that (word embedding) is a kind of word indicates that it is by vocabulary that the word with similar import, which has similar expression, It is mapped to the method general name of real vector.In data type identification process, the column data in file is considered as an analysis Object, and be divided into three phases and realize purpose: feature extraction, disaggregated model training and data classification.
During feature extraction, since column data is divided into column head and data content two parts, therefore feature extraction also divides For the feature extraction of column head and data Content Feature Extraction two parts.It is generally the feature description of column data by column head, therefore leads to here Cross word incorporation model (such as Word2Vec, CBoW, Skip-Gram Model) by column head be converted into the feature of specified dimension to Amount.For column data content part, data sample is obtained by sampling technique first, later to the single data (one in sample Row) it extracts string length, format, the features such as constitution element and obtains the first default feature;All samples of sampling are mentioned The features such as dispersion, continuity, variance are taken to obtain the second default feature;Finally, by obtained column head feature, the first default feature And second default feature spliced the feature vector for obtaining column data, retouched as the feature to present analysis object (column data) It states.
After feature extraction is completed, it will classify in the disaggregated model after feature vector feeding training, obtain belonging to it Classification results, data type automatic identification is achieved the purpose that with this.For disaggregated model concrete form here without limitation, As long as it is able to achieve goal of the invention, be included in the contents of the present invention, such as can be used svm (support vector machines), decision tree, The disaggregated models such as random forest, neural network (deep learning).
As shown in Fig. 2, during being trained to disaggregated model, comprising: S01 selectes training corpus, and to its into Row pretreatment operation;S02 selectes disaggregated model;S03 extracts training sample from the training corpus after pretreatment operation;S04 Class categories are marked to the training sample of extraction;S05 will be labelled in the training sample of class categories and input disaggregated model, to it It is trained.
Specifically, more demanding to training corpus by word incorporation model, should select as far as possible when selecting training corpus can cover Cover the article in the field that data to be analyzed (data in Excel, csv file) are related to.Later, according to specific usage scenario pair It is pre-processed, such as: it deletes English, delete additional character, simple complex form of Chinese characters conversion, then select such as jieba, HanLP Segmentation methods carry out word segmentation processing to corpus.
Before training, the class categories needed are determined according to business scenario, and according to the scalar mapping phase of class categories The classification results of quantity are answered, such as, it is assumed that classify including n, is then 0,1 by each classification map ..., (n-1).Later, from training Training sample (feature vector after training corpus (column data) feature extraction specially chosen) is extracted in corpus, and to each instruction Practice sample and mark class categories, the content marked here is specially the classification results mapped according to classification type, if mapping relations For number, then markup information is corresponding number.Selected training sample should cover all class categories, and of all categories Corresponding training samples number should not have big difference, and should divide equally as far as possible.
For disaggregated model, word incorporation model of the Word2vec model as a kind of prevalence, by a variety of open sources Frame is integrated.The present invention is trained pretreated corpus by gensim Open Framework, using word2vec model.Choosing Sorting algorithm can be svm, decision tree, random forest, neural network etc..Based on selected training sample to disaggregated model Have supervision after training, can be used to the identification to column data type.
Based on above-mentioned data type recognition methods, the present invention also provides a kind of data storage methods, in the method, remove It include except above-mentioned data type recognition methods, further includes: belonging to S4 obtains it according to the class categories that disaggregated model exports Semantic attribute prestores mapping relations between the class categories of disaggregated model output and the semantic attribute belonging to it;S5 will be obtained Column data belonging to semantic attribute matched with the semantic attribute of Database field, complete to the in-stockroom operation of column data, Mapping relations are prestored between semantic attribute and the semantic attribute of Database field belonging to the column data of disaggregated model output.
In the method, column head is the first row data in file, and the content for describing to work as forefront, is semantic attribute Different expression ways;Semantic attribute is used to describe the feature of a column data, is built upon a kind of high level on low-level image feature and retouches It states, such as identification card number, phone number etc..In general, all there is corresponding semantic attribute in structural data (including column data), Database field in database equally exists its corresponding semantic attribute.Since the column head and Database field of column data are all A kind of statement of semantic attribute, and same semantic attribute can be difficult directly to pass through column head and data by a variety of form of presentation Mapping is completed in the matching of library field, such as: Database field phone_num, column head are phone number, cell-phone number, caller number Code etc., therefore the mapping of column data to Database field is completed in the matching in this method by semantic attribute.
After disaggregated model output category result (corresponding a certain classification type), divide class categories by searching for storage With its belonging to semantic attribute between mapping relations, obtain semantic attribute belonging to classification results;Later, it further searches for arranging Mapping relations between semantic attribute belonging to data and the semantic attribute of Database field, i.e., with the database in database Field is matched, and column data is stored in corresponding position in database.In other embodiments, it is instructed to disaggregated model In experienced process, the classification knot of semantic attribute needed for being determined according to business scenario (covering Database field) mapping respective numbers Fruit, it is similar, it is assumed that including n semantic attribute, classification map 0,1 ..., (n-1).It is inputted by the feature vector of column data After disaggregated model, the semantic attribute of the column vector directly is obtained according to the mapping relations of classification results and semantic attribute, later It is matched with the semantic attribute of Database field again.
It is illustrated in figure 3 100 schematic diagram of data type identification device provided by the invention, it can be seen from the figure that at this It include: data acquisition module 110, characteristic extracting module 120 and data categorization module 130 in data type identification device 100, In, characteristic extracting module 120 is connect with data acquisition module 110 and data categorization module 130 respectively.During the work time, first First, data acquisition module 110 obtains column data to be identified, includes column head and data content in column data;Later, feature extraction The feature that module 120 extracts the column data that data acquisition module 110 obtains obtains feature vector, includes that column head is special in feature vector Sign and data content feature;Finally, the pre- instruction of feature vector input that data categorization module 130 extracts characteristic extracting module 120 Classify in experienced disaggregated model to it, completes the identification to column data.
Specifically, column data refers to the column data in the file of the formats such as Excel, csv, under normal conditions its data format Are as follows: column head+data content, wherein column head is the first row data in file, for describing the content when forefront.Word insertion (word embedding) is that a kind of type of word indicates that it is to reflect vocabulary that the word with similar import, which has similar expression, It is mapped to the method general name of real vector.In data type identification process, the column data in file is considered as an analysis pair As, and be divided into three phases and realize purpose: feature extraction, disaggregated model training and data classification.
It specifically, include feature extraction unit and merging features unit in characteristic extracting module 120.In the process of feature extraction In, since column data is divided into column head and data content two parts, therefore feature extraction is also classified into the feature extraction of column head and data content Feature extraction two parts.The feature description of column data is generally by column head, therefore feature extraction unit passes through word incorporation model here Column head is converted into the feature vector of specified dimension by (such as Word2Vec, CBoW, Skip-Gram Model).For column data Content part obtains data sample by sampling technique first, and feature extraction unit is to the single data (one in sample later Row) it extracts string length, format, the features such as constitution element and obtains the first default feature;All samples of sampling are mentioned The features such as dispersion, continuity, variance are taken to obtain the second default feature;Finally, merging features unit by obtained column head feature, First default feature and the second default feature are spliced the feature vector for obtaining column data, as to present analysis object (column Data) feature description.
After feature extraction is completed, it will classify in the disaggregated model after feature vector feeding training, obtain belonging to it Classification results, data type automatic identification is achieved the purpose that with this.For disaggregated model concrete form here without limitation, As long as it is able to achieve goal of the invention, be included in the contents of the present invention, such as can be used svm (support vector machines), decision tree, The disaggregated models such as random forest, neural network (deep learning).
It include: corpus pretreatment unit, sample extraction unit, mark unit and training unit in training module, wherein Sample extraction unit is connect with corpus pretreatment unit, and mark unit is connect with sample extraction unit, and training unit and mark are single Member connection.During being trained to disaggregated model, corpus pretreatment unit selectes training corpus, and is located in advance to it After reason operation;Sample extraction unit extracts training sample from the training corpus after pretreatment operation;Then, unit is marked Class categories are marked to the training sample of extraction;Finally, training unit, which will be labelled in the training sample of class categories, inputs choosing Fixed disaggregated model is trained it.
Specifically, more demanding to training corpus by word incorporation model, should select as far as possible when selecting training corpus can cover Cover the article in the field that data to be analyzed (data in Excel, csv file) are related to.Later, corpus pretreatment unit is according to spy Fixed usage scenario pre-processes it, such as: it deletes English, delete additional character, simple complex form of Chinese characters conversion, then select such as The segmentation methods such as jieba, HanLP carry out word segmentation processing to corpus.
Before training, the class categories needed are determined according to business scenario, and according to the scalar mapping phase of class categories The classification results of quantity are answered, such as, it is assumed that classify including n, is then 0,1 by each classification map ..., (n-1).Later, sample mentions Take unit from extracted in training corpus training sample (feature after training corpus (column data) feature extraction specially chosen to Amount), and class categories are marked to each training sample by mark unit, the content marked here is specially to be reflected according to classification type The classification results penetrated, if mapping relations are number, markup information is corresponding number.Selected training sample should cover All class categories, and corresponding training samples number of all categories should not have big difference, and should divide equally as far as possible.
For disaggregated model, word incorporation model of the Word2vec model as a kind of prevalence, by a variety of open sources Frame is integrated.The present invention is trained pretreated corpus by gensim Open Framework, using word2vec model.Choosing Sorting algorithm can be svm, decision tree, random forest, neural network etc..Training unit is based on selected training sample pair Disaggregated model have supervision after training, can be used to the identification to column data type.
Based on this, the present invention also provides a kind of data loading devices, in addition to including above-mentioned data type identification device, also It include: matching module, the class categories for being exported according to disaggregated model obtain its affiliated semantic attribute, and are used for data class The recognition result of type identification device is matched with the semantic attribute of Database field, completes the in-stockroom operation to column data, In, mapping relations, language belonging to column data are prestored between the class categories of disaggregated model output and the semantic attribute belonging to it Mapping relations are prestored between adopted attribute and the semantic attribute of Database field, are stored in memory module.
In the data loading device, after disaggregated model output category result (corresponding a certain classification type), by looking into The mapping relations between point class categories of storage and the semantic attribute belonging to it are looked for, the category of semanteme belonging to classification results is obtained Property;Later, the mapping relations between semantic attribute belonging to column data and the semantic attribute of Database field are further searched for, It is matched with the Database field in database, column data is stored in corresponding position in database.
It should be noted that above-described embodiment can be freely combined as needed.The above is only preferred implementations of the invention Mode, it is noted that for those skilled in the art, without departing from the principle of the present invention, also Several improvements and modifications can be made, these modifications and embellishments should also be considered as the scope of protection of the present invention.

Claims (10)

1. a kind of data type recognition methods, which is characterized in that the recognition methods includes:
S1 obtains column data to be identified, includes column head and data content in the column data;
The feature that S2 extracts the column data obtains feature vector, includes that column head feature and data content are special in described eigenvector Sign;
S3 will classify to it in the disaggregated model of feature vector input pre-training, complete the identification to column data.
2. recognition methods as described in claim 1, which is characterized in that in step s 2, comprising:
S21 extracts the column head in column data, obtains column head feature;
S22 extracts the first default feature of single data in data content;
S23 extracts the second default feature for all data contents;
S24 splicing column head feature, the first default feature and the second default feature obtain the feature vector of the column data.
3. recognition methods as claimed in claim 2, which is characterized in that
In the step s 21, column head is converted to the feature vector of default dimension using word incorporation model;
And/or in step S22, string length, format and the constitution element feature of single data in data content are extracted;
And/or in step S23, dispersion, continuity and Variance feature are extracted for all data contents.
4. recognition methods as claimed in any one of claims 1-3, which is characterized in that before step S1, further include to point The step of class model is trained, comprising:
S01 selectes training corpus, and carries out pretreatment operation to it;
S02 selectes disaggregated model;
S03 extracts training sample from the training corpus after pretreatment operation;
S04 marks class categories to the training sample of extraction;
S05 will be labelled in the training sample of class categories and input disaggregated model, be trained to it.
5. a kind of data storage method, which is characterized in that include such as claim 1-4 any one in the data storage method The data type recognition methods, further includes:
S4 obtains its affiliated semantic attribute, the class categories of the disaggregated model output according to the class categories that disaggregated model exports With its belonging to semantic attribute between prestore mapping relations;
S5 matches semantic attribute belonging to obtained column data with the semantic attribute of Database field, completes to column data In-stockroom operation, it is pre- between semantic attribute and the semantic attribute of Database field belonging to the column data of disaggregated model output There are mapping relations.
6. a kind of data type identification device, which is characterized in that the identification device includes:
Data acquisition module includes column head and data content in the column data for obtaining column data to be identified;
Characteristic extracting module obtains feature vector, the feature for extracting the feature of column data of data acquisition module acquisition It include column head feature and data content feature in vector;
Data categorization module, for characteristic extracting module to be extracted feature vector input pre-training disaggregated model in its into Row classification, completes the identification to column data.
7. identification device as claimed in claim 6, which is characterized in that include: in characteristic extracting module
Feature extraction unit obtains column head feature for extracting the column head in column data;Extract single data in data content First default feature;And the second default feature is extracted for all data contents;
Merging features unit obtains the column data for splicing column head feature, the first default feature and the second default feature Feature vector.
8. identification device as claimed in claim 7, which is characterized in that
In feature extraction unit, column head is converted to the feature vector of default dimension using word incorporation model;It extracts in data The string length of single data, format and constitution element feature in appearance;And dispersion, continuous is extracted for all data contents Property and Variance feature.
9. the identification device as described in claim 6-8 any one, which is characterized in that the identification device further includes trained mould Block, for being trained to disaggregated model;Include: in the training module
Corpus pretreatment unit carries out pretreatment operation for selecting training corpus, and to it;
Sample extraction unit, for extracting training sample from the training corpus after pretreatment operation;
Unit is marked, for marking class categories to the training sample of extraction;
Training unit inputs selected disaggregated model and is trained to it for will be labelled in the training sample of class categories.
10. a kind of data loading device, which is characterized in that include as claim 6-8 is any one in the data loading device Data type identification device described in, further includes:
Matching module, the class categories for being exported according to disaggregated model obtain its affiliated semantic attribute, and are used for data class The recognition result of type identification device is matched with the semantic attribute of Database field, completes the in-stockroom operation to column data, In, mapping relations are prestored between the class categories of disaggregated model output and the semantic attribute belonging to it, belonging to column data Semantic attribute and Database field semantic attribute between prestore mapping relations, be stored in memory module.
CN201811096054.7A 2018-09-19 2018-09-19 Data type identification method and device and data storage method and device Active CN109408555B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811096054.7A CN109408555B (en) 2018-09-19 2018-09-19 Data type identification method and device and data storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811096054.7A CN109408555B (en) 2018-09-19 2018-09-19 Data type identification method and device and data storage method and device

Publications (2)

Publication Number Publication Date
CN109408555A true CN109408555A (en) 2019-03-01
CN109408555B CN109408555B (en) 2022-11-11

Family

ID=65465012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811096054.7A Active CN109408555B (en) 2018-09-19 2018-09-19 Data type identification method and device and data storage method and device

Country Status (1)

Country Link
CN (1) CN109408555B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109993235A (en) * 2019-04-10 2019-07-09 苏州浪潮智能科技有限公司 A kind of multivariate data classification method and device
CN110134957A (en) * 2019-05-14 2019-08-16 云南电网有限责任公司电力科学研究院 A kind of scientific and technological achievement storage method and system based on semantic analysis
CN110232150A (en) * 2019-05-21 2019-09-13 平安科技(深圳)有限公司 A kind of Users'Data Analysis method, apparatus, readable storage medium storing program for executing and terminal device
CN111046632A (en) * 2019-11-29 2020-04-21 智器云南京信息科技有限公司 Data extraction and conversion method, system, storage medium and electronic equipment
CN111104466A (en) * 2019-12-25 2020-05-05 航天科工网络信息发展有限公司 Method for rapidly classifying massive database tables
CN113312354A (en) * 2021-06-10 2021-08-27 北京百度网讯科技有限公司 Data table identification method, device, equipment and storage medium
CN114781471A (en) * 2021-06-02 2022-07-22 清华大学 Entity record matching method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970736A (en) * 2013-01-25 2014-08-06 苏州精易会信息技术有限公司 Method for converting Excel sheet to database table
CN105825138A (en) * 2015-01-04 2016-08-03 北京神州泰岳软件股份有限公司 Sensitive data identification method and device
CN106503222A (en) * 2016-11-04 2017-03-15 上海轻维软件有限公司 Batch based on Excel imports the method and device of management data base
CN106776843A (en) * 2016-11-28 2017-05-31 浪潮软件集团有限公司 Method for importing excel file based on xml analysis
CN107527070A (en) * 2017-08-25 2017-12-29 江苏赛睿信息科技股份有限公司 Recognition methods, storage medium and the server of dimension data and achievement data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970736A (en) * 2013-01-25 2014-08-06 苏州精易会信息技术有限公司 Method for converting Excel sheet to database table
CN105825138A (en) * 2015-01-04 2016-08-03 北京神州泰岳软件股份有限公司 Sensitive data identification method and device
CN106503222A (en) * 2016-11-04 2017-03-15 上海轻维软件有限公司 Batch based on Excel imports the method and device of management data base
CN106776843A (en) * 2016-11-28 2017-05-31 浪潮软件集团有限公司 Method for importing excel file based on xml analysis
CN107527070A (en) * 2017-08-25 2017-12-29 江苏赛睿信息科技股份有限公司 Recognition methods, storage medium and the server of dimension data and achievement data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姚泱: "导入Excel时对字段自动匹配", 《ACCESS》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109993235A (en) * 2019-04-10 2019-07-09 苏州浪潮智能科技有限公司 A kind of multivariate data classification method and device
CN110134957A (en) * 2019-05-14 2019-08-16 云南电网有限责任公司电力科学研究院 A kind of scientific and technological achievement storage method and system based on semantic analysis
CN110134957B (en) * 2019-05-14 2023-06-13 云南电网有限责任公司电力科学研究院 Scientific and technological achievement warehousing method and system based on semantic analysis
CN110232150A (en) * 2019-05-21 2019-09-13 平安科技(深圳)有限公司 A kind of Users'Data Analysis method, apparatus, readable storage medium storing program for executing and terminal device
CN110232150B (en) * 2019-05-21 2023-04-14 平安科技(深圳)有限公司 User data analysis method and device, readable storage medium and terminal equipment
CN111046632A (en) * 2019-11-29 2020-04-21 智器云南京信息科技有限公司 Data extraction and conversion method, system, storage medium and electronic equipment
CN111046632B (en) * 2019-11-29 2023-11-10 智器云南京信息科技有限公司 Data extraction and conversion method, system, storage medium and electronic equipment
CN111104466A (en) * 2019-12-25 2020-05-05 航天科工网络信息发展有限公司 Method for rapidly classifying massive database tables
CN114781471A (en) * 2021-06-02 2022-07-22 清华大学 Entity record matching method and system
CN114781471B (en) * 2021-06-02 2022-12-27 清华大学 Entity record matching method and system
CN113312354A (en) * 2021-06-10 2021-08-27 北京百度网讯科技有限公司 Data table identification method, device, equipment and storage medium
CN113312354B (en) * 2021-06-10 2023-07-28 北京百度网讯科技有限公司 Data table identification method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN109408555B (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN109408555A (en) Data type recognition methods and device, data storage method and device
CN106156365B (en) A kind of generation method and device of knowledge mapping
CN107766371B (en) Text information classification method and device
KR101657495B1 (en) Image recognition method using deep learning analysis modular systems
CN107943911A (en) Data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing
US20170337260A1 (en) Method and device for storing data
CN105243055B (en) Based on multilingual segmenting method and device
CN112232058A (en) False news identification method and system based on deep learning three-layer semantic extraction framework
US11243971B2 (en) System and method of database creation through form design
CN108399157B (en) Dynamic extraction method of entity and attribute relationship, server and readable storage medium
CN110750977B (en) Text similarity calculation method and system
CN110209828A (en) Case querying method and case inquiry unit, computer equipment and storage medium
KR20210106372A (en) New category tag mining method and device, electronic device and computer-readable medium
CN109933671A (en) Construct method, apparatus, computer equipment and the storage medium of personal knowledge map
CN109033282A (en) A kind of Web page text extracting method and device based on extraction template
CN108536673B (en) News event extraction method and device
CN114239588A (en) Article processing method and device, electronic equipment and medium
CN109635125B (en) Vocabulary atlas building method and electronic equipment
CN114970514A (en) Artificial intelligence based Chinese word segmentation method, device, computer equipment and medium
CN110321557A (en) A kind of file classification method, device, electronic equipment and storage medium
CN115759293A (en) Model training method, image retrieval device and electronic equipment
CN110197175A (en) A kind of method and system of books title positioning and part-of-speech tagging
CN109522407A (en) Business connection prediction technique, device, computer equipment and storage medium
CN111401047A (en) Method and device for generating dispute focus of legal document and computer equipment
CN115563278A (en) Question classification processing method and device for sentence text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Data type identification method and device, data entry method and device

Effective date of registration: 20231027

Granted publication date: 20221111

Pledgee: Bank of Hangzhou Limited by Share Ltd. Nanjing branch

Pledgor: COGNITIVE COMPUTING NANJING INFORMATION TECHNOLOGY Co.,Ltd.

Registration number: Y2023980062710

PE01 Entry into force of the registration of the contract for pledge of patent right