CN110362620A - A kind of list data structure method based on machine learning - Google Patents

A kind of list data structure method based on machine learning Download PDF

Info

Publication number
CN110362620A
CN110362620A CN201910623601.0A CN201910623601A CN110362620A CN 110362620 A CN110362620 A CN 110362620A CN 201910623601 A CN201910623601 A CN 201910623601A CN 110362620 A CN110362620 A CN 110362620A
Authority
CN
China
Prior art keywords
score
row
column
processed
electrical form
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910623601.0A
Other languages
Chinese (zh)
Other versions
CN110362620B (en
Inventor
廖闻剑
李曙光
宋万军
姜广栋
杨万刚
尹若成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NANJING FIBERHOME INFORMATION DEVELOPMENT Co Ltd
Original Assignee
NANJING FIBERHOME INFORMATION DEVELOPMENT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NANJING FIBERHOME INFORMATION DEVELOPMENT Co Ltd filed Critical NANJING FIBERHOME INFORMATION DEVELOPMENT Co Ltd
Priority to CN201910623601.0A priority Critical patent/CN110362620B/en
Publication of CN110362620A publication Critical patent/CN110362620A/en
Application granted granted Critical
Publication of CN110362620B publication Critical patent/CN110362620B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus

Abstract

The list data structure method based on machine learning that the present invention relates to a kind of, quantity statistics are carried out for the object in each unit lattice in great amount of samples electrical form, constitute dictionary table, and combine the number that object occurs in each unit lattice in electrical form to be processed, and it corresponds to the quantity in dictionary table, obtain the score of each unit lattice in electrical form to be processed, using the score of each unit lattice as minimum unit, pass through the comparison of row and column, realize the acquisition of gauge outfit row or gauge outfit column in electrical form to be processed, thus to obtain each header entry, and then it is based on each header entry, carry out the extraction and structuring of data item, it solves in the prior art by rule, only identify lateral gauge outfit, the shortcomings that can not identifying multiple gauge outfits, accurately, efficiently realize the data structured processing of electrical form.

Description

A kind of list data structure method based on machine learning
Technical field
The list data structure method based on machine learning that the present invention relates to a kind of, belongs to list data structure technology Field.
Background technique
Electrical form is the most commonly used computer software tool, the prior art, the Sheet unknown to a content (electricity Sub-table), it is only capable of the data item of reading each unit lattice after opening file, its step are as follows:
(1) Excel file is opened using interface;
(2) Sheet in Excel file is read using interface;
(3) cell in Sheet is read using interface.
In the implementation procedure of the above method, due to not knowing the meaning of each data item, so that the knot of data cannot be completed Structure.It is to be described by the gauge outfit of table, it is not known that the gauge outfit of table can not just understand data because of the meaning of data item.Cause This, some work in order to complete the structuring of list data, used one it is assumed that the gauge outfit of i.e. hypothesis table is present in table First trip, based on it is such it is assumed that gauge outfit can be extracted after extract data again, to complete list data structure, execute step It is rapid as follows:
(1) Excel file is opened using interface;
(2) Sheet in Excel file is read using interface;
(3) the first row cell in Sheet is read using interface, as gauge outfit;
(4) the corresponding data of each gauge outfit are read by column, completes data structured.
This hypothesis has apparent defect, and the gauge outfit that can be extracted only is lateral gauge outfit, and gauge outfit must be right in first trip There is situations such as multirow gauge outfit in the non-first trip of table and a table in the table of longitudinal gauge outfit, gauge outfit, there will be erroneous judgement The case where.For this purpose, some work optimize this based on priori knowledge, solves the problems, such as gauge outfit in non-first trip, step is such as Under:
(1) Excel file is opened using interface;
(2) Sheet in Excel file is read using interface;
(3) it is successively read the data of each row and column in Sheet using interface, (passes through rule until encountering the data being recognized Then match, such as cell-phone number, identity card, bank card), successively first, which is found, upwards from the row column does not meet the rule Row, uses the row as gauge outfit;
(4) the corresponding data of each gauge outfit are read by column, completes data structured.
There are also problems for this mode, for having multiple gauge outfits in longitudinal gauge outfit and a gauge outfit, can also exist and miss Sentence, for the table of no understanding data, does not just identify gauge outfit.
Summary of the invention
Technical problem to be solved by the invention is to provide a kind of list data structure method based on machine learning, energy The header entry in electrical form is enough accurately identified, and is based on each header entry, efficiently completes the structure of data item in electrical form Change.
In order to solve the above-mentioned technical problem the present invention uses following technical scheme: the present invention devises a kind of based on engineering The list data structure method of habit, for carrying out structuring processing, feature for the data item in electrical form to be processed It is, includes the following steps:
Step A. carries out quantity statistics for the object in preset quantity simple electric table, in each unit lattice, respectively Wherein each object and the quantity corresponding to it are obtained, dictionary table is constructed, subsequently into step B;
Step B. is directed to each unit lattice in electrical form to be processed respectively, and object is in electricity to be processed in statistic unit lattice The number count occurred in sub-table, subsequently into step C;
Step C. is directed to each unit lattice in electrical form to be processed respectively, and object corresponds to dictionary in obtaining unit lattice Quantity c in table, wherein if there is no the object in cell in electrical form to be processed, electronics to be processed in dictionary table The quantity that object corresponds in dictionary table in the cell in table is 0, subsequently into step D;
Step D. is directed to each unit lattice in electrical form to be processed respectively, according to the following formula:
Score score corresponding to obtaining unit lattice, subsequently into step E;
Step E. is directed to each row in electrical form to be processed respectively, gone in score score corresponding to each unit lattice The sum of, as score corresponding to the row;
Meanwhile respectively for each column in electrical form to be processed, arranged in score score corresponding to each unit lattice it With as score corresponding to the column;
The corresponding score of each row in electrical form to be processed, each column difference is obtained, subsequently into step F;
Score step F. corresponding according to row each in electrical form to be processed difference, for each in electrical form to be processed Row is clustered, and is obtained in each row cluster respectively, the average value of score corresponding to each row, clusters institute respectively as each row Corresponding score, the row cluster of reselection highest score, clusters as row to be selected;
Meanwhile according to score corresponding respectively is respectively arranged in electrical form to be processed, for each in electrical form to be processed It arranges and is clustered into column, and obtained in each column cluster respectively, each average value for arranging corresponding score, cluster institute respectively as each column Corresponding score, the column cluster of reselection highest score, clusters as column to be selected;
Subsequently into step G;
Step G. selects the row of highest score, and according to the score of the row, be somebody's turn to do for each row in row to be selected cluster The average mark of each non-mentioned null cell in row, as row cell average mark;
Meanwhile for each column in column cluster to be selected, the column of highest score are selected, and according to the score of the column, be somebody's turn to do The average mark of each non-mentioned null cell in column, as column unit lattice average mark;
Subsequently into step H;
If step H. row cell average mark is greater than column unit lattice average mark, each row in row cluster to be selected is Each gauge outfit row in electrical form to be processed obtains wherein each header entry, and enters step J;
If row cell average mark is less than column unit lattice average mark, each column in column cluster to be selected are as to be processed Each gauge outfit column in electrical form, obtain wherein each header entry, and enter step J;
Step J. reads each data in electrical form to be processed according to each header entry in electrical form to be processed , carry out the structuring of list data.
As a preferred technical solution of the present invention: in the step A, after building obtains the dictionary table, using Following steps I are updated, subsequently into step B to step II for dictionary table;
Step I obtains the maximum number magnitude of the corresponding quantity of each object difference in dictionary table, subsequently into step II;
Step II is directed to each object in dictionary table respectively, executes following steps II -1 to step II -2, for object Corresponding quantity is updated, and then updates dictionary table;
Step II -1. judges whether object belongs to default header entry set, is, sets for quantity corresponding to the object For maximum number magnitude, II -2 is otherwise entered step;
Step II -2. judges whether object belongs to preset data item set, is, sets for quantity corresponding to the object It is 0, is not otherwise modified for quantity corresponding to the object.
As a preferred technical solution of the present invention: in the step F, being distinguished according to row each in electrical form to be processed Corresponding score, Fa-1 to step Fa-3, is clustered for each row in electrical form to be processed as follows;
Step Fa-1. obtains minimum row score and maximum row in electrical form to be processed in each corresponding score of row difference Score, and enter step Fa-2;
Step Fa-2., to the span of maximum row score, is drawn for minimum row score by each row fraction levels are preset Point, each row score section is obtained, subsequently into step Fa-3;
Score step Fa-3. corresponding according to row each in electrical form to be processed difference, will be in electrical form to be processed Each row is divided in each row score section, possesses each row score section of spreadsheet line to be processed, and as each row is poly- Class;
Meanwhile according to each column corresponding score respectively in electrical form to be processed, Fb-1 to step Fb- as follows 3, it is clustered for respectively being arranged in electrical form to be processed into column;
Step Fb-1. obtains the minimum column score and maximum column respectively arranged in corresponding score respectively in electrical form to be processed Score, and enter step Fb-2;
Step Fb-2. is directed to the span of minimum column score to maximum column score, draws by each column fraction levels are preset into column Point, each column score section is obtained, subsequently into step Fb-3;
Step Fb-3., will be in electrical form to be processed according to score corresponding respectively is respectively arranged in electrical form to be processed It respectively arranges, be divided in each column score section, possess each column score section of electrical form column to be processed, as each column are poly- Class.
A kind of list data structure method based on machine learning of the present invention, using above technical scheme with it is existing Technology is compared, and is had following technical effect that
The designed list data structure method based on machine learning of the invention, for each in great amount of samples electrical form Object in cell carries out quantity statistics, constitutes dictionary table, and object goes out in each unit lattice in combination electrical form to be processed Existing number and its correspond to dictionary table in quantity, the score of each unit lattice in electrical form to be processed is obtained, with each list The score of first lattice is minimum unit, by the comparison of row and column, realizes obtaining for gauge outfit row in electrical form to be processed or gauge outfit column , thus to obtain each header entry, and then it is based on each header entry, carries out the extraction and structuring of data item, solve existing Lateral gauge outfit is identified by rule, only in technology, can not identify the shortcomings that multiple gauge outfits, accurately and efficiently realizes electronic watch The data structured of lattice is handled.
Detailed description of the invention
Fig. 1 is the schematic diagram of list data structure method of the present invention design based on machine learning.
Specific embodiment
Specific embodiments of the present invention will be described in further detail with reference to the accompanying drawings of the specification.
The present invention devises a kind of list data structure method based on machine learning, for being directed to electronic watch to be processed Data item in lattice carries out structuring processing, in specific practical application, executes following steps A to step J.
Step A. carries out quantity statistics for the object in preset quantity simple electric table, in each unit lattice, respectively Wherein each object and the quantity corresponding to it are obtained, dictionary table is constructed, then using following steps I to step II, for Dictionary table is updated, subsequently into step B.
Step I obtains the maximum number magnitude of the corresponding quantity of each object difference in dictionary table, subsequently into step II.
Step II is directed to each object in dictionary table respectively, executes following steps II -1 to step II -2, for object Corresponding quantity is updated, and then updates dictionary table.
Step II -1. judges whether object belongs to default header entry set, is, sets for quantity corresponding to the object For maximum number magnitude, II -2 is otherwise entered step;
Step II -2. judges whether object belongs to preset data item set, is, sets for quantity corresponding to the object It is 0, is not otherwise modified for quantity corresponding to the object.
Step B. is directed to each unit lattice in electrical form to be processed respectively, and object is in electricity to be processed in statistic unit lattice The number count occurred in sub-table, subsequently into step C.
Step C. is directed to each unit lattice in electrical form to be processed respectively, and object corresponds to dictionary in obtaining unit lattice Quantity c in table, wherein if there is no the object in cell in electrical form to be processed, electronics to be processed in dictionary table The quantity that object corresponds in dictionary table in the cell in table is 0, subsequently into step D.
Step D. is directed to each unit lattice in electrical form to be processed respectively, according to the following formula:
Score score corresponding to obtaining unit lattice, subsequently into step E.
Step E. is directed to each row in electrical form to be processed respectively, gone in score score corresponding to each unit lattice The sum of, as score corresponding to the row;
Meanwhile respectively for each column in electrical form to be processed, arranged in score score corresponding to each unit lattice it With as score corresponding to the column;
The corresponding score of each row in electrical form to be processed, each column difference is obtained, subsequently into step F.
Step F. distinguishes corresponding score according to row each in electrical form to be processed, as follows Fa-1 to step Fa-3 is clustered for each row in electrical form to be processed, and is obtained in each row cluster respectively, score corresponding to each row Average value, the score corresponding as each row cluster difference, the row cluster of reselection highest score are clustered as row to be selected.
Step Fa-1. obtains minimum row score and maximum row in electrical form to be processed in each corresponding score of row difference Score, and enter step Fa-2;
Step Fa-2., to the span of maximum row score, is drawn for minimum row score by each row fraction levels are preset Point, each row score section is obtained, subsequently into step Fa-3;
Score step Fa-3. corresponding according to row each in electrical form to be processed difference, will be in electrical form to be processed Each row is divided in each row score section, possesses each row score section of spreadsheet line to be processed, and as each row is poly- Class.
Meanwhile according to each column corresponding score respectively in electrical form to be processed, Fb-1 to step Fb- as follows 3, it is clustered for respectively being arranged in electrical form to be processed into column, and obtained in each column cluster respectively, respectively arrange the flat of corresponding score Mean value, the score corresponding as each column cluster difference, the column cluster of reselection highest score are clustered as column to be selected.
Step Fb-1. obtains the minimum column score and maximum column respectively arranged in corresponding score respectively in electrical form to be processed Score, and enter step Fb-2;
Step Fb-2. is directed to the span of minimum column score to maximum column score, draws by each column fraction levels are preset into column Point, each column score section is obtained, subsequently into step Fb-3;
Step Fb-3., will be in electrical form to be processed according to score corresponding respectively is respectively arranged in electrical form to be processed It respectively arranges, be divided in each column score section, possess each column score section of electrical form column to be processed, as each column are poly- Class.
Row cluster to be selected is being obtained with after column to be selected cluster, is entering step G.
Step G. selects the row of highest score, and according to the score of the row, be somebody's turn to do for each row in row to be selected cluster The average mark of each non-mentioned null cell in row, as row cell average mark;
Meanwhile for each column in column cluster to be selected, the column of highest score are selected, and according to the score of the column, be somebody's turn to do The average mark of each non-mentioned null cell in column, as column unit lattice average mark;
Subsequently into step H.
If step H. row cell average mark is greater than column unit lattice average mark, each row in row cluster to be selected is Each gauge outfit row in electrical form to be processed obtains wherein each header entry, and enters step J;
If row cell average mark is less than column unit lattice average mark, each column in column cluster to be selected are as to be processed Each gauge outfit column in electrical form, obtain wherein each header entry, and enter step J;
Step J. reads each data in electrical form to be processed according to each header entry in electrical form to be processed , carry out the structuring of list data.
List data structure method based on machine learning designed by above-mentioned technical proposal, for great amount of samples electronic watch Object in lattice in each unit lattice carries out quantity statistics, constitutes dictionary table, and combines in electrical form to be processed in each unit lattice Object occur number and its correspond to dictionary table in quantity, obtain the score of each unit lattice in electrical form to be processed, Using the score of each unit lattice as minimum unit, by the comparison of row and column, gauge outfit row or gauge outfit in electrical form to be processed are realized The acquisition of column thus to obtain each header entry, and then is based on each header entry, carries out the extraction and structuring of data item, solves Lateral gauge outfit is identified by rule, only in the prior art, can not identify the shortcomings that multiple gauge outfits, is accurately and efficiently realized The data structured of electrical form is handled.
Embodiments of the present invention are explained in detail above in conjunction with attached drawing, but the present invention is not limited to above-mentioned implementations Mode within the knowledge of a person skilled in the art can also be without departing from the purpose of the present invention It makes a variety of changes.

Claims (3)

1. a kind of list data structure method based on machine learning, for for the data item in electrical form to be processed into Row structuring processing, which comprises the steps of:
Step A. carries out quantity statistics for the object in preset quantity simple electric table, in each unit lattice, obtains respectively Wherein each object and the quantity corresponding to it construct dictionary table, subsequently into step B;
Step B. is directed to each unit lattice in electrical form to be processed respectively, and object is in electronic watch to be processed in statistic unit lattice The number count occurred in lattice, subsequently into step C;
Step C. is directed to each unit lattice in electrical form to be processed respectively, and object corresponds in dictionary table in obtaining unit lattice Quantity c, wherein if there is no the object in cell in electrical form to be processed, electrical forms to be processed in dictionary table In object corresponds in dictionary table in the cell quantity be 0, subsequently into step D;
Step D. is directed to each unit lattice in electrical form to be processed respectively, according to the following formula:
Score score corresponding to obtaining unit lattice, subsequently into step E;
Step E. is directed to each row in electrical form to be processed respectively, gone in the sum of score score corresponding to each unit lattice, As score corresponding to the row;
Meanwhile respectively for each column in electrical form to be processed, arranged in the sum of score score corresponding to each unit lattice, As score corresponding to the column;
The corresponding score of each row in electrical form to be processed, each column difference is obtained, subsequently into step F;
Step F. is according to row each in electrical form to be processed corresponding score respectively, for respectively advancing in electrical form to be processed Row cluster, and obtained in each row cluster respectively, the average value of score corresponding to each row, as corresponding to each row cluster difference Score, reselection highest score row cluster, as row to be selected cluster;
Meanwhile according to each column corresponding score respectively in electrical form to be processed, for respectively arranged in electrical form to be processed into Column cluster, and obtained in each column cluster respectively, each average value for arranging corresponding score, as corresponding to each column cluster difference Score, reselection highest score column cluster, as column to be selected cluster;
Subsequently into step G;
Step G. selects the row of highest score, and according to the score of the row, obtain in the row for each row in row to be selected cluster The average mark of each non-mentioned null cell, as row cell average mark;
Meanwhile for each column in column cluster to be selected, the column of highest score are selected, and according to the score of the column, obtain in the column The average mark of each non-mentioned null cell, as column unit lattice average mark;
Subsequently into step H;
If step H. row cell average mark is greater than column unit lattice average mark, each row in row cluster to be selected is wait locate Each gauge outfit row in electrical form is managed, obtains wherein each header entry, and enter step J;
If row cell average mark is less than column unit lattice average mark, each column in column cluster to be selected are electronics to be processed Each gauge outfit column in table, obtain wherein each header entry, and enter step J;
Step J. reads each data item in electrical form to be processed according to each header entry in electrical form to be processed, Carry out the structuring of list data.
2. a kind of list data structure method based on machine learning according to claim 1, it is characterised in that: the step In rapid A, after building obtains the dictionary table, using following steps I to step II, be updated for dictionary table, then into Enter step B;
Step I obtains the maximum number magnitude of the corresponding quantity of each object difference in dictionary table, subsequently into step II;
Step II is directed to each object in dictionary table respectively, executes following steps II -1 to step II -2, institute is right for object The quantity answered is updated, and then updates dictionary table;
Step II -1. judges whether object belongs to default header entry set, is, is set to most for quantity corresponding to the object Otherwise big quantitative value enters step II -2;
Step II -2. judges whether object belongs to preset data item set, is, is set to 0 for quantity corresponding to the object, Otherwise it is not modified for quantity corresponding to the object.
3. a kind of list data structure method based on machine learning according to claim 1, it is characterised in that: the step In rapid F, corresponding score is distinguished according to row each in electrical form to be processed, as follows Fa-1 to step Fa-3, for Each row is clustered in electrical form to be processed;
Step Fa-1. obtains minimum row score and maximum row point in electrical form to be processed in each corresponding score of row difference Number, and enter step Fa-2;
Step Fa-2., to the span of maximum row score, is divided by each row fraction levels are preset, is obtained for minimum row score Each row score section is obtained, subsequently into step Fa-3;
Step Fa-3. according to row each in electrical form to be processed corresponding score respectively, by row each in electrical form to be processed, It is divided in each row score section, possesses each row score section of spreadsheet line to be processed, as each row cluster;
Meanwhile according to each column corresponding score respectively in electrical form to be processed, Fb-1 to step Fb-3, needle as follows It is clustered to respectively being arranged in electrical form to be processed into column;
Step Fb-1. obtains the minimum column score and maximum column point respectively arranged in corresponding score respectively in electrical form to be processed Number, and enter step Fb-2;
Step Fb-2. is directed to the span of minimum column score to maximum column score, divides, obtains into column by each column fraction levels are preset Each column score section is obtained, subsequently into step Fb-3;
Step Fb-3. according to each column corresponding score respectively in electrical form to be processed, will in electrical form to be processed respectively column, It is divided in each column score section, possesses each column score section of electrical form column to be processed, as each column cluster.
CN201910623601.0A 2019-07-11 2019-07-11 Table data structuring method based on machine learning Active CN110362620B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910623601.0A CN110362620B (en) 2019-07-11 2019-07-11 Table data structuring method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910623601.0A CN110362620B (en) 2019-07-11 2019-07-11 Table data structuring method based on machine learning

Publications (2)

Publication Number Publication Date
CN110362620A true CN110362620A (en) 2019-10-22
CN110362620B CN110362620B (en) 2021-04-06

Family

ID=68218702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910623601.0A Active CN110362620B (en) 2019-07-11 2019-07-11 Table data structuring method based on machine learning

Country Status (1)

Country Link
CN (1) CN110362620B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523420A (en) * 2020-04-14 2020-08-11 南京烽火星空通信发展有限公司 Header classification and header list semantic identification method based on multitask deep neural network
CN113010503A (en) * 2021-03-01 2021-06-22 广州智筑信息技术有限公司 Engineering cost data intelligent analysis method and system based on deep learning
CN113901214A (en) * 2021-10-08 2022-01-07 北京百度网讯科技有限公司 Extraction method and device of table information, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040107205A1 (en) * 2002-12-03 2004-06-03 Lockheed Martin Corporation Boolean rule-based system for clustering similar records
CN1741020A (en) * 2005-09-29 2006-03-01 北京勤哲软件技术有限责任公司 Method for storing electronic table unit lattice content with relational data base
CN102799574A (en) * 2012-06-29 2012-11-28 无锡永中软件有限公司 Data partitioning and merging method for electronic forms
CN106156239A (en) * 2015-04-27 2016-11-23 中国移动通信集团公司 A kind of form abstracting method and device
CN108009264A (en) * 2017-12-14 2018-05-08 北京航天测控技术有限公司 A kind of comparative approach of versions of data for Excel format files
CN109522452A (en) * 2018-11-13 2019-03-26 南京烽火星空通信发展有限公司 A kind of processing method of magnanimity semi-structured data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040107205A1 (en) * 2002-12-03 2004-06-03 Lockheed Martin Corporation Boolean rule-based system for clustering similar records
CN1741020A (en) * 2005-09-29 2006-03-01 北京勤哲软件技术有限责任公司 Method for storing electronic table unit lattice content with relational data base
CN102799574A (en) * 2012-06-29 2012-11-28 无锡永中软件有限公司 Data partitioning and merging method for electronic forms
CN106156239A (en) * 2015-04-27 2016-11-23 中国移动通信集团公司 A kind of form abstracting method and device
CN108009264A (en) * 2017-12-14 2018-05-08 北京航天测控技术有限公司 A kind of comparative approach of versions of data for Excel format files
CN109522452A (en) * 2018-11-13 2019-03-26 南京烽火星空通信发展有限公司 A kind of processing method of magnanimity semi-structured data

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523420A (en) * 2020-04-14 2020-08-11 南京烽火星空通信发展有限公司 Header classification and header list semantic identification method based on multitask deep neural network
CN111523420B (en) * 2020-04-14 2023-07-07 南京烽火星空通信发展有限公司 Header classification and header column semantic recognition method based on multi-task deep neural network
CN113010503A (en) * 2021-03-01 2021-06-22 广州智筑信息技术有限公司 Engineering cost data intelligent analysis method and system based on deep learning
CN113901214A (en) * 2021-10-08 2022-01-07 北京百度网讯科技有限公司 Extraction method and device of table information, electronic equipment and storage medium
CN113901214B (en) * 2021-10-08 2023-11-17 北京百度网讯科技有限公司 Method and device for extracting form information, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110362620B (en) 2021-04-06

Similar Documents

Publication Publication Date Title
CN110362620A (en) A kind of list data structure method based on machine learning
CN105260359B (en) Semantic key words extracting method and device
CN101079025B (en) File correlation computing system and method
CN101788988B (en) Information extraction method
CN106709032A (en) Method and device for extracting structured information from spreadsheet document
CN103336766A (en) Short text garbage identification and modeling method and device
CN104317891B (en) A kind of method and device that label is marked to the page
CN107704539A (en) The method and device of extensive text message batch structuring
Nurminen Algorithmic extraction of data in tables in PDF documents
CN102063424A (en) Method for Chinese word segmentation
CN110210294A (en) Evaluation method, device, storage medium and the computer equipment of Optimized model
CN106055539A (en) Name disambiguation method and apparatus
CN109614626A (en) Keyword Automatic method based on gravitational model
CN102929902A (en) Character splitting method and device based on Chinese retrieval
CN110083832A (en) Recognition methods, device, equipment and the readable storage medium storing program for executing of article reprinting relationship
CN106528527A (en) Identification method and identification system for out of vocabularies
CN103186556A (en) Method for obtaining and searching structural semantic knowledge and corresponding device
CN106339481A (en) Chinese compound new-word discovery method based on maximum confidence coefficient
CN102270244A (en) Method for quickly extracting webpage content key words based on core sentence
CN101763403A (en) Query translation method facing multi-lingual information retrieval system
CN109344233B (en) Chinese name recognition method
CN105528432B (en) A kind of digital resource hot spot generation method and device
CN101872363A (en) Method for extracting keywords
CN104239292B (en) A kind of method for obtaining specialized vocabulary translation
CN103942332B (en) Web page logic link block identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant