CN110362620A - A kind of list data structure method based on machine learning - Google Patents
A kind of list data structure method based on machine learning Download PDFInfo
- Publication number
- CN110362620A CN110362620A CN201910623601.0A CN201910623601A CN110362620A CN 110362620 A CN110362620 A CN 110362620A CN 201910623601 A CN201910623601 A CN 201910623601A CN 110362620 A CN110362620 A CN 110362620A
- Authority
- CN
- China
- Prior art keywords
- score
- row
- column
- processed
- electrical form
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
Abstract
The list data structure method based on machine learning that the present invention relates to a kind of, quantity statistics are carried out for the object in each unit lattice in great amount of samples electrical form, constitute dictionary table, and combine the number that object occurs in each unit lattice in electrical form to be processed, and it corresponds to the quantity in dictionary table, obtain the score of each unit lattice in electrical form to be processed, using the score of each unit lattice as minimum unit, pass through the comparison of row and column, realize the acquisition of gauge outfit row or gauge outfit column in electrical form to be processed, thus to obtain each header entry, and then it is based on each header entry, carry out the extraction and structuring of data item, it solves in the prior art by rule, only identify lateral gauge outfit, the shortcomings that can not identifying multiple gauge outfits, accurately, efficiently realize the data structured processing of electrical form.
Description
Technical field
The list data structure method based on machine learning that the present invention relates to a kind of, belongs to list data structure technology
Field.
Background technique
Electrical form is the most commonly used computer software tool, the prior art, the Sheet unknown to a content (electricity
Sub-table), it is only capable of the data item of reading each unit lattice after opening file, its step are as follows:
(1) Excel file is opened using interface;
(2) Sheet in Excel file is read using interface;
(3) cell in Sheet is read using interface.
In the implementation procedure of the above method, due to not knowing the meaning of each data item, so that the knot of data cannot be completed
Structure.It is to be described by the gauge outfit of table, it is not known that the gauge outfit of table can not just understand data because of the meaning of data item.Cause
This, some work in order to complete the structuring of list data, used one it is assumed that the gauge outfit of i.e. hypothesis table is present in table
First trip, based on it is such it is assumed that gauge outfit can be extracted after extract data again, to complete list data structure, execute step
It is rapid as follows:
(1) Excel file is opened using interface;
(2) Sheet in Excel file is read using interface;
(3) the first row cell in Sheet is read using interface, as gauge outfit;
(4) the corresponding data of each gauge outfit are read by column, completes data structured.
This hypothesis has apparent defect, and the gauge outfit that can be extracted only is lateral gauge outfit, and gauge outfit must be right in first trip
There is situations such as multirow gauge outfit in the non-first trip of table and a table in the table of longitudinal gauge outfit, gauge outfit, there will be erroneous judgement
The case where.For this purpose, some work optimize this based on priori knowledge, solves the problems, such as gauge outfit in non-first trip, step is such as
Under:
(1) Excel file is opened using interface;
(2) Sheet in Excel file is read using interface;
(3) it is successively read the data of each row and column in Sheet using interface, (passes through rule until encountering the data being recognized
Then match, such as cell-phone number, identity card, bank card), successively first, which is found, upwards from the row column does not meet the rule
Row, uses the row as gauge outfit;
(4) the corresponding data of each gauge outfit are read by column, completes data structured.
There are also problems for this mode, for having multiple gauge outfits in longitudinal gauge outfit and a gauge outfit, can also exist and miss
Sentence, for the table of no understanding data, does not just identify gauge outfit.
Summary of the invention
Technical problem to be solved by the invention is to provide a kind of list data structure method based on machine learning, energy
The header entry in electrical form is enough accurately identified, and is based on each header entry, efficiently completes the structure of data item in electrical form
Change.
In order to solve the above-mentioned technical problem the present invention uses following technical scheme: the present invention devises a kind of based on engineering
The list data structure method of habit, for carrying out structuring processing, feature for the data item in electrical form to be processed
It is, includes the following steps:
Step A. carries out quantity statistics for the object in preset quantity simple electric table, in each unit lattice, respectively
Wherein each object and the quantity corresponding to it are obtained, dictionary table is constructed, subsequently into step B;
Step B. is directed to each unit lattice in electrical form to be processed respectively, and object is in electricity to be processed in statistic unit lattice
The number count occurred in sub-table, subsequently into step C;
Step C. is directed to each unit lattice in electrical form to be processed respectively, and object corresponds to dictionary in obtaining unit lattice
Quantity c in table, wherein if there is no the object in cell in electrical form to be processed, electronics to be processed in dictionary table
The quantity that object corresponds in dictionary table in the cell in table is 0, subsequently into step D;
Step D. is directed to each unit lattice in electrical form to be processed respectively, according to the following formula:
Score score corresponding to obtaining unit lattice, subsequently into step E;
Step E. is directed to each row in electrical form to be processed respectively, gone in score score corresponding to each unit lattice
The sum of, as score corresponding to the row;
Meanwhile respectively for each column in electrical form to be processed, arranged in score score corresponding to each unit lattice it
With as score corresponding to the column;
The corresponding score of each row in electrical form to be processed, each column difference is obtained, subsequently into step F;
Score step F. corresponding according to row each in electrical form to be processed difference, for each in electrical form to be processed
Row is clustered, and is obtained in each row cluster respectively, the average value of score corresponding to each row, clusters institute respectively as each row
Corresponding score, the row cluster of reselection highest score, clusters as row to be selected;
Meanwhile according to score corresponding respectively is respectively arranged in electrical form to be processed, for each in electrical form to be processed
It arranges and is clustered into column, and obtained in each column cluster respectively, each average value for arranging corresponding score, cluster institute respectively as each column
Corresponding score, the column cluster of reselection highest score, clusters as column to be selected;
Subsequently into step G;
Step G. selects the row of highest score, and according to the score of the row, be somebody's turn to do for each row in row to be selected cluster
The average mark of each non-mentioned null cell in row, as row cell average mark;
Meanwhile for each column in column cluster to be selected, the column of highest score are selected, and according to the score of the column, be somebody's turn to do
The average mark of each non-mentioned null cell in column, as column unit lattice average mark;
Subsequently into step H;
If step H. row cell average mark is greater than column unit lattice average mark, each row in row cluster to be selected is
Each gauge outfit row in electrical form to be processed obtains wherein each header entry, and enters step J;
If row cell average mark is less than column unit lattice average mark, each column in column cluster to be selected are as to be processed
Each gauge outfit column in electrical form, obtain wherein each header entry, and enter step J;
Step J. reads each data in electrical form to be processed according to each header entry in electrical form to be processed
, carry out the structuring of list data.
As a preferred technical solution of the present invention: in the step A, after building obtains the dictionary table, using
Following steps I are updated, subsequently into step B to step II for dictionary table;
Step I obtains the maximum number magnitude of the corresponding quantity of each object difference in dictionary table, subsequently into step II;
Step II is directed to each object in dictionary table respectively, executes following steps II -1 to step II -2, for object
Corresponding quantity is updated, and then updates dictionary table;
Step II -1. judges whether object belongs to default header entry set, is, sets for quantity corresponding to the object
For maximum number magnitude, II -2 is otherwise entered step;
Step II -2. judges whether object belongs to preset data item set, is, sets for quantity corresponding to the object
It is 0, is not otherwise modified for quantity corresponding to the object.
As a preferred technical solution of the present invention: in the step F, being distinguished according to row each in electrical form to be processed
Corresponding score, Fa-1 to step Fa-3, is clustered for each row in electrical form to be processed as follows;
Step Fa-1. obtains minimum row score and maximum row in electrical form to be processed in each corresponding score of row difference
Score, and enter step Fa-2;
Step Fa-2., to the span of maximum row score, is drawn for minimum row score by each row fraction levels are preset
Point, each row score section is obtained, subsequently into step Fa-3;
Score step Fa-3. corresponding according to row each in electrical form to be processed difference, will be in electrical form to be processed
Each row is divided in each row score section, possesses each row score section of spreadsheet line to be processed, and as each row is poly-
Class;
Meanwhile according to each column corresponding score respectively in electrical form to be processed, Fb-1 to step Fb- as follows
3, it is clustered for respectively being arranged in electrical form to be processed into column;
Step Fb-1. obtains the minimum column score and maximum column respectively arranged in corresponding score respectively in electrical form to be processed
Score, and enter step Fb-2;
Step Fb-2. is directed to the span of minimum column score to maximum column score, draws by each column fraction levels are preset into column
Point, each column score section is obtained, subsequently into step Fb-3;
Step Fb-3., will be in electrical form to be processed according to score corresponding respectively is respectively arranged in electrical form to be processed
It respectively arranges, be divided in each column score section, possess each column score section of electrical form column to be processed, as each column are poly-
Class.
A kind of list data structure method based on machine learning of the present invention, using above technical scheme with it is existing
Technology is compared, and is had following technical effect that
The designed list data structure method based on machine learning of the invention, for each in great amount of samples electrical form
Object in cell carries out quantity statistics, constitutes dictionary table, and object goes out in each unit lattice in combination electrical form to be processed
Existing number and its correspond to dictionary table in quantity, the score of each unit lattice in electrical form to be processed is obtained, with each list
The score of first lattice is minimum unit, by the comparison of row and column, realizes obtaining for gauge outfit row in electrical form to be processed or gauge outfit column
, thus to obtain each header entry, and then it is based on each header entry, carries out the extraction and structuring of data item, solve existing
Lateral gauge outfit is identified by rule, only in technology, can not identify the shortcomings that multiple gauge outfits, accurately and efficiently realizes electronic watch
The data structured of lattice is handled.
Detailed description of the invention
Fig. 1 is the schematic diagram of list data structure method of the present invention design based on machine learning.
Specific embodiment
Specific embodiments of the present invention will be described in further detail with reference to the accompanying drawings of the specification.
The present invention devises a kind of list data structure method based on machine learning, for being directed to electronic watch to be processed
Data item in lattice carries out structuring processing, in specific practical application, executes following steps A to step J.
Step A. carries out quantity statistics for the object in preset quantity simple electric table, in each unit lattice, respectively
Wherein each object and the quantity corresponding to it are obtained, dictionary table is constructed, then using following steps I to step II, for
Dictionary table is updated, subsequently into step B.
Step I obtains the maximum number magnitude of the corresponding quantity of each object difference in dictionary table, subsequently into step II.
Step II is directed to each object in dictionary table respectively, executes following steps II -1 to step II -2, for object
Corresponding quantity is updated, and then updates dictionary table.
Step II -1. judges whether object belongs to default header entry set, is, sets for quantity corresponding to the object
For maximum number magnitude, II -2 is otherwise entered step;
Step II -2. judges whether object belongs to preset data item set, is, sets for quantity corresponding to the object
It is 0, is not otherwise modified for quantity corresponding to the object.
Step B. is directed to each unit lattice in electrical form to be processed respectively, and object is in electricity to be processed in statistic unit lattice
The number count occurred in sub-table, subsequently into step C.
Step C. is directed to each unit lattice in electrical form to be processed respectively, and object corresponds to dictionary in obtaining unit lattice
Quantity c in table, wherein if there is no the object in cell in electrical form to be processed, electronics to be processed in dictionary table
The quantity that object corresponds in dictionary table in the cell in table is 0, subsequently into step D.
Step D. is directed to each unit lattice in electrical form to be processed respectively, according to the following formula:
Score score corresponding to obtaining unit lattice, subsequently into step E.
Step E. is directed to each row in electrical form to be processed respectively, gone in score score corresponding to each unit lattice
The sum of, as score corresponding to the row;
Meanwhile respectively for each column in electrical form to be processed, arranged in score score corresponding to each unit lattice it
With as score corresponding to the column;
The corresponding score of each row in electrical form to be processed, each column difference is obtained, subsequently into step F.
Step F. distinguishes corresponding score according to row each in electrical form to be processed, as follows Fa-1 to step
Fa-3 is clustered for each row in electrical form to be processed, and is obtained in each row cluster respectively, score corresponding to each row
Average value, the score corresponding as each row cluster difference, the row cluster of reselection highest score are clustered as row to be selected.
Step Fa-1. obtains minimum row score and maximum row in electrical form to be processed in each corresponding score of row difference
Score, and enter step Fa-2;
Step Fa-2., to the span of maximum row score, is drawn for minimum row score by each row fraction levels are preset
Point, each row score section is obtained, subsequently into step Fa-3;
Score step Fa-3. corresponding according to row each in electrical form to be processed difference, will be in electrical form to be processed
Each row is divided in each row score section, possesses each row score section of spreadsheet line to be processed, and as each row is poly-
Class.
Meanwhile according to each column corresponding score respectively in electrical form to be processed, Fb-1 to step Fb- as follows
3, it is clustered for respectively being arranged in electrical form to be processed into column, and obtained in each column cluster respectively, respectively arrange the flat of corresponding score
Mean value, the score corresponding as each column cluster difference, the column cluster of reselection highest score are clustered as column to be selected.
Step Fb-1. obtains the minimum column score and maximum column respectively arranged in corresponding score respectively in electrical form to be processed
Score, and enter step Fb-2;
Step Fb-2. is directed to the span of minimum column score to maximum column score, draws by each column fraction levels are preset into column
Point, each column score section is obtained, subsequently into step Fb-3;
Step Fb-3., will be in electrical form to be processed according to score corresponding respectively is respectively arranged in electrical form to be processed
It respectively arranges, be divided in each column score section, possess each column score section of electrical form column to be processed, as each column are poly-
Class.
Row cluster to be selected is being obtained with after column to be selected cluster, is entering step G.
Step G. selects the row of highest score, and according to the score of the row, be somebody's turn to do for each row in row to be selected cluster
The average mark of each non-mentioned null cell in row, as row cell average mark;
Meanwhile for each column in column cluster to be selected, the column of highest score are selected, and according to the score of the column, be somebody's turn to do
The average mark of each non-mentioned null cell in column, as column unit lattice average mark;
Subsequently into step H.
If step H. row cell average mark is greater than column unit lattice average mark, each row in row cluster to be selected is
Each gauge outfit row in electrical form to be processed obtains wherein each header entry, and enters step J;
If row cell average mark is less than column unit lattice average mark, each column in column cluster to be selected are as to be processed
Each gauge outfit column in electrical form, obtain wherein each header entry, and enter step J;
Step J. reads each data in electrical form to be processed according to each header entry in electrical form to be processed
, carry out the structuring of list data.
List data structure method based on machine learning designed by above-mentioned technical proposal, for great amount of samples electronic watch
Object in lattice in each unit lattice carries out quantity statistics, constitutes dictionary table, and combines in electrical form to be processed in each unit lattice
Object occur number and its correspond to dictionary table in quantity, obtain the score of each unit lattice in electrical form to be processed,
Using the score of each unit lattice as minimum unit, by the comparison of row and column, gauge outfit row or gauge outfit in electrical form to be processed are realized
The acquisition of column thus to obtain each header entry, and then is based on each header entry, carries out the extraction and structuring of data item, solves
Lateral gauge outfit is identified by rule, only in the prior art, can not identify the shortcomings that multiple gauge outfits, is accurately and efficiently realized
The data structured of electrical form is handled.
Embodiments of the present invention are explained in detail above in conjunction with attached drawing, but the present invention is not limited to above-mentioned implementations
Mode within the knowledge of a person skilled in the art can also be without departing from the purpose of the present invention
It makes a variety of changes.
Claims (3)
1. a kind of list data structure method based on machine learning, for for the data item in electrical form to be processed into
Row structuring processing, which comprises the steps of:
Step A. carries out quantity statistics for the object in preset quantity simple electric table, in each unit lattice, obtains respectively
Wherein each object and the quantity corresponding to it construct dictionary table, subsequently into step B;
Step B. is directed to each unit lattice in electrical form to be processed respectively, and object is in electronic watch to be processed in statistic unit lattice
The number count occurred in lattice, subsequently into step C;
Step C. is directed to each unit lattice in electrical form to be processed respectively, and object corresponds in dictionary table in obtaining unit lattice
Quantity c, wherein if there is no the object in cell in electrical form to be processed, electrical forms to be processed in dictionary table
In object corresponds in dictionary table in the cell quantity be 0, subsequently into step D;
Step D. is directed to each unit lattice in electrical form to be processed respectively, according to the following formula:
Score score corresponding to obtaining unit lattice, subsequently into step E;
Step E. is directed to each row in electrical form to be processed respectively, gone in the sum of score score corresponding to each unit lattice,
As score corresponding to the row;
Meanwhile respectively for each column in electrical form to be processed, arranged in the sum of score score corresponding to each unit lattice,
As score corresponding to the column;
The corresponding score of each row in electrical form to be processed, each column difference is obtained, subsequently into step F;
Step F. is according to row each in electrical form to be processed corresponding score respectively, for respectively advancing in electrical form to be processed
Row cluster, and obtained in each row cluster respectively, the average value of score corresponding to each row, as corresponding to each row cluster difference
Score, reselection highest score row cluster, as row to be selected cluster;
Meanwhile according to each column corresponding score respectively in electrical form to be processed, for respectively arranged in electrical form to be processed into
Column cluster, and obtained in each column cluster respectively, each average value for arranging corresponding score, as corresponding to each column cluster difference
Score, reselection highest score column cluster, as column to be selected cluster;
Subsequently into step G;
Step G. selects the row of highest score, and according to the score of the row, obtain in the row for each row in row to be selected cluster
The average mark of each non-mentioned null cell, as row cell average mark;
Meanwhile for each column in column cluster to be selected, the column of highest score are selected, and according to the score of the column, obtain in the column
The average mark of each non-mentioned null cell, as column unit lattice average mark;
Subsequently into step H;
If step H. row cell average mark is greater than column unit lattice average mark, each row in row cluster to be selected is wait locate
Each gauge outfit row in electrical form is managed, obtains wherein each header entry, and enter step J;
If row cell average mark is less than column unit lattice average mark, each column in column cluster to be selected are electronics to be processed
Each gauge outfit column in table, obtain wherein each header entry, and enter step J;
Step J. reads each data item in electrical form to be processed according to each header entry in electrical form to be processed,
Carry out the structuring of list data.
2. a kind of list data structure method based on machine learning according to claim 1, it is characterised in that: the step
In rapid A, after building obtains the dictionary table, using following steps I to step II, be updated for dictionary table, then into
Enter step B;
Step I obtains the maximum number magnitude of the corresponding quantity of each object difference in dictionary table, subsequently into step II;
Step II is directed to each object in dictionary table respectively, executes following steps II -1 to step II -2, institute is right for object
The quantity answered is updated, and then updates dictionary table;
Step II -1. judges whether object belongs to default header entry set, is, is set to most for quantity corresponding to the object
Otherwise big quantitative value enters step II -2;
Step II -2. judges whether object belongs to preset data item set, is, is set to 0 for quantity corresponding to the object,
Otherwise it is not modified for quantity corresponding to the object.
3. a kind of list data structure method based on machine learning according to claim 1, it is characterised in that: the step
In rapid F, corresponding score is distinguished according to row each in electrical form to be processed, as follows Fa-1 to step Fa-3, for
Each row is clustered in electrical form to be processed;
Step Fa-1. obtains minimum row score and maximum row point in electrical form to be processed in each corresponding score of row difference
Number, and enter step Fa-2;
Step Fa-2., to the span of maximum row score, is divided by each row fraction levels are preset, is obtained for minimum row score
Each row score section is obtained, subsequently into step Fa-3;
Step Fa-3. according to row each in electrical form to be processed corresponding score respectively, by row each in electrical form to be processed,
It is divided in each row score section, possesses each row score section of spreadsheet line to be processed, as each row cluster;
Meanwhile according to each column corresponding score respectively in electrical form to be processed, Fb-1 to step Fb-3, needle as follows
It is clustered to respectively being arranged in electrical form to be processed into column;
Step Fb-1. obtains the minimum column score and maximum column point respectively arranged in corresponding score respectively in electrical form to be processed
Number, and enter step Fb-2;
Step Fb-2. is directed to the span of minimum column score to maximum column score, divides, obtains into column by each column fraction levels are preset
Each column score section is obtained, subsequently into step Fb-3;
Step Fb-3. according to each column corresponding score respectively in electrical form to be processed, will in electrical form to be processed respectively column,
It is divided in each column score section, possesses each column score section of electrical form column to be processed, as each column cluster.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910623601.0A CN110362620B (en) | 2019-07-11 | 2019-07-11 | Table data structuring method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910623601.0A CN110362620B (en) | 2019-07-11 | 2019-07-11 | Table data structuring method based on machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110362620A true CN110362620A (en) | 2019-10-22 |
CN110362620B CN110362620B (en) | 2021-04-06 |
Family
ID=68218702
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910623601.0A Active CN110362620B (en) | 2019-07-11 | 2019-07-11 | Table data structuring method based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110362620B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111523420A (en) * | 2020-04-14 | 2020-08-11 | 南京烽火星空通信发展有限公司 | Header classification and header list semantic identification method based on multitask deep neural network |
CN113010503A (en) * | 2021-03-01 | 2021-06-22 | 广州智筑信息技术有限公司 | Engineering cost data intelligent analysis method and system based on deep learning |
CN113901214A (en) * | 2021-10-08 | 2022-01-07 | 北京百度网讯科技有限公司 | Extraction method and device of table information, electronic equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040107205A1 (en) * | 2002-12-03 | 2004-06-03 | Lockheed Martin Corporation | Boolean rule-based system for clustering similar records |
CN1741020A (en) * | 2005-09-29 | 2006-03-01 | 北京勤哲软件技术有限责任公司 | Method for storing electronic table unit lattice content with relational data base |
CN102799574A (en) * | 2012-06-29 | 2012-11-28 | 无锡永中软件有限公司 | Data partitioning and merging method for electronic forms |
CN106156239A (en) * | 2015-04-27 | 2016-11-23 | 中国移动通信集团公司 | A kind of form abstracting method and device |
CN108009264A (en) * | 2017-12-14 | 2018-05-08 | 北京航天测控技术有限公司 | A kind of comparative approach of versions of data for Excel format files |
CN109522452A (en) * | 2018-11-13 | 2019-03-26 | 南京烽火星空通信发展有限公司 | A kind of processing method of magnanimity semi-structured data |
-
2019
- 2019-07-11 CN CN201910623601.0A patent/CN110362620B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040107205A1 (en) * | 2002-12-03 | 2004-06-03 | Lockheed Martin Corporation | Boolean rule-based system for clustering similar records |
CN1741020A (en) * | 2005-09-29 | 2006-03-01 | 北京勤哲软件技术有限责任公司 | Method for storing electronic table unit lattice content with relational data base |
CN102799574A (en) * | 2012-06-29 | 2012-11-28 | 无锡永中软件有限公司 | Data partitioning and merging method for electronic forms |
CN106156239A (en) * | 2015-04-27 | 2016-11-23 | 中国移动通信集团公司 | A kind of form abstracting method and device |
CN108009264A (en) * | 2017-12-14 | 2018-05-08 | 北京航天测控技术有限公司 | A kind of comparative approach of versions of data for Excel format files |
CN109522452A (en) * | 2018-11-13 | 2019-03-26 | 南京烽火星空通信发展有限公司 | A kind of processing method of magnanimity semi-structured data |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111523420A (en) * | 2020-04-14 | 2020-08-11 | 南京烽火星空通信发展有限公司 | Header classification and header list semantic identification method based on multitask deep neural network |
CN111523420B (en) * | 2020-04-14 | 2023-07-07 | 南京烽火星空通信发展有限公司 | Header classification and header column semantic recognition method based on multi-task deep neural network |
CN113010503A (en) * | 2021-03-01 | 2021-06-22 | 广州智筑信息技术有限公司 | Engineering cost data intelligent analysis method and system based on deep learning |
CN113901214A (en) * | 2021-10-08 | 2022-01-07 | 北京百度网讯科技有限公司 | Extraction method and device of table information, electronic equipment and storage medium |
CN113901214B (en) * | 2021-10-08 | 2023-11-17 | 北京百度网讯科技有限公司 | Method and device for extracting form information, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110362620B (en) | 2021-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110362620A (en) | A kind of list data structure method based on machine learning | |
CN105260359B (en) | Semantic key words extracting method and device | |
CN101079025B (en) | File correlation computing system and method | |
CN101788988B (en) | Information extraction method | |
CN106709032A (en) | Method and device for extracting structured information from spreadsheet document | |
CN103336766A (en) | Short text garbage identification and modeling method and device | |
CN104317891B (en) | A kind of method and device that label is marked to the page | |
CN107704539A (en) | The method and device of extensive text message batch structuring | |
Nurminen | Algorithmic extraction of data in tables in PDF documents | |
CN102063424A (en) | Method for Chinese word segmentation | |
CN110210294A (en) | Evaluation method, device, storage medium and the computer equipment of Optimized model | |
CN106055539A (en) | Name disambiguation method and apparatus | |
CN109614626A (en) | Keyword Automatic method based on gravitational model | |
CN102929902A (en) | Character splitting method and device based on Chinese retrieval | |
CN110083832A (en) | Recognition methods, device, equipment and the readable storage medium storing program for executing of article reprinting relationship | |
CN106528527A (en) | Identification method and identification system for out of vocabularies | |
CN103186556A (en) | Method for obtaining and searching structural semantic knowledge and corresponding device | |
CN106339481A (en) | Chinese compound new-word discovery method based on maximum confidence coefficient | |
CN102270244A (en) | Method for quickly extracting webpage content key words based on core sentence | |
CN101763403A (en) | Query translation method facing multi-lingual information retrieval system | |
CN109344233B (en) | Chinese name recognition method | |
CN105528432B (en) | A kind of digital resource hot spot generation method and device | |
CN101872363A (en) | Method for extracting keywords | |
CN104239292B (en) | A kind of method for obtaining specialized vocabulary translation | |
CN103942332B (en) | Web page logic link block identification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |