CN102193940A - Method of carrying out characteristic analysis and data extraction on two-dimensional table - Google Patents
Method of carrying out characteristic analysis and data extraction on two-dimensional table Download PDFInfo
- Publication number
- CN102193940A CN102193940A CN 201010123219 CN201010123219A CN102193940A CN 102193940 A CN102193940 A CN 102193940A CN 201010123219 CN201010123219 CN 201010123219 CN 201010123219 A CN201010123219 A CN 201010123219A CN 102193940 A CN102193940 A CN 102193940A
- Authority
- CN
- China
- Prior art keywords
- data
- signature analysis
- carried out
- dimentional
- data extract
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention discloses a method of carrying out characteristic analysis and data extraction on a two-dimensional table, characterized in that a sample table conforming with the characteristic can be subject to characteristic analysis and data extraction according to the appointed key to form dictionary data, and then the sample table can be updated according to the dictionary data. The method of the invention can be applied to satisfy the demand of importing data of one table according to field to another table, for example, importing data of a translation file provided by client to a translation file on a mobile phone platform.
Description
Technical field
The present invention relates to areas of information technology, be specifically related to a kind of method of two-dimentional form being carried out signature analysis and data extract.
Background technology
In the course of work of reality; regular meeting runs into the data of a form is imported to demand in another part form by order of the field, if the record that upgrades is fewer, can be undertaken by manual; if but record reaches up to ten thousand, at this moment will be by the operation of upgrading of instrument robotization.The method that the present invention works out can be handled above-mentioned problem preferably, and allows the user freely to select match keywords, makes work efficiency greatly improve, and has been applied to the Data Update of mobile phone embedded platform translated document at present.
Summary of the invention
At the deficiency of prior art, the purpose of this invention is to provide a kind of method that two-dimentional form is carried out signature analysis and data extract, this method can significantly be increased work efficiency, and can be applied to multiple occasion.
A kind of method that two-dimentional form is carried out signature analysis and data extract, this method comprises the form that meets certain feature, visual program window, signature analysis algorithm, data extract algorithm, data importing algorithm.
The described form principal character that meets certain feature is to comprise the multiple lines and multiple rows text in the form, separate with new line symbol between every row, between every row with tab-delimited.Form logically can be divided into head of form and form body, and head of form comprises some fields, has comprised corresponding data in the form body.
Described visualization procedure window mainly is responsible for accepting the parameter of user's input, carry out signature analysis, show with the form of tabulating analyzing the field of coming out, and require the user to select match keywords, carry out data extract then, again another form is arrived in the data importing of extracting, finish data importing work.
The parameter of described user's input comprises the file path of two forms (A table and B table) and the match keywords of user's appointment.
Described signature analysis algorithm mainly is responsible for automatic Identification Lists wresting, analyzes the field of form and forms tabulation.
Described data extract algorithm mainly according to the key word of user's appointment, carries out data extract to the form body, forms a kind of dictionary data structure.
Described dictionary data structure be a kind of the definition of key:value} form, key is (key1, key2) tuple of form, vlaue is [v1, v2 ... ] tabulation of form, when running into the same data of key when extracting data, the data that it is corresponding are added in the value tabulation.Complete data structure form is: (key1, key2): [v1, v2, v3 ... ]
Described data importing algorithm carries out signature analysis and carries out data extract with each row of data the B table, takes out corresponding value with key word then from the dictionary data of A table, is updated in the dictionary data of B table, and then is reduced into the text data of B table
Description of drawings
Fig. 1 is the data structure synoptic diagram of the present invention when carrying out the gauge outfit signature analysis;
Fig. 2 is the one-dimensional data structural representation of the present invention when being used to extract data;
Fig. 3 is the example form;
Fig. 4 the present invention extracts the synoptic diagram of data;
Fig. 5 is the example form;
Fig. 6 is the synoptic diagram of the capable extraction of Fig. 5 table the first data;
Fig. 7 is the synoptic diagram of the capable importing of Fig. 5 table the first data.
Embodiment
Below in conjunction with accompanying drawing the present invention is further described.
After a form carries out signature analysis and extracts data, import to again and need three steps another form:
The first step, program window at first receives the form document path parameter of user's input, read the beginning several rows data of two parts of files in advance according to these parameters, these data are carried out tab to be cut apart, form data structure as shown in Figure 1, then two piece of data are compared, find out the field of same names, these fields are shown to the user as optional key word.
Second step, next button behind the selected key word of user in the option program window, program can be extracted list data by data structure as shown in Figure 2, Fig. 3 and Fig. 4 are that example form and example are extracted data, the effect of this extraction algorithm and data structure is that the list data with two dimension converts one-dimensional data to, the reduced data structure conveniently imports in another Zhang Erwei form.
The 3rd step, data generate the importing button in the option program window of back, program can be extracted data by above-mentioned algorithm to each the bar record in another table, example form and sample data such as form 5 and shown in Figure 6, key value according to each bar one dimension record, search the record that has key equally in the data of Fig. 4, find then corresponding value value is replaced original value value, finish data importing, example is seen Fig. 7, after all records import, one-dimensional data is reduced into two-dimentional form, use the same method then and handle the next line data, until all data of handling form.
Claims (8)
1. method that two-dimentional form is carried out signature analysis and data extract, it is characterized in that: described this method comprises the form that meets certain feature, visual program window, signature analysis algorithm, data extract algorithm, data importing algorithm.
2. a kind of method of two-dimentional form being carried out signature analysis and data extract according to claim 1, it is characterized in that: comprise the multiple lines and multiple rows text in the described form leading schedule that meets certain feature, separate with the new line symbol between every row, between every row with tab-delimited, form logically can be divided into head of form and form body, head of form comprises some fields, has comprised corresponding data in the form body.
3. a kind of method of two-dimentional form being carried out signature analysis and data extract according to claim 1, it is characterized in that: described visualization procedure window mainly is responsible for accepting the parameter of user's input, carry out signature analysis, show with the form of tabulating analyzing the field of coming out, and require the user to select match keywords (keyword), carry out data extract then, the data importing that extracts in another part form document, is finished data importing work.
4. a kind of method that two-dimentional form is carried out signature analysis and data extract according to claim 3 is characterized in that: the file path of the parameter of described user's input and the key word that is complementary of user's appointment.
5. a kind of method that two-dimentional form is carried out signature analysis and data extract according to claim 1 is characterized in that: described signature analysis algorithm mainly is responsible for automatic Identification Lists wresting, analyzes the field of form and forms tabulation.
6. a kind of method of two-dimentional form being carried out signature analysis and data extract according to claim 1, it is characterized in that: described data extract algorithm is mainly according to the key word of user's appointment, the form body is carried out data extract, form a kind of dictionary data structure.
7. a kind of method of two-dimentional form being carried out signature analysis and data extract according to claim 6, it is characterized in that: described dictionary data structure is a kind of { definition of key:value} form, key is (key1, key2) tuple of form, vlaue is [v1, v2 ... ] tabulation of form, when running into the same data of key when extracting data, the data that it is corresponding are added in the value tabulation.
8. a kind of method of two-dimentional form being carried out signature analysis and data extract according to claim 1, it is characterized in that: described data importing algorithm carries out signature analysis and carries out data extract with each row of data (Fig. 5) table, from the dictionary data of (Fig. 3) table, take out corresponding value then with key word, be updated in the dictionary data of (Fig. 5) table, the one-dimensional data that this was upgraded restores the text data of (Fig. 5) table then.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010123219 CN102193940A (en) | 2010-03-11 | 2010-03-11 | Method of carrying out characteristic analysis and data extraction on two-dimensional table |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010123219 CN102193940A (en) | 2010-03-11 | 2010-03-11 | Method of carrying out characteristic analysis and data extraction on two-dimensional table |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102193940A true CN102193940A (en) | 2011-09-21 |
Family
ID=44602019
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201010123219 Pending CN102193940A (en) | 2010-03-11 | 2010-03-11 | Method of carrying out characteristic analysis and data extraction on two-dimensional table |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102193940A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103810153A (en) * | 2014-02-17 | 2014-05-21 | 深圳市世纪安软信息技术有限公司 | Temperature measurement form generation method and device for temperature measurement terminal and temperature measurement system |
WO2015113301A1 (en) * | 2014-01-30 | 2015-08-06 | Microsoft Technology Licensing, Llc | Automatic insights for spreadsheets |
CN105320739A (en) * | 2015-09-22 | 2016-02-10 | 深圳市永兴元科技有限公司 | Information extraction method and apparatus |
CN105630916A (en) * | 2015-12-21 | 2016-06-01 | 浙江工业大学 | Method for extracting and organizing unstructured sheet document data under big data environment |
CN106547786A (en) * | 2015-09-22 | 2017-03-29 | 阿里巴巴集团控股有限公司 | A kind of date storage method and device |
CN108415927A (en) * | 2018-01-17 | 2018-08-17 | 中国科学院声学研究所 | A kind of restoring method and device of non-sequential table |
CN109388633A (en) * | 2018-08-22 | 2019-02-26 | 盐城优易数据有限公司 | A kind of data cleaning method |
CN111507075A (en) * | 2019-01-31 | 2020-08-07 | 贵州白山云科技股份有限公司 | Method and device for data format conversion |
CN113065813A (en) * | 2021-03-12 | 2021-07-02 | 云汉芯城(上海)互联网科技股份有限公司 | Material list processing method and device and computer storage medium |
-
2010
- 2010-03-11 CN CN 201010123219 patent/CN102193940A/en active Pending
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015113301A1 (en) * | 2014-01-30 | 2015-08-06 | Microsoft Technology Licensing, Llc | Automatic insights for spreadsheets |
US10747950B2 (en) | 2014-01-30 | 2020-08-18 | Microsoft Technology Licensing, Llc | Automatic insights for spreadsheets |
CN103810153A (en) * | 2014-02-17 | 2014-05-21 | 深圳市世纪安软信息技术有限公司 | Temperature measurement form generation method and device for temperature measurement terminal and temperature measurement system |
CN105320739A (en) * | 2015-09-22 | 2016-02-10 | 深圳市永兴元科技有限公司 | Information extraction method and apparatus |
CN106547786A (en) * | 2015-09-22 | 2017-03-29 | 阿里巴巴集团控股有限公司 | A kind of date storage method and device |
CN105630916A (en) * | 2015-12-21 | 2016-06-01 | 浙江工业大学 | Method for extracting and organizing unstructured sheet document data under big data environment |
CN105630916B (en) * | 2015-12-21 | 2018-11-06 | 浙江工业大学 | Unstructured form document data pick-up and method for organizing under a kind of big data environment |
CN108415927A (en) * | 2018-01-17 | 2018-08-17 | 中国科学院声学研究所 | A kind of restoring method and device of non-sequential table |
CN109388633A (en) * | 2018-08-22 | 2019-02-26 | 盐城优易数据有限公司 | A kind of data cleaning method |
CN109388633B (en) * | 2018-08-22 | 2021-09-28 | 盐城优易数据有限公司 | Data cleaning method |
CN111507075A (en) * | 2019-01-31 | 2020-08-07 | 贵州白山云科技股份有限公司 | Method and device for data format conversion |
CN113065813A (en) * | 2021-03-12 | 2021-07-02 | 云汉芯城(上海)互联网科技股份有限公司 | Material list processing method and device and computer storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102193940A (en) | Method of carrying out characteristic analysis and data extraction on two-dimensional table | |
US11714839B2 (en) | Apparatus and method for automated and assisted patent claim mapping and expense planning | |
CN102122280B (en) | Method and system for intelligently extracting content object | |
CN101794307A (en) | Vehicle navigation POI (Point of Interest) search engine based on internetwork word segmentation idea | |
CN103123618B (en) | Text similarity acquisition methods and device | |
CN104298658B (en) | The method and apparatus for obtaining search result | |
CN102567409A (en) | Method and device for providing retrieval associated word | |
CN103995885B (en) | The recognition methods of physical name and device | |
EP1161092A3 (en) | Electronic-program-guide retrieval method and system | |
CN110489543B (en) | News abstract extraction method and device | |
CN106156111B (en) | Patent document retrieval method, device and system | |
CN103678362A (en) | Search method and search system | |
US8484229B2 (en) | Method and system for identifying traditional arabic poems | |
CN110321549B (en) | New concept mining method based on sequential learning, relation mining and time sequence analysis | |
CN106909628A (en) | A kind of text similarity method based on interval | |
CN110362596A (en) | A kind of control method and device of text Extracting Information structural data processing | |
CN104915359A (en) | Theme label recommending method and device | |
CN112307337B (en) | Associated recommendation method and device based on tag knowledge graph and computer equipment | |
CN101673263B (en) | Method for searching video content | |
CN114141384A (en) | Method, apparatus and medium for retrieving medical data | |
CN109710634B (en) | Method and device for generating information | |
CN107862028B (en) | Method for establishing standard academic model, server and storage medium | |
CN107066474A (en) | Literature search method and apparatus | |
CN110188106A (en) | A kind of data managing method and device | |
Rasmussen et al. | The data documentation initiative: a preservation standard for research |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20110921 |