CN102193940A - Method of carrying out characteristic analysis and data extraction on two-dimensional table - Google Patents

Method of carrying out characteristic analysis and data extraction on two-dimensional table Download PDF

Info

Publication number
CN102193940A
CN102193940A CN 201010123219 CN201010123219A CN102193940A CN 102193940 A CN102193940 A CN 102193940A CN 201010123219 CN201010123219 CN 201010123219 CN 201010123219 A CN201010123219 A CN 201010123219A CN 102193940 A CN102193940 A CN 102193940A
Authority
CN
China
Prior art keywords
data
signature analysis
carried out
dimentional
data extract
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010123219
Other languages
Chinese (zh)
Inventor
黄晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Longcheer Technology Co Ltd
Original Assignee
Shanghai Longcheer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Longcheer Technology Co Ltd filed Critical Shanghai Longcheer Technology Co Ltd
Priority to CN 201010123219 priority Critical patent/CN102193940A/en
Publication of CN102193940A publication Critical patent/CN102193940A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method of carrying out characteristic analysis and data extraction on a two-dimensional table, characterized in that a sample table conforming with the characteristic can be subject to characteristic analysis and data extraction according to the appointed key to form dictionary data, and then the sample table can be updated according to the dictionary data. The method of the invention can be applied to satisfy the demand of importing data of one table according to field to another table, for example, importing data of a translation file provided by client to a translation file on a mobile phone platform.

Description

A kind of method of two-dimentional form being carried out signature analysis and data extract
Technical field
The present invention relates to areas of information technology, be specifically related to a kind of method of two-dimentional form being carried out signature analysis and data extract.
Background technology
In the course of work of reality; regular meeting runs into the data of a form is imported to demand in another part form by order of the field, if the record that upgrades is fewer, can be undertaken by manual; if but record reaches up to ten thousand, at this moment will be by the operation of upgrading of instrument robotization.The method that the present invention works out can be handled above-mentioned problem preferably, and allows the user freely to select match keywords, makes work efficiency greatly improve, and has been applied to the Data Update of mobile phone embedded platform translated document at present.
Summary of the invention
At the deficiency of prior art, the purpose of this invention is to provide a kind of method that two-dimentional form is carried out signature analysis and data extract, this method can significantly be increased work efficiency, and can be applied to multiple occasion.
A kind of method that two-dimentional form is carried out signature analysis and data extract, this method comprises the form that meets certain feature, visual program window, signature analysis algorithm, data extract algorithm, data importing algorithm.
The described form principal character that meets certain feature is to comprise the multiple lines and multiple rows text in the form, separate with new line symbol between every row, between every row with tab-delimited.Form logically can be divided into head of form and form body, and head of form comprises some fields, has comprised corresponding data in the form body.
Described visualization procedure window mainly is responsible for accepting the parameter of user's input, carry out signature analysis, show with the form of tabulating analyzing the field of coming out, and require the user to select match keywords, carry out data extract then, again another form is arrived in the data importing of extracting, finish data importing work.
The parameter of described user's input comprises the file path of two forms (A table and B table) and the match keywords of user's appointment.
Described signature analysis algorithm mainly is responsible for automatic Identification Lists wresting, analyzes the field of form and forms tabulation.
Described data extract algorithm mainly according to the key word of user's appointment, carries out data extract to the form body, forms a kind of dictionary data structure.
Described dictionary data structure be a kind of the definition of key:value} form, key is (key1, key2) tuple of form, vlaue is [v1, v2 ... ] tabulation of form, when running into the same data of key when extracting data, the data that it is corresponding are added in the value tabulation.Complete data structure form is: (key1, key2): [v1, v2, v3 ... ]
Described data importing algorithm carries out signature analysis and carries out data extract with each row of data the B table, takes out corresponding value with key word then from the dictionary data of A table, is updated in the dictionary data of B table, and then is reduced into the text data of B table
Description of drawings
Fig. 1 is the data structure synoptic diagram of the present invention when carrying out the gauge outfit signature analysis;
Fig. 2 is the one-dimensional data structural representation of the present invention when being used to extract data;
Fig. 3 is the example form;
Fig. 4 the present invention extracts the synoptic diagram of data;
Fig. 5 is the example form;
Fig. 6 is the synoptic diagram of the capable extraction of Fig. 5 table the first data;
Fig. 7 is the synoptic diagram of the capable importing of Fig. 5 table the first data.
Embodiment
Below in conjunction with accompanying drawing the present invention is further described.
After a form carries out signature analysis and extracts data, import to again and need three steps another form:
The first step, program window at first receives the form document path parameter of user's input, read the beginning several rows data of two parts of files in advance according to these parameters, these data are carried out tab to be cut apart, form data structure as shown in Figure 1, then two piece of data are compared, find out the field of same names, these fields are shown to the user as optional key word.
Second step, next button behind the selected key word of user in the option program window, program can be extracted list data by data structure as shown in Figure 2, Fig. 3 and Fig. 4 are that example form and example are extracted data, the effect of this extraction algorithm and data structure is that the list data with two dimension converts one-dimensional data to, the reduced data structure conveniently imports in another Zhang Erwei form.
The 3rd step, data generate the importing button in the option program window of back, program can be extracted data by above-mentioned algorithm to each the bar record in another table, example form and sample data such as form 5 and shown in Figure 6, key value according to each bar one dimension record, search the record that has key equally in the data of Fig. 4, find then corresponding value value is replaced original value value, finish data importing, example is seen Fig. 7, after all records import, one-dimensional data is reduced into two-dimentional form, use the same method then and handle the next line data, until all data of handling form.

Claims (8)

1. method that two-dimentional form is carried out signature analysis and data extract, it is characterized in that: described this method comprises the form that meets certain feature, visual program window, signature analysis algorithm, data extract algorithm, data importing algorithm.
2. a kind of method of two-dimentional form being carried out signature analysis and data extract according to claim 1, it is characterized in that: comprise the multiple lines and multiple rows text in the described form leading schedule that meets certain feature, separate with the new line symbol between every row, between every row with tab-delimited, form logically can be divided into head of form and form body, head of form comprises some fields, has comprised corresponding data in the form body.
3. a kind of method of two-dimentional form being carried out signature analysis and data extract according to claim 1, it is characterized in that: described visualization procedure window mainly is responsible for accepting the parameter of user's input, carry out signature analysis, show with the form of tabulating analyzing the field of coming out, and require the user to select match keywords (keyword), carry out data extract then, the data importing that extracts in another part form document, is finished data importing work.
4. a kind of method that two-dimentional form is carried out signature analysis and data extract according to claim 3 is characterized in that: the file path of the parameter of described user's input and the key word that is complementary of user's appointment.
5. a kind of method that two-dimentional form is carried out signature analysis and data extract according to claim 1 is characterized in that: described signature analysis algorithm mainly is responsible for automatic Identification Lists wresting, analyzes the field of form and forms tabulation.
6. a kind of method of two-dimentional form being carried out signature analysis and data extract according to claim 1, it is characterized in that: described data extract algorithm is mainly according to the key word of user's appointment, the form body is carried out data extract, form a kind of dictionary data structure.
7. a kind of method of two-dimentional form being carried out signature analysis and data extract according to claim 6, it is characterized in that: described dictionary data structure is a kind of { definition of key:value} form, key is (key1, key2) tuple of form, vlaue is [v1, v2 ... ] tabulation of form, when running into the same data of key when extracting data, the data that it is corresponding are added in the value tabulation.
8. a kind of method of two-dimentional form being carried out signature analysis and data extract according to claim 1, it is characterized in that: described data importing algorithm carries out signature analysis and carries out data extract with each row of data (Fig. 5) table, from the dictionary data of (Fig. 3) table, take out corresponding value then with key word, be updated in the dictionary data of (Fig. 5) table, the one-dimensional data that this was upgraded restores the text data of (Fig. 5) table then.
CN 201010123219 2010-03-11 2010-03-11 Method of carrying out characteristic analysis and data extraction on two-dimensional table Pending CN102193940A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010123219 CN102193940A (en) 2010-03-11 2010-03-11 Method of carrying out characteristic analysis and data extraction on two-dimensional table

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010123219 CN102193940A (en) 2010-03-11 2010-03-11 Method of carrying out characteristic analysis and data extraction on two-dimensional table

Publications (1)

Publication Number Publication Date
CN102193940A true CN102193940A (en) 2011-09-21

Family

ID=44602019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010123219 Pending CN102193940A (en) 2010-03-11 2010-03-11 Method of carrying out characteristic analysis and data extraction on two-dimensional table

Country Status (1)

Country Link
CN (1) CN102193940A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810153A (en) * 2014-02-17 2014-05-21 深圳市世纪安软信息技术有限公司 Temperature measurement form generation method and device for temperature measurement terminal and temperature measurement system
WO2015113301A1 (en) * 2014-01-30 2015-08-06 Microsoft Technology Licensing, Llc Automatic insights for spreadsheets
CN105320739A (en) * 2015-09-22 2016-02-10 深圳市永兴元科技有限公司 Information extraction method and apparatus
CN105630916A (en) * 2015-12-21 2016-06-01 浙江工业大学 Method for extracting and organizing unstructured sheet document data under big data environment
CN106547786A (en) * 2015-09-22 2017-03-29 阿里巴巴集团控股有限公司 A kind of date storage method and device
CN108415927A (en) * 2018-01-17 2018-08-17 中国科学院声学研究所 A kind of restoring method and device of non-sequential table
CN109388633A (en) * 2018-08-22 2019-02-26 盐城优易数据有限公司 A kind of data cleaning method
CN111507075A (en) * 2019-01-31 2020-08-07 贵州白山云科技股份有限公司 Method and device for data format conversion
CN113065813A (en) * 2021-03-12 2021-07-02 云汉芯城(上海)互联网科技股份有限公司 Material list processing method and device and computer storage medium

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015113301A1 (en) * 2014-01-30 2015-08-06 Microsoft Technology Licensing, Llc Automatic insights for spreadsheets
US10747950B2 (en) 2014-01-30 2020-08-18 Microsoft Technology Licensing, Llc Automatic insights for spreadsheets
CN103810153A (en) * 2014-02-17 2014-05-21 深圳市世纪安软信息技术有限公司 Temperature measurement form generation method and device for temperature measurement terminal and temperature measurement system
CN105320739A (en) * 2015-09-22 2016-02-10 深圳市永兴元科技有限公司 Information extraction method and apparatus
CN106547786A (en) * 2015-09-22 2017-03-29 阿里巴巴集团控股有限公司 A kind of date storage method and device
CN105630916A (en) * 2015-12-21 2016-06-01 浙江工业大学 Method for extracting and organizing unstructured sheet document data under big data environment
CN105630916B (en) * 2015-12-21 2018-11-06 浙江工业大学 Unstructured form document data pick-up and method for organizing under a kind of big data environment
CN108415927A (en) * 2018-01-17 2018-08-17 中国科学院声学研究所 A kind of restoring method and device of non-sequential table
CN109388633A (en) * 2018-08-22 2019-02-26 盐城优易数据有限公司 A kind of data cleaning method
CN109388633B (en) * 2018-08-22 2021-09-28 盐城优易数据有限公司 Data cleaning method
CN111507075A (en) * 2019-01-31 2020-08-07 贵州白山云科技股份有限公司 Method and device for data format conversion
CN113065813A (en) * 2021-03-12 2021-07-02 云汉芯城(上海)互联网科技股份有限公司 Material list processing method and device and computer storage medium

Similar Documents

Publication Publication Date Title
CN102193940A (en) Method of carrying out characteristic analysis and data extraction on two-dimensional table
US11714839B2 (en) Apparatus and method for automated and assisted patent claim mapping and expense planning
CN102122280B (en) Method and system for intelligently extracting content object
CN101794307A (en) Vehicle navigation POI (Point of Interest) search engine based on internetwork word segmentation idea
CN103123618B (en) Text similarity acquisition methods and device
CN104298658B (en) The method and apparatus for obtaining search result
CN102567409A (en) Method and device for providing retrieval associated word
CN103995885B (en) The recognition methods of physical name and device
EP1161092A3 (en) Electronic-program-guide retrieval method and system
CN110489543B (en) News abstract extraction method and device
CN106156111B (en) Patent document retrieval method, device and system
CN103678362A (en) Search method and search system
US8484229B2 (en) Method and system for identifying traditional arabic poems
CN110321549B (en) New concept mining method based on sequential learning, relation mining and time sequence analysis
CN106909628A (en) A kind of text similarity method based on interval
CN110362596A (en) A kind of control method and device of text Extracting Information structural data processing
CN104915359A (en) Theme label recommending method and device
CN112307337B (en) Associated recommendation method and device based on tag knowledge graph and computer equipment
CN101673263B (en) Method for searching video content
CN114141384A (en) Method, apparatus and medium for retrieving medical data
CN109710634B (en) Method and device for generating information
CN107862028B (en) Method for establishing standard academic model, server and storage medium
CN107066474A (en) Literature search method and apparatus
CN110188106A (en) A kind of data managing method and device
Rasmussen et al. The data documentation initiative: a preservation standard for research

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110921