GB2470943A - Converting a large data file to a spreadsheet format to allow processing using remote procedure calls - Google Patents

Converting a large data file to a spreadsheet format to allow processing using remote procedure calls Download PDF

Info

Publication number
GB2470943A
GB2470943A GB0910038A GB0910038A GB2470943A GB 2470943 A GB2470943 A GB 2470943A GB 0910038 A GB0910038 A GB 0910038A GB 0910038 A GB0910038 A GB 0910038A GB 2470943 A GB2470943 A GB 2470943A
Authority
GB
United Kingdom
Prior art keywords
data
data file
data set
spreadsheet
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0910038A
Other versions
GB0910038D0 (en
Inventor
Joseph Kilbride
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
T B I REFUNDS IPR Ltd
Original Assignee
T B I REFUNDS IPR Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by T B I REFUNDS IPR Ltd filed Critical T B I REFUNDS IPR Ltd
Priority to GB0910038A priority Critical patent/GB2470943A/en
Publication of GB0910038D0 publication Critical patent/GB0910038D0/en
Publication of GB2470943A publication Critical patent/GB2470943A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • G06F17/246
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Processing a large data file containing a number of data sets, each of which has a plurality of data items, between a server and a remote client. The large data file is manipulated at the client by formatting the data file 5 into a spreadsheet format and arranging the data sets 7 into a spreadsheet. The spreadsheet is loaded into a calculation component 9 which scans the data set 11 and transmits a plurality of data items to the server e.g. performing an embedded SQL remote procedure call. At the server the data sets are processed in a data set equation 19 using at least some of the data items and coefficients 15 retrieved from a data table 13 in a database. The data set equation is processed 19 and the data set result is transmitted back to the calculation component on the client which dynamically updates the spreadsheet. The client then combines the data set results 23 into a data file and exported.

Description

"A method of processing a large data file"
Introduction
This invention relates to a method of processing a large data file. More specifically, this invention relates to a method of processing a large data file between a server and a client, the large data file containing a large number of data sets each having a plurality of data items and requiring a number of distinct processing steps to be carned out on the data sets.
There is a constant desire to improve performance and reduce the processing times taken to process large data files. Numerous techniques have been proposed to reduce the processing times. One commonly used technique is to provide a faster processor that has the ability to perform a larger number of operations per time period. Another commonly used technique involves providing additional processors and spreading the processing load across a number of processors. Although effective in improving performance and reducing processing times, both of the above methods require additional capital expenditure which is undesirable.
It is an object of the present invention to provide a method of processing a large data file that overcomes at least some of the problems with the known methods.
Statements of Invention
According to the invention there is provided a computer implemented method of processing a large data file between a server and a remote client, the large data file containing a plurality of data sets, each data set comprising a plurality of data items, and in which each data set requires a calculation to be performed using the plurality of data items in the data set to produce a data set result, and in which a further calculation is performed using a plurality of the data set results to produce a data file result, the method comprising the steps of: (a) the client structuring the plurality of data sets in the large data file in a spreadsheet format, the spreadsheet format comprising a plurality of rows and a plurality of columns, each data set populating a row and a plurality of columns of the spreadsheet format; (b) the client loading the data file in spreadsheet format into a calculation component; (c) for each data set, the calculation component scanning the data set and transmitting a plurality of data items to the server; (d) for each data set, the server querying a data table stored in a database for appropriate coefficients to use in a data set equation based on one or more of the received data items from the data set; (e) for each data set, the server retrieving those coefficients from the data table, inserting those coefficients and one or more data items of the data set into the data set equation; (f) the server calculating a data set result at a database level by processing the data set equation; (g) the server transmitting the data set result to the calculation component; (h) the calculation component updating the large data file dynamically in the spreadsheet format by inserting the data set result in a column of the spreadsheet format; and (i) the client combining the data set results into a data file result and exporting the data file result.
By having such a method, it will be possible to significantly reduce the processing time required to process the large data file without employing additional or alternative expensive resources. On the client side, the data is structured into a spreadsheet format in which it may be more easily referenced and manipulated. Data received in a number of different formats may be quickly transformed into a single uniform format prior to processing of the large data file. This is due in part to the simplicity of manipulating the data once it is in the spreadsheet format which speeds up and simplifies the handling of the data. Secondly, the calculations on the data items are performed at a database level by the server processing the data set equation which improves the processing speed of the method. Finally, the large data file is updated dynamically in the spreadsheet format on the client side which allows the processing of the data set results on the client side which again improves the method of processing the large data file.
In one embodiment of the invention there is provided a computer implemented method in which the step of transmitting a plurality of data items to the server comprises performing an embedded SQL call. By implementing the method in this manner, the code being executed is not stored on the server database side and the data never has to reside on the server yet all calculations may be carried out on the server side.
In one embodiment of the invention there is provided a computer implemented method in which the step of transmitting a plurality of data items to the server comprises performing a remote procedure call. This is seen as useful as the function may run on the server side as opposed to the client side and the minimum amount of data is transferred between the client and the server side.
In another embodiment of the invention there is provided a computer implemented method in which the step of structuring the data sets in the large data file in a spreadsheet format further comprises rearranging the columns of the spreadsheet into a pre-selected spreadsheet format.
In a further embodiment of the invention there is provided a computer implemented method in which the step of structuring the data sets in the large data file in a spreadsheet format comprises structuring the data sets in an Excel � spreadsheet format.
In another embodiment of the invention there is provided a computer implemented method in which the method comprises the initial step of converting the large data file to a spreadsheet format by changing the file extension of the large data file from a.txt extension to a.xls extension.
In one embodiment of the invention there is provided a computer implemented method in which the method comprises the initial step of converting the large data file to a spreadsheet format by changing the file extension of the large data file from a.csv extension to a.xls extension.
In a further embodiment of the invention there is provided a computer implemented method in which the data file result is saved in a spreadsheet file format.
In another embodiment of the invention there is provided a computer implemented method in which the data file result is saved in an Excel � tile format.
Detailed Description of the Invention
The invention will now be more clearly understood from the following description of some embodiments thereof given by way of example only with reference to the accompanying drawing, in which:-Fig. 1 is a flow diagram of the method according to the present invention.
Referring to the drawing, the method, indicated generally by the reference numeral 1, comprises the initial step 3 of providing a large data file for processing. The large data file comprises a plurality of data sets, each of which in turn comprises a plurality of data items. Typically, the large data file will comprise of the order of between 100 and 100,000 data sets and each data set will comprise of the order of between 4 and 100 data items. Once provided, the large data file is formatted in step 5. The formatting step comprises converting the large data file from a.txt format or other format into a spreadsheet format, in this case a Microsoft � Excel � .xls format.
Once formatted, the data sets of the large data file are structured in step 7 which comprises placing the data sets into a spreadsheet format having a plurality of rows and columns. Each data set occupies a row and a plurality of columns of the spreadsheet format and the data items populate a plurality of the columns. The spreadsheet format will depend on the number of data sets and the number of data items contained in the data set having the most data items. For example, if there are 1,000 data sets and the data set with the largest number of data items has 15 data items, the spreadsheet will have at least 1,000 rows and 15 columns. Other additional rows and columns may be provided for headings, numerical identifiers and the like. What is important is that all of the data sets are placed in an ordered manner in the spreadsheet and this will facilitate the manipulation of the data sets. The large data file in the spreadsheet format is then loaded into a calculation component in step 9.
In step 11, the data sets are each scanned in order to ascertain the parameters, in this case coefficients, of the calculation that is to be carried out on at least some of the data items from that data set and a plurality of the data items are transmitted to the server.
The information regarding the correct coefficients to use is ascertained from one or more of the data items. In step 13, a data table on the server containing a plurality of coefficients is queried and in step 15 the appropriate coefficients are retrieved from the data table for use in a data set equation. In step 17, the data set equation is populated with the coefficients and one or more of the data items in the data set and in step 19, the data set equation is executed thereby producing a data set result. The result of the data set equation is populated dynamically into the large data file in the calculation component by adding the result to the other data in spreadsheet format in the calculation component. Effectively, another column containing the result is added to the large data file in the calculation component. The data set equation is executed at a database level and this contributes to significantly speed up the processing of the large data file.
In step 21, a check is made to see if all of the data sets have been processed. If all of the data sets have been processed, the method proceeds to step 23. If all the data sets have not been processed, the steps 11 to 21 are repeated for the remaining data sets in the large data file until all data sets in the large data file have been processed. In step 23, the data set results of all the data set equations are combined into a data file result and the data file result is thereafter exported in step 25.
Typically, the large data file will be transmitted over a communications network, preferably through the internet, for processing. The large data file will be transmitted by a first party, the client, for partial processing by the second party, the server. The large data file content may be transferred through a web site or other dedicated portal. Once the second party has calculated the data file result, the second party will typically export the data file result to the first party over the communications network or alternatively they will make the result available to the first party by providing a link to the location of the data file result.
In one embodiment of the present invention, the large data file may already be provided in a spreadsheet format in which case, the structuring step may comprise either alone or in combination the steps of changing the provided spreadsheet format to an Excel � spreadsheet format and the step of reconfiguring the large data file into a uniform spreadsheet format suitable for processing.
It will be readily understood that the present invention could be applied in a wide range of activities where it is necessary to process large data files containing predominantly numeric data in an efficient manner. For example, one could envisage that the present invention could be used in an aerodynamics environment where a plurality of sensors is arranged about a body being tested. Each of the sensors periodically gathers measurements such as wind speed, direction, pressure and the like and the measurements from all of the sensors over a period of time are stored as data items in a large data file. In order to evaluate the aerodynamic performance of the body, the data from each of the sensors has to be individually analysed before the overall aerodynamic performance of the body can be determined. Depending on the location of the sensor about the body (a data item could identify the location), specific coefficients may be used in a data set equation to determine the impact of the measurements at that location on the aerodynamic profile of the body. The correct coefficients for the equation could then be obtained from the data table and those coefficients used along with the measurements from the sensor in the data set equation to determine drag or other property at that point on the body. All of the results could then be combined together to provide an overall result specifying the aerodynamic profile of the body.
Alternatively, it could be seen how the invention could be used in other areas where numerous calculations on numeric data must be carried out such as in an instrument to determine refunds to individuals where the individuals have been abroad and are entitled to a refund of certain taxes that have been paid while abroad. Depending on where the taxes were paid (a data item could identify the location) and when the taxes were paid (another data item could identify the date of payment), the individual may be entitled to a refund of different levels of tax and appropriate coefficients could be retrieved from a data table for insertion into a data set equation. Furthermore, different coefficients may apply to different types of goods or services as different tax rates would apply (again, a data item could be indicative of the good or service). The appropriate refund of tax for each purchase could be ascertained by inserting the data items and appropriate coefficients in a data set equation to obtain a data set result before an overall refund is calculated by combining the data set results.
Again, the above implementations are only representative of two areas that would particularly benefit from a method according to the invention and many other fields of endeavour would also benefit. The invention has been implemented using a PostgreSQL � v8.3.0 for the database on the server side, Visual Basic � 6 programming environment on the client side incorporating DynamiCube � version 3.0 and Farpoint Spread � Version 3.5 components. The step of transmitting a plurality of data items to the server comprises performing an embedded SQL call. In this way, no software code has to reside on the database server side. In addition to the embedded SQL call, a remote procedure call may be additionally used to transfer one or more data items to the server to reference data stored in the PostgreSQL database. It can be further appreciated that according to the present invention, although the data never resides on the server, a significant portion of the calculations may be carried out on the server side.
Using this implementation, speed-up figures for processing a large data file of the order of a factor of two have been achieved, thereby decreasing the processing time by half.
What is important is that the invention is particularly useful where there are a large number of calculations to be carried out on a large amount of data and the calculations may be carried out in an efficient manner with the minimum of resources available. By arranging the data sets in a spreadsheet format and then carrying out the necessary calculations at a database level dynamically in the spreadsheet format, this is achieved.
In this specification the terms "comprise, comprises, comprised and comprising" and the terms "include, includes, included and including" are all deemed totally interchangeable and should be afforded the widest possible interpretation.
The invention is in no way limited to the embodiment hereinbefore described but may be varied in both construction and detail within the scope of the specification.
GB0910038A 2009-06-11 2009-06-11 Converting a large data file to a spreadsheet format to allow processing using remote procedure calls Withdrawn GB2470943A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0910038A GB2470943A (en) 2009-06-11 2009-06-11 Converting a large data file to a spreadsheet format to allow processing using remote procedure calls

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0910038A GB2470943A (en) 2009-06-11 2009-06-11 Converting a large data file to a spreadsheet format to allow processing using remote procedure calls

Publications (2)

Publication Number Publication Date
GB0910038D0 GB0910038D0 (en) 2009-07-22
GB2470943A true GB2470943A (en) 2010-12-15

Family

ID=40937230

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0910038A Withdrawn GB2470943A (en) 2009-06-11 2009-06-11 Converting a large data file to a spreadsheet format to allow processing using remote procedure calls

Country Status (1)

Country Link
GB (1) GB2470943A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0668032A (en) * 1992-08-19 1994-03-11 Toshiba Corp Data base system
US5734889A (en) * 1993-07-29 1998-03-31 Nec Corporation Method and apparatus for retrieving data and inputting retrieved data to spreadsheet including descriptive sentence input means and natural language interface means
US20050039114A1 (en) * 2003-07-16 2005-02-17 Oracle International Corporation Spreadsheet to SQL translation
US20050267868A1 (en) * 1999-05-28 2005-12-01 Microstrategy, Incorporated System and method for OLAP report generation with spreadsheet report within the network user interface
EP1622016A2 (en) * 2004-07-30 2006-02-01 Microsoft Corporation Method, System, and Apparatus for Providing Access to Workbook Models Through Remote Function Calls
EP1672582A1 (en) * 2004-12-20 2006-06-21 Microsoft Corporation Processing real time data using a spreadsheet application on a server

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0668032A (en) * 1992-08-19 1994-03-11 Toshiba Corp Data base system
US5734889A (en) * 1993-07-29 1998-03-31 Nec Corporation Method and apparatus for retrieving data and inputting retrieved data to spreadsheet including descriptive sentence input means and natural language interface means
US20050267868A1 (en) * 1999-05-28 2005-12-01 Microstrategy, Incorporated System and method for OLAP report generation with spreadsheet report within the network user interface
US20050039114A1 (en) * 2003-07-16 2005-02-17 Oracle International Corporation Spreadsheet to SQL translation
EP1622016A2 (en) * 2004-07-30 2006-02-01 Microsoft Corporation Method, System, and Apparatus for Providing Access to Workbook Models Through Remote Function Calls
EP1672582A1 (en) * 2004-12-20 2006-06-21 Microsoft Corporation Processing real time data using a spreadsheet application on a server

Also Published As

Publication number Publication date
GB0910038D0 (en) 2009-07-22

Similar Documents

Publication Publication Date Title
Avolio et al. A comprehensive approach to analyzing community dynamics using rank abundance curves
Yuan et al. Global-scale latitudinal patterns of plant fine-root nitrogen and phosphorus
Woodall et al. Methods and equations for estimating aboveground volume, biomass, and carbon for trees in the US forest inventory, 2010
Heilmann‐Clausen et al. Communities of wood‐inhabiting bryophytes and fungi on dead beech logs in Europe–reflecting substrate quality or shaped by climate and forest conditions?
Pommier et al. Applying FAIR principles to plant phenotypic data management in GnpIS
Heikkinen et al. Testing hypotheses on shape and distribution of ecological response curves
Walk et al. Modelling applicability of fractal analysis to efficiency of soil exploration by roots
Lai et al. Age-related trends in genetic parameters for Larix kaempferi and their implications for early selection
Schaefer et al. Oribatid mites show that soil food web complexity and close aboveground-belowground linkages emerged in the early Paleozoic
Tavankar et al. Soil natural recovery process and Fagus orientalis lipsky seedling growth after timber extraction by wheeled skidder
CASTELAN‐ESTRADA et al. Allometric Relationships to Estimate Seasonal Above‐ground Vegetative and Reproductive Biomass of Vitis vinifera L.
Martins et al. Fine roots stimulate nutrient release during early stages of leaf litter decomposition in a Central Amazon rainforest
Lindner et al. Proposal of a unified biodiversity impact assessment method
Spinelli et al. Decreasing the fuel consumption and CO2 emissions of excavator-based harvesters with a machine control system
Williams et al. Nitrogen use efficiency in parent vs. hybrid canola under varying nitrogen availabilities
Manschadi et al. Full parameterisation matters for the best performance of crop models: inter-comparison of a simple and a detailed maize model
Yan Estimation of the optimal number of replicates in crop variety trials
McMahon et al. Management intensification maintains wood production over multiple harvests in tropical Eucalyptus plantations
Rotundo et al. Development of a decision-making application for optimum soybean and maize fertilization strategies in Mato Grosso
Melesse et al. Variation in growth potential between hybrid clones of Eucalyptus trees in eastern South Africa
Wang et al. Effects of nutrient heterogeneity on root foraging and plant growth at the individual and community level
Schäfer et al. Modeling root loss reveals impacts on nutrient uptake and crop development
Bueno-López et al. Nonlinear mixed model approaches to estimating merchantable bole volume for Pinus occidentalis
Valadares et al. Modeling rhizosphere carbon and nitrogen cycling in Eucalyptus plantation soil
JP4866090B2 (en) Chart creation device, program

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)