CN104376055B - A kind of large-sized model data comparing method based on allocation methods - Google Patents

A kind of large-sized model data comparing method based on allocation methods Download PDF

Info

Publication number
CN104376055B
CN104376055B CN201410614042.4A CN201410614042A CN104376055B CN 104376055 B CN104376055 B CN 104376055B CN 201410614042 A CN201410614042 A CN 201410614042A CN 104376055 B CN104376055 B CN 104376055B
Authority
CN
China
Prior art keywords
burst
record
data
records
num
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410614042.4A
Other languages
Chinese (zh)
Other versions
CN104376055A (en
Inventor
王昌频
王飞
季惠英
季学纯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co Ltd
Nari Technology Co Ltd
Original Assignee
State Grid Jiangsu Electric Power Co Ltd
Nari Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co Ltd, Nari Technology Co Ltd filed Critical State Grid Jiangsu Electric Power Co Ltd
Priority to CN201410614042.4A priority Critical patent/CN104376055B/en
Publication of CN104376055A publication Critical patent/CN104376055A/en
Application granted granted Critical
Publication of CN104376055B publication Critical patent/CN104376055B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of large-sized model data comparing method based on allocation methods, including following steps:Timeslicing parameters are set;All keywords of reference data sources are taken out, are arranged from small to large ord, and are deposited into keyword array;The record number in burst number fragment_num and each burst is calculated, then the head and the tail key value of each burst is sequentially obtained from keyword array;Start a worker thread for each burst, each worker thread obtains corresponding data content from reference data sources and data source to be compared respectively;Each worker thread compares line by line distributes to the data content of oneself, and records difference results;After all working thread process terminates, fragment_num difference results are obtained, all differences result is converged for final difference results.The present invention is applied can increase substantially large-sized model data relative efficiency in two systems or two databases.

Description

A kind of large-sized model data comparing method based on allocation methods
Technical field
The present invention relates to a kind of large-sized model data comparing method based on allocation methods, belong to Automation of Electric Systems distribution Pessimistic concurrency control administrative skill field.
Background technology
Electricity distribution network model data volume is than larger, and the record number of a model table is likely to reach million ranks.For this The table of the order of magnitude is planted, traditional single workflow manner of comparison there may be the problem of comparison procedure is time-consuming longer.
The content of the invention
In view of the deficienciess of the prior art, being applied it is an object of the present invention to provide one kind in two systems or two databases In can increase substantially the large-sized model data comparing method based on allocation methods of large-sized model data relative efficiency.
To achieve these goals, the present invention is to realize by the following technical solutions:
A kind of large-sized model data comparing method based on allocation methods of the present invention, specifically includes following steps:
(1) timeslicing parameters are set, and the timeslicing parameters support two kinds of set-up modes:Set and big by data block by record number It is small to set;If timeslicing parameters are set to by data block size, if data block size is m, if every record in data source to be compared Length be k, if the record number up to n that each burst is included, then can obtain n=m/k;If timeslicing parameters are set to by note Number is recorded, then n is most record numbers that each burst is included;
(2) all keywords of reference data sources are taken out, are arranged from small to large ord, and are deposited to keyword array In, the keyword array size is the total number of records record_sum in the reference data sources;
(3) the record number in burst number fragment_num and each burst is calculated, then sequentially from keyword array The middle head and the tail key value for obtaining each burst, that is, obtain burst information;
Fragment_num=record_sum/n+ (record_sum%n!=0)
If total number of records record_sum is n integral multiple, record number is n in each burst obtained;
If total number of records record_sum is not n integral multiple, in preceding fragment_num-1 burst, each burst Record number be n, the distribution of remaining record number is in last burst;
(4) a worker thread is started for each burst, according to corresponding burst information, each worker thread is respectively from base Corresponding data content is obtained in quasi- data source and data source to be compared;
(5) each worker thread is relatively distributed to the data content of oneself by domain line by line, and records difference results;
(6) after all working thread process terminates, fragment_num difference results are obtained, all differences result is pressed Keyword converges for a result, as final difference results from small to large.
Above-mentioned difference results are described comprising differentiated identification and difference content;The differentiated identification includes insertion, renewal, deletion Three kinds of marks;If certain records and has in nothing in data source to be compared, reference data sources, then the differentiated identification is insertion;If certain Record has in data source to be compared, nothing in reference data sources, then the differentiated identification is deletion;If certain record is in data to be compared Keyword is consistent in source and reference data sources, but content is inconsistent, then the differentiated identification is renewal;On the basis of difference content description Corresponding data record in data source and data source to be compared.
There is provided set by record number and set both timeslicing parameters set-up modes to protect by data block size in the present invention The flexibility of burst is hindered;The division of each burst is carried out according to keyword, the non-intersect property and integrality of burst has been ensured, from And also just ensured irredundant and difference results the integrality of comparison procedure;Multiple worker threads are according to respective burst information It is read out data content simultaneously and compares, work will be compared and concurrently carry out improving overall relative efficiency;Use difference Mark and difference content description record the difference of data source record to be compared and reference data source record, so that according to difference results It can easily organize out to need synchronous SQL statement.
Brief description of the drawings
Fig. 1 is the large-sized model data comparing method workflow diagram based on allocation methods of the invention.
Embodiment
To be easy to understand the technical means, the inventive features, the objects and the advantages of the present invention, with reference to Embodiment, is expanded on further the present invention.
A kind of large-sized model data comparing method based on allocation methods of the present invention.Model table in distribution network system is general There is keyword, which provides the possibility for according to keywords carrying out burst comparison.The mould more present invention is generally directed to record number Type tables of data, is arranged as required to timeslicing parameters, is obtained further according to timeslicing parameters from reference data sources and data source to be compared Burst content, and multiple bursts are compared simultaneously, finally obtain difference results.Difference results are retouched by differentiated identification and difference content Composition is stated, differentiated identification has insertion, updates, deletes these three marks, and difference content is described as reference data sources and number to be compared According to the content information of source respective record.Difference results are generated according to reference data sources for data source to be compared.
Referring to Fig. 1, this method specifically includes following steps:
(1) specify and compare data source and model table to be compared, the type that data source is supported has database and data file Deng needing keyword in model table.Timeslicing parameters are arranged as required to, can set and also be set by data block size by record number Put.
If timeslicing parameters are set to by data block size, it is assumed that set the length that data block size is every record in m, the table The record number up to n included for k, corresponding each burst is spent, then can obtain n=m/k;If timeslicing parameters are set to by note Number is recorded, then n is the numerical value that this is set.
(2) all keywords of reference data sources are obtained, and by the arrangement of ascending order, storage to keyword array In, the array size is the total number of records record_sum in the reference data sources.
With reference to keyword array, burst information is obtained according to timeslicing parameters, including burst number, the head of each burst Tail key value.
(3) the record number in burst number and each burst is calculated, then each point is sequentially obtained from keyword array The head and the tail key value of piece;
Burst number fragment_num values should be:
Fragment_num=record_sum/n+ (record_sum%n!=0)
If total number of records record_sum n integral multiple, then it is n that number is recorded in each burst of acquisition;
Total number of records record_sum is not n integral multiple, then in preceding fragment_num-1 burst, each burst Record number be n, the distribution of remaining record number is in last burst.
(4) start a worker thread for each burst, obtain reference data sources according to corresponding burst information and wait to compare Compared with the corresponding contents of data source;
(5) each worker thread compares line by line distributes to the data content of oneself, and records difference results.Difference results In described comprising differentiated identification and difference content, differentiated identification comprising insertion, update, delete three kinds of marks, difference content description On the basis of in data source and data source to be compared respective record content information.
(6) after the completion for the treatment of that all working thread compares, fragment_num difference results are obtained, by all differences result Converge for a result, as final difference results.
The present invention operation principle be:
Present invention is generally directed to the more large-sized model data of the record number of the isomorphism in different system or disparate databases Table relatively and obtain difference results.Burst information is set according to the keyword of reference data sources, then from reference data sources with treating Compare acquisition burst content in data source, and compare multiple bursts simultaneously, finally obtain difference results.The method of the present invention is realized Burst comparison techniques, greatly improved the relative efficiency of the more large-sized model data of record number.
There is provided set and be arranged on by data block size the burst ensured to a certain extent by record number in the present invention Flexibility.The division of each burst is carried out according to keyword, the non-intersect property and integrality of burst has been ensured, so as to also just protect Irredundant and difference results the integrality of comparison procedure is hindered.Multiple worker threads are carried out simultaneously according to respective burst information Read data content and compare, work will be compared and concurrently carry out improving overall relative efficiency.Using in differentiated identification and difference Hold the difference that description records data source record to be compared and reference data source record, so as to can easily be organized according to difference results Go out to need synchronous SQL statement.
Using the method for the present invention, large-sized model table is compared using allocation methods, relative efficiency can be greatly improved. In the case of not considering machine performance and resource occupation, burst number is close to speed higher than burst premise after burst.
The general principle and principal character and advantages of the present invention of the present invention has been shown and described above.The technology of the industry Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the simply explanation described in above-described embodiment and specification is originally The principle of invention, without departing from the spirit and scope of the present invention, various changes and modifications of the present invention are possible, these changes Change and improvement all fall within the protetion scope of the claimed invention.The claimed scope of the invention by appended claims and its Equivalent thereof.

Claims (2)

1. a kind of large-sized model data comparing method based on allocation methods, it is characterised in that specifically include following steps:
(1) timeslicing parameters are set, and the timeslicing parameters support two kinds of set-up modes:Set and set by data block size by record number Put;
If timeslicing parameters are set to by data block size, if data block size is m, if the length that every records in data source to be compared Spend for k, if the record number up to n that each burst is included, then can obtain n=m/k;
If timeslicing parameters are set to by record number, n is most record numbers that each burst is included;
The n of two kinds of set-up modes value obtains integer value by the method that truncates;
(2) all keywords of reference data sources are taken out, are arranged from small to large ord, and are deposited into keyword array, institute It is the total number of records record_sum in the reference data sources to state keyword array size;
(3) the record number in burst number fragment_num and each burst is calculated, then sequentially obtain from keyword array The head and the tail key value of each burst is taken, that is, obtains burst information;
If total number of records record_sum is n integral multiple, record number is n, the calculating side of burst number in each burst Method is:Fragment_num=record_sum/n;
If total number of records record_sum is not n integral multiple, in preceding fragment_num-1 burst, the note of each burst Record number is n, and remaining record number distribution is in last burst, and the computational methods of burst number are:
Fragment_num=record_sum/n+1, wherein record_sum/n value obtain respective integer value by the method that truncates;
(4) a worker thread is started for each burst, according to corresponding burst information, each worker thread is respectively from base value According to obtaining corresponding data content in source and data source to be compared;
(5) each worker thread is relatively distributed to the data content of oneself by domain line by line, and records difference results;
(6) after all working thread process terminates, fragment_num difference results are obtained, by all differences result by key Word converges for a result, as final difference results from small to large.
2. the large-sized model data comparing method according to claim 1 based on allocation methods, it is characterised in that
The difference results are described comprising differentiated identification and difference content;
The differentiated identification includes insertion, renewal, three kinds of marks of deletion;If certain records nothing in data source to be compared, base value According to having in source, then the differentiated identification is insertion;If certain records and has in data source to be compared, nothing in reference data sources, then the difference It is different to be designated deletion;If certain record keyword in data source to be compared and reference data sources is consistent, but content is inconsistent, then The differentiated identification is renewal;
Difference content is described as corresponding data record in reference data sources and data source to be compared.
CN201410614042.4A 2014-11-04 2014-11-04 A kind of large-sized model data comparing method based on allocation methods Active CN104376055B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410614042.4A CN104376055B (en) 2014-11-04 2014-11-04 A kind of large-sized model data comparing method based on allocation methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410614042.4A CN104376055B (en) 2014-11-04 2014-11-04 A kind of large-sized model data comparing method based on allocation methods

Publications (2)

Publication Number Publication Date
CN104376055A CN104376055A (en) 2015-02-25
CN104376055B true CN104376055B (en) 2017-08-29

Family

ID=52554962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410614042.4A Active CN104376055B (en) 2014-11-04 2014-11-04 A kind of large-sized model data comparing method based on allocation methods

Country Status (1)

Country Link
CN (1) CN104376055B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106033427A (en) * 2015-03-11 2016-10-19 阿里巴巴集团控股有限公司 A sampling data verification method and device
CN105843886A (en) * 2016-03-21 2016-08-10 国电南瑞科技股份有限公司 Multi-thread based power grid offline model data query method
CN106777337A (en) * 2017-01-13 2017-05-31 山东浪潮商用系统有限公司 The management method of data model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1652116A (en) * 2005-03-29 2005-08-10 威盛电子股份有限公司 Database synchronous system and method
CN101236554A (en) * 2007-11-29 2008-08-06 中兴通讯股份有限公司 Database mass data comparison process
CN102467570A (en) * 2010-11-17 2012-05-23 日电(中国)有限公司 Connection query system and method for distributed data warehouse

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1708096A1 (en) * 2005-03-31 2006-10-04 Ubs Ag Computer Network System and Method for the Synchronisation of a Second Database with a First Database

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1652116A (en) * 2005-03-29 2005-08-10 威盛电子股份有限公司 Database synchronous system and method
CN101236554A (en) * 2007-11-29 2008-08-06 中兴通讯股份有限公司 Database mass data comparison process
CN102467570A (en) * 2010-11-17 2012-05-23 日电(中国)有限公司 Connection query system and method for distributed data warehouse

Also Published As

Publication number Publication date
CN104376055A (en) 2015-02-25

Similar Documents

Publication Publication Date Title
CN101329676B (en) Data paralleling abstracting method and apparatus and database system
CN102054007B (en) Searching method and searching device
US9195701B2 (en) System and method for flexible distributed massively parallel processing (MPP) database
CN102270225A (en) Data change log monitoring method and device
CN104268295B (en) A kind of data query method and device
CN110209728A (en) A kind of Distributed Heterogeneous Database synchronous method, electronic equipment and storage medium
CN103914483B (en) File memory method, device and file reading, device
CN102915382A (en) Method and device for carrying out data query on database based on indexes
CN104376055B (en) A kind of large-sized model data comparing method based on allocation methods
CN103226610B (en) Database table querying method and device
WO2019228015A1 (en) Index creating method and apparatus based on nosql database of mobile terminal
CN104715076B (en) A kind of data processing of multithread and device
CN106897281A (en) A kind of daily record sharding method and device
CN103780263B (en) Device and method of data compression and recording medium
CN103365923A (en) Method and device for assessing partition schemes of database
US20130290352A1 (en) Concatenation for relations
CN104298570B (en) Data processing method and device
CN106682047A (en) Method for importing data and related device
CN106156197A (en) The querying method of a kind of data base and device
CN104572730A (en) Method and device for importing and exporting digital resources
CN106776810A (en) The data handling system and method for a kind of big data
US9135300B1 (en) Efficient sampling with replacement
CN104461552B (en) The analytic method and resolver of bar code attribute
KR20160047239A (en) The column group selection method for storing datea efficiently in the mixed olap/oltp workload environment
US9378229B1 (en) Index selection based on a compressed workload

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant