CN104268136A - Record grouping method and device - Google Patents

Record grouping method and device Download PDF

Info

Publication number
CN104268136A
CN104268136A CN201310341709.3A CN201310341709A CN104268136A CN 104268136 A CN104268136 A CN 104268136A CN 201310341709 A CN201310341709 A CN 201310341709A CN 104268136 A CN104268136 A CN 104268136A
Authority
CN
China
Prior art keywords
record
user
logical order
algorithm
raw readings
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310341709.3A
Other languages
Chinese (zh)
Inventor
边旭
贾西贝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huaao Data Technology Co Ltd
Original Assignee
Shenzhen Huaao Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huaao Data Technology Co Ltd filed Critical Shenzhen Huaao Data Technology Co Ltd
Priority to CN201310341709.3A priority Critical patent/CN104268136A/en
Publication of CN104268136A publication Critical patent/CN104268136A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification

Abstract

The invention provides a record grouping method. The method includes: acquiring original record sets; acquiring a logical command which is input by a user through script compiling; generating intermediate record sets from the original record sets according to the logical command; generating a final record pair set from the intermediate record sets through a record pair generation algorithm. An embodiment of the invention further provides a record grouping device. The user is allowed to input the logical command, more suitable for the original record set, through the MRL (markup recipe language), the original record sets are grouped faster according to the logical command, and grouping efficiency is improved.

Description

A kind of record group technology and device
Technical field
The application relates to Data Matching technical field, particularly relates to a kind of record group technology and device.
Background technology
In the activity in production of enterprise, a large amount of business datums can be produced, in fact enterprise be difficult to guarantee its accumulate the quality of data, its reason is diversified, as: the multiple describing mode etc. of typing mistake, integrity constraint disappearance, information, more complicated, the value used when separate data source not only states identical entity is not identical, even storage organization, yet not identical about the basic assumption of data.The activity in production of enterprise is again based on its data, arrives greatly market analysis, decision-making, little of service inquiry, is all the operation on business datum.The quality of institute of obvious enterprise cumulative data does not ensure, it operates and also cannot ensure.In order to reach the object of enterprise to business datum duplicate removal, a kind of efficient, accurate, automatic matching process of needs invention, finds the different records describing same entity rapidly and accurately.
Problem in the above-mentioned application of solution, prior art proposes a kind of record matching method, mainly comprise: first carry out grouping to standardized recorded set and obtain record to set; Then record through set and comparison and decision making algorithm acquisition matched record pair.
In this record matching method, need in grouping process, by preset algorithm, raw readings set is formed record to set.Also be that the algorithm that relies on of grouping process is set in advance in bottom, cannot change, the algorithm that cannot be optimized for the set of records ends of different industries user be arranged, uses inconvenience, also may reduce grouping efficiency.
Summary of the invention
Technical problems to be solved in this application are to provide a kind of group technology that records can carry out grouping calculating acquisition record to set according to the grouping algorithm of user's input to set of records ends.
Accordingly, present invention also provides kind of a record apparatus for grouping.
In order to solve the problem, this application discloses a kind of record group technology, comprising:
Obtain raw readings set;
Obtain the logical order that user is inputted by script compile;
Raw readings set is utilized to produce intermediate record set according to described logical order;
Described intermediate record is generated final entry to set by record to generating algorithm to set.
Further, the set of described acquisition raw readings comprises:
Obtain the raw readings set through standardization.
Further, the logical order that described acquisition user is inputted by script compile comprises:
Obtain the logical order that user is stated by the form of conditional expression; Described conditional expression includes but not limited to arithmetic, comparison operation, Boolean calculation and character string function.
Further, the logical order that described acquisition user is inputted by script compile comprises:
Obtain the logical order of user by preset MRL language in-put.
Further, described by intermediate record to set by record to generating algorithm generate final entry set is comprised:
Described intermediate record is generated final entry to set to set by set operation or Block algorithm or SortedWindow algorithm.
Present invention also provides a kind of record apparatus for grouping, comprising:
Record acquisition module, for obtaining raw readings set;
Order acquisition module, for obtaining the logical order that user is inputted by script compile;
Intermediate record set generation module, produces intermediate record set for utilizing raw readings set according to described logical order;
Record generation module, for described intermediate record is generated final entry to set by record to generating algorithm to set.
Further, described record acquisition module also for:
Obtain the raw readings set through standardization.
Further, described order acquisition module also for:
Obtain the logical order that user is stated by the form of conditional expression; Described conditional expression includes but not limited to arithmetic, comparison operation, Boolean calculation and character string function.
Further, described order acquisition module also for:
Obtain the logical order of user by preset MRL language in-put.
Further, described record to generation module also for:
Described intermediate record is generated final entry to set to set by set operation or Block algorithm or SortedWindow algorithm.
Compared with prior art, the application comprises following advantage: allow user's input to be more suitable for the logical order of current raw readings set by MRL language, divide into groups faster, improve grouping efficiency according to this logical order to raw readings set.
Accompanying drawing explanation
Fig. 1 is a kind of schematic flow sheet recording group technology one embodiment of the present invention;
Fig. 2 is a kind of logical schematic recording group technology one embodiment of the present invention;
Fig. 3 is a kind of structural representation recording apparatus for grouping one embodiment of the present invention.
Embodiment
For enabling above-mentioned purpose, the feature and advantage of the application more become apparent, below in conjunction with the drawings and specific embodiments, the application is described in further detail.
With reference to Fig. 1, show the application's one record group technology, comprising:
Step S101, the set of acquisition raw readings;
Further, the set of described acquisition raw readings can obtain the raw readings set through standardization.
In the embodiment of the present invention, the set of records ends of initial input can pass through particular data standardization flow process, forms standardized raw readings set, to facilitate follow-up packet transaction.
The logical order that step S102, acquisition user are inputted by script compile;
Further, described acquisition user can obtain by the logical order that script compile inputs the logical order that user stated by the form of conditional expression; Described conditional expression includes but not limited to arithmetic, comparison operation, Boolean calculation and character string function.
Further, described acquisition user can obtain the logical order of user by preset MRL language in-put by the logical order that script compile inputs.
Because existing grouping algorithm is the algorithm customized substantially, the algorithm of its customization can express various logic according to service needed, but it does not have versatility, the algorithm being applicable to a certain special scenes directly cannot be used in other scene.State being grouped into example with the identification of nomenclature of drug below; In the grouping process of nomenclature of drug set of records ends, the raw readings set that input is is nomenclature of drug table, attribute comprises: authentication code (NO), medicine name (DName), drugmaker's code (DCID), formulation (DForm), specification (DSize), and target is identified by the difference record of the same medicine of expressing under same company.
Existing algorithm such as SortedWindow, Block etc. are beyond expression following service logic: by authentication code for the record of empty (isNotNull) is according to authentication code piecemeal (Block), that is: the record that authentication code is identical produce record to, that different not the producing of authentication code is recorded is right; The record that authentication code is empty (isNull) can and other records all produce record right.Similar service logic all can exist in a lot of actual scenes, but general Block algorithm can not process the problem of above-mentioned scene, mainly owing to using general Block algorithm effective object to be raw readings set, and business scenario needs to use Block algorithm in the subset meeting a certain condition.
The MRL language that the embodiment of the present invention adopts can be expressed in 1. raw readings set and in set of records ends (containing subset), be produced record to set according to service logic generation subset 2. according to service logic, and therefore MRL language can support that above-mentioned service logic is expressed.Its expression way is as follows:
A=NO.isNotNull (); The record composition set A that // authentication code is not empty
B=NO.isNull (); // be empty record composition set B by accurate code
PL=Block (A.NO); // by record in A according to authentication code produce record right, composition record to set PL
PL=PL.Union (A.Cross (B)); // record of A and B formed between two and records right, result is joined in PL
PL=PL.Union (B.toPair ()); // record in B formed between two and records right, result is joined in PL
Retum PL; // return recording is to set PL
Step S103, raw readings set is utilized to produce intermediate record set according to described logical order;
The intermediate record set being utilized MRL language can produce user clear, simply as seen to need to use by step S102, and combines set according to service needed, as " A=NO.isNotNull (); " statement expresses is exactly the intermediate record set A of smoothing out with the fingers authentication code (NO) not empty (isNotNull), namely utilizes the order of user's input logic to produce intermediate record set; " PL=PL.Union (A.Cross (B)); " statement express be exactly smooth out with the fingers in A each record with B in each record generation record right, its realize by cartesian product operation Cross realize, namely utilize service logic smooth out with the fingers set of records ends composition record to set.
In embodiments of the present invention, raw readings set needs first to be undertaken calculating intermediate record set by the logical order that script compile inputs by user, then smoothing out with the fingers intermediate record set utilizes preset record to obtain record to set to generating algorithm, and logic as shown in Figure 2.
Step S104, smooth out with the fingers described intermediate record to set by record to generating algorithm generate final entry to set.
Further, the embodiment of the present invention can be smoothed out with the fingers described intermediate record and be generated final entry to set to set by set operation or Block algorithm or SortedWindow algorithm.
The embodiment of the present invention allows user's input to be more suitable for the logical order of current raw readings set by MRL language, divides into groups faster, improve grouping efficiency according to this logical order to raw readings set.
For device embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar part mutually see.
Be illustrated in figure 3 a kind of structural representation recording apparatus for grouping one embodiment of the present invention, described device comprises:
Record acquisition module 31, for obtaining raw readings set:
Order acquisition module 32, for obtaining the logical order that user is inputted by script compile:
Intermediate record set generation module 33, produces intermediate record set for utilizing raw readings set according to described logical order:
Record generation module 34, for described intermediate record is generated final entry to set by record to generating algorithm to set.
Further.Described record acquisition module 31 also for:
Obtain the raw readings set through standardization.
Further, described order acquisition module 32 also for:
Obtain the logical order that user is stated by the form of conditional expression: described conditional expression includes but not limited to arithmetic, comparison operation, Boolean calculation and character string function.
Further, described order acquisition module 32 also for:
Obtain the logical order of user by preset MRL language in-put.
Further. described record to generation module 34 also for two
Described intermediate record is generated final entry to set to set by set operation or Block algorithm or SortedWindow algorithm.
In the embodiment of the present invention, what record acquisition module 31 obtained is form standardized raw readings set by particular data standardization flow process, to facilitate follow-up packet transaction.
Because existing grouping algorithm is the algorithm customized substantially, the algorithm of its customization can express various logic according to service needed, but it does not have versatility, cannot smooth out with the fingers the algorithm being applicable to a certain special scenes and directly use in the scene of another sunset L.Below smooth out with the fingers and be grouped into example with the identification of nomenclature of drug and state: in the grouping process of nomenclature of drug set of records ends, the raw readings set that input is is nomenclature of drug table, attribute comprises: authentication code (NO), medicine name (DName), drugmaker's code (DCID), formulation (DForm), specification (DSize), and target is identified by the difference record of the same medicine of expressing under same company.
Existing algorithm such as SortedWindow, Block etc. are beyond expression following service logic: smooth out with the fingers authentication code for the record of empty (isNotNull) is according to authentication code piecemeal (Block), the record that namely two authentication codes are identical produce record to, that different not the producing of authentication code is recorded is right: authentication code be empty (isNull) record can and other records all generation record right.Similar service logic exists
All can exist in a lot of actual scenes, but general Block algorithm can not process the problem of above-mentioned scene, mainly owing to using general Block algorithm effective object to be raw readings set, and business scenario needs to use Block algorithm in the subset meeting a certain condition.
The MRL language that in the embodiment of the present invention, order acquisition module 32 adopts can be expressed in 1. raw readings set and in set of records ends (containing subset), be produced record to set according to service logic generation subset 2. according to service logic, and therefore MRL language can support that above-mentioned service logic is expressed.Its expression way is as follows:
A=NO.isNotNull (); The record composition set A that // authentication code is not empty
B=NO.isNull (); // be empty record composition set B by accurate code
PL=Block (A.NO); // by record in A according to authentication code produce record right, composition record to set PL
PL=PL.Union (A.Cross (B)); // record of A and B formed between two and records right, result is joined in PL
PL=PL.Union (B.toPair ()); // record in B formed between two and records right, result is joined in PL
Return PL; // return recording is to set PL
The intermediate record set that intermediate record set generation module 33 utilizes MRL language can produce user clear, simply to need to use, and combines set according to service needed, as " A=NO.isNotNull (); " statement expresses is exactly by the intermediate record set A of not empty for authentication code (NO) (isNotNull), namely utilizes user's input logic order generation intermediate record set; " PL=PL.Union (A.Cross (B)); " statement expresses is exactly each in A record and each in B records generation to record right, it realizes being realized by cartesian product operation Cross, namely utilizes service logic to record set of records ends composition to set.
The one record group technology above the application provided and device, be described in detail, apply specific case herein to set forth the principle of the application and embodiment, the explanation of above embodiment is just for helping method and the core concept thereof of understanding the application; Meanwhile, for one of ordinary skill in the art, according to the thought of the application, all will change in specific embodiments and applications, in sum, this description should not be construed as the restriction to the application.

Claims (10)

1. record a group technology, it is characterized in that, comprising:
Obtain raw readings set;
Obtain the logical order that user is inputted by script compile;
Raw readings set is utilized to produce intermediate record set according to described logical order;
Described intermediate record is generated final entry to set by record to generating algorithm to set.
2. method according to claim 1, is characterized in that, the set of described acquisition raw readings comprises:
Obtain the raw readings set through standardization.
3. method according to claim 1, is characterized in that, the logical order that described acquisition user is inputted by script compile comprises:
Obtain the logical order that user is stated by the form of conditional expression; Described conditional expression includes but not limited to arithmetic, comparison operation, Boolean calculation and character string function.
4. method according to claim 1, is characterized in that, the logical order that described acquisition user is inputted by script compile comprises:
Obtain the logical order of user by preset MRL language in-put.
5. the method according to claim 3 or 4, is characterized in that, described by intermediate record to set by record to generating algorithm generate final entry set is comprised:
Described intermediate record is generated final entry to set to set by set operation or Block algorithm or SortedWindow algorithm.
6. record an apparatus for grouping, it is characterized in that, comprising:
Record acquisition module, for obtaining raw readings set;
Order acquisition module, for obtaining the logical order that user is inputted by script compile;
Intermediate record set generation module, produces intermediate record set for utilizing raw readings set according to described logical order;
Record generation module, for described intermediate record is generated final entry to set by record to generating algorithm to set.
7. method according to claim 6, is characterized in that, described record acquisition module also for:
Obtain the raw readings set through standardization.
8. device according to claim 6, is characterized in that, described order acquisition module also for:
Obtain the logical order that user is stated by the form of conditional expression; Described conditional expression includes but not limited to arithmetic, comparison operation, Boolean calculation and character string function.
9. device according to claim 6, is characterized in that, described order acquisition module also for:
Obtain the logical order of user by preset MRL language in-put.
10. device according to claim 8 or claim 9, is characterized in that, described record to generation module also for:
Described intermediate record is generated final entry to set to set by set operation or Block algorithm or Sorted Window algorithm.
CN201310341709.3A 2013-07-30 2013-07-30 Record grouping method and device Pending CN104268136A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310341709.3A CN104268136A (en) 2013-07-30 2013-07-30 Record grouping method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310341709.3A CN104268136A (en) 2013-07-30 2013-07-30 Record grouping method and device

Publications (1)

Publication Number Publication Date
CN104268136A true CN104268136A (en) 2015-01-07

Family

ID=52159658

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310341709.3A Pending CN104268136A (en) 2013-07-30 2013-07-30 Record grouping method and device

Country Status (1)

Country Link
CN (1) CN104268136A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286210A (en) * 2007-04-11 2008-10-15 中国科学院地理科学与资源研究所 Populace space distribution numerical simulation system
US20100017436A1 (en) * 2008-07-18 2010-01-21 Qliktech International Ab Method and Apparatus for Extracting Information From a Database
CN101635001A (en) * 2008-07-18 2010-01-27 QlikTech国际公司 Method and apparatus for extracting information from a database

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286210A (en) * 2007-04-11 2008-10-15 中国科学院地理科学与资源研究所 Populace space distribution numerical simulation system
US20100017436A1 (en) * 2008-07-18 2010-01-21 Qliktech International Ab Method and Apparatus for Extracting Information From a Database
CN101635001A (en) * 2008-07-18 2010-01-27 QlikTech国际公司 Method and apparatus for extracting information from a database

Similar Documents

Publication Publication Date Title
US11296961B2 (en) Simplified entity lifecycle management
JP6523354B2 (en) State machine builder with improved interface and handling of state independent events
US20220070171A1 (en) Hierarchical permissions model within a document
GB2574905A (en) Pipeline template configuration in a data processing system
US9367652B2 (en) Cross-domain data artifacts consolidation in model context
CN103309904A (en) Method and device for generating data warehouse ETL (Extraction, Transformation and Loading) codes
CN105956087A (en) Data and code version management system and method
CN104572068A (en) Catalog driven order management for rule definition
CN105446952B (en) For handling the method and system of semantic segment
CN108536718A (en) A kind of method and system for the IT application in management realized based on input and output semantization
CN108153729A (en) A kind of Knowledge Extraction Method towards financial field
US20170075333A1 (en) Planning for manufacturing environments
Rademacher et al. Deriving microservice code from underspecified domain models using DevOps-enabled modeling languages and model transformations
Stark et al. Intelligent information technologies to enable next generation PLM
CN103530134A (en) Configurable software platform structure
CN108268615A (en) A kind of data processing method, device and system
CN109408601B (en) Data model conversion method based on graph data and graph data structure converter
Zacharewicz et al. Simulation-Based Enterprise Management: Model Driven from Business Process to Simulation
US9069373B2 (en) Generating a runtime framework
CN108132936A (en) Data lead-in method and device
Kulkarni et al. Modelling and enterprises-the past, the present and the future
CN104268136A (en) Record grouping method and device
Cao et al. Mining change operations for workflow platform as a service
CN107589946A (en) A kind of Net silver gas producing formation code automatic generation method and device
CN103885762B (en) A kind of file development device assembled based on file dictionary element and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150107