CN105894019A - Database data classification method and apparatus - Google Patents

Database data classification method and apparatus Download PDF

Info

Publication number
CN105894019A
CN105894019A CN201610190392.1A CN201610190392A CN105894019A CN 105894019 A CN105894019 A CN 105894019A CN 201610190392 A CN201610190392 A CN 201610190392A CN 105894019 A CN105894019 A CN 105894019A
Authority
CN
China
Prior art keywords
data
parameter
database
module
configuration parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610190392.1A
Other languages
Chinese (zh)
Inventor
刘朋飞
李爱华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201610190392.1A priority Critical patent/CN105894019A/en
Publication of CN105894019A publication Critical patent/CN105894019A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a database data classification method and apparatus, and relates to the field of data processing. The apparatus comprises a parameter configuration module for setting configuration parameters including polymerization parameters and classification parameters, a general framework module for generating a data classification module based on a general calculating template according to the configuration parameters, and the data classification module for calling the data in a database to classifying data according to configuration parameters, wherein the general framework module has variables corresponding to the configuration parameters. Data classification efficiency is improved. Since a general calculating template is used, only simple configuration is needed. The expendability and universality of the apparatus are substantially improved.

Description

A kind of database data sorter and method
Technical field
The present invention relates to data processing field, particularly relate to a kind of database data sorter and Method.
Background technology
At present, along with the increasingly extensive of database application and the day of the class of business of different field Benefit is abundant, more and more important based on database processing mass data.Database would generally preserve All kinds, the data of various dimension, but along with the rapid expansion of data scale, every profession and trade is amassed Tired data volume is more and more huger, and the data dimension that the personnel of different demand are paid close attention to is different. In traditional data categorizing system, when the data called in database carry out data classification, need Different programs is write, it is impossible to realize autonomous configuration and automate realization for different dimensions Procedure designs, and causes the problem such as computational efficiency low, poor universality when carrying out data classification.
Summary of the invention
The technical problem that the invention solves the problems that is to provide a kind of database data sorter With efficiency and the versatility that method improves data classification.
According to an aspect of the present invention, it is provided that a kind of database data sorter, including parameter Configuration module, is used for arranging configuration parameter, and configuration parameter includes polymerization parameter and sorting parameter; General framework module, for generating data classification mould based on general-purpose computations template according to configuration parameter Block, wherein, general-purpose computations template has the variable corresponding with configuration parameter;Data categorization module, Data in calling database carry out data classification according to configuration parameter.
Alternatively, this device also includes: result verification module, pre-for according to configuration parameter Issue is according to distribution verification and categorization of data result.
Alternatively, configuration parameter also includes threshold parameter;General framework module is for joining polymerization Number, sorting parameter and threshold parameter, as the variable of general-purpose computations template, generate data classification mould Block.
Alternatively, the number that data categorization module is corresponding with polymerization parameter in calling database According to, according to sorting parameter, the data corresponding with polymerization parameter are classified according to threshold parameter.
Alternatively, the number that data classification model is corresponding with polymerization parameter in calling database According to, generate threshold parameter according to the data corresponding with polymerization parameter and sorting parameter, and will be with The corresponding data of polymerization parameter are classified according to threshold parameter.
Alternatively, result verification module is used for judging whether data classification results meets according to configuration The anticipatory data distribution of parameter, if data classification results meets the anticipatory data according to configuration parameter Distribution, then send data to database and preserve instruction, otherwise, send ginseng to parameter configuration module Number regulating command.
Alternatively, this device also includes data buffer storage layer module;Data buffer storage layer module is used for Judge whether data classification results meets storage data before the anticipatory data according to configuration parameter is distributed Classification results.
Alternatively, result verification module be used for sorting data into result be averaged value calculate, mark Quasi-difference calculates and/or quantile distribution calculates, it is judged that whether the data after calculating meet according to configuration The anticipatory data distribution of parameter.
Alternatively, this device also includes Verification module;Verification module is joined for checking Put the legitimacy of parameter.
According to a further aspect in the invention, also provide for a kind of database data sorting technique, including: Receiving the configuration parameter of user setup, configuration parameter includes polymerization parameter and sorting parameter;Based on General-purpose computations template generates data classified calculating program, wherein, general-purpose computations according to configuration parameter Template has the variable corresponding with configuration parameter;By data classified calculating routine call database Interior data carry out data classification according to configuration parameter.
Alternatively, the method also includes: be distributed checking data according to the anticipatory data of configuration parameter Classification results.
Alternatively, configuration parameter also includes threshold parameter;Based on general-purpose computations template according to configuration Parameter generates the step of data classified calculating program and includes: by polymerization parameter, sorting parameter and threshold Value parameter generates data classified calculating program as the variable of general-purpose computations template.
Alternatively, by the data in data classified calculating routine call database according to described in join Put parameter to carry out the step of data classification and include: by data classified calculating routine call database The interior data corresponding with polymerization parameter;According to sorting parameter by the number corresponding with polymerization parameter Classify according to according to threshold parameter.
Alternatively, by the data in data classified calculating routine call database according to described in join Put parameter to carry out the step of data classification and include: by data classified calculating routine call database The interior data corresponding with polymerization parameter;According to the data corresponding with polymerization parameter and classification ginseng Number generates threshold parameters, and the data corresponding with polymerization parameter is carried out point according to threshold parameter Class.
Alternatively, the anticipatory data according to configuration parameter is distributed the step of verification and categorization of data result Including: judge whether data classification results meets the anticipatory data according to configuration parameter and be distributed;If Meeting, then sort data into result and be saved in database, otherwise, self-adaptative adjustment threshold value is joined Number.
Alternatively, it is judged that whether data classification results meets is divided according to the anticipatory data of configuration parameter Include before the step of cloth: sort data into result and be saved in data buffer storage layer.
Alternatively, it is judged that whether data classification results meets is divided according to the anticipatory data of configuration parameter The step of cloth also includes: sort data into result be averaged value calculate, standard deviation calculate and/ Or quantile distribution calculates;Judge whether the data after calculating meet the expection according to configuration parameter Data are distributed.
Alternatively, by the data in data classified calculating routine call database according to described in join Also include before putting the step that parameter carries out data classification: configuration parameter is carried out legitimate verification.
Compared with prior art, the present invention is by arranging parameter configuration module, general framework module And data categorization module, general framework module generates according to configuration parameter based on general-purpose computations template Data categorization module, the data that data categorization module calls in database are carried out according to configuration parameter Data are classified.The efficiency that data divide can be improved, and owing to utilizing general calculation template, Only need simply to configure, substantially increase extensibility and the versatility of device.
By detailed description to the exemplary embodiment of the present invention referring to the drawings, the present invention Further feature and advantage will be made apparent from.
Accompanying drawing explanation
The accompanying drawing of the part constituting specification describes embodiments of the invention, and together with saying Bright book is together for explaining the principle of the present invention.
Referring to the drawings, according to detailed description below, the present invention can be more clearly understood from, Wherein:
Fig. 1 is the structural representation of an embodiment of database data sorter of the present invention.
Fig. 2 is the structural representation of another embodiment of database data sorter of the present invention Figure.
Fig. 3 is the structural representation of the further embodiment of database data sorter of the present invention Figure.
Fig. 4 is the schematic flow sheet of an embodiment of database data sorting technique of the present invention.
Fig. 5 is the flow process signal of another embodiment of database data sorting technique of the present invention Figure.
Fig. 6 is the flow process signal of the further embodiment of database data sorting technique of the present invention Figure.
Fig. 7 is the flow process signal of another embodiment of database data sorting technique of the present invention Figure.
Detailed description of the invention
The various exemplary embodiments of the present invention are described in detail now with reference to accompanying drawing.It should be noted that Arrive: unless specifically stated otherwise, the parts illustrated the most in these embodiments and the phase of step Layout, numerical expression and numerical value are not limited the scope of the invention.
Simultaneously, it should be appreciated that for the ease of describing, the chi of the various piece shown in accompanying drawing Very little is not to draw according to actual proportionate relationship.
Description only actually at least one exemplary embodiment is illustrative below, certainly Not as to the present invention and application thereof or any restriction of use.
May not make in detail for technology, method and apparatus known to person of ordinary skill in the relevant Thin discussion, but in the appropriate case, described technology, method and apparatus should be considered to authorize to be said A part for bright book.
Shown here with in all examples discussed, any occurrence should be construed as merely Exemplary rather than conduct limits.Therefore, other example of exemplary embodiment can have There is different values.
It should also be noted that similar label and letter expression similar terms in following accompanying drawing, therefore, The most a certain Xiang Yi accompanying drawing is defined, then need not it is carried out in accompanying drawing subsequently Discussed further.
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with tool Body embodiment, and referring to the drawings, the present invention is described in more detail.
Fig. 1 is the structural representation of an embodiment of database data sorter of the present invention. This universal database data sorter includes parameter configuration module 110, general framework module 120 and data categorization module 130.
Parameter configuration module 110 is used for arranging configuration parameter.
Concrete parameter can be configured according to the demand of business by parameter configuration module 110. For example, it is possible to arrange polymerization parameter and sorting parameter, polymerization ginseng by parameter configuration module 110 Number is referred to as topic parameter, is i.e. specifically designated and enters in database by what condition or planning Row data aggregate;Sorting parameter is referred to as index parameter, it is intended that by what index pair concrete Data in database calculate.
General framework module 120 is used for based on general-purpose computations template according to parameter configuration module 110 The configuration parameter arranged generates data categorization module 130.
Wherein, the variable of general-purpose computations template is corresponding with the parameter in parameter configuration module 110, Each parameter as the input quantity of general-purpose computations template, is automatically generated symbol by general framework module 120 Close the data calculation procedure of user's request.
Data categorization module 130 is for calling the data in database according to configuration parameter number According to division.
Wherein, the data calculation procedure in data categorization module 130 can be shell script or Standardized query language, by calling data corresponding with polymerization parameter in database, according to Data are divided by the setting of sorting parameter.
In this embodiment, the configuration parameter arranged is become as the input of common calculation module Amount, generates the calculation procedure meeting user's request, and utilizes in the routine call database of generation Data carry out data division, it is possible to increase the efficiency that data divide, and general owing to utilizing Calculation template, it is only necessary to simply configure, substantially increases extensibility and the versatility of device.
In another embodiment of the present invention, polymerization ginseng is set in parameter configuration module 110 Number, sorting parameter and threshold parameter.Wherein, threshold parameter mainly specifies concrete Classification Index The criteria for classifying, can be configured according to concrete demand.This module can be in advance by exploit person Member and business personnel are configured after carrying out business communication.Polymerization is joined by general framework module 120 Number, sorting parameter and threshold parameter generate data categorization module as the variable of general-purpose computations template 130.The data that data categorization module 130 is corresponding with polymerization parameter in calling database, according to The data corresponding with polymerization parameter are classified by sorting parameter according to threshold value.
In this embodiment, pre-set polymerization parameter, sorting parameter and threshold parameter etc., and Using the parameter of setting as the input variable of common calculation module, generate the meter meeting user's request Calculation program, and utilize data in the routine call database of generation to carry out data division, it is possible to carry The efficiency that high data divide, and owing to utilizing general calculation template, it is only necessary to simply join Put, substantially increase the extensibility of device.It addition, the present invention can also meet different user Variation, personalization and high frequency time calculate demand.
In another embodiment of the present invention, only polymerization is set in parameter configuration module 110 Parameter, sorting parameter.General framework module 120 using polymerization parameter and sorting parameter as general The input variable of calculation template generates data categorization module 130.Data categorization module 130 calls Data corresponding with polymerization parameter in database, automatic according to sorting parameter and the data called Generate threshold parameter, and the data corresponding with polymerization parameter are classified according to threshold value.Its In, the generation of threshold parameter can calculate automatically according to the method for quantile.Such as, database In the data maximums corresponding with polymerization parameter, sorting parameter be 1000, minimum of a value is 200, utilize quartile method, automatically generating threshold parameter is 400,600,800, i.e. according to Numerical value is 200~400,400~600, and 600~800,800~1000 pairs of data divide.This Field it will be appreciated by the skilled person that in this embodiment application quartile method be only exemplary Illustrate, concrete application can automatically generate threshold parameter according to many algorithms.
In this embodiment, polymerization parameter, sorting parameter etc., and the ginseng that will arrange are pre-set Count the input variable as common calculation module and generate the calculation procedure meeting user's request, and profit By data in the routine call database generated, automatically generate threshold parameter, and join according to threshold value Number carries out data division, it is possible to increase the efficiency that data divide, and owing to utilizing general meter Calculate template, it is only necessary to simply configure, substantially increase the extensibility of device.
In one embodiment, it is also possible to (this module is not in the drawings to include Verification module Display), the configuration parameter arranged can be carried out legitimate verification by Verification module, Such as, during data in subsequent calls database, by the field in scan database and word The bound of section carries out the validity of parameter and limits and the inspection of span legitimacy.Ensure that The personnel only having permission just can call the data in database.
Fig. 2 is the structural representation of another embodiment of database data sorter of the present invention Figure.This database data sorter include parameter configuration module 210, general framework module 220, Data categorization module 230 and result verification module 240.
Parameter configuration module 210, general framework module 220, data categorization module 230 exist The various embodiments described above are described in detail and are the most no longer expanded on further.
Result verification module 240 is distributed checking data for the anticipatory data according to configuration parameter and divides Class result.
Such as, judge whether data classification results meets according to joining by result verification module 240 Put the anticipatory data distribution of parameter, if data classification results meets the expected numbers according to configuration parameter According to distribution, then send data to database and preserve instruction, in order to database preserves data classification knot Really.If data classification results is unsatisfactory for the anticipatory data distribution according to configuration parameter, join to parameter Put module 210 and send parameter regulating command.
If parameter configuration module 210 has preset threshold parameter, then at parameter configuration mould Block resets threshold parameter, until data classification results meets according to polymerization parameter and classification The threshold data distribution of parameter.If parameter configuration module 210 is only provided with polymerization parameter and Sorting parameter, then data categorization module 230 is through repeatedly computing, until finding suitable threshold value Parameter, divides so that data classification results meets according to the threshold data of polymerization parameter and sorting parameter Cloth.
In one embodiment, result verification module 240 is additionally operable to sort data into result and carries out Mean value calculation, standard deviation calculate or quantile distribution calculates, it is judged that the data after calculating are The no anticipatory data met according to configuration parameter is distributed.This module can provide front-end interactive page, Being selected by staff, staff can data distribution situation rule of thumb and before The adjustment that threshold parameter is adapted.
This database data sorter can be processed into the api interface of standard, enters with database Row is mutual, and as application interface, the delivery outlet of final data classification results is supplied to demand Side calls.Data classification results can be processed into the structural data of standard, is stored in MySQL, or save as the forms such as hdfs file, hbase file, xml or text, Database or api interface is directly invoked, for downstream by the Internet data transmission agreement of standard Party in request uses.
In this embodiment, judge that data classification results is the most reasonable by result verification module, If data classification results is reasonable, then can preserve data classification results, if unreasonable, the most again Adjust parameter and carry out data division.This device constitutes the mechanism that can feed back, it is ensured that data The accuracy divided.
In another embodiment of the present invention, as it is shown on figure 3, database data sorter 300 can also include parameter configuration module 310, general framework module 320, data categorization module 330, data buffer storage layer module 340 and result verification module 350.
Parameter configuration module 310, general framework module 320, data categorization module 330, result Authentication module 350 has been described in detail in the various embodiments described above and has the most further explained State.
Data buffer storage layer module 340 is used for storing data temporary file, result verification module 350 Before carrying out data classification results checking, can directly invoke in data buffer storage layer module 340 The data classification results being computed, eliminate repeat peek from database every time, calculating waited Journey, has promoted in room and time efficiency calculating.It addition, in result verification module 350 After verification and categorization of data result is reasonable, sorts data into result transmission and carry out to database 360 Preserve, now can discharge the data preserved in data buffer storage layer module 340, Bu Huizhan in time By space and resource.
Fig. 4 is the schematic flow sheet of an embodiment of database data sorting technique of the present invention. This universal database data sorting technique comprises the following steps:
In step 410, receive the configuration parameter of user setup.
Wherein, user can configure concrete parameter according to the demand of business.For example, it is possible to Arranging polymerization parameter and sorting parameter, wherein polymerization parameter is referred to as topic parameter, i.e. has Body is specified and is carried out data aggregate in database by what condition or planning;Sorting parameter can also It is referred to as index parameter, it is intended that by what index concrete, data in database are calculated.
In one embodiment, the configuration parameter arranged can be carried out legitimate verification, such as, During data in subsequent calls database, upper by the field in scan database and field Lower limit carries out the validity of parameter and limits and the inspection of span legitimacy.
In step 420, generate data classified calculating based on general-purpose computations template according to configuration parameter Program.
Wherein it is possible to using each configuration parameter as the input quantity of general-purpose computations template, automatically generate Meet the data classified calculating program of user's request.This data classified calculating program can be script Program or standardized query language.
In step 430, by the data number in data classified calculating routine call database According to classification.
Such as, by corresponding with polymerization parameter in data classified calculating routine call database Data, according to the setting of sorting parameter, are divided by data.
In this embodiment, the configuration parameter arranged is become as the input of common calculation module Amount, generates the data classified calculating program meeting user's request, and utilizes the routine call of generation In database, data carry out data division, it is possible to increase the efficiency that data divide, and due to profit With general calculation template, it is only necessary to simply configure, substantially increase the extensibility of device.
Fig. 5 is the flow process signal of another embodiment of database data sorting technique of the present invention Figure.This database data sorting technique comprises the following steps:
In step 510, receive the polymerization parameter of user setup, sorting parameter and threshold parameter.
Wherein, polymerization parameter is specifically designated and carries out data in database by what condition or planning Polymerization;Sorting parameter, it is intended that calculate by what index concrete;Threshold parameter is mainly specified The criteria for classifying of concrete Classification Index, can be configured according to concrete demand.
In step 520, using polymerization parameter, sorting parameter and threshold parameter as general-purpose computations mould The variable of plate generates data classified calculating program.
In step 530, by data classified calculating routine call database with polymerization parameter phase The data corresponding with polymerization parameter are carried out according to threshold value by corresponding data according to sorting parameter Classification.
In this embodiment, pre-set polymerization parameter, sorting parameter and threshold parameter etc., and Using the parameter of setting as the input variable of common calculation module, generate the meter meeting user's request Calculation program, and utilize data in the routine call database of generation to carry out data division, it is possible to carry The efficiency that high data divide, and owing to utilizing general calculation template, it is only necessary to simply join Put, substantially increase the extensibility of device.It addition, the present invention can also meet different user Variation, personalization and high frequency time calculate demand.
Fig. 6 is the flow process signal of the further embodiment of database data sorting technique of the present invention Figure.This database data sorting technique comprises the following steps:
In step 610, receive polymerization parameter and the sorting parameter of user setup.
In step 620, polymerization parameter and sorting parameter are become as the input of general-purpose computations template Amount generates data classified calculating program.
In step 630, data corresponding with polymerization parameter in calling database, according to poly- Close the corresponding data of parameter and sorting parameter automatically generates threshold parameter.
In step 640, the data corresponding with polymerization parameter are classified according to threshold value.
In this embodiment, polymerization parameter, sorting parameter etc., and the ginseng that will arrange are pre-set Number, as the input variable of common calculation module, generates the calculation procedure meeting user's request, and Utilize data in the routine call database generated, automatically generate threshold parameter, and according to threshold value Parameter carries out data division, it is possible to increase the efficiency that data divide, and general owing to utilizing Calculation template, it is only necessary to simply configure, substantially increases the extensibility of device.
Fig. 7 is the flow process signal of another embodiment of database data sorting technique of the present invention Figure.This database data sorting technique can also include the anticipatory data distribution according to configuration parameter The step of verification and categorization of data result.
Such as, in step 710, the configuration parameter of user setup is received.
In step 720, generate data classified calculating based on general-purpose computations template according to configuration parameter Program.
In step 730, by the data in data classified calculating routine call database according to joining Put parameter and carry out data classification.
In one embodiment, after carrying out data classification, can first sort data into result data Cache layer, before carrying out data classification results checking, can directly invoke in data buffer storage layer The data classification results being computed, eliminate repeat peek from database every time, calculating waited Journey, has promoted in room and time efficiency calculating.
In step 740, it is judged that whether data classification results meets the expected numbers according to configuration parameter According to distribution.It is distributed if data classification results meets the anticipatory data according to configuration parameter, then performs Step 750, otherwise, performs step 710.
In one embodiment, it is also possible to sort data into result be averaged value calculate, standard Difference calculates or quantile distribution calculates, it is judged that whether the data after calculating meet according to configuration ginseng The anticipatory data distribution of number.
In step 750, send data to database and preserve instruction, in order to database preserves data Classification results.
In one embodiment, after verification and categorization of data result is reasonable, data can be discharged in time The data preserved in cache layer, will not take up room and resource.
If data classification results is unsatisfactory for the anticipatory data distribution according to configuration parameter, then feed back to Step 710, adjusts configuration parameter from newly.If having preset polymerization parameter, sorting parameter And threshold parameter, then reset threshold parameter, until data classification results meets according to polymerization The threshold data distribution of parameter and sorting parameter.If being the most only provided with polymerization parameter and classification Parameter.Then data classified calculating program is through repeatedly computing, until finding suitable threshold parameter, So that data classification results meets the threshold data according to polymerization parameter and sorting parameter and is distributed.
In this embodiment, judge that data classification results is the most reasonable by result verification module, If data classification results is reasonable, then can preserve data classification results, if unreasonable, the most again Adjust parameter and carry out data division.The method uses a flow scheme design that can feed back, Ke Yisui Time adjust parameter, it is ensured that data divide accuracy.
This database data sorter and method can be applied in every field.Such as, application When electricity business field, operation personnel has brand business, shop master or category operation personnel, he The dimension paid close attention to different, they are closed by demand from specific dimension such as brand, category, shops The specific user colony of the heart is finely divided.But when brand, shop even category magnitude the most, The when of thousands of, be manually finely divided in efficiency and speed all can not timely respond to this Demand.
Based on above-mentioned application scenarios, in an application examples of the present invention, can be in advance in parameter Configuration module arranges the polymerization parameter such as brand, category or shop;Order volume, order gold are set The sorting parameters such as volume, commodity amount, rate of gross profit.Such as, polymerization parameter is set and is set to brand (association thinkpad), sorting parameter arranges the order amount of money.Threshold parameter is set to 1000, 5000,10000,20000, i.e. according to the order amount of money respectively at 1000 yuan, 5000 yuan, 10000 Unit, 20000 yuan as the different threshold value criteria for classifying.The false code of common calculation module is as follows Shown in:
Utilize above-mentioned general-purpose computations template, the calculation procedure in data categorization module can be generated such as Under:
Utilize above-mentioned calculation procedure can divide to database carries out concrete data, such as, User interval for the different amount of money is divided.
Expand to other any brands, category or shop the most very easily, operation personnel is closed The commercial family of electricity of the different dimensions of note divides, and quickly grasps the feature of user group, for rear Continuous precision marketing activity is prepared.
If it addition, only arranging polymerization parameter and sorting parameter in advance in configuration parameter module. Such as, arranging polymerization parameter and be set to brand (association thinkpad), sorting parameter is set to The order amount of money.Using polymerization parameter and sorting parameter as the variable of general-purpose computations template, generate number According to classified calculating program.Purchase brand in utilizing data classified calculating routine call database After user data for association thinkpad, find that buying computer maximum dollar amount in user is 10000 yuan, minimum dollar amount is 2000 yuan, uses quartile method, automatically generates threshold value ginseng Number is 4000,6000,8000, i.e. according to buy the amount of money be 2000 yuan~4000 yuan, 4000 Unit~6000 yuan, 6000 yuan~8000 yuan, user is divided by 8000 yuan~10000 yuan.This Field it will be appreciated by the skilled person that in this embodiment application quartile method be only exemplary Illustrate, concrete application can automatically generate threshold parameter according to many algorithms.
It addition, business personnel can contrast, according to the needs of business purpose, the system having divided crowd Meter feature, determines that crowd divides the most suitable;If such as business personnel is intended to utilize " 20-80 " Rule, finds the crowd that spending amount contribution is the highest, and this part population should account for smaller (people The accumulative accounting 20% of number), but consume more (spending amount adds up accounting 80%), it is possible to To judging whether the result that crowd divides meets expection.
If crowd's division result does not meets expection, then automatically threshold value is adjusted, carries out new One iterative computation taken turns.If meeting threshold value, then can preserve crowd's division result, carrying out there is pin Advertising campaign to property.
The present invention can be easily scalable to other any brands, category or shop, meets magnanimity Brand Operation personnel, category operation personnel, the demand of shop operation personnel, make operation personnel fast Speed holds the feature of the user group oneself paid close attention to, and instructs accurate advertising campaign.
Certainly, application examples above simply describes a concrete application of technical solution of the present invention Scene, is not used to limit protection scope of the present invention.Technical scheme can be used Process in various different database datas and in classification environment, the effect of data classification can be improved Rate, has versatility.
So far, the present invention is described in detail.In order to avoid covering the design of the present invention, do not have It is described details more known in the field.Those skilled in the art as described above, Completely it can be appreciated how implement technical scheme disclosed herein.
Method and the device of the present invention may be achieved in many ways.Such as, can be by soft Part, hardware, firmware or software, hardware, any combination of firmware realize the side of the present invention Method and device.For the said sequence of step of described method merely to illustrate, this The step of the method for invention is not limited to order described in detail above, the most especially Explanation.Additionally, in certain embodiments, also can be embodied as the present invention recording at record medium In program, these programs include the machine readable instructions for realizing the method according to the invention. Thus, the present invention also covers the record of the program for performing the method according to the invention that stores and is situated between Matter.
Although some specific embodiments of the present invention being described in detail by example, But it should be appreciated by those skilled in the art, above example is not merely to illustrate, and not It is to limit the scope of the present invention.It should be appreciated by those skilled in the art, can without departing from In the case of scope and spirit of the present invention, above example is modified.The model of the present invention Enclose and be defined by the following claims.

Claims (18)

1. a database data sorter, it is characterised in that including:
Parameter configuration module, is used for arranging configuration parameter, and described configuration parameter includes polymerization parameter And sorting parameter;
General framework module, for generating number based on general-purpose computations template according to described configuration parameter According to sort module, wherein, described general-purpose computations template has the change corresponding with described configuration parameter Amount;
Data categorization module, is carried out according to described configuration parameter for the data called in database Data are classified.
Device the most according to claim 1, it is characterised in that also include:
Result verification module, is distributed checking data for the anticipatory data according to described configuration parameter Classification results.
Device the most according to claim 1, it is characterised in that described configuration parameter is also Including threshold parameter;
Described general framework module is for by described polymerization parameter, described sorting parameter and described threshold Value parameter, as the variable of described general-purpose computations template, generates described data categorization module.
Device the most according to claim 3, it is characterised in that
Described data categorization module is relative with described polymerization parameter in being used for calling described database The data answered, according to described sorting parameter by the data corresponding with described polymerization parameter according to threshold Value parameter is classified.
Device the most according to claim 1, it is characterised in that
Described data classification model is relative with described polymerization parameter in being used for calling described database The data answered, generate threshold according to the data corresponding with described polymerization parameter and described sorting parameter Value parameter, and the data corresponding with described polymerization parameter are classified according to threshold parameter.
Device the most according to claim 2, it is characterised in that
Described result verification module is used for judging whether described data classification results meets according to institute State the anticipatory data distribution of configuration parameter, if described data classification results meets according to described configuration The anticipatory data distribution of parameter, then send data preservation to described database and instruct, otherwise, to Described parameter configuration module sends parameter regulating command.
Device the most according to claim 6, it is characterised in that also include data buffer storage Layer module;
Described data buffer storage layer module is for judging whether described data classification results meets root Described data classification results is stored before being distributed according to the anticipatory data of described configuration parameter.
8. according to the device described in claim 6 or 7, it is characterised in that
Described result verification module calculates for the value that is averaged by described data classification results, mark Quasi-difference calculates and/or quantile distribution calculates, it is judged that whether the data after calculating meet according to described The anticipatory data distribution of configuration parameter.
9. according to the arbitrary described device of claim 1-7, it is characterised in that also include ginseng Number authentication module;
Described Verification module is for verifying the legitimacy of described configuration parameter.
10. a database data sorting technique, it is characterised in that including:
Receiving the configuration parameter of user setup, described configuration parameter includes polymerization parameter and classification ginseng Number;
Data classified calculating program is generated according to described configuration parameter based on general-purpose computations template, its In, described general-purpose computations template has the variable corresponding with described configuration parameter;
By the data in described data classified calculating routine call database according to described configuration Parameter carries out data classification.
11. methods according to claim 10, it is characterised in that also include:
Anticipatory data distribution verification and categorization of data result according to described configuration parameter.
12. methods according to claim 10, it is characterised in that described configuration parameter Also include threshold parameter;
Data classified calculating program is generated according to described configuration parameter based on general-purpose computations template Step includes:
Using described polymerization parameter, described sorting parameter and described threshold parameter as described general meter The variable calculating template generates described data classified calculating program.
13. methods according to claim 12, it is characterised in that by described data Data in classified calculating routine call database carry out data classification according to described configuration parameter Step includes:
By in database described in described data classified calculating routine call with described polymerization parameter Corresponding data;
According to described sorting parameter, the data corresponding with described polymerization parameter are joined according to threshold value Number is classified.
14. methods according to claim 10, it is characterised in that by described data Data in classified calculating routine call database carry out data classification according to described configuration parameter Step includes:
By in database described in described data classified calculating routine call with described polymerization parameter Corresponding data;
Threshold value ginseng is generated according to the data corresponding with described polymerization parameter and described sorting parameter Number, and the data corresponding with described polymerization parameter are classified according to threshold parameter.
15. methods according to claim 11, it is characterised in that according to described configuration The step of the anticipatory data distribution verification and categorization of data result of parameter includes:
Judge whether described data classification results meets the anticipatory data according to described configuration parameter Distribution;
If meeting, then described data classification results is saved in described database, otherwise, from Adapt to adjust described threshold parameter.
16. methods according to claim 15, it is characterised in that judge described data Classification results wraps before whether meeting the step that the anticipatory data according to described configuration parameter is distributed Include:
Described data classification results is saved in data buffer storage layer.
17. according to the method described in claim 15 or 16, it is characterised in that judge described Whether data classification results meets the step of the anticipatory data distribution according to described configuration parameter is also wrapped Include:
The value that is averaged by described data classification results calculates, standard deviation calculates and/or quantile Distribution calculates;
Judge whether the data after calculating meet the anticipatory data according to described configuration parameter and be distributed.
18. according to the arbitrary described method of claim 10-16, it is characterised in that by institute The data stated in data classified calculating routine call database carry out data according to described configuration parameter Also include before the step of classification:
Described configuration parameter is carried out legitimate verification.
CN201610190392.1A 2016-03-30 2016-03-30 Database data classification method and apparatus Pending CN105894019A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610190392.1A CN105894019A (en) 2016-03-30 2016-03-30 Database data classification method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610190392.1A CN105894019A (en) 2016-03-30 2016-03-30 Database data classification method and apparatus

Publications (1)

Publication Number Publication Date
CN105894019A true CN105894019A (en) 2016-08-24

Family

ID=57014583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610190392.1A Pending CN105894019A (en) 2016-03-30 2016-03-30 Database data classification method and apparatus

Country Status (1)

Country Link
CN (1) CN105894019A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108279883A (en) * 2016-12-30 2018-07-13 北京京东尚科信息技术有限公司 A kind of configurable feature calculation method and system
CN110020142A (en) * 2017-11-17 2019-07-16 上海宝信软件股份有限公司 A kind of Fast Classification polymerization and system towards steel electric business integrated retrieval

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040158578A1 (en) * 2002-12-31 2004-08-12 Chung-I Lee System and method for generating structured information reports
CN102567517A (en) * 2011-12-28 2012-07-11 用友软件股份有限公司 Device and method for issuing data of database
CN103544299A (en) * 2013-10-30 2014-01-29 刘峰 Construction method for commercial intelligent cloud computing system
CN104142998A (en) * 2014-08-01 2014-11-12 中国传媒大学 Text classification method
CN104424296A (en) * 2013-09-02 2015-03-18 阿里巴巴集团控股有限公司 Query word classifying method and query word classifying device
CN104715409A (en) * 2015-03-20 2015-06-17 北京京东尚科信息技术有限公司 Method and system for electronic commerce user purchasing power classification
US20160364466A1 (en) * 2015-06-15 2016-12-15 The Medical College Of Wisconsin, Inc. Methods and apparatus for enhanced data storage based on analysis of data type and domain

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040158578A1 (en) * 2002-12-31 2004-08-12 Chung-I Lee System and method for generating structured information reports
CN102567517A (en) * 2011-12-28 2012-07-11 用友软件股份有限公司 Device and method for issuing data of database
CN104424296A (en) * 2013-09-02 2015-03-18 阿里巴巴集团控股有限公司 Query word classifying method and query word classifying device
CN103544299A (en) * 2013-10-30 2014-01-29 刘峰 Construction method for commercial intelligent cloud computing system
CN104142998A (en) * 2014-08-01 2014-11-12 中国传媒大学 Text classification method
CN104715409A (en) * 2015-03-20 2015-06-17 北京京东尚科信息技术有限公司 Method and system for electronic commerce user purchasing power classification
US20160364466A1 (en) * 2015-06-15 2016-12-15 The Medical College Of Wisconsin, Inc. Methods and apparatus for enhanced data storage based on analysis of data type and domain

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108279883A (en) * 2016-12-30 2018-07-13 北京京东尚科信息技术有限公司 A kind of configurable feature calculation method and system
CN108279883B (en) * 2016-12-30 2021-11-26 北京京东尚科信息技术有限公司 Configurable feature calculation method and system
CN110020142A (en) * 2017-11-17 2019-07-16 上海宝信软件股份有限公司 A kind of Fast Classification polymerization and system towards steel electric business integrated retrieval

Similar Documents

Publication Publication Date Title
US11430013B2 (en) Configurable relevance service test platform
US11652628B2 (en) Deterministic verification of digital identity documents
WO2019196579A1 (en) Method and apparatus for issuing smart voucher, and method and apparatus for verification and cancellation by using smart voucher
US20190180255A1 (en) Utilizing machine learning to generate recommendations for a transaction based on loyalty credits and stored-value cards
WO2020238229A1 (en) Transaction feature generation model training method and devices, and transaction feature generation method and devices
WO2018149386A1 (en) Risk management and control method and device
WO2020103560A1 (en) Risk control method and apparatus, and server and storage medium
US11210673B2 (en) Transaction feature generation
CN106952072A (en) A kind of method and system of data processing
CN107122369A (en) A kind of business data processing method, device and system
US10657525B2 (en) Method and apparatus for determining expense category distance between transactions via transaction signatures
CN109344154A (en) Data processing method, device, electronic equipment and storage medium
TWI706348B (en) Method and device for detecting fund transaction path in electronic payment process
US20230274282A1 (en) Transaction tracking and fraud detection using voice and/or video data
US11521207B2 (en) Tokenization request handling at a throttled rate in a payment network
CN105894019A (en) Database data classification method and apparatus
US20240169353A1 (en) Systems and methods for dynamically funding transactions
CN108170404B (en) Web service combination verification method based on parameterized model
CN105512914A (en) Information processing method and electronic device
JP2023526462A (en) Method and apparatus for processing information
WO2019100867A1 (en) Processing method based on resources appreciation objects and resources objects, and apparatus
CN106204177A (en) For opening the method and system of single tax rate VAT invoice without amount of tax to be paid sales slip
CN108268347B (en) Physical equipment performance testing method and device
CN116167852B (en) Method and device for processing fund flow direction data
US20240233010A1 (en) Systems and methods for consolidating accounts

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160824

RJ01 Rejection of invention patent application after publication