WO2017167064A1 - 一种数据管控的方法及系统 - Google Patents
一种数据管控的方法及系统 Download PDFInfo
- Publication number
- WO2017167064A1 WO2017167064A1 PCT/CN2017/077452 CN2017077452W WO2017167064A1 WO 2017167064 A1 WO2017167064 A1 WO 2017167064A1 CN 2017077452 W CN2017077452 W CN 2017077452W WO 2017167064 A1 WO2017167064 A1 WO 2017167064A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature data
- cluster
- interval
- distribution
- highest
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0206—Price or cost determination based on market factors
Definitions
- the present application relates to the field of data processing technologies, and in particular, to a data management method, a data management system, a data layering method for managing data, and a system for layering data of management data.
- the technical problem to be solved by the embodiments of the present application is to provide a data management method and a data layering method for controlling data to better and more reasonably implement data management and control.
- the embodiment of the present application further provides a data management system and a system for layering data of the control data to ensure the implementation and application of the foregoing method.
- the embodiment of the present application discloses a data management method, and the method includes:
- the feature data of the first management object is controlled within a feature data distribution interval of the corresponding cluster object.
- the first management object has a corresponding second management object
- the step of clustering the plurality of first management objects into one or more cluster objects includes:
- the plurality of first management objects are clustered based on the level information and key attributes of the first management object to obtain one or more cluster object.
- the feature data distribution interval includes a left section, a highest section distribution section, and a right section from left to right;
- the step of determining the feature data distribution interval of the one or more cluster objects based on the preset feature data set includes:
- Density at the left end of the highest segment distribution interval based on the highest segment distribution interval The distribution area is divided into corresponding one or more left sections according to a first preset rule; and the density distribution area at the right end of the highest section distribution section is divided into corresponding one or more right parts according to a second preset rule. Interval.
- the density distribution area at the left end of the highest section distribution section is divided into corresponding one or more left section according to a first preset rule based on the highest segment distribution interval; and
- the step of dividing the density distribution area at the right end of the highest segment distribution interval into corresponding one or more right portion intervals according to the second preset rule includes:
- the density distribution area at the left end of the highest segment distribution interval is divided into N segments according to a first preset ratio to obtain corresponding N left portions.
- the interval, and the density distribution area at the right end of the highest segment distribution interval is divided into M segments according to a second preset ratio to obtain corresponding M right segments, where N and M are positive integers;
- the density distribution area at the left end of the highest segment distribution interval is divided into M segments according to a third preset ratio to obtain corresponding M left segments. And dividing the density distribution area at the right end of the highest segment distribution interval into N segments according to a fourth preset ratio to obtain corresponding N right portion intervals, where N and M are positive integers.
- the leftmost section of the left section is used as the smallest feature data section of the cluster object; and
- the rightmost interval in the right section is the largest feature data interval of the cluster object;
- the step of controlling the feature data of the first management object in the feature data distribution interval of the corresponding cluster object includes:
- the feature data minimum value is controlled in a minimum feature data interval of the cluster object corresponding to the first management object, and the feature data is maximized.
- the value is controlled within a maximum feature data interval of the cluster object corresponding to the first governing object.
- the method is applied to an e-commerce platform, wherein the first control object is a commodity object; the cluster object is a commodity cluster; the feature data is a commodity price; and the second control object is a merchant Object.
- the embodiment of the present application further discloses a data management system, and the system includes:
- a clustering module configured to cluster multiple first management objects into one or more cluster objects
- a data distribution determining module configured to determine a feature data distribution interval of the one or more cluster objects based on the preset feature data set
- a data control module configured to control feature data of the first management object in a feature data distribution interval of the corresponding cluster object.
- the first control object has a corresponding second control object
- the clustering module includes:
- An attribute information obtaining submodule configured to acquire attribute information of the first management object and attribute information of the second management object
- a key attribute extraction submodule configured to extract a key attribute from attribute information of the first management object
- a level information obtaining sub-module configured to cluster attribute information of all second control objects to obtain a plurality of level information for the second control object
- the cluster object acquisition submodule is configured to cluster the plurality of first management objects based on the level information and key attributes of the first management object to obtain one or more cluster object.
- the feature data distribution interval includes a left section, a highest section distribution section, and a right section from left to right;
- the data distribution determining module includes:
- a density distribution estimation submodule configured to separately estimate a density distribution of the feature data of the cluster object based on the preset feature data set
- a first section acquiring a sub-module, configured to use a highest point of the density distribution as a midpoint, and a range of a first preset threshold value as an endpoint to form a highest segment distribution interval;
- a second interval obtaining submodule configured to divide, according to the highest segment distribution interval, a density distribution region at a left end of the highest segment distribution interval into corresponding one or more left segments according to a first preset rule; And dividing the density distribution area at the right end of the highest segment distribution interval into corresponding one or more right portion intervals according to a second preset rule.
- the second interval acquisition submodule is further configured to:
- the density distribution area at the left end of the highest segment distribution interval is divided into N segments according to a first preset ratio to obtain corresponding N left portions.
- the interval, and the density distribution area at the right end of the highest segment distribution interval is divided into M segments according to a second preset ratio to obtain corresponding M right segments, where N and M are positive integers;
- the density distribution area at the left end of the highest segment distribution interval is divided into M segments according to a third preset ratio to obtain corresponding M left segments. And dividing the density distribution area at the right end of the highest segment distribution interval into N segments according to a fourth preset ratio to obtain corresponding N right portion intervals, where N and M are positive integers.
- the left section is The leftmost interval in the interval is the smallest feature data interval of the cluster object; and the rightmost interval in the right segment is used as the largest feature data interval of the cluster object;
- the data control module is further configured to:
- the feature data minimum value is controlled in a minimum feature data interval of the cluster object corresponding to the first management object, and the feature data is maximized.
- the value is controlled within a maximum feature data interval of the cluster object corresponding to the first governing object.
- the system is applied to an e-commerce platform, wherein the first control object is a commodity object; the cluster object is a commodity cluster; the feature data is a commodity price; and the second control object is a merchant Object.
- the embodiment of the present application further discloses a method for layering data of management data, where the method includes:
- the embodiment of the present application further discloses a system for layering data of management data, wherein the system includes:
- a clustering module configured to cluster multiple first management objects into one or more cluster objects
- a data distribution determining module configured to determine a feature data distribution interval of the one or more cluster objects based on the preset feature data set.
- the embodiments of the present application include the following advantages:
- the first control object is clustered to obtain one or more classes.
- the feature data distribution interval of each cluster object is estimated based on the feature data set, and the feature data of the first control object is controlled in the feature data distribution interval of the corresponding cluster object.
- the cluster object is a dimension, and the feature data set in the integrated control platform is used to formulate a reasonable feature data distribution interval to achieve the purpose of feature data layering, and provides data reference support for the feature data setting of the first control object.
- the setting of the feature data of the first control object is controlled within a reasonable range to prevent adverse effects caused by setting the feature data too high or too low.
- FIG. 1 is a flow chart showing the steps of a first embodiment of a data management method according to the present application
- FIG. 2 is a flow chart of steps of a second embodiment of a data management method according to the present application.
- FIG. 3 is a schematic diagram of a price interval in a second embodiment of a data management method of the present application.
- FIG. 4 is a structural block diagram of an embodiment of a data management system of the present application.
- FIG. 5 is a flow chart of steps of an embodiment of a method for data layering of management data according to the present application
- FIG. 6 is a structural block diagram of a system embodiment for performing data layering on management data according to the present application.
- FIG. 1 a flow chart of a first embodiment of a method for data management of the present application is shown.
- the method may include the following steps:
- Step 101 Cluster multiple first control objects into one or more cluster objects
- Step 102 Determine, according to a preset feature data set, a feature data distribution interval of the one or more cluster objects;
- Step 103 Control feature data of the first management object in a feature data distribution interval of the corresponding cluster object.
- the feature data distribution interval of each cluster object may be estimated based on the feature data set, and the first control object is The feature data is controlled in the feature data distribution interval of the corresponding cluster object.
- the cluster data object is used as a dimension, and the feature data set in the integrated control platform is used to formulate a reasonable feature data distribution interval to achieve feature data layering.
- the purpose is to provide data reference support for setting the feature data of the first control object, so that the setting of the feature data of the first control object is controlled within a reasonable range, and the feature data is prevented from being set too high or too low. Bad effects.
- FIG. 2 a flow chart of the steps of the second embodiment of the data management method of the present application is shown, which may include the following steps:
- Step 201 Obtain attribute information of the first control object and attribute information of the second control object.
- the management object is a data processing object in the management platform.
- the control object in the embodiment of the present application may include at least a first control object and a second control object, wherein the first control object and the second control object may be multiple.
- the second control object may manage the first control object, and the second control object sets the feature data for the first control object.
- the management platform is an e-commerce platform
- the first management object may be a commodity object
- the second management object may be a merchant object
- the feature data may be a commodity price
- the merchant object may manage the commodity object
- the commodity object is set. Commodity price, etc.
- the first control object has a corresponding first database, and the second control pair
- the first database stores the attribute information of the plurality of first management objects
- the second database stores the attribute information of the plurality of second management objects, so that the first database can be extracted from the first database.
- the first control object is a commodity object and the second management object is a merchant object
- the first database is a commodity database
- the second database may be a merchant database
- the item attribute of the item object can be extracted from the item database.
- the item attribute of an item can include the material, size, style, brand, and the like of the item.
- the merchant attribute of the merchant object may be extracted from the merchant database.
- the merchant attribute is an attribute related to the operation capability of the merchant.
- the merchant attribute of a merchant object may include the traffic of the merchant store, the transaction, the marketing rate, the customer unit price, The number of online products, inventory, store type, opening time, etc., are shown in Table 1 below:
- the attribute information of the first control object and the attribute information of the second control object in the management platform are comprehensively considered, so that the data source is more abundant.
- Step 202 Extract key attributes from attribute information of the first control object.
- the first control object has corresponding feature data, and the first control object may have different feature data in different periods.
- Key attributes refer to attribute information that has a large impact on feature data.
- the correlation between each attribute information of the first management object and the feature data may be calculated, and the attribute information corresponding to the plurality of correlations before the ranking is used as the key attribute of the first management object according to the correlation order.
- the feature data can be a commodity price.
- the correlation between each attribute information and the product price can be tapped, thereby mining the key attributes determining the price of the product, for example, a women's bag, the material of the bag, Size, style, brand related, it can be said that these four attributes can basically determine the price of a package, therefore, these four attributes are the key attributes of the women's bag.
- Step 203 Perform clustering on attribute information of all second control objects to obtain multiple level information for the second control object.
- the second control object After obtaining the attribute information of all the second control objects in the control platform, the second control object may be clustered by using clustering methods such as kmeans based on the attribute information of all the second control objects, thereby obtaining the second control object. Multiple level information.
- the level information of the second management object may include a first level, a second level, a third level, and the like, wherein the level of the first level is higher than the second level, and the second level The level is higher than the third level, and so on.
- a layered model of the merchant operation capability can be established, according to which the merchant can be divided into four levels, including: top seller, waist seller, small seller, long-term no Traffic seller.
- Step 204 Perform clustering of the plurality of first management objects based on the level information and key attributes of the first management object to obtain one or more cluster objects;
- the first clustering algorithm may be used to adopt the first clustering algorithm according to the level information of the second control object and the key attribute of the first control object.
- the managed objects are clustered to get one or more cluster objects.
- the level of the second control object corresponding to the first management object may be the same, and the first control object with the same key attribute of the first management object is classified into one class.
- a commodity having the same operational capability and the same key attributes of the commodity can be clustered into one commodity cluster.
- the key attributes are material, size, and style. Then you can use the same material, size, style, and the same operational ability of the merchant's goods as a product cluster, such as all KA seller's shop, the first layer of cowhide, large All the items in the locomotive bag are a commodity cluster.
- Step 205 Determine, according to the preset feature data set, a feature data distribution interval of the one or more cluster objects;
- the feature data distribution interval of each cluster object may be obtained based on a preset feature data set, where each cluster object may have multiple feature data distribution intervals, for example, a cluster object.
- the feature data distribution interval may include a left section, a highest section distribution section, and a right section from left to right.
- step 205 may include the following sub-steps:
- Sub-step S11 estimating a density distribution of the feature data of the cluster-like object based on the preset feature data set
- the preset feature data set may include feature data of all the first control objects in the management platform.
- the feature data set may further include transaction data of the commodity, and the price density distribution of each commodity cluster may be estimated by using the feature data set of the commodity as a weight.
- a preset feature data set may be used as a sample point set, and a kernel density estimation (KDE) algorithm is used to estimate a density distribution curve of the feature data of each cluster object.
- KDE kernel density estimation
- Kernel density estimation is a density function used to estimate unknowns in probability theory. It belongs to one of the nonparametric test methods. The principle is: in the case of probability distribution of a certain thing, if a certain number appears in the observation, It can be considered that the probability density of this number is relatively large, and the probability density of the number closer to this number will be larger, and the probability density of those numbers will be smaller.
- Sub-step S12 taking the highest point of the density distribution as the midpoint, and taking the range of the first preset threshold as the end point to form the highest segment distribution interval;
- the highest point of the density distribution curve can be determined, and the highest point is taken as the midpoint, and the range of the first preset threshold is taken as the end point to form the highest segment distribution interval, for example, the highest point is medium. Point, 15% of each of the left and right ends are 30% of the interval, which is the highest segment distribution interval [a, b).
- Sub-step S13 the density distribution area at the left end of the highest-end distribution section is divided into corresponding one or more left-section sections according to the first preset rule, with reference to the highest-segment distribution section; and the highest The density distribution area at the right end of the segment distribution interval is divided into corresponding one or more right sections according to a second preset rule.
- the distribution region at the left and right ends of the highest segment distribution interval in the density distribution curve may be divided according to the highest segment distribution interval, and the corresponding left segment and right segment may be obtained, wherein There may be one or more sections, and one or more sections of the right.
- the feature data distribution section of the cluster object can be obtained.
- the sub-step S13 may further include the following sub-steps:
- Sub-step S131 determining a quantile of the highest point
- Sub-step S132 obtaining a median of the feature data in the cluster object
- a cluster object may include multiple first control objects, and each first control object has one or more feature data, and all feature data of all first control objects in the cluster object may be characterized.
- the data queue, and the median of the feature data queue is obtained as the median of the feature data of the cluster object.
- Sub-step S133 it is determined whether the quantile of the highest point is less than or equal to the median, and if so, sub-step S134 is performed, and if not, sub-step S135 is performed.
- sub-step S134 After obtaining the quantile of the highest point and the median of the feature data of the cluster object, the two can be compared to determine whether the quantile is less than or equal to the median, and if so, sub-step S134 is performed, Otherwise, sub-step S135 is performed.
- Sub-step S134 dividing the density distribution area at the left end of the highest-end distribution interval into N segments according to a first preset ratio to obtain corresponding N left-section intervals, and density distribution at the right end of the highest-end distribution interval The area is divided into M segments according to a second preset ratio to obtain corresponding M right segments;
- the density distribution area at the left end of the highest segment distribution interval is divided into N segments according to the first preset ratio, and the corresponding N quantiles are obtained.
- the N left quantiles and the left end point of the highest segment distribution interval are the end points of the interval, and form N left sections.
- the highest segment distribution interval is [a, b)
- the density distribution region at the left end of the highest segment distribution interval is divided into two segments according to the first preset ratio
- the corresponding two quantiles are p0, p1, corresponding to The two left sections are [p0, p1), [p1, a).
- the density distribution area on the right side of the highest segment distribution interval is divided into M segments according to a second preset ratio, and corresponding M quantiles are obtained, and the right end point of the highest segment distribution interval and the M quantiles are interval endpoints. , form M right section.
- the highest segment distribution interval is [a, b)
- the density distribution region at the right end of the highest segment distribution interval is divided into three segments according to a second predetermined ratio, and the corresponding three quantiles are obtained as p4, p5, p6,
- the corresponding right interval is [b, p4), [p4, p5), [p5, p6).
- six intervals of the entire density distribution curve are obtained, which are [p0, p1), [p1, a), [a, b), [b, p4), [p4, p5), [p5, p6).
- a certain product cluster is estimated by Gaussian kernel density according to the price, and a density distribution curve in which the commodity price is the abscissa and the transaction ratio is the ordinate is obtained. Then take the highest point of the density distribution curve as the midpoint, and take 15% of the left and right points as the endpoint to get the 30% price segment as the highest price segment, mark it as [a, b); then judge the gradation of the highest point. Whether the number is less than or equal to the median of the nuclear density distribution curve, and if the quantile is less than or equal to the median, the density at the left end of the highest price segment distribution interval may be divided.
- the cloth area is divided into two segments according to the ratio of 1/5, 4/5, and the corresponding quantile is p0, p1; and the density distribution area at the right end of the highest price segment distribution interval is proportional to 5/9, 3/9, 1 /9 is divided into three segments, and the corresponding quantiles p4, p5, and p6 are obtained.
- the entire density distribution curve can be divided into six price segments, which are [p0, p1), [p1, a), [a, b), [b, p4), [p4, p5), [p5, p6), thereby achieving price stratification of the commodity cluster.
- Sub-step S135 dividing the density distribution area at the left end of the highest-end distribution interval into M segments according to a third preset ratio, to obtain corresponding M left-section intervals, and density distribution at the right end of the highest-end distribution interval
- the area is divided into N segments according to a fourth preset ratio to obtain corresponding N right segments.
- the density distribution area at the left end of the highest segment distribution interval may be divided into M segments according to a third preset ratio, and corresponding M quantiles are obtained, and the M numbers are obtained.
- the left endpoint of the quantile and the highest segment distribution interval is the endpoint of the interval, and constitutes the M left segment.
- the highest segment distribution interval is [a, b)
- the density distribution region at the left end of the highest segment distribution interval is divided into three segments according to a third predetermined ratio, and the corresponding three quantiles are P0, P1, P2, Then the corresponding three left sections are [P0, P1), [P1, P2), [P2, a).
- the highest segment distribution interval is [a, b)
- the density distribution region at the right end of the highest segment distribution interval is divided into two segments according to the fourth predetermined ratio, and the corresponding two quantiles are P4, P5, corresponding to The right part of the interval is [b, P4), [P4, P5).
- six intervals of the entire density distribution curve are obtained, which are [P0, P1), [P1, P2), [P2, a), [a, b), [b, P4), [P4, P5).
- a certain product cluster is estimated by Gaussian kernel density according to the price, and a density distribution curve with the price as the abscissa and the transaction ratio as the ordinate is obtained. Then take the highest point of the density distribution curve as the midpoint, and take 15% of the left and right points as the end point to get the price range of 30% to the highest price.
- the segment is marked as [a, b); then it is determined whether the quantile of the highest point is greater than the median of the nuclear density distribution curve, and if the quantile is greater than the median, the highest price segment may be distributed
- the density distribution area at the left end of the interval is divided into three segments according to the ratios 1/9, 3/9, 5/9, and the corresponding quantiles are P0, P1, P2; and the density distribution region at the right end of the highest price segment distribution interval is The ratio 1/5, 4/5 is divided into two segments, and the corresponding quantile P4, P5 is obtained.
- the whole density distribution curve can be divided into 6 price segments, which are [P0, P1), [P1, P2), [P2 , a), [a, b), [b, P4), [P4, P5).
- the first management object in the management platform and the attribute information of the second management object are comprehensively considered to perform clustering of the first management object, and combined with the feature data set in the management platform to formulate reasonable characteristic data. Distribution interval, to achieve the purpose of feature data layering.
- Step 206 Control feature data of the first control object in a feature data distribution interval of the corresponding cluster object.
- the leftmost section of the left section may be the smallest feature data section of the cluster object.
- the rightmost section of the right section may be used as the largest feature data section of the cluster object, for example, if the feature data distribution interval of a certain cluster object is [p0, P1), [p1, a), [a, b), [b, p4), [p4, p5), [p5, p6), the smallest feature data interval of the cluster object is [p0, p1) The largest feature data interval is [p5, p6).
- the cluster object to which the first control object belongs may be first determined.
- the similarity algorithm may be used to calculate the first control. The similarity between the object and each cluster object, and the cluster object with the similarity less than the preset value is used as the cluster object corresponding to the first management object.
- the feature data distribution interval of the cluster object corresponding to the first management object may be supported by the data reference, and the feature data is minimized.
- the value is controlled in a minimum feature data interval of the cluster object corresponding to the first control object, and the maximum value of the feature data is controlled in a maximum feature data interval of the cluster object corresponding to the first control object, thereby completing the pair
- the control of the first control object ensures the rationality of the feature data setting.
- the leftmost interval is the low price range, that is, the lowest price line is set, which prevents the underpricing from disrupting the normal platform competition, reduces the phenomenon of counterfeit goods, and improves the e-commerce platform.
- the overall image the implementation of the embodiment of the present application on the e-commerce platform can achieve the following beneficial effects:
- Reasonable price range It is required to establish a reasonable price range for goods of the same category attribute, to avoid the adverse effects caused by excessive or low price, and to provide data reference support for the pricing strategies of brands and sellers.
- the feature data set may be used as a sample point set, and the density distribution of each cluster object is estimated, and each class is determined according to the density distribution.
- the cluster object determines a reasonable feature data distribution interval, and provides data reference support for setting the feature data of the first control object to prevent adverse effects caused by setting the feature data too high or too low.
- the system may include the following modules:
- the clustering module 401 is configured to cluster the plurality of first management objects into one or more cluster objects;
- the data distribution determining module 402 is configured to determine, according to the preset feature data set, a feature data distribution interval of the one or more cluster objects;
- the data control module 403 is configured to control feature data of the first management object in a feature data distribution interval of the corresponding cluster object.
- the first management object has a corresponding second management object
- the clustering module 401 may include the following sub-modules:
- An attribute information obtaining submodule configured to acquire attribute information of the first management object and attribute information of the second management object
- a key attribute extraction submodule configured to extract a key attribute from attribute information of the first management object
- a level information obtaining sub-module configured to cluster attribute information of all second control objects to obtain a plurality of level information for the second control object
- the cluster object acquisition submodule is configured to cluster the plurality of first management objects based on the level information and key attributes of the first management object to obtain one or more cluster object.
- the feature data distribution interval includes a left section, a highest section distribution section, and a right section from left to right;
- the data distribution determining module 402 can include the following sub-modules:
- a density distribution estimation submodule configured to separately estimate a density distribution of the feature data of the cluster object based on the preset feature data set
- a first interval acquisition submodule configured to use the highest point of the density distribution as a midpoint, Taking a range of the first preset threshold as an endpoint to form a highest segment distribution interval;
- a second interval obtaining submodule configured to divide, according to the highest segment distribution interval, a density distribution region at a left end of the highest segment distribution interval into corresponding one or more left segments according to a first preset rule; And dividing the density distribution area at the right end of the highest segment distribution interval into corresponding one or more right portion intervals according to a second preset rule.
- the second interval obtaining submodule is further configured to:
- the density distribution area at the left end of the highest segment distribution interval is divided into N segments according to a first preset ratio to obtain corresponding N left portions.
- the interval, and the density distribution area at the right end of the highest segment distribution interval is divided into M segments according to a second preset ratio to obtain corresponding M right segments, where N and M are positive integers;
- the density distribution area at the left end of the highest segment distribution interval is divided into M segments according to a third preset ratio to obtain corresponding M left segments. And dividing the density distribution area at the right end of the highest segment distribution interval into N segments according to a fourth preset ratio to obtain corresponding N right portion intervals, where N and M are positive integers.
- the leftmost section of the left section is used as the minimum of the cluster object.
- the data control module 403 is further configured to:
- the system is applied to an e-commerce platform, wherein the first control object is a commodity object; the cluster object is a commodity cluster; and the feature data is a commodity. Price; the second control object is a merchant object.
- FIG. 5 a flow chart of steps of an embodiment of a method for data layering of management data according to the present application is shown, which may include the following steps:
- Step 501 Cluster multiple first control objects into one or more cluster objects
- Step 502 Determine, according to the preset feature data set, a feature data distribution interval of the one or more cluster objects.
- the embodiment of the present invention may further include the following steps:
- the feature data of the first management object is controlled within a feature data distribution interval of the corresponding cluster object.
- the first management object has a corresponding second control object
- the step 501 may further include:
- the plurality of the plurality of attributes based on the level information and the first management object The first governing object is clustered to obtain one or more cluster objects.
- the feature data distribution interval includes a left section, a highest section distribution section, and a right section from left to right;
- the step 502 further includes:
- the density distribution area at the left end of the highest segment distribution interval is divided into corresponding one or more according to a first preset rule by using the highest segment distribution interval as a reference.
- the step of dividing the density distribution area at the right end of the highest section distribution section into the corresponding one or more right sections according to the second preset rule includes:
- the density distribution area at the left end of the highest segment distribution interval is divided into N segments according to a first preset ratio to obtain corresponding N left portions.
- the interval, and the density distribution area at the right end of the highest segment distribution interval is divided into M segments according to a second preset ratio to obtain corresponding M right segments, where N and M are positive integers;
- the highest segment distribution interval The density distribution area at the left end is divided into M segments according to a third preset ratio to obtain corresponding M left portion intervals, and the density distribution region at the right end of the highest segment distribution interval is divided into N segments according to a fourth preset ratio. To obtain corresponding N right-section intervals, where N and M are positive integers.
- the leftmost section of the left section is used as the minimum of the cluster object.
- the step of controlling the feature data of the first management object in the feature data distribution interval of the corresponding cluster object includes:
- the feature data minimum value is controlled in a minimum feature data interval of the cluster object corresponding to the first management object, and the feature data is maximized.
- the value is controlled within a maximum feature data interval of the cluster object corresponding to the first governing object.
- the method is applied to an e-commerce platform, wherein the first management object is a commodity object; the cluster object is a commodity cluster; and the feature data is a commodity. Price; the second control object is a merchant object.
- FIG. 6 a structural block diagram of a system embodiment for data layering of management data is shown in the present application, and the system may include the following modules:
- the clustering module 601 is configured to cluster the plurality of first management objects into one or more cluster objects;
- the data distribution determining module 602 is configured to determine a feature data distribution interval of the one or more cluster objects based on the preset feature data set.
- system may further include the following modules:
- a data control module configured to control feature data of the first management object in a feature data distribution interval of the corresponding cluster object.
- the first management object has a corresponding second management object
- the clustering module 601 may include the following sub-modules:
- An attribute information obtaining submodule configured to acquire attribute information of the first management object and attribute information of the second management object
- a key attribute extraction submodule configured to extract a key attribute from attribute information of the first management object
- a level information obtaining sub-module configured to cluster attribute information of all second control objects to obtain a plurality of level information for the second control object
- the cluster object acquisition submodule is configured to cluster the plurality of first management objects based on the level information and key attributes of the first management object to obtain one or more cluster object.
- the feature data distribution interval includes a left section, a highest section distribution section, and a right section from left to right;
- the data distribution determining module 602 can include the following sub-modules:
- a density distribution estimation submodule configured to separately estimate a density distribution of the feature data of the cluster object based on the preset feature data set
- a first section acquiring a sub-module, configured to use a highest point of the density distribution as a midpoint, and a range of a first preset threshold value as an endpoint to form a highest segment distribution interval;
- a second interval obtaining submodule configured to divide, according to the highest segment distribution interval, a density distribution region at a left end of the highest segment distribution interval into corresponding one or more left segments according to a first preset rule; a density distribution region at the right end of the highest segment distribution interval The domain is divided into corresponding one or more right sections according to a second preset rule.
- the second interval obtaining submodule is further configured to:
- the density distribution area at the left end of the highest segment distribution interval is divided into N segments according to a first preset ratio to obtain corresponding N left portions.
- the interval, and the density distribution area at the right end of the highest segment distribution interval is divided into M segments according to a second preset ratio to obtain corresponding M right segments, where N and M are positive integers;
- the density distribution area at the left end of the highest segment distribution interval is divided into M segments according to a third preset ratio to obtain corresponding M left segments. And dividing the density distribution area at the right end of the highest segment distribution interval into N segments according to a fourth preset ratio to obtain corresponding N right portion intervals, where N and M are positive integers.
- the leftmost section of the left section is used as the minimum of the cluster object.
- the data control module is further configured to:
- the feature data minimum value is controlled in a minimum feature data interval of the cluster object corresponding to the first management object, and the feature data is maximized.
- the value is controlled within a maximum feature data interval of the cluster object corresponding to the first governing object.
- the system is applied to an e-commerce platform
- the first control object is a commodity object
- the cluster object is a commodity cluster
- the feature data is a commodity price
- the second control object is a merchant object.
- embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
- computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
- Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
- These computer program operating instructions can be provided to a general purpose computer, a special purpose computer, an embedded processor, or other programmable number Processing a processor of the terminal device to generate a machine such that operational instructions executed by a processor of a computer or other programmable data processing terminal device are generated for implementing a block in a flow or a flow and/or block diagram of the flowchart Or a device with a function specified in multiple boxes.
- the computer program operating instructions may also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that operational instructions stored in the computer readable memory produce manufacturing including the operational command device
- the operation instruction means implements the functions specified in one block or a plurality of blocks of a flow or a flow and/or a block diagram of the flowchart.
- These computer program operating instructions can also be loaded onto a computer or other programmable data processing terminal device such that a series of operational steps are performed on the computer or other programmable terminal device to produce computer-implemented processing, such that the computer or other programmable terminal
- the operational instructions executed on the device provide steps for implementing the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.
Landscapes
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Data Mining & Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本申请实施例提供了一种数据管控的方法及系统,其中所述数据管控的方法包括:将多个第一管控对象聚类成一个或以上的类簇对象;基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间;将所述第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内。本申请实施例可以以类簇对象为维度,综合管控平台中的特征数据集合,制定出合理的特征数据分布区间,达到特征数据分层的目的,为第一管控对象的特征数据的设定提供了数据参考支持,使得第一管控对象的特征数据的设定控制在合理的范围内,防止特征数据设定过高或过低带来的不良影响。
Description
本申请涉及数据处理技术领域,特别是涉及一种数据管控的方法、、一种数据管控的系统、一种对管控数据进行数据分层的方法以及一种对管控数据进行数据分层的系统。
随着信息技术的发展,一个数据处理平台需要处理的数据往往是海量的,对数据的管控也提出了更高要求。
例如,在电商平台中,越来越多的消费者通过电商平台在网上购买商品。平台给消费者展现的商品价格是否合理逐渐转化为一个必须考虑的问题。定价过低的商品,由于高转化率、高销量从而获得较高的流量,但是容易带来恶意竞争,或者一些不良商家在平台上售卖假货,因为假货商品往往低价,这将对平台的整体形象带来恶劣的影响。但关于平台上商品的整体价格管控策略方案还没有。
因此,目前需要本领域技术人员迫切解决的一个技术问题就是:提出一种基于平台的数据管控机制,用以更好更合理地实行数据管控。
发明内容
本申请实施例所要解决的技术问题是提供一种数据管控的方法以及一种对管控数据进行数据分层的方法,用以更好更合理地实行数据管控。
相应的,本申请实施例还提供了一种数据管控的系统以及一种对管控数据进行数据分层的系统,用以保证上述方法的实现及应用。
为了解决上述问题,本申请实施例公开了一种数据管控的方法,所述方法包括:
将多个第一管控对象聚类成一个或以上的类簇对象;
基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间;
将所述第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内。
优选地,所述第一管控对象具有对应的第二管控对象,所述将多个第一管控对象聚类成一个或以上的类簇对象的步骤包括:
获取所述第一管控对象的属性信息以及所述第二管控对象的属性信息;
从所述第一管控对象的属性信息中提取出关键属性;
对所有的第二管控对象的属性信息进行聚类,以获得针对所述第二管控对象的多个等级信息;
基于所述等级信息以及所述第一管控对象的关键属性,将所述多个第一管控对象进行聚类,以得到一个或以上的类簇对象。
优选地,所述特征数据分布区间从左到右包括左部区间、最高段分布区间以及右部区间;
所述基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间的步骤包括:
基于预设的特征数据集合,分别估计所述类簇对象的特征数据的密度分布;
以所述密度分布的最高点为中点,左右分别取第一预设阈值的范围作为端点,以组成最高段分布区间;
以所述最高段分布区间为基准,将所述最高段分布区间左端的密度
分布区域按照第一预设规则划分成对应的一个或多个左部区间;以及,将所述最高段分布区间右端的密度分布区域按照第二预设规则划分成对应的一个或多个右部区间。
优选地,所述以所述最高段分布区间为基准,将所述最高段分布区间左端的密度分布区域按照第一预设规则划分成对应的一个或多个左部区间;以及,将所述最高段分布区间右端的密度分布区域按照第二预设规则划分成对应的一个或多个右部区间的步骤包括:
确定所述最高点的分位数;
获取所述类簇对象中的特征数据的中位数;
若所述最高点的分位数小于或等于所述中位数,则将所述最高段分布区间左端的密度分布区域按照第一预设比例划分成N段,以得到对应的N个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第二预设比例划分成M段,以得到对应的M个右部区间,其中,N、M为正整数;
若所述最高点的分位数大于所述中位数,则将所述最高段分布区间左端的密度分布区域按照第三预设比例划分成M段,以得到对应的M个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第四预设比例划分成N段,以得到对应的N个右部区间,其中,N、M为正整数。
优选地,若所述左部区间及所述右部区间均有多个,则将所述左部区间中最左端的区间作为所述类簇对象的最小的特征数据区间;以及,将所述右部区间中最右端的区间作为所述类簇对象的最大的特征数据区间;
所述将所述第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内的步骤包括:
在为所述第一管控对象设定特征数据时,将所述特征数据最小值控制在所述第一管控对象对应的类簇对象的最小的特征数据区间内,以及,将所述特征数据最大值控制在所述第一管控对象对应的类簇对象的最大的特征数据区间内。
优选地,所述方法应用于电商平台中,则所述第一管控对象为商品对象;所述类簇对象为商品类簇;所述特征数据为商品价格;所述第二管控对象为商家对象。
本申请实施例还公开了一种数据管控的系统,所述系统包括:
聚类模块,用于将多个第一管控对象聚类成一个或以上的类簇对象;
数据分布确定模块,用于基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间;
数据控制模块,用于将所述第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内。
优选地,所述第一管控对象具有对应的第二管控对象,所述聚类模块包括:
属性信息获取子模块,用于获取所述第一管控对象的属性信息以及所述第二管控对象的属性信息;
关键属性提取子模块,用于从所述第一管控对象的属性信息中提取出关键属性;
等级信息获取子模块,用于对所有的第二管控对象的属性信息进行聚类,以获得针对所述第二管控对象的多个等级信息;
类簇对象获取子模块,用于基于所述等级信息以及所述第一管控对象的关键属性,将所述多个第一管控对象进行聚类,以得到一个或以上的类簇对象。
优选地,所述特征数据分布区间从左到右包括左部区间、最高段分布区间以及右部区间;
所述数据分布确定模块包括:
密度分布估计子模块,用于基于预设的特征数据集合,分别估计所述类簇对象的特征数据的密度分布;
第一区间获取子模块,用于以所述密度分布的最高点为中点,左右分别取第一预设阈值的范围作为端点,以组成最高段分布区间;
第二区间获取子模块,用于以所述最高段分布区间为基准,将所述最高段分布区间左端的密度分布区域按照第一预设规则划分成对应的一个或多个左部区间;以及,将所述最高段分布区间右端的密度分布区域按照第二预设规则划分成对应的一个或多个右部区间。
优选地,所述第二区间获取子模块还用于:
确定所述最高点的分位数;
获取所述类簇对象中的特征数据的中位数;
若所述最高点的分位数小于或等于所述中位数,则将所述最高段分布区间左端的密度分布区域按照第一预设比例划分成N段,以得到对应的N个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第二预设比例划分成M段,以得到对应的M个右部区间,其中,N、M为正整数;
若所述最高点的分位数大于所述中位数,则将所述最高段分布区间左端的密度分布区域按照第三预设比例划分成M段,以得到对应的M个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第四预设比例划分成N段,以得到对应的N个右部区间,其中,N、M为正整数。
优选地,若所述左部区间及所述右部区间均有多个,则将所述左部
区间中最左端的区间作为所述类簇对象的最小的特征数据区间;以及,将所述右部区间中最右端的区间作为所述类簇对象的最大的特征数据区间;
所述数据控制模块还用于:
在为所述第一管控对象设定特征数据时,将所述特征数据最小值控制在所述第一管控对象对应的类簇对象的最小的特征数据区间内,以及,将所述特征数据最大值控制在所述第一管控对象对应的类簇对象的最大的特征数据区间内。
优选地,所述系统应用于电商平台中,则所述第一管控对象为商品对象;所述类簇对象为商品类簇;所述特征数据为商品价格;所述第二管控对象为商家对象。
本申请实施例还公开了一种对管控数据进行数据分层的方法,所述方法包括:
将多个第一管控对象聚类成一个或以上的类簇对象;
基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间。
本申请实施例还公开了一种对管控数据进行数据分层的系统,其特征在于,所述系统包括:
聚类模块,用于将多个第一管控对象聚类成一个或以上的类簇对象;
数据分布确定模块,用于基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间。
与背景技术相比,本申请实施例包括以下优点:
在本申请实施例中,将第一管控对象进行聚类得到一个或以上的类
簇对象以后,可以基于特征数据集合,估计每个类簇对象的特征数据分布区间,并将第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内,本申请实施例以类簇对象为维度,综合管控平台中的特征数据集合,制定出合理的特征数据分布区间,达到特征数据分层的目的,为第一管控对象的特征数据的设定提供了数据参考支持,使得第一管控对象的特征数据的设定控制在合理的范围内,防止特征数据设定过高或过低带来的不良影响。
图1是本申请的一种数据管控的方法实施例一的步骤流程图;
图2是本申请的一种数据管控的方法实施例二的步骤流程图;
图3是本申请的一种数据管控的方法实施例二中的价格区间示意图;
图4是本申请的一种数据管控的系统实施例的结构框图;
图5是本申请的一种对管控数据进行数据分层的方法实施例的步骤流程图;
图6是本申请一种对管控数据进行数据分层的系统实施例的结构框图。
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
参照图1,示出了本申请的一种数据管控的方法实施例一的步骤流程图,所述方法可以包括如下步骤:
步骤101,将多个第一管控对象聚类成一个或以上的类簇对象;
步骤102,基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间;
步骤103,将所述第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内。
在本申请实施例中,将第一管控对象进行聚类得到一个或以上的类簇对象以后,可以基于特征数据集合,估计每个类簇对象的特征数据分布区间,并将第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内,本申请实施例以类簇对象为维度,综合管控平台中的特征数据集合,制定出合理的特征数据分布区间,达到特征数据分层的目的,为第一管控对象的特征数据的设定提供了数据参考支持,使得第一管控对象的特征数据的设定控制在合理的范围内,防止特征数据设定过高或过低带来的不良影响。
参照图2,示出了本申请的一种数据管控的方法实施例二的步骤流程图,可以包括如下步骤:
步骤201,获取第一管控对象的属性信息以及第二管控对象的属性信息;
在具体实现中,管控对象为管控平台中的数据处理对象。本申请实施例中的管控对象至少可以包括第一管控对象以及第二管控对象,其中,第一管控对象以及第二管控对象可以为多个。
第二管控对象可以对第一管控对象进行管理,包括第二管控对象为第一管控对象设定特征数据等。
例如,若管控平台为电商平台,则第一管控对象可以为商品对象,第二管控对象可以为商家对象,特征数据可以为商品价格,商家对象可以对商品对象进行管理,设定商品对象的商品价格等。
在管控平台中,第一管控对象具有对应的第一数据库,第二管控对
象具有对应的第二数据库,该第一数据库存储有多个第一管控对象的属性信息,该第二数据库存储有多个第二管控对象的属性信息,因此,可以从第一数据库中提取第一管控对象的属性信息,以及,从第二数据库中提取第二管控对象的属性信息。
例如,在电商平台中,若第一管控对象为商品对象,第二管控对象为商家对象,则第一数据库为商品数据库,第二数据库可以为商家数据库。
可以从商品数据库中提取商品对象的商品属性,例如,一个商品的商品属性可以包括商品的材质、大小、款式、品牌等。
可以从商家数据库中提取商家对象的商家属性,在实际中,该商家属性为商家运营能力相关的属性,例如,一个商家对象的商家属性可以包括商家店铺的流量、成交、动销率、客单价、在线商品数,库存、店铺类型、开店时长等,如下表1所示:
表1
在本申请实施例中,综合考虑管控平台中的第一管控对象的属性信息以及第二管控对象的属性信息,使得数据来源更加丰富。
步骤202,从所述第一管控对象的属性信息中提取出关键属性;
在具体实现中,第一管控对象具有对应的特征数据,一个第一管控对象在不同时期可以具有不同的特征数据。
关键属性是指对特征数据影响较大的属性信息。在实际中,可以计算第一管控对象的每个属性信息与特征数据的相关性,并按照相关性排序,将排序在前的多个相关性对应的属性信息作为第一管控对象的关键属性。
需要说明的是,本申请实施例对相关性的计算方式不作限定。
例如,在电商平台中,特征数据可以为商品价格。基于电商平台中所有的商品对象的属性信息以及价格,可以挖掘每个属性信息与商品价格相关性,从而挖掘出决定商品价格的关键属性,比如,一个女式包袋,跟包的材质、大小、款式、品牌相关,可以说这四个属性基本可以决定一个包的价格,因此,这四个属性为女包的关键属性。
步骤203,对所有的第二管控对象的属性信息进行聚类,以获得针对所述第二管控对象的多个等级信息;
得到管控平台中所有第二管控对象的属性信息以后,可以基于该所有的第二管控对象的属性信息,采用kmeans等聚类方法对第二管控对象进行聚类,从而得到针对第二管控对象的多个等级信息。
在具体实现中,第二管控对象的等级信息可以包括第一等级、第二等级、第三等级等,其中,第一等级的级别高于第二等级,第二等级的
级别高于第三等级,以此类推。
例如,在电商平台中,根据所有商家的商家属性,可以建立商家运营能力分层模型,依据该分层模型可以将商家分成4个等级,包括:top卖家、腰部卖家、小卖家、长期无流量卖家。
步骤204,基于所述等级信息以及所述第一管控对象的关键属性,将所述多个第一管控对象进行聚类,以得到一个或以上的类簇对象;
得到基于第二管控对象的多个等级信息以及第一管控对象的关键属性以后,可以根据该第二管控对象的等级信息以及第一管控对象的关键属性,采用预设的聚类算法对第一管控对象进行聚类,以得到一个或以上的类簇对象。在一种实施方式中,可以将第一管控对象对应的第二管控对象的等级相同,且第一管控对象的关键属性相同的第一管控对象归为一类。
例如,可以将商家有相同运营能力及商品的关键属性都相同的商品聚类成一个商品簇。以箱包为例,其关键属性为材质、大小、款式,那么可以将相同的材质、大小、款式以及相同运营能力的商家的商品作为一个商品簇,如所有KA卖家的店铺、头层牛皮、大个的机车包的所有的商品是一个商品簇。
步骤205,基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间;
在具体实现中,可以基于预设的特征数据集合,获取每个类簇对象的特征数据分布区间,其中,每个类簇对象的特征数据分布区间可以有多个,例如,一个类簇对象的特征数据分布区间从左到右可以包括左部区间、最高段分布区间以及右部区间。
在本申请实施例的一种优选实施例中,步骤205可以包括如下子步骤:
子步骤S11,基于预设的特征数据集合,分别估计所述类簇对象的特征数据的密度分布;
预设的特征数据集合中可以包括管控平台中所有第一管控对象的特征数据。
在电商平台中,特征数据集合中除了可以包括商品价格以外,还可以包括商品的交易数据,可以以商品的特征数据集合为权重,估计每个商品簇的价格密度分布。
在本申请实施例中,可以以预设的特征数据集合作为样本点集合,采用高斯核密度估计(kernel density estimation,简称KDE)算法估计出每个类簇对象的特征数据的密度分布曲线。
核密度估计是在概率论中用来估计未知的密度函数,属于非参数检验方法之一,其原理为:在对某一事物的概率分布的情况下,如果某一个数在观察中出现了,可以认为这个数的概率密度比较大,和这个数比较近的数的概率密度也会比较大,而那些离这个数的概率密度会比较小。
子步骤S12,以所述密度分布的最高点为中点,左右分别取第一预设阈值的范围作为端点,以组成最高段分布区间;
根据密度分布曲线,可以确定该密度分布曲线的最高点,并以该最高点作为中点,左右分别取第一预设阈值的范围作为端点,组成最高段分布区间,例如,以最高点为中点,左右各15%为端点得到左右共30%的区间作为最高段分布区间[a,b)。
子步骤S13,以所述最高段分布区间为基准,将所述最高段分布区间左端的密度分布区域按照第一预设规则划分成对应的一个或多个左部区间;以及,将所述最高段分布区间右端的密度分布区域按照第二预设规则划分成对应的一个或多个右部区间。
确定最高段分布区间以后,可以以该最高段分布区间为基准,分别将密度分布曲线中位于最高段分布区间左右两端的分布区域进行划分,得到对应的左部区间以及右部区间,其中,左部区间可以有一个或多个,右部区间也可以有一个或多个。
将左部区间、最高段分布区间以及右部区间顺次排列,可以得到该类簇对象的特征数据分布区间。
在本申请实施例的一种优选实施例中,子步骤S13进一步可以包括如下子步骤:
子步骤S131,确定所述最高点的分位数;
设连续随机变量X的分布函数为F(X),密度函数为p(x)。那么,对任意0<p<1的p,称F(X)=p的x为此分布的分位数,或者下侧分位数。简单的说,分位数指的就是连续分布函数中的一个点,这个点对应概率p。
在本申请实施例中,估计的密度分布的最高点就是概率密度最大的特征数据,设定p为最高点,则可以根据F(X)=p计算p的分位数x。
子步骤S132,获取所述类簇对象中的特征数据的中位数;
在具体实现中,一个类簇对象可以包括多个第一管控对象,每个第一管控对象具有一个或多个特征数据,可以将该类簇对象中所有第一管控对象的所有特征数据组成特征数据队列,并获取该特征数据队列的中位数作为类簇对象的特征数据的中位数。
子步骤S133,判断所述最高点的分位数是否小于或等于所述中位数,若是,则执行子步骤S134,若否,则执行子步骤S135。
得到最高点的分位数以及类簇对象的特征数据的中位数以后,可以对两者进行比较,判断该分位数是否小于或等于该中位数,若是,则执行子步骤S134,若否,则执行子步骤S135。
子步骤S134,将所述最高段分布区间左端的密度分布区域按照第一预设比例划分成N段,以得到对应的N个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第二预设比例划分成M段,以得到对应的M个右部区间;
具体而言,如果该最高点的分位数小于或等于中位数,则将该最高段分布区间左端的密度分布区域按照第一预设比例划分成N段,得到对应的N个分位数,并以该N个分位数以及最高段分布区间的左端点为区间端点,组成N个左部区间。例如,最高段分布区间为[a,b),将该最高段分布区间左端的密度分布区域按照第一预设比例划分成两段,得到对应的2个分位数为p0,p1,则对应的两个左部区间为[p0,p1),[p1,a)。
将该最高段分布区间右边的密度分布区域按照第二预设比例划分成M段,得到对应的M个分位数,并以最高段分布区间的右端点以及该M个分位数为区间端点,组成M个右部区间。例如,最高段分布区间为[a,b),将该最高段分布区间右端的密度分布区域按照第二预设比例划分成三段,得到对应的3个分位数为p4,p5,p6,则对应的右部区间为[b,p4),[p4,p5),[p5,p6)。从而得到整个密度分布曲线的6个区间,分别是[p0,p1),[p1,a),[a,b),[b,p4),[p4,p5),[p5,p6)。
例如,如图3的价格区间示意图所示,将某个商品簇按照价格进行高斯核密度估计,得到以商品价格为横坐标,以成交占比为纵坐标的密度分布曲线。然后以密度分布曲线的最高点为中点,左右各取15%为端点得到左右共30%的价格段为最高价格段,将其标记为[a,b);随后判断该最高点的分位数是否小于或等于核密度分布曲线的中位数,若该分位数小于或等于中位数,则可以将该最高价格段分布区间左端的密度分
布区域按照比例1/5,4/5分成两段,得到对应的分位数为p0,p1;并将该最高价格段分布区间右端的密度分布区域按照比例5/9,3/9,1/9分成三段,得到对应的分位数p4,p5,p6,则整个密度分布曲线可以分成6个价格段,分别是[p0,p1),[p1,a),[a,b),[b,p4),[p4,p5),[p5,p6),进而实现该商品簇的价格分层。
子步骤S135,将所述最高段分布区间左端的密度分布区域按照第三预设比例划分成M段,以得到对应的M个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第四预设比例划分成N段,以得到对应的N个右部区间。
如果该最高点的分位数大于中位数,则可以将该最高段分布区间左端的密度分布区域按照第三预设比例分成M段,得到对应的M个分位数,并以该M个分位数以及最高段分布区间的左端点为区间端点,组成M个左部区间。例如,最高段分布区间为[a,b),将该最高段分布区间左端的密度分布区域按照第三预设比例划分成三段,得到对应的3个分位数为P0,P1,P2,则对应的三个左部区间为[P0,P1),[P1,P2),[P2,a)。并且将该最高段分布区间右端的密度分布区域按照第四预设比例分成N段,得到对应的N个分位数,并以最高段分布区间的右端点以及该N个分位数为区间端点,组成N个右部区间。例如,最高段分布区间为[a,b),将该最高段分布区间右端的密度分布区域按照第四预设比例划分成两段,得到对应的2个分位数为P4,P5,则对应的右部区间为[b,P4),[P4,P5)。从而得到整个密度分布曲线的6个区间,分别是[P0,P1),[P1,P2),[P2,a),[a,b),[b,P4),[P4,P5)。
例如,将某个商品簇按照价格进行高斯核密度估计,得到以价格为横坐标,以成交占比为纵坐标的密度分布曲线。然后以密度分布曲线的最高点为中点,左右各取15%为端点得到左右共30%的价格段为最高价
格段,将其标记为[a,b);随后判断该最高点的分位数是否大于核密度分布曲线的中位数,若该分位数大于中位数,则可以将该最高价格段分布区间左端的密度分布区域按照比例1/9,3/9,5/9分成三段,得到对应的分位数为P0,P1,P2;并将该最高价格段分布区间右端的密度分布区域按照比例1/5,4/5分成两段,得到对应的分位数P4,P5,则整个密度分布曲线可以分成6个价格段,分别是[P0,P1),[P1,P2),[P2,a),[a,b),[b,P4),[P4,P5)。
在本申请实施例中,综合考虑管控平台中的第一管控对象以及第二管控对象的属性信息进行第一管控对象的聚类,并结合管控平台中的特征数据集合,制定出合理的特征数据分布区间,达到特征数据分层的目的。
步骤206,将所述第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内。
在本申请实施例中,若左部区间有多个时,则可以将该左部区间中最左端的区间作为所述类簇对象的最小的特征数据区间。若右部区间有多个,则可以将该右部区间中最右端的区间作为所述类簇对象的最大的特征数据区间,例如,若某一类簇对象的特征数据分布区间为[p0,p1),[p1,a),[a,b),[b,p4),[p4,p5),[p5,p6),则该类簇对象的最小的特征数据区间为[p0,p1),最大的特征数据区间为[p5,p6)。
则在为第一管控对象(包括新的第一管控对象)设定特征数据时,可以首先确定该第一管控对象所属的类簇对象,在具体实现中,可以采用相似度算法计算第一管控对象与每个类簇对象的相似度,并将相似度小于预设值的类簇对象作为该第一管控对象对应的类簇对象。
确定第一管控对象对应的类簇对象以后,可以以该第一管控对象对应的类簇对象的特征数据分布区间为数据参考支持,将该特征数据最小
值控制在该第一管控对象对应的类簇对象的最小的特征数据区间,以及,将该特征数据最大值控制在该第一管控对象对应的类簇对象的最大的特征数据区间,从而完成对第一管控对象的管控,保障特征数据设定的合理性。
例如,在图3中,最左端的区间为低价价格区间,即定出最低价格线,防止定价过低扰乱正常的平台竞争的现象,并降低了假货泛滥的现象,提高了电商平台的整体形象。详言之,在电商平台实施本申请实施例,可以取得如下有益效果:
(1)合理的价格区间:要求对于相同类目属性的商品制定合理的价格区间,避免价格过高或者过低带来的不利影响,同时为品牌商和卖家的定价策略提供数据参考支持。
(2)最低价格线:要求对相同类目属性的商品制定出最低价格线,防止定价过低引起的扰乱正常的平台竞争,甚至会出现假货泛滥,影响平台整体形象的问题。
在本申请实施例中,将第一管控对象进行聚类得到类簇对象以后,还可以以特征数据集合为样本点集合,估计每个类簇对象的密度分布,并根据密度分布为每个类簇对象确定合理的特征数据分布区间,为第一管控对象的特征数据的设定提供数据参考支持,防止特征数据设定过高或过低带来的不良影响。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
参照图4,示出了本申请一种数据管控的系统实施例的结构框图,所述系统可以包括如下模块:
聚类模块401,用于将多个第一管控对象聚类成一个或以上的类簇对象;
数据分布确定模块402,用于基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间;
数据控制模块403,用于将所述第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内。
在本申请实施例的一种优选实施例中,所述第一管控对象具有对应的第二管控对象,所述聚类模块401可以包括如下子模块:
属性信息获取子模块,用于获取所述第一管控对象的属性信息以及所述第二管控对象的属性信息;
关键属性提取子模块,用于从所述第一管控对象的属性信息中提取出关键属性;
等级信息获取子模块,用于对所有的第二管控对象的属性信息进行聚类,以获得针对所述第二管控对象的多个等级信息;
类簇对象获取子模块,用于基于所述等级信息以及所述第一管控对象的关键属性,将所述多个第一管控对象进行聚类,以得到一个或以上的类簇对象。
在本申请实施例的一种优选实施例中,所述特征数据分布区间从左到右包括左部区间、最高段分布区间以及右部区间;
所述数据分布确定模块402可以包括如下子模块:
密度分布估计子模块,用于基于预设的特征数据集合,分别估计所述类簇对象的特征数据的密度分布;
第一区间获取子模块,用于以所述密度分布的最高点为中点,左右
分别取第一预设阈值的范围作为端点,以组成最高段分布区间;
第二区间获取子模块,用于以所述最高段分布区间为基准,将所述最高段分布区间左端的密度分布区域按照第一预设规则划分成对应的一个或多个左部区间;以及,将所述最高段分布区间右端的密度分布区域按照第二预设规则划分成对应的一个或多个右部区间。
在本申请实施例的一种优选实施例中,所述第二区间获取子模块还用于:
确定所述最高点的分位数;
获取所述类簇对象中的特征数据的中位数;
若所述最高点的分位数小于或等于所述中位数,则将所述最高段分布区间左端的密度分布区域按照第一预设比例划分成N段,以得到对应的N个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第二预设比例划分成M段,以得到对应的M个右部区间,其中,N、M为正整数;
若所述最高点的分位数大于所述中位数,则将所述最高段分布区间左端的密度分布区域按照第三预设比例划分成M段,以得到对应的M个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第四预设比例划分成N段,以得到对应的N个右部区间,其中,N、M为正整数。
在本申请实施例的一种优选实施例中,若所述左部区间及所述右部区间均有多个,则将所述左部区间中最左端的区间作为所述类簇对象的最小的特征数据区间;以及,将所述右部区间中最右端的区间作为所述类簇对象的最大的特征数据区间;
所述数据控制模块403还用于:
在为所述第一管控对象设定特征数据时,将所述特征数据最小值控
制在所述第一管控对象对应的类簇对象的最小的特征数据区间内,以及,将所述特征数据最大值控制在所述第一管控对象对应的类簇对象的最大的特征数据区间内。
在本申请实施例的一种优选实施例中,所述系统应用于电商平台中,则所述第一管控对象为商品对象;所述类簇对象为商品类簇;所述特征数据为商品价格;所述第二管控对象为商家对象。
对于系统实施例而言,由于其与上述方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
参照图5,示出了本申请的一种对管控数据进行数据分层的方法实施例的步骤流程图,可以包括如下步骤:
步骤501,将多个第一管控对象聚类成一个或以上的类簇对象;
步骤502,基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间。
在本申请实施例的一种优选实施例中,本发明实施例还可以包括如下步骤:
将所述第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内。
在本申请实施例的一种优选实施例中,所述第一管控对象具有对应的第二管控对象,所述步骤501进一步可以包括:
获取所述第一管控对象的属性信息以及所述第二管控对象的属性信息;
从所述第一管控对象的属性信息中提取出关键属性;
对所有的第二管控对象的属性信息进行聚类,以获得针对所述第二管控对象的多个等级信息;
基于所述等级信息以及所述第一管控对象的关键属性,将所述多个
第一管控对象进行聚类,以得到一个或以上的类簇对象。
在本申请实施例的一种优选实施例中,所述特征数据分布区间从左到右包括左部区间、最高段分布区间以及右部区间;
所述步骤502进一步包括:
基于预设的特征数据集合,分别估计所述类簇对象的特征数据的密度分布;
以所述密度分布的最高点为中点,左右分别取第一预设阈值的范围作为端点,以组成最高段分布区间;
以所述最高段分布区间为基准,将所述最高段分布区间左端的密度分布区域按照第一预设规则划分成对应的一个或多个左部区间;以及,将所述最高段分布区间右端的密度分布区域按照第二预设规则划分成对应的一个或多个右部区间。
在本申请实施例的一种优选实施例中,所述以所述最高段分布区间为基准,将所述最高段分布区间左端的密度分布区域按照第一预设规则划分成对应的一个或多个左部区间;以及,将所述最高段分布区间右端的密度分布区域按照第二预设规则划分成对应的一个或多个右部区间的步骤包括:
确定所述最高点的分位数;
获取所述类簇对象中的特征数据的中位数;
若所述最高点的分位数小于或等于所述中位数,则将所述最高段分布区间左端的密度分布区域按照第一预设比例划分成N段,以得到对应的N个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第二预设比例划分成M段,以得到对应的M个右部区间,其中,N、M为正整数;
若所述最高点的分位数大于所述中位数,则将所述最高段分布区间
左端的密度分布区域按照第三预设比例划分成M段,以得到对应的M个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第四预设比例划分成N段,以得到对应的N个右部区间,其中,N、M为正整数。
在本申请实施例的一种优选实施例中,若所述左部区间及所述右部区间均有多个,则将所述左部区间中最左端的区间作为所述类簇对象的最小的特征数据区间;以及,将所述右部区间中最右端的区间作为所述类簇对象的最大的特征数据区间;
所述将所述第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内的步骤包括:
在为所述第一管控对象设定特征数据时,将所述特征数据最小值控制在所述第一管控对象对应的类簇对象的最小的特征数据区间内,以及,将所述特征数据最大值控制在所述第一管控对象对应的类簇对象的最大的特征数据区间内。
在本申请实施例的一种优选实施例中,所述方法应用于电商平台中,则所述第一管控对象为商品对象;所述类簇对象为商品类簇;所述特征数据为商品价格;所述第二管控对象为商家对象。
对于图5实施例而言,由于其与上述图2方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
参照图6,示出了本申请一种对管控数据进行数据分层的系统实施例的结构框图,所述系统可以包括如下模块:
聚类模块601,用于将多个第一管控对象聚类成一个或以上的类簇对象;
数据分布确定模块602,用于基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间。
在本申请实施例的一种优选实施例中,所述系统还可以包括如下模块:
数据控制模块,用于将所述第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内。
在本申请实施例的一种优选实施例中,所述第一管控对象具有对应的第二管控对象,所述聚类模块601可以包括如下子模块:
属性信息获取子模块,用于获取所述第一管控对象的属性信息以及所述第二管控对象的属性信息;
关键属性提取子模块,用于从所述第一管控对象的属性信息中提取出关键属性;
等级信息获取子模块,用于对所有的第二管控对象的属性信息进行聚类,以获得针对所述第二管控对象的多个等级信息;
类簇对象获取子模块,用于基于所述等级信息以及所述第一管控对象的关键属性,将所述多个第一管控对象进行聚类,以得到一个或以上的类簇对象。
在本申请实施例的一种优选实施例中,所述特征数据分布区间从左到右包括左部区间、最高段分布区间以及右部区间;
所述数据分布确定模块602可以包括如下子模块:
密度分布估计子模块,用于基于预设的特征数据集合,分别估计所述类簇对象的特征数据的密度分布;
第一区间获取子模块,用于以所述密度分布的最高点为中点,左右分别取第一预设阈值的范围作为端点,以组成最高段分布区间;
第二区间获取子模块,用于以所述最高段分布区间为基准,将所述最高段分布区间左端的密度分布区域按照第一预设规则划分成对应的一个或多个左部区间;以及,将所述最高段分布区间右端的密度分布区
域按照第二预设规则划分成对应的一个或多个右部区间。
在本申请实施例的一种优选实施例中,所述第二区间获取子模块还用于:
确定所述最高点的分位数;
获取所述类簇对象中的特征数据的中位数;
若所述最高点的分位数小于或等于所述中位数,则将所述最高段分布区间左端的密度分布区域按照第一预设比例划分成N段,以得到对应的N个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第二预设比例划分成M段,以得到对应的M个右部区间,其中,N、M为正整数;
若所述最高点的分位数大于所述中位数,则将所述最高段分布区间左端的密度分布区域按照第三预设比例划分成M段,以得到对应的M个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第四预设比例划分成N段,以得到对应的N个右部区间,其中,N、M为正整数。
在本申请实施例的一种优选实施例中,若所述左部区间及所述右部区间均有多个,则将所述左部区间中最左端的区间作为所述类簇对象的最小的特征数据区间;以及,将所述右部区间中最右端的区间作为所述类簇对象的最大的特征数据区间;
所述数据控制模块还用于:
在为所述第一管控对象设定特征数据时,将所述特征数据最小值控制在所述第一管控对象对应的类簇对象的最小的特征数据区间内,以及,将所述特征数据最大值控制在所述第一管控对象对应的类簇对象的最大的特征数据区间内。
在本申请实施例的一种优选实施例中,所述系统应用于电商平台
中,则所述第一管控对象为商品对象;所述类簇对象为商品类簇;所述特征数据为商品价格;所述第二管控对象为商家对象。
对于系统实施例而言,由于其与上述方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序操作指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序操作指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数
据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的操作指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序操作指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的操作指令产生包括操作指令装置的制造品,该操作指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序操作指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的操作指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系
术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种数据管控的方法及系统进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。
Claims (14)
- 一种数据管控的方法,其特征在于,所述方法包括:将多个第一管控对象聚类成一个或以上的类簇对象;基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间;将所述第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内。
- 根据权利要求1所述的方法,其特征在于,所述第一管控对象具有对应的第二管控对象,所述将多个第一管控对象聚类成一个或以上的类簇对象的步骤包括:获取所述第一管控对象的属性信息以及所述第二管控对象的属性信息;从所述第一管控对象的属性信息中提取出关键属性;对所有的第二管控对象的属性信息进行聚类,以获得针对所述第二管控对象的多个等级信息;基于所述等级信息以及所述第一管控对象的关键属性,将所述多个第一管控对象进行聚类,以得到一个或以上的类簇对象。
- 根据权利要求1或2所述的方法,其特征在于,所述特征数据分布区间从左到右包括左部区间、最高段分布区间以及右部区间;所述基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间的步骤包括:基于预设的特征数据集合,分别估计所述类簇对象的特征数据的密度分布;以所述密度分布的最高点为中点,左右分别取第一预设阈值的范围作为端点,以组成最高段分布区间;以所述最高段分布区间为基准,将所述最高段分布区间左端的密度分布区域按照第一预设规则划分成对应的一个或多个左部区间;以及,将所述最高段分布区间右端的密度分布区域按照第二预设规则划分成对应的一个或多个右部区间。
- 根据权利要求3所述的方法,其特征在于,所述以所述最高段分布区间为基准,将所述最高段分布区间左端的密度分布区域按照第一预设规则划分成对应的一个或多个左部区间;以及,将所述最高段分布区间右端的密度分布区域按照第二预设规则划分成对应的一个或多个右部区间的步骤包括:确定所述最高点的分位数;获取所述类簇对象中的特征数据的中位数;若所述最高点的分位数小于或等于所述中位数,则将所述最高段分布区间左端的密度分布区域按照第一预设比例划分成N段,以得到对应的N个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第二预设比例划分成M段,以得到对应的M个右部区间,其中,N、M为正整数;若所述最高点的分位数大于所述中位数,则将所述最高段分布区间左端的密度分布区域按照第三预设比例划分成M段,以得到对应的M个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第四预设比例划分成N段,以得到对应的N个右部区间,其中,N、M为正整数。
- 根据权利要求3或4所述的方法,其特征在于,若所述左部区间及所述右部区间均有多个,则将所述左部区间中最左端的区间作为所述类簇对象的最小的特征数据区间;以及,将所述右部区间中最右端的区间作为所述类簇对象的最大的特征数据区间;所述将所述第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内的步骤包括:在为所述第一管控对象设定特征数据时,将所述特征数据最小值控制在所述第一管控对象对应的类簇对象的最小的特征数据区间内,以及,将所述特征数据最大值控制在所述第一管控对象对应的类簇对象的最大的特征数据区间内。
- 根据权利要求2所述的方法,其特征在于,所述方法应用于电商平台中,则所述第一管控对象为商品对象;所述类簇对象为商品类簇;所述特征数据为商品价格;所述第二管控对象为商家对象。
- 一种数据管控的系统,其特征在于,所述系统包括:聚类模块,用于将多个第一管控对象聚类成一个或以上的类簇对象;数据分布确定模块,用于基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间;数据控制模块,用于将所述第一管控对象的特征数据控制在对应的类簇对象的特征数据分布区间内。
- 根据权利要求7所述的系统,其特征在于,所述第一管控对象具有对应的第二管控对象,所述聚类模块包括:属性信息获取子模块,用于获取所述第一管控对象的属性信息以及所述第二管控对象的属性信息;关键属性提取子模块,用于从所述第一管控对象的属性信息中提取出关键属性;等级信息获取子模块,用于对所有的第二管控对象的属性信息进行聚类,以获得针对所述第二管控对象的多个等级信息;类簇对象获取子模块,用于基于所述等级信息以及所述第一管控对 象的关键属性,将所述多个第一管控对象进行聚类,以得到一个或以上的类簇对象。
- 根据权利要求7或8所述的系统,其特征在于,所述特征数据分布区间从左到右包括左部区间、最高段分布区间以及右部区间;所述数据分布确定模块包括:密度分布估计子模块,用于基于预设的特征数据集合,分别估计所述类簇对象的特征数据的密度分布;第一区间获取子模块,用于以所述密度分布的最高点为中点,左右分别取第一预设阈值的范围作为端点,以组成最高段分布区间;第二区间获取子模块,用于以所述最高段分布区间为基准,将所述最高段分布区间左端的密度分布区域按照第一预设规则划分成对应的一个或多个左部区间;以及,将所述最高段分布区间右端的密度分布区域按照第二预设规则划分成对应的一个或多个右部区间。
- 根据权利要求9所述的系统,其特征在于,所述第二区间获取子模块还用于:确定所述最高点的分位数;获取所述类簇对象中的特征数据的中位数;若所述最高点的分位数小于或等于所述中位数,则将所述最高段分布区间左端的密度分布区域按照第一预设比例划分成N段,以得到对应的N个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第二预设比例划分成M段,以得到对应的M个右部区间,其中,N、M为正整数;若所述最高点的分位数大于所述中位数,则将所述最高段分布区间左端的密度分布区域按照第三预设比例划分成M段,以得到对应的M个左部区间,以及,将所述最高段分布区间右端的密度分布区域按照第 四预设比例划分成N段,以得到对应的N个右部区间,其中,N、M为正整数。
- 根据权利要求9或10所述的系统,其特征在于,若所述左部区间及所述右部区间均有多个,则将所述左部区间中最左端的区间作为所述类簇对象的最小的特征数据区间;以及,将所述右部区间中最右端的区间作为所述类簇对象的最大的特征数据区间;所述数据控制模块还用于:在为所述第一管控对象设定特征数据时,将所述特征数据最小值控制在所述第一管控对象对应的类簇对象的最小的特征数据区间内,以及,将所述特征数据最大值控制在所述第一管控对象对应的类簇对象的最大的特征数据区间内。
- 根据权利要求8所述的系统,其特征在于,所述系统应用于电商平台中,则所述第一管控对象为商品对象;所述类簇对象为商品类簇;所述特征数据为商品价格;所述第二管控对象为商家对象。
- 一种对管控数据进行数据分层的方法,其特征在于,所述方法包括:将多个第一管控对象聚类成一个或以上的类簇对象;基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间。
- 一种对管控数据进行数据分层的系统,其特征在于,所述系统包括:聚类模块,用于将多个第一管控对象聚类成一个或以上的类簇对象;数据分布确定模块,用于基于预设的特征数据集合,确定所述一个或以上的类簇对象的特征数据分布区间。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610194515.9 | 2016-03-30 | ||
CN201610194515.9A CN107292641A (zh) | 2016-03-30 | 2016-03-30 | 一种数据管控的方法及系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017167064A1 true WO2017167064A1 (zh) | 2017-10-05 |
Family
ID=59963493
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/077452 WO2017167064A1 (zh) | 2016-03-30 | 2017-03-21 | 一种数据管控的方法及系统 |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN107292641A (zh) |
TW (1) | TW201737128A (zh) |
WO (1) | WO2017167064A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114371677A (zh) * | 2022-01-05 | 2022-04-19 | 天津大学 | 基于谱半径-区间主成分分析的工业过程状态监测方法 |
CN117595464A (zh) * | 2024-01-18 | 2024-02-23 | 深圳创芯技术股份有限公司 | 一种电池充电器充电检测控制方法及系统 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111178595B (zh) * | 2019-12-11 | 2023-03-24 | 深圳平安医疗健康科技服务有限公司 | 项目控制参数生成方法、装置、计算机设备和存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103034687A (zh) * | 2012-11-29 | 2013-04-10 | 中国科学院自动化研究所 | 一种基于2-类异质网络的关联模块识别方法 |
CN103365969A (zh) * | 2013-06-24 | 2013-10-23 | 北京奇虎科技有限公司 | 一种异常数据检测处理的方法和系统 |
CN104077303A (zh) * | 2013-03-28 | 2014-10-01 | 国际商业机器公司 | 用于呈现数据的方法和装置 |
CN104123465A (zh) * | 2014-07-24 | 2014-10-29 | 中国软件与技术服务股份有限公司 | 一种基于聚类的大数据交叉分析预警方法及系统 |
US20150134410A1 (en) * | 2013-11-12 | 2015-05-14 | Bank Of America Corporation | Predicting economic conditions |
-
2016
- 2016-03-30 CN CN201610194515.9A patent/CN107292641A/zh active Pending
-
2017
- 2017-02-21 TW TW106105761A patent/TW201737128A/zh unknown
- 2017-03-21 WO PCT/CN2017/077452 patent/WO2017167064A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103034687A (zh) * | 2012-11-29 | 2013-04-10 | 中国科学院自动化研究所 | 一种基于2-类异质网络的关联模块识别方法 |
CN104077303A (zh) * | 2013-03-28 | 2014-10-01 | 国际商业机器公司 | 用于呈现数据的方法和装置 |
CN103365969A (zh) * | 2013-06-24 | 2013-10-23 | 北京奇虎科技有限公司 | 一种异常数据检测处理的方法和系统 |
US20150134410A1 (en) * | 2013-11-12 | 2015-05-14 | Bank Of America Corporation | Predicting economic conditions |
CN104123465A (zh) * | 2014-07-24 | 2014-10-29 | 中国软件与技术服务股份有限公司 | 一种基于聚类的大数据交叉分析预警方法及系统 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114371677A (zh) * | 2022-01-05 | 2022-04-19 | 天津大学 | 基于谱半径-区间主成分分析的工业过程状态监测方法 |
CN117595464A (zh) * | 2024-01-18 | 2024-02-23 | 深圳创芯技术股份有限公司 | 一种电池充电器充电检测控制方法及系统 |
CN117595464B (zh) * | 2024-01-18 | 2024-04-12 | 深圳创芯技术股份有限公司 | 一种电池充电器充电检测控制方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
TW201737128A (zh) | 2017-10-16 |
CN107292641A (zh) | 2017-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106157083B (zh) | 挖掘潜在客户的方法和装置 | |
US20160035044A1 (en) | Account processing method and apparatus | |
CN105931065B (zh) | 客户群特征数据的处理方法及装置 | |
EP3279806A1 (en) | Data processing method and apparatus | |
CN109948724A (zh) | 一种基于改进lof算法的电商刷单行为检测方法 | |
WO2017167064A1 (zh) | 一种数据管控的方法及系统 | |
CN107679856A (zh) | 基于交易的业务控制方法和装置 | |
CN107093122B (zh) | 对象分类方法及装置 | |
CN105989146B (zh) | 对象展示方法及装置 | |
Kumar et al. | Cost optimization inventory model for deteriorating items with trapezoidal demand rate under completely backlogged shortages in crisp and fuzzy environment | |
CN111752662B (zh) | 银行交易界面展示方法及装置 | |
US20170032707A1 (en) | Method for determining a fruition score in relation to a poverty alleviation program | |
Chen et al. | Out-of-stock detection based on deep learning | |
CN111737555A (zh) | 热点关键词的选取方法、设备和存储介质 | |
US20170186063A1 (en) | System and method for barter support | |
Insani et al. | Data mining for marketing in telecommunication industry | |
CN107305615A (zh) | 数据表识别方法和系统 | |
CN113506164A (zh) | 一种风控决策方法、装置、电子设备及机器可读存储介质 | |
CN118195666A (zh) | 商品价格自动监控方法、装置、存储介质及计算机设备 | |
Jaggi | An optimal replenishment policy for non-instantaneous deteriorating items with price dependent demand and time-varying holding cost | |
CN109545312B (zh) | 一种药店结算单风险检测方法和装置 | |
JP2020160949A (ja) | 融資情報提供システム、融資情報提供装置、融資情報提供方法及び学習済モデル | |
Preethi et al. | Data Mining In Banking Sector | |
WO2020151619A1 (zh) | 库存管理方法、装置、存储介质及电子设备 | |
JP2017111630A (ja) | 情報提供装置、情報提供方法、及びプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17773094 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17773094 Country of ref document: EP Kind code of ref document: A1 |