Disclosure of Invention
The application aims to provide a food sales data mining analysis method based on big data, which is used for solving the problem of low mining accuracy of the existing food sales association rules.
In order to solve the technical problems, the application provides a food sales data mining analysis method based on big data, which comprises the following steps:
acquiring food sales data to be mined, performing data mining on the food sales data according to a set confidence coefficient threshold and a set sales period, and acquiring an association rule set of each food under each sales period and the confidence coefficient of each association rule in the association rule set;
determining the complexity of the association rule of each food under each sales period according to the distribution characteristics of the types of the last food in each association rule in the association rule set of each food under each sales period and the confidence corresponding to each association rule;
determining the price of each food under each sales cycle according to the food sales data, and determining the most potential market price of each food according to the price and the complexity of the association rule;
determining relevant parameters of the association rules corresponding to each food according to the price difference of each food in every two adjacent sales cycles, the number difference of each association rule in the association rule set and the total variety number difference of the follow-up food in each association rule in the association rule set;
determining a dynamic adjustment parameter of each food under each sales period according to the association rule related parameter corresponding to each food, the difference between the price of each food under each sales period and the corresponding most market potential price, and the difference between the price of each food under each adjacent two sales periods;
and adjusting the confidence coefficient threshold according to the dynamic adjustment parameters, obtaining an adjustment confidence coefficient threshold of each food under each sales period, carrying out data mining on the food sales data according to the adjustment confidence coefficient threshold and the sales period, and determining a final association rule set of each food under each sales period.
Further, determining the association rule complexity of each food item at each sales cycle includes:
determining information entropy corresponding to the type of the last item of food in each association rule in the association rule set of each food under each sales period;
determining the accumulated sum of the confidence coefficient of each association rule in the association rule set of each food under each sales period, thereby obtaining the corresponding accumulated sum of the confidence coefficient of each food under each sales period;
and determining the complexity of the association rule of each food under each sales period according to the corresponding information entropy and confidence coefficient accumulation sum of each food under each sales period, wherein the information entropy and the confidence coefficient accumulation sum are in positive correlation with the complexity of the association rule.
Further, determining association rule related parameters corresponding to each food comprises:
the association rule related parameters comprise food association rule changing coefficients and food category rule changing coefficients, and the calculation formulas corresponding to the food association rule changing coefficients and the food category rule changing coefficients are as follows:
wherein,and->Respectively represent the firstnFood association rule change coefficient and food category rule change coefficient corresponding to the seed food, and +_>And->Respectively represent the firstnTotal number of association rules in association rule set of seed food under t and t-1 sales period, +.>And->Respectively represent the firstnPrice of the seed food at the t-th and t-1 th sales cycle, +.>And->Respectively represent the firstnTotal category number of all the last food in the association rule set of the foods under the T and T-1 sales period, T represents the TnThe total number of sales cycles corresponding to the seed food, ||representsTaking absolute value sign, e represents a natural constant.
Further, determining the dynamic adjustment parameters for each food product at each sales cycle includes:
determining a price coefficient of each food under each sales cycle according to the price of each food under each sales cycle and the corresponding difference value of the price with the most market potential;
determining the average value of the price difference values of each food under every two adjacent sales cycles, thereby obtaining the average value of the price difference values corresponding to each food;
determining a price difference quantized value corresponding to each food under each sales cycle according to the price of each food under each sales cycle, the corresponding absolute value of the difference value of the price with the market potential and the corresponding price difference mean value of each food;
determining the price influence degree of the association rule corresponding to each food under each sales period according to the price difference quantized value corresponding to each food under each sales period and the association rule related parameters corresponding to each food;
and carrying out positive correlation normalization on the price influence degree of the association rule, and determining the product value of the corresponding positive correlation normalization result and the price coefficient of each food under each sales period as a dynamic adjustment parameter of each food under each sales period.
Further, determining a price coefficient for each food product at each sales cycle includes:
judging whether the difference between the price of each food in each sales period and the corresponding price with the most market potential is larger than 0, if so, setting the price coefficient in the corresponding sales period as a first value, otherwise, setting the price coefficient in the corresponding sales period as a second value, wherein the first value is a negative number and the second value is a positive number.
Further, the first value is-1 and the second value is 1.
Further, determining the most market potential price for each food product includes:
and determining the sales period corresponding to the maximum value in the association rule complexity of each food under each sales period as the target sales period of each food, and determining the price of each food under the target sales period as the most potential market price of each food.
Further, the confidence coefficient threshold value is adjusted, the adjusted confidence coefficient threshold value of each food under each sales period is obtained, and the corresponding calculation formula is as follows:
wherein,is the firstnAn adjusted confidence threshold for the seed food at the t-th sales cycle,/for the seed food>For the confidence threshold,/a>Is the firstnThe dynamic adjustment parameters of the food under the t sales period.
Further, the food sales data is subjected to data mining by adopting an Apriori algorithm.
Further, the sales cycle is a time period between set times of every two adjacent days, and the set times are determined according to the food price change completion time.
The application has the following beneficial effects: preliminary data mining is carried out on food sales data according to the set confidence coefficient threshold value and the sales period, so that the association rule set of each food under each sales period and the confidence coefficient of each association rule in the association rule set can be determined. In order to eliminate the influence of price factors of foods on the association rules, the complexity degree of the association rules of each food under each sales period is determined so as to measure the market potential of each food under the current price in each sales period, and the most potential market price of each food is screened out. Meanwhile, the influence degree of the price difference of the food on the association rules is measured by analyzing the price difference of each food under every two adjacent sale periods, the difference of the number of each association rule in the association rule set and the difference of the total category number of the follow-up food in each association rule in the association rule set, so that the association rule related parameters corresponding to each food are determined. And finally, based on the correlation rule related parameters, combining the price of each food in each sales period with the corresponding price difference with the most market potential and the price difference of each food in every two adjacent sales periods, adaptively determining the dynamic adjustment parameters of each food in each sales period, and dynamically adjusting the confidence coefficient threshold by utilizing the dynamic adjustment parameters so as to obtain the adjustment confidence coefficient threshold of each food in each sales period. Based on the confidence coefficient adjustment threshold value, the data mining is carried out on the food sales data again, so that the influence of price factors on food association rules is avoided, more accurate association rules can be obtained, and the accurate establishment of subsequent food sales decisions is ensured.
Detailed Description
In order to further describe the technical means and effects adopted by the present application to achieve the preset purpose, the following detailed description is given below of the specific implementation, structure, features and effects of the technical solution according to the present application with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. In addition, all parameters or indices in the formulas referred to herein are values after normalization that eliminate the dimensional effects.
The embodiment provides a food sales data mining analysis method based on big data, and a flow chart corresponding to the method is shown in fig. 1, and the method comprises the following steps:
step S1: acquiring food sales data to be mined, performing data mining on the food sales data according to a set confidence coefficient threshold and a set sales period, and acquiring an association rule set of each food under each sales period and the confidence coefficient of each association rule in the association rule set.
The present embodiment is directed to mining and analyzing sales data of farm and livestock foods, and thus, first, food sales data to be mined needs to be acquired. When the food sales data to be mined is obtained, the electronic archiving of shopping tickets of all consumers in a certain period of a certain supermarket is used as an initial data set, then the electronic archiving of all shopping tickets in the initial data set is subjected to data cleaning, the data cleaning process is to remove the data which does not contain the data related to farm and livestock foods, only the data containing the farm and livestock foods are reserved, and the reserved data are used as the food sales data to be mined.
It should be understood that, in this embodiment, the data mining is described by taking the sales data of farm and livestock foods as an example, but the method provided by the scheme is not limited to the data mining of the sales data of farm and livestock foods, and is also applicable to the data mining of sales data of seafood foods, green vegetables foods and other foods with large fluctuation intervals of prices.
After obtaining the food sales data to be mined, mining the initial association rules under the fixed confidence threshold in a certain period, and classifying the association rules of farm and livestock foods. Since the prices of farm and livestock foods in the supermarket are unstable, when the initial association rule is mined, the association rules of farm and livestock foods at different prices need to be acquired as much as possible.
In order to obtain association rules for farm and livestock foods at different prices, it is necessary to determine the period of data mining. Considering that the price change of farm and livestock foods in a supermarket is generally changed at the same time, for example, in general, the food price in the morning of the supermarket is often different from the food price in the afternoon due to different freshness of the foods, and the change of the food price is generally uniformly changed in a fixed time in the noon, so that in the embodiment, on the basis of the price change completion time in the noon every day, all the food sales data are set at set times separated from the price change completion time in the noon every day, and thus each sales period is obtained. At this time, all the food sales data takes each day as one sales cycle, which is a period of time between set times of every adjacent two days, the set times being determined according to the food price change completion time.
After determining each sales cycle, a confidence threshold is determinedThe embodiment sets the confidence thresholdThe confidence threshold ∈>Can be adjusted according to the actual demands of the practitioners. And then carrying out data mining on sales data of the farm and livestock foods under the fixed confidence coefficient threshold by using an Apriori algorithm on a basis of each sales period, so as to obtain a plurality of association rules of different farm and livestock foods under each sales period and the confidence coefficient of each association rule. Due to the fact thatThe specific implementation process of data mining by using Apriori algorithm belongs to the prior art, and will not be described here again. And then carrying out association rule classification based on the farm animal husbandry food on a plurality of association rules under each sales cycle, thereby obtaining association rule sets of different farm animal husbandry foods under each sales cycle, wherein each association rule set comprises a plurality of association rules. That is, all association rules with the same first food in the association rules are taken as a class, and a set formed by the association rules is taken as an association rule set of the first food. To facilitate understanding, by +.>The individual species of farm animal food are exemplified by, for example, the +.>Association rule set in each sales period. Wherein (1)>Indicate->The seed farm animal food is in the +.>Corresponding +.>Bar association rule, the->The rule of association of the bar is composed of->The farm and livestock food is used as the front item and the other farm and livestock food is used as the rear item, and the ∈>,/>Indicating the total number of all sales cycles,,/>represents the total number of species of all farm animal foods,/->Indicate->The seed farm animal food is in the +.>The total number of corresponding association rules for the respective sales period, i.e. at +.>Within a sales period and->Total number of association rules for individual species of farm animal food.
In the above manner, a set of association rules for each food item at each sales cycle and a confidence level for each association rule in the set of association rules may be obtained.
Step S2: and determining the complexity of the association rule of each food under each sales period according to the distribution characteristics of the types of the last food in each association rule in the association rule set of each food under each sales period and the confidence corresponding to each association rule.
The corresponding association rule set of each farm and animal food in different sales periods is obtained through the steps, each sales period is obtained based on the price change period of the farm and animal food, namely, the association rule in the obtained association rule set is an association rule under no price change, so that association rule analysis of the same kind of farm and animal food in different sales periods based on price fluctuation is needed, and then quantification of price-based data mining confidence dynamic adjustment parameters of the farm and animal food in the kind is carried out according to analysis results.
Based on the analysis, optionally, according to the distribution characteristics of the types of the last item of food in each association rule in the association rule set of each food under each sales cycle and the confidence corresponding to each association rule, determining the complexity of the association rule of each food under each sales cycle, the implementation steps comprise:
determining information entropy corresponding to the type of the last item of food in each association rule in the association rule set of each food under each sales period;
determining the accumulated sum of the confidence coefficient of each association rule in the association rule set of each food under each sales period, thereby obtaining the corresponding accumulated sum of the confidence coefficient of each food under each sales period;
and determining the complexity of the association rule of each food under each sales period according to the corresponding information entropy and confidence coefficient accumulation sum of each food under each sales period, wherein the information entropy and the confidence coefficient accumulation sum are in positive correlation with the complexity of the association rule.
Specifically, in the firstFor example, the formula of the complexity of the association rule of each kind of farm and animal food under each sales period is as follows:
=/>
wherein,indicate->The seed farm animal food is in the +.>Association rule complexity under a sales cycle, +.>Indicate->The seed farm animal food is in the +.>Information entropy corresponding to the type of the post-item food in each association rule in the association rule set under each sales period, namely +.>The seed farm animal food is in the +.>All +.>Information entropy corresponding to the variety of all the post-agricultural livestock food in the rule of association of strips,/->Indicate->The seed farm animal food is in the +.>Total number of all association rules in the set of association rules under each sales cycle, +.>Indicate->The seed farm animal food is in the +.>The association rule set under the sales period +.>The confidence of the bar association rule,indicate->The seed farm animal food is in the +.>The corresponding confidence sums for each sales period.
For the above calculation formula of the association rule complexity, the association rule complexity is used for quantifying the association rule in the first step∈1 in sales cycle>The greater the market potential of the farm and livestock food at the current price, the more complex it is, which is stated at +.>Price of sales period +.>The higher the market potential of farm animal food, the specific logic is: />And->The larger, the description is +.>The farming and livestock food is not only related to more other kinds of farming and livestock foodIs linked, and different association rules are more, physically representing the +.>The farm and livestock food can be matched or combined with other farm and livestock food to meet different demands and preferences of consumers.The greater this value, the description is +.>The higher the confidence level that all the association rules in the association rule set corresponding to the farm and livestock food are mined. To->As a means ofThe coefficient of (2) indicates +.>The higher the market potential of farm and livestock food.
Step S3: and determining the price of each food under each sales cycle according to the food sales data, and determining the most potential market price of each food according to the price and the complexity of the association rule.
By the steps, complexity quantization is carried out on the association rule sets corresponding to different sales periods of each kind of farm animal food, and the association rule complexity of each kind of farm animal food under different sales periods can be obtained. Based on these association rule complexities, prices with market potential can be obtained, namely: and determining the sales period corresponding to the maximum value in the association rule complexity of each food under each sales period as the target sales period of each food, and determining the price of each food under the target sales period as the most potential market price of each food. For the convenience ofIt will be appreciated that for the firstSelecting a sales period corresponding to the maximum association rule complexity in all sales periods of the agricultural and livestock food, wherein the sales period is +.>The price of the farm and livestock food is the price with the market potential and is marked as +.>。
Step S4: and determining relevant parameters of the association rules corresponding to each food according to the price difference of each food under every two adjacent sales cycles, the number difference of each association rule in the association rule set and the total variety number difference of the follow-up food in each association rule in the association rule set.
In each sales cycle of each farm animal, the most important parameters in the relevant association rules are the total number of all association rules in the association rule set and the total types of the post farm animal corresponding to all association rules in the association rule set, so that the two characteristics changing along with the price are extracted to determine the relevant parameters of the association rules corresponding to each farm animal, and the implementation steps comprise:
the association rule related parameters comprise food association rule changing coefficients and food category rule changing coefficients, and the calculation formulas corresponding to the food association rule changing coefficients and the food category rule changing coefficients are as follows:
wherein,and->Respectively represent the firstnFood association rule change coefficient and food type rule change coefficient corresponding to farm and livestock food, and ∈10>And->Respectively represent the firstnTotal number of association rules in association rule set of agricultural and livestock food under t and t-1 sales period, +.>And->Respectively represent the firstnPrice of the farm animal food at the t-th and t-1 th sales period,/for the species farm animal food>And->Respectively represent the firstnTotal number of types of all the post-foods in the association rule set of the farm and livestock foods at the t-th and t-1 th sales cycles,Trepresent the firstnThe total number of sales periods corresponding to farm and livestock foods is represented by absolute value symbols, and e represents a natural constant.
For the above calculation formulas of the food association rule change coefficient and the food category rule change coefficient, in the firstNo. corresponding to agricultural livestock food>In the association rule set of each sales period, since the price is fixed, the corresponding association rule set is only suitable for the current priceUnder the%>Agricultural livestock food is bred, and the first part of two adjacent sales periods is added>Price difference value of agricultural livestock food>And for the price difference valueCarrying out negative correlation normalization to obtain a price difference value negative correlation normalization resultThen by calculating the total number difference value of the association rules in the corresponding association rule sets in the two adjacent sales periods +.>Total category number difference value of the last agricultural livestock food in the association rule set of two adjacent sales periods>Normalized results inversely related to the above price difference values, respectively +.>Obtaining the ratio to obtain the +.>Rule change coefficient for the breeding of livestock food, i.e. food association rule change coefficient under the current price difference value +.>And food class rule change coefficient->. Then by all ofFood association rule change coefficient at price difference value is averaged to represent +.>Food association rule change coefficient corresponding to agricultural livestock food>And represents +.f. under a certain price difference by averaging the food kind rule change coefficients under all price difference values>Food type rule change coefficient corresponding to agricultural livestock food>. When food association rule changes coefficient +.>And food class rule change coefficient->When the value of (2) is larger, the influence on the association rule set is larger when the price difference is larger, and otherwise, the influence is opposite.
Step S5: and determining the dynamic adjustment parameters of each food under each sales period according to the association rule related parameters corresponding to each food, the difference between the price of each food under each sales period and the corresponding price with the market potential, and the difference between the price of each food under each adjacent two sales periods.
After determining the correlation rule related parameters corresponding to each farm and livestock food through the steps, acquiring the dynamic adjustment parameters of the confidence thresholds of the correlation rules of different sales periods based on the price, the most market potential price and the correlation rule related parameters of each farm and livestock food of different sales periods, wherein the implementation steps comprise:
determining a price coefficient of each food under each sales cycle according to the price of each food under each sales cycle and the corresponding difference value of the price with the most market potential;
determining the average value of the price difference values of each food under every two adjacent sales cycles, thereby obtaining the average value of the price difference values corresponding to each food;
determining a price difference quantized value corresponding to each food under each sales cycle according to the price of each food under each sales cycle, the corresponding absolute value of the difference value of the price with the market potential and the corresponding price difference mean value of each food;
determining the price influence degree of the association rule corresponding to each food under each sales period according to the price difference quantized value corresponding to each food under each sales period and the association rule related parameters corresponding to each food;
and carrying out positive correlation normalization on the price influence degree of the association rule, and determining the product value of the corresponding positive correlation normalization result and the price coefficient of each food under each sales period as a dynamic adjustment parameter of each food under each sales period.
Optionally, determining a price coefficient for each food item at each sales cycle includes:
judging whether the difference between the price of each food in each sales period and the corresponding price with the most market potential is larger than 0, if so, setting the price coefficient in the corresponding sales period as a first value, otherwise, setting the price coefficient in the corresponding sales period as a second value, wherein the first value is a negative number and the second value is a positive number.
Specifically, by the firstFirst->For example, the farm animal food has the corresponding calculation formula of dynamic adjustment parameters:
wherein,represent the firstnThe farm and livestock food is in the first placetDynamically adjusted parameters under a sales cycle, +.>Represent the firstnThe farm and livestock food is in the first placetPrice coefficient under a sales cycle, +.>Represent the firstnThe most market potential price for breeding farm animal food,/-for>And->Respectively represent the firstnThe farm and livestock food is in the first placetAnd (b)tPrice at 1 sales cycle,Trepresent the firstnTotal number of sales cycles corresponding to the farming and livestock food, +.>And->Respectively represent the firstnFood association rule change coefficients and food type rule change coefficients corresponding to farm and livestock foods, e represents a natural constant, ||represents absolute value symbols, |is ++>Representing infinitesimal parameters.
For the dynamic adjustment parameters described above, the first is calculatednThe farm and livestock food is in the first placetActual price at each sales cycle is different from the most market potential priceAnd then in the first stepnAverage value of price difference of farm and livestock food in all sales cycles>For the difference->Quantization is performed to obtain a price gap quantized value +.>Wherein, infinitesimal parameter->For preventing denominator from being 0. Then the price gap quantized value is taken as a weight value for +.>The food association rule change coefficient of the farm animal food and the added value of the food category rule change coefficient are weighted and multiplied to obtain +.>The farm and livestock food is in the first placeCorresponding association rule price impact degree under each sales cycleThe association rule price influence degree characterizes the +.>The seed farm animal food is in the +.>The price under each sales period affects the degree of the association rule, and when the influence degree of the price of the association rule is takenThe larger the value, the higher the degree of influence is explained. Then regard the price influence degree of the association rule as +.>The seed farm animal food is in the +.>The basis of the dynamic adjustment parameters of the sales cycle is normalized by means of an exponential function, and the basis is normalized by means of +.>The foundation can be made +.>Normalization is achieved without changing the logic. Finally, calculating the normalized result and the price coefficient of the positive correlation>To obtain the dynamic adjustment parameter of the confidence threshold of the association rule +.>. Wherein price coefficient->Is used for judging whether the dynamic adjustment parameter is positive adjustment or negative adjustment, when +.>When the price is equal, the corresponding price coefficient +.>For a first value-1, the dynamic adjustment parameter is negatively adjusted at this time, and when +.>When the price is equal, the corresponding price coefficient +.>Is the firstAnd two are 1, and the dynamic adjustment parameters are positively adjusted at the moment.
Step S6: and adjusting the confidence coefficient threshold according to the dynamic adjustment parameters, obtaining an adjustment confidence coefficient threshold of each food under each sales period, carrying out data mining on the food sales data according to the adjustment confidence coefficient threshold and the sales period, and determining a final association rule set of each food under each sales period.
After the dynamic adjustment parameters of each farm and livestock food under each sales period are determined through the steps, the confidence coefficient threshold value is adjusted by utilizing the dynamic adjustment parameters, so that the corresponding adjustment confidence coefficient threshold value is obtained, the subsequent mining of the final association rule of the corresponding sales period is conveniently carried out based on the adjustment confidence coefficient threshold value, and the corresponding calculation formula is as follows:
wherein,is the firstnThe farm and livestock food is in the first placetAn adjusted confidence threshold value for each sales cycle, < ->For the confidence threshold,/a>Is the firstnThe farm and livestock food is in the first placetDynamically adjusting parameters under a sales cycle.
After the adjustment confidence threshold value of each farm and livestock food under each sales period is determined through the calculation formula, according to the adjustment confidence threshold value and the sales period, data mining is carried out on food sales data by using an Apriori algorithm, namely, the Apriori algorithm is used for carrying out final data mining with the adjustment confidence threshold value as final data mining confidence level on sales data of all farm and livestock foods of each sales period, and final association rule sets with each farm and livestock food as a front item are obtained. Because the data mining is performed on the food sales data by using the Apriori algorithm after the confidence coefficient adjustment threshold and the sales period are determined, a specific implementation process of obtaining the association rule set of each food under each sales period belongs to the prior art, and is not repeated here. All the association rules in the final association rule set furthest reduce the influence of the food price on the association rules, and have higher accuracy. And carrying out association rule analysis on different kinds of farm and livestock foods by using an association rule analysis algorithm, so as to formulate corresponding sales strategies.
According to the application, when the data mining of the food sales data is carried out, the influence of the price factors of the food on the association rules is considered, the influence of the price factors on the food association rules is removed as much as possible through the dynamic adjustment of the confidence coefficient threshold value when the association rules of the food are mined, and compared with the prior art, the obtained association rules are more accurate, the data mining is more thorough, and the robustness is stronger.
It should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.