CN108830492B

CN108830492B - Method for determining spot-check merchants based on big data

Info

Publication number: CN108830492B
Application number: CN201810650948.XA
Authority: CN
Inventors: 高嵘; 丁熠; 赵良吉; 秦臻; 张黎; 邓伏虎; 赵洋
Original assignee: Chengdu Boyoi Technology Co ltd; University of Electronic Science and Technology of China
Current assignee: Chengdu Boyoi Technology Co ltd; University of Electronic Science and Technology of China
Priority date: 2018-06-22
Filing date: 2018-06-22
Publication date: 2022-02-08
Anticipated expiration: 2038-06-22
Also published as: CN108830492A

Abstract

The invention discloses a method for determining a spot check merchant based on big data, which comprises the following steps: s1, classifying all merchants in the market according to the inspection items according to the data in the market supervision platform, and calculating the predicted failure rate F (i) of each merchant; s2, calculating the final inspection priority F of each merchant according to the predicted failure rate of each merchant^*(i) (ii) a S3, according to F, a plurality of merchants in the market^*(i) In descending order according to F^*(i) And automatically generating a merchant list, updating the merchant checking condition after the field check is finished, and storing corresponding data into the market supervision platform. The spot check merchant generation method automatically generates the merchant list required to be subjected to field check on the same day every day according to the event information data which occurs in the recent market in the supervision platform in real time, and effectively solves the problems of inaccuracy and low efficiency in the spot check process of the merchants in the traditional market.

Description

Method for determining spot-check merchants based on big data

Technical Field

The invention belongs to the technical field of Internet big data application, and particularly relates to a method for determining a spot-check merchant based on big data.

Background

Strengthening the safety supervision and management of the marketing quality of the edible agricultural products is an important way for ensuring the quality safety of the edible agricultural products. Therefore, centralized trading markets in various regions begin to establish own practical agricultural product market supervision platforms so as to realize the information acquisition and recording of quality information and sales information of edible agricultural products. Such platforms typically include three types of users: market regulators, sellers (merchants), and buyers (buyers). The market supervisor comprises a market management department and a basic supervision department of government, and can carry out online and offline (on-site) inspection on basic information of sellers and daily purchase and sale information through the platform, and give out inspection results and related correction requirements on the platform, such as a booth name, a mobile phone number, an identity card number, a management certificate, a management field of the market, a booth number and the like, and provide daily purchase and sale information, such as the variety, total weight, unit price, production place, purchase certificate, quality inspection condition and the like of input agricultural products, the variety, unit price, weight, amount of money sold and the like, so as to be inspected by the market supervisor; the buyer refers to a buyer of the product, and the buyer can evaluate and grade the seller corresponding to the purchased product on the platform, and can also complain the seller on the platform. For example, as shown in fig. 1, the platform interface is used when a supervisor performs field inspection, an inspector can fill in according to actual conditions to finally draw a conclusion whether the platform is qualified, and if the platform is not qualified, the platform can automatically generate requirements for what aspects need to be modified according to the inspection conditions, or the inspector can manually input the requirements for modification.

For a market supervisor, the workload of checking all the booths on site every day is too large and the efficiency is low, which may cause a lot of repeated labor, and the traditional spot inspection method is that the supervisor draws part of the booths for inspection in sequence or randomly every day, which has the advantages of simplicity and easy implementation, but this way may cause the supervisor not to know the food safety problem occurring in the recent market in time. Big data is a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, highly-growing and diversified information asset which can have stronger decision-making power, insight discovery power and process optimization power only by a new processing mode. The real-time updated various data information on the market supervision platform is big data information, so that the data information of the platform is integrated for more efficiently and accurately performing quality spot check on the merchants on the market, and the list of the merchants needing spot check automatically by the supervision party becomes a direction needing further thinking according to the content of the merchants.

Disclosure of Invention

Aiming at the defects in the prior art, the method for determining the spot check merchant based on the big data solves the problems that the spot check process of the traditional market merchant is not accurate enough, the efficiency is low, and the real-time food quality safety cannot be known in time.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a method for determining a spot check merchant based on big data comprises the following steps,

s1, classifying all merchants in the market according to the inspection item data in the market supervision platform, and calculating the prediction reject ratio F (i) of each merchant;

wherein i ═ 1,2.. m, m denotes the total number of merchants in the market;

s2, calculating the final inspection priority F of each merchant according to the predicted failure rate of each merchant^*(i)；

S3, according to F, multiple merchants in the market^*(i) In descending order, arrange F^*(i) Automatically generating a merchant list by the K merchants with the maximum value, updating the checking condition of the merchants after the on-site checking is finished, and storing corresponding data in a market supervision platform;

and K is the number of merchants needing on-site inspection on the day and manually set by the supervisor.

Further, the step S1 is specifically:

s11, classifying all merchants in the market according to whether the inspection items of the sold products are the same or not;

s12, establishing a data set D ═ D { D } based on the historical check records of each type of merchants according to the data in the market platform₁,D₂,…D_L}；

Wherein the subscript L represents the number of sales merchants belonging to the same class of products;

D_i＝{(x₁,y₁),(x₂,y₂),…(x_n,y_n) Denotes the check record, x, for merchant i_nThe condition of each attribute of the merchant at the nth check is represented and recorded as

Value, y, representing attribute v at the time of the nth check_nIndicates the result of the nth inspection, if the inspection is qualified y_n0, otherwise y_n＝1；

The attributes are corresponding inspection items when a supervisor carries out field inspection;

s13, according to the data set D of each type of merchant as N: 1 is randomly divided into a training set S and a test set T;

wherein the number of samples in the training set S is N times of the number of samples in the T;

s14, learning for each type of merchants by using a logistic regression model based on the training set S of each type of merchants to obtain a group of disqualification rate prediction models { f }⁽¹⁾,f⁽²⁾,...,f^(M)}；

Wherein M is the number of unqualified prediction models;

s15, using the test set T to predict the model { f) of M failure rates⁽¹⁾,f⁽²⁾,...,f^(M)Performing performance test, taking a model with the highest prediction accuracy as a final failure rate prediction model of the merchant, and recording as F;

s16, calculating the predicted failure rate F (i) (i is 1,2, …, m), wherein m represents the total number of merchants in the market, of each merchant based on the final failure rate prediction model of each type of merchant;

the calculation formula of the predicted failure rate F (i) of each merchant is as follows:

wherein the content of the first and second substances,

weight of j in final failure prediction model F representing the category of merchant i, b^*The bias term in F is represented as,

representing the value of attribute j for the current merchant i.

Further, the final inspection priority F of each merchant in the step S2^*(i) The calculation formula of (2) is as follows:

wherein f (i) represents the predicted failure rate for each merchant;

ti represents the time of the last on-site inspection from the merchant i;

γ ∈ (0,1) is the failure rate threshold set manually by the supervisor.

Further, the method for updating the merchant check condition in step S3 specifically includes:

t of merchant who will have completed on-site inspection_iSet to 0 and will check F for a non-qualified merchant^*(i) Set directly to γ, i.e., the merchant is added directly to the spot check merchant list when the next spot check merchant list is generated.

Further, the method for training the kth unqualified prediction model in step S14 specifically includes:

wherein k is 1,2.. M, and M is the number of unqualified prediction models;

s141, randomly initializing a group of weights for the logistic regression model

And its offset b^(k)；

The logistic regression model is as follows:

wherein x is_iRepresenting the ith sample in the training set;

a weight representing a vth attribute in the kth prediction model;

s142, adjusting the weight

And its offset b^(k)And optimizing the logistic regression model by using the parameters to obtain a kth unqualified prediction model.

Further, in step S142, the method for optimizing the logistic regression model specifically includes:

measurement of prediction model result f by mean square error loss function^(k)(x_i) With true mark y_iAnd minimizing the error in the logistic regression model by a batch gradient descent method to obtain a kth unqualified prediction model.

The calculation formula of the mean square error loss function Z is as follows:

where | S | represents the number of samples in the training set S.

The invention has the beneficial effects that: the method for determining the spot-check merchants based on the big data automatically generates a merchant list which needs to be checked on site on the same day every day for a supervisor according to the information data of events which occur in the recent market in the supervision platform in real time, and effectively solves the problems that the conventional market merchants are inaccurate in spot-check process, low in efficiency and incapable of knowing the quality safety of real-time food products in time.

Drawings

Fig. 1 is a platform interface for a supervisor to perform field inspection in the background art of the present invention.

FIG. 2 is a flow chart of a method for identifying a spot check merchant based on big data according to an embodiment of the present invention.

FIG. 3 is a flow chart of a method for calculating the predicted failure rate for each merchant in an embodiment of the present invention.

FIG. 4 is a flowchart of a method for training a k-th merchant failure rate prediction model in an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

As shown in fig. 2, a method for determining a spot check merchant based on big data includes the following steps,

wherein i is 1,2, …, m, m represents the total number of merchants in the market;

as shown in fig. 3, the step S1 specifically includes:

D_i＝{(x₁,y₁),(x₂,y₂),…(x_n,y_n) Denotes the check record, x, for merchant i_nIndicates the nth examinationThe conditions of the attributes of the merchant are recorded as

Wherein M is the number of unqualified prediction models;

as shown in fig. 4, the method for training the kth unqualified prediction model in step S14 specifically includes:

wherein k is 1,2, … M;

And its offset b^(k)；

The logistic regression model is as follows:

wherein x is_iRepresenting the ith sample in the training set;

is shown inA weight of a vth attribute in a kth prediction model;

s142, adjusting the weight

The method for optimizing the logistic regression model specifically comprises the following steps:

The calculation formula of the mean square error loss function Z is as follows:

where | S | represents the number of samples in the training set S.

Repeating the above steps S141-S142, i.e. repeating the random setting, for each unqualified predictive model training

And b^(k)Obtaining M unqualified prediction models.

wherein the content of the first and second substances,

represents the weight of j in the final disqualification prediction model F of the category of the merchant i, b represents the bias term in F,

representing the value of attribute j for the current merchant i.

S2, calculating the final inspection priority F (i) of each merchant according to the predicted failure rate of each merchant;

final inspection priority F of each merchant in the above step S2^*(i) The calculation formula of (2) is as follows:

wherein f (i) represents the predicted failure rate for each merchant;

ti represents the time of the last on-site inspection from the merchant i;

γ ∈ (0,1) is the failure rate threshold set manually by the supervisor.

The method for updating the merchant check condition in step S3 specifically includes: t of merchant who will have completed on-site inspection_iSet to 0 and will check F for a non-qualified merchant^*(i) The direct setting is gamma, namely the merchant does not calculate the priority of the unqualified merchant at the next spot check, and the merchant directly enters the spot check merchant list.

In one embodiment of the invention, when the method is applied to the edible agricultural product market, a specific process of generating a spot check for a market merchant is provided:

when classifying merchants in the market according to inspection items of agricultural products sold, the following classifications are included, but not limited to: in a data set D established on the basis of historical inspection records of each type of merchants, the attributes and attribute values of each data sample are shown in a table 1:

TABLE 1 Attribute List Explanation for each data sample

When the merchants in the market are subjected to spot inspection generation, the merchants in the market are divided into meat, seafood, aquatic products, vegetables, eggs and fruits according to statistical data in the market supervision platform, wherein 8 merchants are related to meat products, data D of the meat merchants are established on the basis of historical inspection record data in the market supervision platform and are shown in a table 2, unqualified prediction models of the meat merchants are trained, inspection priorities of all the merchants are calculated, and inspection lists of the meat merchants are automatically generated. It should be noted that in practice, the inspection priority is calculated for each merchant, and the sampling list is automatically generated based on all categories of merchants.

TABLE 2 data set created with meat Merchant historical exam records

First, a data set D is created based on historical inspection records of 8 meat product merchants, as shown in Table 2, e.g., attribute vector x for sample 1₁＝(1 1 0 0 1 1 1 2 4.3)，y₁0; then, randomly dividing the data set D into a training set S and a testing set T according to a ratio of 3:1, wherein the number of samples contained in S is 24 samples {1,2,3,5,6,7,9,10,11,13,14,15,17,18,19,21,22,23,25,26,27,29,30,31} and the number of samples contained in S is 8 samples {4,8,12,16,20,24,28,32 };

secondly, learning on the training set S by using a logistic regression model to obtain 1 failure rate prediction model f⁽¹⁾Where f is⁽¹⁾Model parameter ω of⁽¹⁾＝(-0.11,-1.11,-0.47,0.21,-0.45,-1.01,0.001,0.80,0.51)，b⁽¹⁾＝0.01；

Resetting the initial values of the model parameters to obtain the 2 nd failure rate prediction model f⁽²⁾，ω⁽²⁾＝(-0.08,-0.9,0.2,0.1,-0.2,-1.1,0.2,0.82,-0.48)，b⁽²⁾＝0.2；

3 rd failure rate prediction model f⁽³⁾，ω⁽³⁾＝(-0.2,-0.8,0.24,0.05,-0.18,-0.9,-0.2,0.7,-0.6)，b⁽³⁾＝-0.1；

In the present embodiment, it is assumed that M is 3, that is, only 3 failure rate prediction models are trained for meat merchants; testing the accuracy of the 3 failure rate prediction models by using the test set T, wherein in the embodiment, the classification threshold of the accuracy of the failure rate prediction models of meat merchants is set to be 0.5, and the judgment result is unqualified when the predicted value is greater than or equal to 0.5; if the predicted value is less than 0.5, judging that the result is qualified; according to this setting, the prediction results shown in table 3 can be obtained;

TABLE 33 comparison of results for rejection prediction models

Business company

a

b

c

d

e

f

g

h

f⁽¹⁾

Fail to be qualified

f⁽²⁾

Qualified

Fail to be qualified

Qualified

Fail to be qualified

Qualified

Fail to be qualified

f⁽³⁾

Qualified

Fail to be qualified

Qualified

Fail to be qualified

Qualified

Can calculate to obtain f⁽¹⁾、f⁽²⁾、f⁽³⁾The accuracy of the model is 100%, 50% and 40%, respectively, so the f with the highest accuracy is selected⁽¹⁾The failure rate prediction model is marked as F as a meat merchant failure rate prediction model; substituting the current attribute value of each merchant into the failure rate prediction model F to obtain the predicted failure rate of each merchant, wherein the table 4 shows the current attribute condition of 8 merchants;

the predicted failure rates of 8 merchants obtained by the same method are respectively as follows:

0.034,0.0562,0.833,0.0787,0.0787,0.722,0.955,0.214；

table 48 current attribute profiles for merchants

Then, the inspection priority F of each merchant is calculated using formula (4)^*(i) Assume in this embodiment a merchant failure rate thresholdSince γ is 0.7 and the predicted failure rates of the merchants F and g are both greater than the threshold value, F^*(c)＝0.833，F^*(f)＝F(f)＝0.722，F^*(g) 0.955; the predicted failure rates of the merchants a, b, d, e, h are all less than the threshold, so the merchants need to make corrections according to the time interval from the last inspection, and the inspection priority is calculated as follows:

F^*(a)＝F(a)(1-2^-2)＝0.034×0.75＝0.026

the same can be obtained:

F^*(b)＝F(b)(1-2^-4)＝0.053，F^*(d)＝0.006，F^*(e)＝0.008，F^*(h)＝0.207

thirdly, 8 merchants are ranked from high to low according to inspection priority and are sequentially (g,0.955), (c,0.833), (f,0.722), (h,0.207), (b,0.053), (a,0.026), (e,0.008), (d,0.006), and in the embodiment, assuming that only 3 merchants are inspected on the day, the automatically generated list of spot inspection merchants is { (g,0.955), (c,0.833), (f,0.722) };

finally, the data record information in the market supervision platform is updated according to the inspection result of the current day, in this embodiment, assuming that the inspection result is that the merchant g is unqualified and the merchants c and d are qualified, the time interval between the last inspection and the 3 merchants is changed to 0, and the F of the merchant g is changed to 0^*(g) And setting the reject rate threshold to be 0.7, and directly adding the spot check merchant list when generating a next spot check list.

The method for determining the spot-check merchants based on the big data automatically generates a merchant list which needs to be checked on site on the same day every day for a supervisor according to the information data of events which occur in the recent market in the supervision platform in real time, and effectively solves the problems that the conventional market merchants are inaccurate in spot-check process, low in efficiency and incapable of knowing the quality safety of real-time food products in time.

Claims

1. A method for determining a spot check merchant based on big data is characterized by comprising the following steps,

wherein i ═ 1,2.. m, m denotes the total number of merchants in the market;

S3, according to F, a plurality of merchants in the market^*(i) In descending order, arrange F^*(i) Automatically generating a merchant list by the K merchants with the maximum value, updating the checking condition of the merchants after the on-site checking is finished, and storing corresponding data in a market supervision platform;

k is the number of merchants needing on-site inspection on the day and manually set by a supervisor;

the step S1 specifically includes:

Wherein M is the number of unqualified prediction models;

wherein the content of the first and second substances,

weight of j in final fail prediction model F representing category to which merchant i belongs, b^*The bias term in F is represented as,

a value representing attribute j of the current merchant i;

final inspection priority F of each merchant in said step S2^*(i) The calculation formula of (2) is as follows:

wherein f (i) represents the predicted failure rate for each merchant;

ti represents the time of the last on-site inspection from the merchant i;

gamma belongs to (0,1) as a failure rate threshold value manually set by a supervisor;

the method for updating the merchant check condition in step S3 specifically includes:

2. The big-data-based method for determining spot-checked merchants according to claim 1, wherein the method for training the kth unqualified prediction model in step S14 specifically comprises:

wherein k is 1,2.. M, and M is the number of unqualified prediction models;

And its offset b^(k)；

The logistic regression model is as follows:

wherein x is_iRepresenting the ith sample in the training set;

a weight representing a vth attribute in the kth prediction model;

s142, adjusting the weight

3. The big-data-based method for determining the spot-checked merchants according to claim 2, wherein in the step S142, the method for optimizing the logistic regression model specifically comprises:

measurement of prediction model result f by mean square error loss function^(k)(x_i) With true mark y_iMinimizing the error in the logistic regression model by a batch gradient descent method to obtain a kth unqualified prediction model;

the calculation formula of the mean square error loss function Z is as follows:

where | S | represents the number of samples in the training set S.