CN115239106A - Analysis method based on all-purpose card big data - Google Patents

Analysis method based on all-purpose card big data Download PDF

Info

Publication number
CN115239106A
CN115239106A CN202210820713.7A CN202210820713A CN115239106A CN 115239106 A CN115239106 A CN 115239106A CN 202210820713 A CN202210820713 A CN 202210820713A CN 115239106 A CN115239106 A CN 115239106A
Authority
CN
China
Prior art keywords
consumption
students
card
student
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210820713.7A
Other languages
Chinese (zh)
Inventor
张滢雪
司占军
卢勇拾
邢斌
李龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Science and Technology
Original Assignee
Tianjin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Science and Technology filed Critical Tianjin University of Science and Technology
Priority to CN202210820713.7A priority Critical patent/CN115239106A/en
Publication of CN115239106A publication Critical patent/CN115239106A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/12Hotels or restaurants
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education

Abstract

The invention relates to an analysis method based on all-purpose card big data, which comprises the following steps: acquiring an original data set in a preset period; cleaning and counting the original data set to obtain a campus life behavior data set for analysis; performing clustering analysis on two-dimensional data consisting of consumption frequency and consumption total; acquiring consumption behavior characteristics of students according to the clustering analysis result; performing joint analysis on the campus life behavior data set and the effective data subset thereof by using an Apriori association rule analysis method; and according to the analysis result of the association rule, acquiring the life track behavior characteristics of the students, and providing meal preparation planning suggestions and dining recommendations for the canteen manager and the students respectively. On one hand, the invention provides a side-writing image reflecting the living behavior characteristics of the students, helps the students to carry out self management and promotion, on the other hand, provides overall management suggestions of the living and learning environment of the students, potential problem early warning and the like for school managers, and has more multidimensional and wide application scenes.

Description

Analysis method based on one-card big data
Technical Field
The invention relates to the technical field of data analysis, in particular to an analysis method based on all-purpose card big data.
Background
With the rapid development of information technology and related software and hardware, schools can provide integrated and efficient information services for students, such as consumption, access control, book borrowing, identity authentication, service reservation and the like, by means of a card and a corresponding data management system without leaving campus card in all aspects of campus life. The common use of the campus card is convenient for students to live and enables campus management to be more standard and intelligent.
In the construction of digital campuses and smart campuses, campus smart card data is undoubtedly an important foundation. In order to obtain valuable information from massive original data generated by card swiping of students and support student behavior analysis and school safety management, a proper data analysis or mining method is of great importance. Based on a data analysis method, high-level semantic information is obtained from original data, so that on one hand, living habit analysis and healthy living advice can be provided for students, and on the other hand, effective reference can be provided for daily management and safe work of schools.
At present, research on campus card data analysis technology is still relatively limited. On one hand, the analysis subject mainly focuses on consumption data, the data analysis dimension is small, and the obtained conclusion is mainly oriented to a school manager. With the popularization and development of mobile devices, mobile applications and big data concepts, students gradually have greater interest in statistical rules of their own data, and have stronger willingness to evaluate their own states and adopt corresponding suggestions according to big data analysis results, for example, the current annual plans introduced by various mobile applications are popular with teenager groups. On the other hand, the method used for analysis is single, one of the most common methods is a K-means clustering method, the method needs to preset the number of clusters, and for the data of the all-purpose cards with increasingly large data volumes, the distribution condition and possible category number of the data are often difficult to determine, so that efficient clustering cannot be realized, and the optimal analysis result is obtained. Therefore, massive and multidimensional campus card data provide more diverse requirements for analysis technology research, and people hope that the campus card data can reduce manual intervention as much as possible, provide efficient and reliable analysis service for both schools and students, and promote development of related applications.
In view of the above background and problems, there is a need for an analysis method based on campus card data, which can perform efficient and automatic analysis on massive multidimensional data generated by swiping a card through a campus card, and obtain campus life behavior characteristics of students and association relations therebetween, on one hand, provide a side-writing image reflecting the life behavior characteristics of the students, help the students to perform self-management and promotion, and on the other hand, provide overall management suggestions of student life and learning environments such as canteens, dormitories, teaching buildings and the like, and student economic and safety problem early warning and the like for school managers. By fully mining the rich information contained in the data of the one-card, valuable data analysis results are provided for students and schools at the same time.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide an analysis method based on one-card big data, and the analysis method can provide valuable data analysis results for students and school parties by fully mining the multi-dimensional information of the one-card data, on one hand, provides a side-writing image reflecting the living behavior characteristics of the students and helps the students to carry out self-management and promotion, on the other hand, provides overall planning management suggestions, potential problem early warning and the like of the living and learning environments of the students such as canteens, dormitories, teaching buildings and the like for school managers, and has more multidimensional and wide application scenes.
In order to achieve the purpose, the invention provides the following scheme:
an analysis method based on big data of a one-card comprises the following steps:
s1: reading personal information of a campus card-through student group in a preset period, campus card consumption card swiping records and access control card swiping records in the same period, and acquiring an original data set reflecting campus life, wherein the original data set takes a card number as a unique identifier and corresponds to students one by one.
S2: cleaning and counting the original data set, and processing the original data set according to different living behavior characteristics to obtain a campus living behavior data set for analysis and an effective data subset thereof;
s3: acquiring two-dimensional data consisting of consumption frequency and consumption total amount through the effective data subset which can be used for analysis, and performing cluster analysis on the two-dimensional data consisting of consumption frequency and consumption total amount by using a mean shift method to acquire data after cluster analysis;
s4: and acquiring consumption behavior characteristics of the students according to the data after the cluster analysis, and providing management and self-management references for schools and students respectively.
S5: performing joint analysis on the campus life behavior data set for analysis and the effective data subset thereof by using an Apriori association rule analysis method through the campus life behavior data set for analysis and the effective data subset thereof to obtain a joint analysis result and analyze life behavior trajectory preferences of different student groups;
s6: and acquiring the behavior characteristics of the life tracks of the students according to the joint analysis result, and providing meal preparation planning suggestions and dining recommendations for the canteen manager and the students respectively.
Preferably, step S1 comprises: reading N consumption card swiping records of the student campus card in a preset period Dur, wherein each consumption card swiping record is p n Can be expressed as a number of charactersThe set of signatures is as follows:
p n ={(CardNo n ,Time n ,Loction n ,Money n )|n=1,2,3,…,N} (1)
wherein, cardNO n ,Time n ,Loction n ,Money n Record p for consumption card swiping respectively n Card number, consumption time, consumption place and consumption amount;
the personal information of the students refers to the utilization of the card number CardNO in each record n Inquiring and reading access control identification code Acc of student corresponding to the card number n And Sex information Sex n
The student personal information and the consumption card swiping record p n The consumption place data sets which contain N card swiping records and student personal information in the preset period Dur are formed together:
l n ={(CardNo n ,Time n ,Loction n ,Money n ,Acc n ,Sex n )|n=1,2,3,…,N} (2)
preferably, step S2 comprises:
counting the consumption frequency and the consumption sum of each student in a preset period, and expressing the consumption frequency and the consumption sum as a two-dimensional characteristic vector set of the consumption frequency-the consumption sum:
{v i =(t i ,f i )|i=1,2,…,I} (3)
wherein, I is the total number of students, and the card number of the campus card corresponding to each student I is C i Then all the card numbers CardNO in the data set are consumed n =C i Consumption record p of n (n.ltoreq.N) the subset of consumption records P constituting the student i . Its total amount of consumption t i Has a value of P i Sum of the intermediate consumption amount and consumption frequency f i Then is P i Number of pieces recorded in (1):
Figure BDA0003742350180000031
f i =|P i | (5)
obtaining a subset L of consumption location records valid for student i i : for each student i, the card number of the campus card corresponding to the student i is C i All card numbers CardNo in the consumption site set n =C i Consumption record l n (N ≦ N) the subset of consumption site records L that make up the student i Let the time interval threshold be T interv If:
Figure BDA0003742350180000032
then look at l n+m And l n Only one record of card swiping is reserved for the same consumption process n From L to i Middle removing n+m To l, to n Traverse all l n+m If the condition of formula (6) is not satisfied, let l n =l n+m+1 Repeating the above condition until L i All the data items in the database are checked completely, and a valid consumption place record subset L of the student i is obtained i
Repeating the steps until the consumption place record subsets corresponding to each student are obtained;
obtaining the effective entrance guard card swiping record of the student i: for a predetermined period Dur, read in and n the entrance guard card-swiping record of the same batch of students in the same period of the middle consumption data is recorded by utilizing the I n Access control identification code Acc in n And Time of card swiping Time n Screening records of which the card swiping time is adjacent to the consumption time from the entrance guard card swiping records, and checking the entrance guard card swiping place Tbuilding in the records n Adding the data into consumption place data records to obtain a complete school life behavior data set a of the students n
a n ={(CardNo n ,Time n ,Loction n ,Tbuilding n ,Money n ,Acc n ,Sex n )|n=1,2,3,...,N} (7)
The complete school life behavior data set a of students n I.e. the valid data set available for analysis.
Preferably, step S3 comprises:
s3.1: for two-dimensional feature vector v i It can be considered as a set in two-dimensional space with (t) i ,f i ) The method is characterized in that the method is a point set of horizontal and vertical coordinates, wherein I =1,2, \8230, and I, each point corresponds to the consumption behavior of a student, and each point is taken as an independent initial class to realize the initialization of a clustering process;
s3.2: randomly selecting a point v x As an initial centroid cen x
S3.3: by the center of mass cen x Selecting a sliding window with the bandwidth of r for the center, marking a set consisting of all points in the window range as W, and temporarily marking the set as belonging to the class clu x And increasing the access frequency of the points within the class by 1;
s3.4: calculating all points in the sliding window to the initial centroid cen x Radial basis kernel weighted average distance M r As mean shift vector:
Figure RE-GDA0003827186860000042
s3.5: by mean shift vector M r Updating the centroid coordinates as:
cen x+1 =M r +cen x (9)
s3.6: repeating the steps 3.3 to 3.5 until the offset M r Less than a threshold value T conv Then the centroid cen at this time is determined X As a cluster center, all points accessed in the repeated iteration process of steps 3.3 to 3.5 belong to the class clu corresponding to the center X The current drift is converged;
s3.7: if current class clu X The distance between the cluster center and the center of a certain existing class is less than a threshold value T dis If not, the current class is kept as a new class;
s3.8: repeating the steps 3.1 to 3.7 until all the points are accessed, and ending the mean shift clustering process;
s3.9: and attributing all the points to corresponding clustering centers according to the marks, and clustering the points to a class with higher access frequency if one point is marked by a plurality of class accesses.
Preferably, step S4 comprises:
and feeding back the positions of the students in the frequency-total two-dimensional space according to the frequency-total clustering result, providing individual consumption behavior reports for the students, analyzing the reports to obtain abnormal classes and abnormal items, feeding back the abnormal classes and abnormal items to a student management department as early warning information, reminding special attention to corresponding student groups, and simultaneously providing necessary consumption suggestions for the students.
Preferably, the anomaly classes and anomaly terms include:
potential economically difficult students; a potential student away from school; a potentially high consumer group;
potential economic difficulties students: for CEN, if CEN, in the cluster center point set x (t x ,f x ) And its corresponding class clus x The following conditions are met, which indicates that the total consumption amount of the students in the class in the preset period is very low and the consumption frequency is higher,
Figure BDA0003742350180000051
wherein, t CEN Set of abscissas of the CEN set of central points, f I The ordinate set, mean () and std () of all the points and num () are respectively a mean value, a standard deviation and a counting function;
potential out-of-school students: for the cluster center point set CEN, if CEN x (t x ,f x ) And its corresponding class clus x If the following conditions are met, the total consumption amount and the consumption frequency of the students in the class in the preset period are both extremely low,
Figure BDA0003742350180000052
potential high consumer groups: for CEN, if CEN, in the cluster center point set x (t x ,f x ) And its corresponding class clus x The following conditions are met, which indicates that the total consumption amount and the consumption frequency of the students in the class in the preset period are extremely high,
Figure BDA0003742350180000053
preferably, step S5 includes:
digging association rules for the gender item set and the consumption place item set in the consumption place record subset, judging whether strong association rules exist, and reflecting that the gender of the student is clearly associated with the selection of the dining room if the strong association rules exist; if the gender of the student does not exist, the fact that obvious interaction influence does not exist between the gender of the student and the selection of the dining hall is shown, and the method specifically comprises the following steps:
step 5.1.1, recording subset l from consumption site n Taking two elements of consumption place and gender in each item to form tau n ={Loction n ,Sex n | N =1,2, \8230;, N } as a transaction for association rule mining, where M different locations and 2 different genders of Loction are contained m ,Sex q (M =1,2, \8230;, M, q =1,2) is an item in a transaction, transaction database D l ={τ 12 ,…,τ N };
Step 5.1.2, let it = { Loction m ,Sex q I M =1,2, \ 8230, M, q =1,2} is D l If any of the items in (1) is D, then any non-empty subset X of Ite is D l In order to determine the association rule between gender and consumption location, the item set in (1) is a 2-item set X comprising 2 items and composed of location and gender items in Ite k (k =1,2, \8230;, M × 2) to obtain a degree of support:
Figure BDA0003742350180000061
wherein the content of the first and second substances,
Figure BDA0003742350180000062
is D l Middle inclusion item set X k N is D l A total number of transactions;
step 5.1.3, setting the minimum support threshold value as
Figure BDA0003742350180000063
For the 2 item set X in step 5.1.2 k If, if
Figure BDA0003742350180000064
Then X k For frequent item set, the set of all frequent item sets is marked as X F
Step 5.1.4, in order to know whether the student gender and the dining room selection have strong association relation, a frequent item set X is collected F In the method, all association rules between the gender and the consumption place are generated, taking the gender as a condition and the place as a result as an example:
Sex q →Location m ,m=1,2,…,M,q=1,2 (14)
step 5.1.5, solving the confidence of each association rule:
Figure BDA0003742350180000065
step 5.1.6, set the minimum confidence threshold to
Figure BDA0003742350180000066
Known frequent itemset set X F In
Figure BDA0003742350180000067
If it is
Figure BDA0003742350180000068
Figure BDA0003742350180000069
Then Sex q →Location m Is a strong association rule;
mining association rules of the entrance and exit place item sets and the consumption place item sets in the campus life behavior data sets, judging whether strong association rules exist, and reflecting that the selection of the teaching building and the selection of the dining room have clear association if the strong association rules exist; if the result does not exist, the method indicates that no obvious interaction exists between the teaching building selection and the dining hall selection, and specifically comprises the following substeps:
step 5.2.1, from campus life behavior data set a n Taking two elements of consumption place and gender in each item to form alpha n ={Loction n ,Tbuilding n L N =1,2, \8230;, N } as a matter of association rule mining, wherein M different canteens and S different teaching buildings Loction are included m ,Tbuilding s (M =1,2, \8230;, M, S =1,2, \8230;, S) are terms in transactions, a transaction database D a ={τ 12 ,…,τ N };
Step 5.2.2, let it = { Location = m ,Tbuilding s I M =1,2, \ 8230 |, M, S =1,2, \8230 |, S } is D a If any of the items in (1) is D, then any non-empty subset X of Ite is D a The item set in (1) is a relation rule between a teaching building and a canteen, and firstly, a 2-item set X which comprises 2 items and is formed by the items of the teaching building and the canteen in item k (k =1,2, \8230;, S × M) is supported:
Figure BDA0003742350180000071
wherein the content of the first and second substances,
Figure BDA0003742350180000072
is D a Middle containing item set X k N is D a A total number of transactions;
step 5.2.3, setting the minimum support threshold value as
Figure BDA0003742350180000073
For the 2-item set X in step 5.2.2 k If at all
Figure BDA0003742350180000074
Then X k For frequent item sets, the set of all frequent item sets is marked as X F
Step 5.2.4, in order to know whether the teaching building and the dining room have strong association relation, the frequent item set X is collected F Generating all association rules between the teaching building and the canteen, taking the association rules with the teaching building as a condition and the canteen as a result as an example:
Tbuilding s →Location m ,m=1,2,…,M,s=1,2,…,S (17)
step 5.2.5, solving the confidence of each association rule:
Figure BDA0003742350180000075
step 5.2.6, set the minimum confidence threshold to
Figure BDA0003742350180000076
Known frequent itemset set X F In
Figure BDA0003742350180000077
If it is
Figure BDA0003742350180000078
Figure BDA0003742350180000079
Tbuilding s →Location m Is a strongly associated rule.
Preferably, step S6 includes:
acquiring the potential influence of the gender on the selection of the dining room by utilizing the strong association rule, providing suggestions of meal preparation amount and meal preparation types for the dining room according to the suggestions, respectively forming the strongly associated dining rooms with females and males, respectively increasing the meal types which are more in line with the preference of the females or males, and adjusting the meal preparation amount according to the number of people with different genders and the meal consumption;
by utilizing the strong association rule, potential influences of coming in and going out of different teaching buildings on canteen selection are known to the students to form a strongly associated canteen with the teaching buildings, the class-going and class-leaving time of the class arranged on the day of the corresponding teaching building and the capacity of the class students are combined to provide suggestions of meal supply time and meal preparation amount for a canteen management party, and suggestions of meal place selection and peak-load meal place selection are provided for the students.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the campus card-swiping system can utilize data generated by swiping the campus card to automatically analyze multiple dimensions such as consumption behaviors, living behavior tracks, consumption place preference and the like, and can provide valuable data analysis results for students and schools; the invention adopts a two-dimensional clustering mode to the consumption data, can carry out joint analysis on the consumption sum and frequency, obtains more comprehensive consumption behavior analysis results, eliminates artificial limitation on the clustering number in the clustering process, ensures that the results are more in line with objective conditions, and has stronger inclusiveness.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method in an embodiment provided by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide an analysis method based on one-card big data, which can provide valuable data analysis results for students and schools by fully mining multi-dimensional information of one-card data, on one hand, provides a side-writing image reflecting life behavior characteristics of the students, helps the students to perform self management and promotion, on the other hand, provides overall management suggestions and potential problem early warning of student life and learning environments such as canteens, dormitories, teaching buildings and the like for school managers, and has more multi-dimensional and wide application scenes.
In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof.
As shown in FIG. 1, the invention provides an analysis method based on big data of a one-card, which comprises the following steps:
s1: reading personal information of a campus card-through student group in a preset period, campus card consumption card-swiping records and access control card-swiping records in the same period, and acquiring an original data set reflecting campus life, wherein the original data set takes card numbers as unique identifiers and corresponds to students one by one, and the consumption and access control card-swiping records all contain card number information, so that the three groups of data records can be associated with each other through the card number information.
S2: cleaning and counting the original data set, and processing the original data set according to different living behavior characteristics to obtain a campus living behavior data set for analysis and an effective data subset thereof;
s3: acquiring two-dimensional data formed by consumption frequency and consumption sum through the effective data subset which can be used for analysis, and performing cluster analysis on the two-dimensional data formed by the consumption frequency and the consumption sum by using a mean shift method to acquire data after the cluster analysis;
s4: and acquiring consumption behavior characteristics of the students through the data after the cluster analysis, and providing management and self-management references for schools and students respectively.
S5: performing joint analysis on the campus life behavior data set for analysis and the effective data subset thereof by using an Apriori association rule analysis method through the campus life behavior data set for analysis and the effective data subset thereof to obtain a joint analysis result and analyze life behavior trajectory preferences of different student groups;
s6: and acquiring the living track behavior characteristics of the students according to the joint analysis result, and providing a meal preparation planning suggestion and a dining recommendation for the dining room manager and the students respectively.
Further, step S1 includes: reading N consumption card swiping records of one card in the school of students in a preset period Dur, wherein each consumption card swiping record p n The set of several features that can be expressed is as follows:
p n ={(CardNo n ,Time n ,Loction n ,Money n )|n=1,2,3,…,N} (1)
wherein, cardNO n ,Time n ,Loction n ,Money n Record p for consumption card swiping separately n Card number, consumption time, consumption place and consumption amount; it should be noted that the actual card-swiping record generally contains more information, and only the features relevant to the present invention are listed here;
the personal information of the students refers to the utilization of the card number CardNO in each record n Inquiring and reading access control identification code Acc of student corresponding to the card number n And Sex information Sex n
The student personal information and the consumption card swiping record p n The consumption place data sets which contain N card swiping records and student personal information in the preset period Dur are formed together:
l n ={(CardNo n ,Time n ,Loction n ,Money n ,Acc n ,Sex n )|n=1,2,3,…,N} (2)
further, step S2 includes:
counting the consumption frequency and the consumption sum of each student in a preset period, and expressing the consumption frequency and the consumption sum as a two-dimensional characteristic vector set of the consumption frequency-the consumption sum:
{v i =(t i ,f i )|i=1,2,…,I} (3)
wherein, I is the total number of students, and the card number of the campus card corresponding to each student I is C i Then all the card numbers CardNO in the data set are consumed n =C i Consumption record p of n (n.ltoreq.N) the subset of consumption records P constituting the student i . Its total amount of consumption t i Has a value of P i Sum of medium consumption, frequency of consumption f i Then is P i Number of pieces recorded in (1):
Figure BDA0003742350180000101
f i =|P i | (5)
obtaining a subset L of consumption location records valid for student i i : for each student i, the card number of the campus card corresponding to the student i is C i Regarding consumption card swiping data, regarding consumption records with time intervals smaller than a preset threshold value in card swiping records of the same card number and the same consumption place as the same consumption process, only reserving one record to represent the current consumption place and eliminating other records, wherein all card numbers CardNO in a consumption place set n =C i Consumption record l of n (N ≦ N) the subset of consumption site records L that make up the student i Let the time interval threshold be T interv If:
Figure BDA0003742350180000102
then see l n+m And l n Only one record of card swiping is reserved for the same consumption process n From L to i Middle removing n+m To l, to n Traverse all l n+m If the condition of formula (6) is not satisfied, let l n =l n+m+1 Repeating the above condition judgment until L i All the data items in the system are checked, and a valid consumption place record subset L of the student i is obtained i
Repeating the steps until the consumption place record subset corresponding to each student is obtained;
acquiring the effective entrance guard card swiping record of the student i: to pairPreset period Dur, read-in and l n The entrance guard card-swiping records of the same batch of students in the same period of the middle consumption data are utilized n Access control identification code Acc in n And Time of card swiping Time n Screening records of the card swiping time and the consumption time which are adjacent from the entrance guard card swiping records, and checking the entrance guard card swiping place Tbuilding in the records n Adding the data into consumption place data records to obtain a complete school life behavior data set a of the student school n
a n ={(CardNo n ,Time n ,Loction n ,Tbuilding n ,Money n ,Accn,Sex n )|n=1,2,3,...,N} (7)
The complete school life behavior data set a of students n I.e. the valid data set available for analysis.
Further, step S3 includes:
s3.1: for two-dimensional feature vector v i It can be considered as a set in two-dimensional space with (t) i ,f i ) The method is characterized by comprising the following steps of (1) setting a horizontal coordinate and a vertical coordinate, wherein I =1,2, \8230, and I, each point corresponds to the consumption behavior of a student, and each point is taken as an independent initial class to realize the initialization of a clustering process;
s3.2: randomly selecting a point v x As an initial centroid cen x
S3.3: by the center of mass cen x Selecting a sliding window with the bandwidth of r for the center, recording a set formed by all points in the window range as W, and temporarily marking the sliding window as belonging to class clu x And increasing the access frequency of the points within the class by 1;
s3.4: calculating all points in the sliding window to the initial centroid cen x Radial basis kernel weighted average distance M r As the mean shift vector:
Figure RE-GDA0003827186860000121
s3.5: by mean shift vector M r Update the coordinates of the centroid to:
cen x+1 =M r +cen x (9)
S3.6: repeating the steps 3.3 to 3.5 until the offset M r Less than threshold T conv Then the centroid cen at this time is determined X As a cluster center, all points accessed in the repeated iteration process of steps 3.3 to 3.5 belong to the class clu corresponding to the center X Converging the drift;
s3.7: if current class clu X The distance between the cluster center and the center of a certain existing class is less than a threshold value T dis If the current class is not the existing class, the current class is not reserved as the new class;
s3.8: repeating the steps 3.1 to 3.7 until all the points are accessed, and ending the mean shift clustering process;
s3.9: and attributing all the points to corresponding clustering centers according to the marks, and clustering the points to a type with higher access frequency if one point is marked by a plurality of types of access marks.
Preferably, step S4 comprises:
and feeding back the positions of the students in the frequency-total two-dimensional space according to the frequency-total clustering result, providing individual consumption behavior reports for the students, analyzing the reports to obtain abnormal classes and abnormal items, feeding back the abnormal classes and abnormal items to a student management department as early warning information, reminding special attention to corresponding student groups, and simultaneously providing necessary consumption suggestions for the students.
Further, the exception category and the exception item include:
potential economically difficult students; a potential student not at school; a potentially high consumer group;
potential economic difficulties students: for the cluster center point set CEN, if CEN x (t x ,f x ) And its corresponding class clus x The following conditions are met, which indicates that the total consumption amount of the students in the class in the preset period is very low and the consumption frequency is higher,
Figure BDA0003742350180000121
wherein, t CEN Set of abscissas of the CEN set of central points, f I The ordinate set, mean () and std () of all the points and num () are respectively a mean value, a standard deviation and a counting function;
potential student absence: for the cluster center point set CEN, if CEN x (t x ,f x ) And its corresponding class clus x If the following conditions are met, the total consumption amount and the consumption frequency of the students in the class in the preset period are both extremely low,
Figure BDA0003742350180000122
potentially high consumer groups: for the cluster center point set CEN, if CEN x (t x ,f x ) And its corresponding class clus x The following conditions are met, which indicates that the total consumption amount and the consumption frequency of the students in the class in the preset period are extremely high,
Figure BDA0003742350180000123
further, step S5 includes:
digging association rules for the gender item set and the consumption place item set in the consumption place record subset, judging whether strong association rules exist, and reflecting that the gender of the student is associated with the dining room selection more definitely if the strong association rules exist; if the gender of the student does not exist, the fact that obvious interaction influence does not exist between the gender of the student and the selection of the dining hall is shown, and the method specifically comprises the following steps:
step 5.1.1, recording the subset l from the consumption site n Taking two elements of consumption place and sex in each item to form tau n ={Loction n ,Sex n L N =1,2, \8230;, N } as a matter of association rule mining, in which M different places and 2 different sexes Loction are contained m ,Sex q (M =1,2, \8230;, M, q =1,2) is an item in a transaction, transaction database D l ={τ 12 ,…,τ N };
Step 5.1.2, set Ite = { Loction = m ,Sex q I M =1,2, \ 8230, M, q =1,2} is D l Then any non-empty subset X of Ite is D l In order to determine the association rule between gender and consumption location, the item set in (1) is a 2-item set X comprising 2 items and composed of location and gender items in Ite k (k =1,2, \8230;, M × 2) to obtain a degree of support:
Figure BDA0003742350180000131
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003742350180000132
is D l Middle inclusion item set X k N is D l The total number of medium transactions;
step 5.1.3, setting the minimum support threshold value as
Figure BDA0003742350180000133
For the 2 item set X in step 5.1.2 k If at all
Figure BDA0003742350180000134
Then X k For frequent item sets, the set of all frequent item sets is marked as X F
Step 5.1.4, in order to know whether the student gender and the dining room selection have strong association relation, collecting X from the frequent item set F In the method, all association rules between the gender and the consumption place are generated, taking the gender as a condition and the place as a result as an example:
Sex q →Location m ,m=1,2,…,M,q=1,2 (14)
step 5.1.5, solving the confidence of each association rule:
Figure BDA0003742350180000135
step (ii) of5.1.6, set minimum confidence threshold to
Figure BDA0003742350180000136
Known frequent item set X F In
Figure BDA0003742350180000137
If it is
Figure BDA0003742350180000138
Figure BDA0003742350180000139
Then Sex q →Location m Is a strong association rule;
performing association rule mining on the entrance and exit place item set and the consumption place item set in the campus life behavior data set, judging whether strong association rules exist, and reflecting that relatively clear association exists between the teaching building selection and the dining room selection if strong association rules exist; if the result does not exist, the method indicates that no obvious interaction exists between the teaching building selection and the dining hall selection, and specifically comprises the following substeps:
step 5.2.1, from campus life behavior data set a n Taking two elements of consumption place and gender in each item to form alpha n ={Loction n ,Tbuilding n L N =1,2, \8230;, N } as a matter of association rule mining, wherein M different canteens and S different teaching buildings are included m ,Tbuilding s (M =1,2, \8230;, M, S =1,2, \8230;, S) is an item in a transaction, transaction database D a ={τ 12 ,…,τ N };
Step 5.2.2, let it = { Location = m ,Tbuilding s L M =1,2, \8230, M, S =1,2, \8230, S is D a Then any non-empty subset X of Ite is D a The item set in (1) is a relation rule between a teaching building and a canteen, and firstly, a 2-item set X which comprises 2 items and is formed by the items of the teaching building and the canteen in item k (k =1,2, \8230;, S × M) is supported:
Figure BDA0003742350180000141
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003742350180000142
is D a Middle containing item set X k N is D a A total number of transactions;
step 5.2.3, setting the minimum support threshold value as
Figure BDA0003742350180000143
For the 2-item set X in step 5.2.2 k If at all
Figure BDA0003742350180000144
Then X k For frequent item set, the set of all frequent item sets is marked as X F
Step 5.2.4, in order to know whether the strong association relationship exists between the teaching building and the dining room selection, the frequent item set X is collected F Generating all association rules between the teaching building and the canteen, taking the association rules with the teaching building as a condition and the canteen as a result as an example:
Tbuilding s →Location m ,m=1,2,…,M,s=1,2,…,S (17)
step 5.2.5, solving the confidence of each association rule:
Figure BDA0003742350180000145
step 5.2.6, set the minimum confidence threshold to
Figure BDA0003742350180000146
Known frequent item set X F In (1)
Figure BDA0003742350180000147
If it is
Figure BDA0003742350180000148
Figure BDA0003742350180000149
Tbuilding s →Location m Is a strongly associated rule.
Further, step S6 includes:
acquiring the potential influence of the gender on the selection of the dining room by utilizing the strong association rule, providing suggestions of meal preparation amount and meal preparation types for the dining room according to the suggestions, respectively forming the strongly associated dining rooms with females and males, respectively increasing the meal types which are more in line with the preference of the females or males, and adjusting the meal preparation amount according to the number of people with different genders and the meal consumption;
by utilizing the strong association rule, potential influences of coming in and going out of different teaching buildings on canteen selection are known to the students to form a strongly associated canteen with the teaching buildings, the class-going and class-leaving time of the class arranged on the day of the corresponding teaching building and the capacity of the class students are combined to provide suggestions of meal supply time and meal preparation amount for a canteen management party, and suggestions of meal place selection and peak-load meal place selection are provided for the students.
The invention also provides a specific embodiment:
the embodiment takes campus card consumption swiping data as a center, and finds out consumption behavior characteristics of students, card swiping place association and preference, abnormal consumption behaviors and the like:
step 1, reading personal information of a certain student group in a preset period, consumption card swiping records of campus one-card, and access control card swiping records in the same period, and forming an original data set reflecting life behaviors of the campus together (note that campus one-card data takes card numbers as unique identifiers, and corresponds to students one by one, and consumption and access control card swiping records both contain card number information, so that the three groups of data records can be associated with each other through the card number information, and the method comprises the following substeps:
step 1.1, reading consumption card swiping records N =241014 of 3267 student campus card in a preset period Dur of 3 months 1 to 30 days in the embodiment, wherein each consumption card swiping record p n The set of several features that can be expressed is as follows:
p n ={(CardNo n ,Time n ,Loction n ,Money n )|n=1,2,3,…,N} (1)
wherein, cardNO n ,Time n ,Loction n ,Money n Record p for consumption card swiping separately n Card number, time of consumption, place of consumption, amount of consumption. It should be noted that more information is typically contained in the actual card swipe record, and only the features relevant to the present invention are listed here.
Step 1.2, the campus card takes the card number as the unique identifier, and corresponds to the students one by one, and here, the acquisition of the personal information of the students refers to the utilization of the card number of the card CardNo in each record n Inquiring and reading access control identification code Acc of student corresponding to the card number n And Sex information Sex n . Student personal information data and consumption card swiping record p in step 1.1 n The consumption place data sets which comprise N =241014 card swiping records and student personal information in the preset period Dur are formed together:
l n ={(CardNo n ,Time n ,Loction n ,Money n ,Acc n ,Sex n )|n=1,2,3,…,N} (2)
and 2, cleaning and counting the data set. Processing the original data set obtained in the step 1 according to different living behavior characteristics to obtain a campus living behavior data set for analysis, and the method comprises the following sub-steps:
step 2.1, counting the consumption frequency and the consumption total amount in a preset period aiming at each card number (namely each student), and expressing the consumption frequency and the consumption total amount as a two-dimensional characteristic vector set of the consumption frequency and the consumption total amount:
{v i =(t i ,f i )|i=1,2,…,I} (3)
wherein I =3267 is the total number of students. It should be noted that the campus card takes the card number as the unique identifier, and corresponds to students one by one, and for each student i, the card number of the campus card corresponding to the student i is C i Then all the card numbers CardNO in the data set are consumed n =C i Consumption record p of n (n.ltoreq.N) the subset of consumption records P constituting the student i . Its total amount of consumption t i Has a value of P i Sum of medium consumption, frequency of consumption f i Then is P i Number of pieces recorded in (1):
Figure BDA0003742350180000161
f i =|P i | (5)
and 2.2, regarding the consumption card swiping data, regarding consumption records with the time interval smaller than a preset threshold value in the card swiping records of the same card number and the same consumption place as the same consumption process, only keeping one record to represent the current consumption place and removing other records to form a consumption place record subset. Specifically, for each student i, the card number of the campus card corresponding to the student i is C i Then all the card numbers in the consumption place set are CardNO n =C i Consumption record l n (n.ltoreq.N) the subset L of consumption location records forming the student i Time interval threshold T interv Can be determined by the person skilled in the art, in this embodiment, the time interval threshold T is taken interv = 60min if:
Figure BDA0003742350180000162
then look at l n+m And l n Only one record of card swiping is reserved for the same consumption process n From L to L i Middle eliminating of n+m . To l is to n Traverse all l n+m If the condition of formula (6) is not satisfied, let l n =l n+m+1 Repeating the above condition until L i All the data items in the system are checked, and a valid consumption place record subset L of the student i is obtained i
And 2.3, repeating the step 2.2 until all the consumption place record subsets corresponding to the card numbers are obtained, and finally obtaining 133082 pieces of effective consumption place data of 3267 students in the embodiment.
Step 2.4, for the preset period Dur, read in and n consuming the same cycle of data (i.e. consuming3 months and 1 to 30 days) of the same group of students, and using l n In (2) the access control identification code Acc n And Time of card swiping Time n Screening records of which the card swiping time is adjacent to the consumption time from the entrance guard card swiping records, and checking the entrance guard card swiping place Tbuilding in the records n Adding the data into consumption place data records to obtain a complete school life behavior data set a of the student school n
a n ={(CardNo n ,Time n ,Loction n ,Tbuilding n ,Money n ,Acc n ,Sex n )|n=1,2,3,…,N} (7)
And 3, carrying out clustering analysis by using a mean shift method aiming at two-dimensional data formed by consumption frequency and consumption total. Comprising the following substeps:
step 3.1, for two-dimensional feature vector v i It can be considered as a set in two-dimensional space with (t) i ,f i ) A set of points on the abscissa with I =1,2, \ 8230;, I, each point corresponding to the consumption behavior of one student. And taking each point as a separate initial class to realize the initialization of the clustering process.
Step 3.2, randomly selecting a point v x As an initial centroid cen x
Step 3.3, with centroid cen x Selecting a sliding window with the bandwidth of r for the center, recording a set formed by all points in the window range as W, and temporarily marking the sliding window as belonging to class clu x And increase the access frequency of these points within the class by 1.
Step 3.4, calculating all points in the sliding window to the initial centroid cen x Radial basis kernel weighted average distance M r As the mean shift vector:
Figure BDA0003742350180000171
where g () represents a radial basis kernel function.
Step 3.5, mean shift vector M r Updating the centroid coordinates as:
cen x+1 =M r +cen x (9)
step 3.6, repeat steps 1.3.3 to 1.3.5 until offset M r Less than a threshold value T conv Then the centroid cen at that time is determined X As a clustering center, all points visited in the repeated iteration process of steps 1.3.3 to 1.3.5 belong to the class clu corresponding to the center X And the drift is converged.
Step 3.7, if the current class clu X The distance between the cluster center and the center of an existing class is less than a threshold value T dis If the current class is not the new class, the current class is classified into the existing class, otherwise, the current class is kept as the new class.
And 3.8, repeating the steps 1.3.1 to 1.3.7 until all the points are accessed, and ending the mean shift clustering process.
And 3.9, attributing all the points to corresponding clustering centers according to the marks, and clustering the points to a type with higher access frequency if one point is marked by a plurality of types of access marks. The number of clusters finally obtained in this embodiment is 6.
Step 4, according to the clustering analysis result, acquiring consumption behavior characteristics of students, and providing management and self-management references for schools and students respectively, comprising the following substeps:
and 4.1, feeding back the positions of the students in the frequency-total two-dimensional space according to the frequency-total clustering result, providing individual consumption behavior reports for the students, helping the students to know and improve self consumption behaviors and habits, and developing financial consciousness. For example, if students are in a class with relatively low consumption frequency and relatively high total consumption, the average single consumption amount of the students is high, which reflects that the students have high requirements on consumption or life quality.
And 4.2, analyzing the frequency-total clustering result, acquiring abnormal classes and abnormal items, feeding back the abnormal classes and abnormal items to a student management department as early warning information to remind a corresponding student group to pay special attention, and simultaneously providing necessary consumption suggestions to students. In particular, the following classes of data are labeled as potentially anomalous:
(1) For the cluster center point set CEN, if CEN x (t x ,f x ) And its corresponding class clus x The following conditions are met, which indicates that the total consumption amount of students in the class in the preset period is very low, and the consumption frequency is higher, and the students belong to potential students with economic difficulties:
Figure BDA0003742350180000181
wherein, t CEN Set of abscissas of the CEN set of central points, f I And the ordinate sets, mean (), std (), and num () of all points are mean, standard deviation, and count functions, respectively.
(2) For CEN, if CEN, in the cluster center point set x (t x ,f x ) And its corresponding class clus x The following conditions are met, which indicates that the total consumption amount and the consumption frequency of the students in the class in the preset period are extremely low, and the students belong to potential students not at school:
Figure BDA0003742350180000182
(3) For CEN, if CEN, in the cluster center point set x (t x ,f x ) And its corresponding class clus x The following conditions are met, which indicates that the total consumption amount and the consumption frequency of the students in the class in the preset period are extremely high, and the students belong to potential high consumption groups:
Figure BDA0003742350180000183
in the embodiment, there are no classes meeting the conditions (1) and (2), 1 class meets the condition (3), and the classes contain potential high-consumption groups of students, and can be provided with rational consumption suggestions in campus card association application and continuously pay attention as special attention groups of student management departments.
Step 5, aiming at the campus life behavior data set and the subset thereof, carrying out joint analysis on the campus life behavior data set by using an Apriori association rule analysis method, taking a canteen as a center, mining the association relation between the attributes of students and the entrance and exit of a teaching building and the selection of the canteen, and analyzing the life behavior track preference of different student groups, wherein the method comprises the following substeps:
and 5.1, mining association rules aiming at the gender item set and the consumption site item set in the consumption site record subset obtained in the step 2.3, and judging whether strong association rules exist or not. If the relationship exists, the relationship between the academic nature and the selection of the canteens is more definite; if the gender of the student does not exist, the gender of the student does not have obvious mutual influence on selection of the canteen. The specific implementation process is as follows:
step 5.1.1, recording the subset l from the consumption site n Taking two elements of consumption place and sex in each item to form tau n ={Loction n ,Sex n I N =1,2, \8230;, N }, 6 different locations and 2 different sexes Loction included in the present embodiment as a matter of association rule mining m ,Sex q (m =1,2, \8230;, 6,q =1,2) is an item in a transaction, transaction database D l ={τ 12 ,…,τ N }。
Step 5.1.2, set Ite = { Loction = m ,Sex q I m =1,2, \ 8230 |, 6,q =1,2} is D l Then any non-empty subset X of Ite is D l In order to determine the association rule between gender and consumption location, the item set in (1) is a 2-item set X comprising 2 items and composed of location and gender items in Ite k (k =1,2, \8230;, 12) requires a degree of support:
Figure BDA0003742350180000191
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003742350180000192
is D l Middle containing item set X k N is D l The total number of transactions in.
Step 5.1.3, setting the minimum support threshold value as
Figure BDA0003742350180000193
In specific implementation, the threshold is selected by a person skilled in the art, and is set in the embodiment
Figure BDA0003742350180000194
For the 2-item set X in step 5.1.2 k If at all
Figure BDA0003742350180000195
Then X k For frequent item set, the set of all frequent item sets is marked as X F
Step 5.1.4, in order to know whether the student gender and the dining room selection have strong association relation, a frequent item set X is collected F In the step (3), all association rules between the gender and the consumption location are generated, and the association rule taking the gender as a condition and the location as a result is taken as an example:
Sex q →Location m ,m=1,2,…,6,q=1,2 (14)
step 5.1.5, solving the confidence of each association rule:
Figure BDA0003742350180000196
step 5.1.6, set the minimum confidence threshold to
Figure BDA0003742350180000197
In specific implementation, the threshold is selected by a person skilled in the art, and is set in the embodiment
Figure BDA0003742350180000198
Known frequent itemset set X F In
Figure BDA0003742350180000201
If it is
Figure BDA0003742350180000202
Figure BDA0003742350180000203
Then Sex q →Location m Is a strongly associated rule. For association rule Location m →Sex q The support degree is the same as the calculation process of the confidence degree, and the details are not repeated here. The strong association rule in this embodiment is "bamboo garden dining hall → girls" through analysis, and the support and confidence are 0.1502 and 0.7366, respectively.
And 5.2, mining association rules aiming at the entrance guard access point (namely the teaching building) item set and the consumption point (namely the ready-to-eat hall) item set in the campus life behavior data set obtained in the step 2.4, and judging whether strong association rules exist or not. If the current association exists, the fact that a relatively clear association exists between the teaching building selection and the canteen selection is reflected; if not, the fact that the interaction between the teaching building selection and the canteen selection is not obvious is indicated. The specific implementation process is as follows:
step 5.2.1, from campus life behavior data set a n Taking two elements of consumption place and gender in each item to form alpha n ={Loction n ,Tbuilding n I N =1,2, \8230;, N } as a matter of association rule mining, including 6 different canteens and 8 different teaching buildings, loction m ,Tbuilding s (m =1,2, \8230;, 6,s =1,2, \8230;, 8) is an entry in a transaction, transaction database D a ={τ 12 ,…,τ N }。
Step 5.2.2, let it = { Location = m ,Tbuilding s I m =1,2, \ 8230 |, 6,s =1,2, \8230;, 8} is D a Then any non-empty subset X of Ite is D a The item set in (1) is a correlation rule between an education building and a canteen, and firstly, a 2-item set X containing 2 items and formed by the education building and the canteen items in Ite k (k =1,2, \8230;, 48) support:
Figure BDA0003742350180000204
wherein the content of the first and second substances,
Figure BDA0003742350180000205
is D a Middle containing item set X k N is D a The total number of transactions in.
Step 5.2.3, setting the minimum support threshold value as
Figure BDA0003742350180000206
In specific implementation, the threshold is selected by a person skilled in the art, and is set in the embodiment
Figure BDA0003742350180000207
For the 2-item set X in step 5.2.2 k If, if
Figure BDA0003742350180000208
Then X k For frequent item set, the set of all frequent item sets is marked as X F
Step 5.2.4, in order to know whether the teaching building and the dining room have strong association relation, the frequent item set X is collected F Generating all association rules between the teaching building and the canteen, taking the teaching building as a condition and the canteen as a result as an example:
Tbuilding s →Location m ,m=1,2,…,6,s=1,2,…,8 (17)
step 5.2.5, solving the confidence of each association rule:
Figure BDA0003742350180000211
step 5.2.6, set the minimum confidence threshold to
Figure BDA0003742350180000212
In specific implementation, the threshold is selected by a person skilled in the art, and is set in the embodiment
Figure BDA0003742350180000213
Known frequent itemset set X F In (1)
Figure BDA0003742350180000214
If it is
Figure BDA0003742350180000215
Figure BDA0003742350180000216
Then Tbuilding s →Location m Is a strongly associated rule. For association rule Location m →Tbuilding s The support degree is the same as the calculation process of the confidence degree, and the details are not repeated here. The strong association rule obtained by analysis in the embodiment is 'lan garden canteen → yazai', and the support degree and the confidence degree are 0.1520 and 0.5080 respectively; "bamboo garden canteen → two educations", its support and confidence are 0.1566, 0.6731 respectively.
And 6, acquiring the life track behavior characteristics of the students according to the analysis result of the association rule, and providing a meal preparation planning suggestion and a meal recommendation for the canteen manager and the students respectively. Comprising the following substeps:
step 6.1, by utilizing the strong association rule obtained in the step 5.1, the greater probability of having a meal to go to the bamboo garden dining room is known as girls, so that the meal categories which better accord with the preference of girls can be correspondingly increased, the meal quantity demand of girls is considered to be lower than that of girls, the meal quantity is properly reduced, and the waste is avoided.
And 6.2, by using the strong association rule obtained in the step 5.2, the students going to the orchards canteens can be known to come from eight teachers with a high probability, and the students going to the bamboo garden canteens can come from two teachers with a high probability, so that the dining room management party can adjust the meal supply time and meal preparation amount by combining the course arrangement (such as the time of going to and going from classes and the volume of course students) of the eight teachers and the two teachers. For students, the prediction of the flow of people in the canteens can be provided for the students according to the capacity conditions of the students in the teaching building on the same day, so that the students can be assisted to properly select the off-peak dining mode.
The invention has the following beneficial effects:
the campus card swiping system can utilize data generated by the campus card swiping to conduct automatic analysis of multiple dimensions such as consumption behaviors, life action tracks and consumption place preference, and can provide valuable data analysis results for students and schools; the invention adopts a two-dimensional clustering mode to the consumption data, can carry out combined analysis on the total consumption sum and the frequency to obtain a more comprehensive consumption behavior analysis result, and eliminates the artificial limitation on the clustering number in the clustering process, so that the result is more in line with the objective condition, and the inclusion is stronger.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the foregoing, the description is not to be taken in a limiting sense.

Claims (8)

1. An analysis method based on big data of a one-card is characterized by comprising the following steps:
s1: reading personal information of a campus card-through student group, consumption card-swiping records of the campus card-through students in a preset period and access card-swiping records in the same period to obtain an original data set reflecting campus life, wherein the original data set takes a card number as a unique identifier and corresponds to students one by one.
S2: cleaning and counting the original data set, and processing the original data set according to different life behavior characteristics to obtain a campus life behavior data set for analysis and an effective data subset thereof;
s3: acquiring two-dimensional data consisting of consumption frequency and consumption sum through the effective data subset which can be used for analysis, and performing cluster analysis on the two-dimensional data consisting of consumption frequency and consumption sum by using a mean shift method to acquire data after cluster analysis;
s4: and acquiring consumption behavior characteristics of the students according to the data after the cluster analysis, and providing management and self-management references for schools and students respectively.
S5: performing joint analysis on the campus life behavior data set available for analysis and the effective data subset thereof by using an Apriori association rule analysis method to obtain joint analysis results and analyze life behavior trajectory preferences of different student groups;
s6: and acquiring the living track behavior characteristics of the students according to the joint analysis result, and providing a meal preparation planning suggestion and a dining recommendation for the dining room manager and the students respectively.
2. The method for analyzing the big data of the all-purpose card according to claim 1, wherein the step S1 comprises: presetting a period Dur, reading in N consumption card swiping records of the student campus card in the period, wherein each consumption card swiping record is p n Expressed as a collection of several features:
p n ={(CardNo n ,Time n ,Loction n ,Money n )|n=1,2,3,...,N} (1)
wherein, cardNO n ,Time n ,Loction n ,Money n Record p for consumption card swiping separately n Card number, consumption time, consumption place and consumption amount of the user;
the personal information of the students refers to the utilization of the card number CardNO in each record n Inquiring and reading access control identification code Acc of student corresponding to the card number n And Sex information Sex n
The student personal information and the consumption card swiping record p n The consumption place data sets which contain N card swiping records and student personal information in the preset period Dur are formed together:
l n ={(CardNo n ,Time n ,Loction n ,Money n ,Acc n ,Sex n )|n=1,2,3,...,N} (2)。
3. the method for analyzing the one-card big data according to claim 2, wherein the step S2 comprises:
counting the consumption frequency and the consumption sum of each student in a preset period, and expressing the consumption frequency and the consumption sum as a two-dimensional characteristic vector set of the consumption frequency-the consumption sum:
{v i =(t i ,f i )|i=1,2,...,I} (3)
wherein I is the total number of students, and the card number of the campus card corresponding to each student I is C i Then all the card numbers CardCo in the data set are consumed n =C i Consumption record p of n (n.ltoreq.N) constitutes a subset P of consumption records for the student i . Its total amount of consumption t i Has a value of P i Sum of medium consumption, frequency of consumption f i Then is P i Number of pieces recorded in (1):
Figure FDA0003742350170000021
f i =|P 1 | (5)
obtaining a subset L of consumption location records valid for student i i : for each student i, the card number of the campus card corresponding to the student i is C i All card numbers CardNo in the consumption site set n =C i Consumption record l n (n.ltoreq.N) the subset L of consumption location records forming the student i Setting the time interval threshold value as T interv If:
Figure FDA0003742350170000022
then look at l n+m And l n Only retaining l for the card-swiping record generated in the same consumption process n From L to i Middle eliminating of n+m To l, to n Traverse all l n+m If the condition of formula (6) is not satisfied, let l n =l n+m+1 Repeating the above condition judgment until L i All data items in (1)After the examination is finished, obtaining the effective consumption place record subset L of the students i i
Repeating the steps until the consumption place record subsets corresponding to each student are obtained;
obtaining the effective entrance guard card swiping record of the student i: for a predetermined period Dur, read in and n the entrance guard card-swiping record of the same batch of students in the same period of the middle consumption data is recorded by utilizing the I n Access control identification code Acc in n And Time of card swiping Time n Screening records of the card swiping time and the consumption time which are adjacent from the entrance guard card swiping records, and checking the entrance guard card swiping place Tbuilding in the records n Adding the data into a consumption place data record to obtain a complete student campus life behavior data set a n
a n ={(CardNo n ,Time n ,Loction n ,Tbuilding n ,Money n ,Acc n ,Sex n )|n=1,2,3,...,N} (7)
The complete school life behavior data set a of students n I.e. the valid data set available for analysis.
4. The method for analyzing the big data of the smart card of claim 3, wherein the step S3 comprises:
s3.1: for two-dimensional feature vector v i Viewed as a set in two-dimensional space with (t) i ,f i ) The clustering method comprises the following steps of taking a point set of horizontal and vertical coordinates, wherein I =1, 2.. And I, each point corresponds to the consumption behavior of a student, and each point is taken as an independent initial class to realize the initialization of a clustering process;
s3.2: randomly selecting a point v x As an initial centroid cen x
S3.3: by the center of mass cen x Selecting a sliding window with the bandwidth of r for the center, recording a set formed by all points in the window range as W, and temporarily marking the sliding window as belonging to class clu x And increasing the access frequency of the points within the class by 1;
s3.4: computing all points to the beginning within a sliding windowCenter of mass cen x Radial basis kernel weighted average distance M r As the mean shift vector:
Figure DEST_PATH_FDA0003827186850000031
s3.5: by mean shift vector M r Updating the centroid coordinates as:
cen x+1 =M r +cen x (9)
s3.6: repeating the steps 3.3 to 3.5 until the offset M r Less than threshold T conv Then the centroid cen at this time X As a clustering center, all points visited in the repeated iteration process of steps 3.3 to 3.5 belong to the class clu corresponding to the center X Converging the drift;
s3.7: if the current class clu X The distance between the cluster center and the center of a certain existing class is less than a threshold value T dis If the current class is not the existing class, the current class is not reserved as the new class;
s3.8: repeating the steps 3.1 to 3.7 until all the points are accessed, and ending the mean shift clustering process;
s3.9: and (3) attributing all the points to corresponding clustering centers according to the marks, and clustering the points to a class with higher access frequency if one point is marked by a plurality of class access marks to obtain a frequency-total clustering result.
5. The method for analyzing the big data of the smart card of claim 4, wherein the step S4 comprises:
and feeding back the positions of the students in the frequency-total two-dimensional space according to the frequency-total clustering result, providing individual consumption behavior reports for the students, analyzing the reports to obtain abnormal classes and abnormal items, feeding back the abnormal classes and abnormal items to a student management department as early warning information, reminding special attention to corresponding student groups, and simultaneously providing necessary consumption suggestions for the students.
6. The method for analyzing big data of a one-card according to claim 5, wherein the abnormal items and the abnormal classes comprise:
potential economically difficult students; a potential student away from school; a potentially high consumer group;
potential economic difficulties students: for CEN, if CEN, in the cluster center point set x (t x ,f x ) And its corresponding class clus x The following conditions are met, which indicates that the total consumption amount of the students in the class in the preset period is very low and the consumption frequency is higher,
Figure FDA0003742350170000041
wherein, t CEN Set of abscissas of the CEN set of central points, f I The ordinate set, mean () and std () of all the points and num () are respectively a mean value, a standard deviation and a counting function;
potential out-of-school students: for the cluster center point set CEN, if CEN x (t x ,f x ) And its corresponding class clus x The following conditions are met, which indicates that the total consumption amount and the consumption frequency of the students in the class in the preset period are extremely low,
Figure FDA0003742350170000042
potential high consumer groups: for CEN, if CEN, in the cluster center point set x (t x ,f x ) And its corresponding class clus x The following conditions are met, which indicates that the total consumption amount and the consumption frequency of the students in the class in the preset period are extremely high,
Figure FDA0003742350170000043
7. the method for analyzing the big data of the all-purpose card according to claim 6, wherein the step S5 comprises:
performing association rule mining on the gender item set and the consumption site item set in the consumption site record subset, judging whether a strong association rule exists, and reflecting that the gender of the student is clearly associated with the selection of the dining room if the strong association rule exists; if the gender of the student does not exist, the fact that obvious mutual influence does not exist between the gender of the student and the selection of the canteen is shown, and the method specifically comprises the following steps:
step 5.1.1, recording the subset l from the consumption site n Taking two elements of consumption place and gender in each item to form tau n ={Loction n ,Sex n I N =1, 2., N } as a matter of association rule mining, containing M different places and 2 different sexes Loction m ,Sex q (M =1, 2.., M, q =1, 2) is an entry in a transaction, transaction database D l ={τ 1 ,τ 2 ,...,τ N };
Step 5.1.2, let it = { Loction m ,Sex q I M =1, 2.. M, q =1,2} is D l If any of the items in (1) is D, then any non-empty subset X of Ite is D l In order to determine the association rule between gender and consumption location, the item set in (1) is a 2-item set X comprising 2 items and composed of location and gender items in Ite k (k =1, 2.., mx 2) to obtain a support degree:
Figure FDA0003742350170000051
wherein the content of the first and second substances,
Figure FDA0003742350170000052
is D l Middle containing item set X k N is D l A total number of transactions;
step 5.1.3, setting the minimum support threshold value as
Figure FDA0003742350170000053
For the 2 item set X in step 5.1.2 k If at all
Figure FDA0003742350170000054
Then X k For frequent item sets, the set of all frequent item sets is marked as X F
Step 5.1.4, in order to know whether the student sex and the dining room selection have a strong association relationship, a frequent item set X is collected F Generating all association rules between the gender and the consumption place, wherein the association rule taking the gender as a condition and the place as a result is as follows:
Sex q →Location m ,m=1,2,...,M,q=1,2 (14)
step 5.1.5, calculating the confidence of each association rule:
Figure FDA0003742350170000055
step 5.1.6, set the minimum confidence threshold to
Figure FDA0003742350170000056
Known frequent item set X F In (1)
Figure FDA0003742350170000057
If it is
Figure FDA0003742350170000058
Figure FDA0003742350170000059
Then Sex q →Location m Is a strong association rule;
mining association rules of the entrance and exit place item sets and the consumption place item sets in the campus life behavior data sets, judging whether strong association rules exist, and reflecting that the selection of the teaching building and the selection of the dining room have clear association if the strong association rules exist; if the result does not exist, the method indicates that no obvious interaction exists between the teaching building selection and the dining hall selection, and specifically comprises the following substeps:
step 5.2.1, from campus life behavior data set a n Taking two elements of consumption place and gender in each item to form alpha n ={Loction n ,Tbuilding n I N =1,2,. And N }, as a matter mined by association rules, wherein M different canteens and S different teaching buildings Loction are included in the affairs mined by the association rules m ,Tbuilding s (M =1, 2.. Said.m, S =1, 2.. Said.s.) is an entry in a transaction, the transaction database D a ={τ 1 ,τ 2 ,...,τ N };
Step 5.2.2, let it = { Location = m ,Tbuilding s I M =1, 2.. Said, M, S =1, 2.. Said, S } is D a If any of the items in (1) is D, then any non-empty subset X of Ite is D a The item set in (1) is an association rule between a teaching building and a canteen, and firstly, a 2-item set X which comprises 2 items and is formed by the items of the teaching building and the canteen in Ire k (k =1, 2.., sxm) to obtain a support degree:
Figure FDA0003742350170000061
wherein the content of the first and second substances,
Figure FDA0003742350170000062
is D a Middle containing item set X k N is D a A total number of transactions;
step 5.2.3, setting the minimum support threshold value as
Figure FDA0003742350170000063
For the 2-item set X in step 5.2.2 k If, if
Figure FDA0003742350170000064
Then X k For frequent item set, the set of all frequent item sets is marked as X F
Step 5.2.4, in order to know whether a strong association relationship exists between the teaching building and the selection of the dining room, generating all association rules between the teaching building and the dining room from the frequent item set XF, wherein the association rules of the teaching building as the conditions and the dining room as the results are as follows:
Tbuilding s →Location m ,m=1,2,...,M,s=1,2,...,S (17)
step 5.2.5, calculating the confidence of each association rule:
Figure FDA0003742350170000065
step 5.2.6, set the minimum confidence threshold to
Figure FDA0003742350170000066
Known frequent itemset set X F In
Figure FDA0003742350170000067
If it is
Figure FDA0003742350170000068
Figure FDA0003742350170000069
Then Tbuilding s →Location m Is a strongly associated rule.
8. The method for analyzing the big data of the smart card of claim 7, wherein the step S6 comprises:
acquiring the potential influence of the gender on the selection of the dining room by utilizing the strong association rule, providing suggestions of meal preparation amount and meal preparation types for the dining room according to the suggestions, respectively forming the strongly associated dining rooms with females and males, respectively increasing the meal types which are more in line with the preference of the females or males, and adjusting the meal preparation amount according to the number of people with different genders and the meal consumption;
by utilizing the strong association rule, potential influences of coming in and going out of different teaching buildings on the dining room selection are known to the dining room which is strongly associated with the teaching buildings, the on-and-off time of courses arranged on the day of the corresponding teaching building and the capacity of students in the courses are combined to provide suggestions of meal supply time and meal preparation amount for a dining room management party, and suggestions of meal place selection and peak-off meal place selection are provided for students.
CN202210820713.7A 2022-07-12 2022-07-12 Analysis method based on all-purpose card big data Pending CN115239106A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210820713.7A CN115239106A (en) 2022-07-12 2022-07-12 Analysis method based on all-purpose card big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210820713.7A CN115239106A (en) 2022-07-12 2022-07-12 Analysis method based on all-purpose card big data

Publications (1)

Publication Number Publication Date
CN115239106A true CN115239106A (en) 2022-10-25

Family

ID=83673416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210820713.7A Pending CN115239106A (en) 2022-07-12 2022-07-12 Analysis method based on all-purpose card big data

Country Status (1)

Country Link
CN (1) CN115239106A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843377A (en) * 2023-07-25 2023-10-03 河北鑫考科技股份有限公司 Consumption behavior prediction method, device, equipment and medium based on big data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843377A (en) * 2023-07-25 2023-10-03 河北鑫考科技股份有限公司 Consumption behavior prediction method, device, equipment and medium based on big data

Similar Documents

Publication Publication Date Title
Ejdemyr et al. Segregation, ethnic favoritism, and the strategic targeting of local public goods
Nickerson et al. Political campaigns and big data
Kruppa et al. Consumer credit risk: Individual probability estimates using machine learning
US8255392B2 (en) Real time data collection system and method
Dragan et al. Does gentrification displace poor children and their families? New evidence from medicaid data in New York City
Tam Cho et al. Prospecting for (campaign) gold
Liu et al. A principal component analysis (PCA)-based framework for automated variable selection in geodemographic classification
Renigier-Biłozor et al. Rating engineering of real estate markets as the condition of urban areas assessment
Randon-Furling et al. From urban segregation to spatial structure detection
Abraham et al. Gender differences in willingness to move for interregional job offers
CN111221868A (en) Data mining and analyzing method applied to channel preference of power customer
Nam et al. City size distribution as a function of socioeconomic conditions: an eclectic approach to downscaling global population
CN115239106A (en) Analysis method based on all-purpose card big data
Kowalewska Gendered employment patterns: Women’s labour market outcomes across 24 countries
Strömblad et al. Urban inequality and political recruitment
McPherson Growth and survival of small Southern African Firms
Renigier-Biłozor et al. Residential market ratings using fuzzy logic decision-making procedures
Vysochan et al. Cluster analysis of charitable organizations of Ukraine using K-means technology.
He et al. Multi-dimensional boundary effects and regional economic integration: Evidence from the Yangtze River Economic Belt
Rahadian et al. Segmentation analysis of students in X course with RFM model and clustering
Gorard An argument concerning overcoming inequalities in Higher Education
Fritsche et al. Changes in water demand resulting from pandemic mitigations in Southeast Michigan
Yordanova et al. Similarities and differences between female and male entrepreneurs in a transition context: Evidence from Bulgaria
Pizzimenti et al. Do parties converge? An empirical analysis of party organizational and policy issue saliency change in Western Europe (1970–2010)
CN112685654A (en) Student identification method and device, computing equipment and readable computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination