CN112685654A - Student identification method and device, computing equipment and readable computer storage medium - Google Patents

Student identification method and device, computing equipment and readable computer storage medium Download PDF

Info

Publication number
CN112685654A
CN112685654A CN201910990107.8A CN201910990107A CN112685654A CN 112685654 A CN112685654 A CN 112685654A CN 201910990107 A CN201910990107 A CN 201910990107A CN 112685654 A CN112685654 A CN 112685654A
Authority
CN
China
Prior art keywords
campus
student
roaming
data
base station
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910990107.8A
Other languages
Chinese (zh)
Other versions
CN112685654B (en
Inventor
钱慧如
郑欢
许乐静
傅泉辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Zhejiang Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201910990107.8A priority Critical patent/CN112685654B/en
Publication of CN112685654A publication Critical patent/CN112685654A/en
Application granted granted Critical
Publication of CN112685654B publication Critical patent/CN112685654B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Telephonic Communication Services (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The embodiment of the invention relates to the technical field of information, and discloses a student identification method and device, a computing device and a readable computer storage medium, wherein the method comprises the following steps: acquiring a campus roaming number set, wherein the campus roaming number set comprises a plurality of roaming numbers of at least one campus, and the time of the roaming numbers appearing in the corresponding campus does not exceed a preset value; obtaining attribute data of the plurality of roaming numbers, each of the attribute data including one or more of: the activity record of the user corresponding to each roaming number in a preset time period, the real-time mobile phone data stream of the corresponding user and the behavior data of the corresponding user; identifying students from the set of campus roaming numbers based on the attribute data. Through the mode, the embodiment of the invention can improve the accuracy of student number identification.

Description

Student identification method and device, computing equipment and readable computer storage medium
Technical Field
The embodiment of the invention relates to the technical field of information, in particular to a student identification method and device, a computing device and a readable computer storage medium.
Background
Campus marketing is a marketing method for students, and in the open season of the year, various marketing activities occur in campuses, such as: telephone cards, living goods, school supplies and the like, most of which are offline marketing, online (e.g., WeChat) marketing is gradually popular with the development of the Internet of things, and online marketing becomes an important means for various merchants.
In real life, advertisements become a means of commodity marketing, consumers can know commodities well through commodity advertisements and attract the consumers to purchase the commodities, a plurality of merchants publicize the commodities in modes of network television advertisements, billboard arrangement and the like, and advertisement pushing to the mobile phones of the consumers is a means which can be selected by the merchants. When a plurality of merchants conduct campus marketing activities, commodity information is pushed to the obtained mobile phone numbers through the pre-obtained student mobile phone numbers. The accuracy of identifying students through mobile phone numbers is important, and the marketing effect is related.
In the prior art, the mobile phone number of the student can be obtained in the form of identity information, for example: the method comprises the steps that whether a target number is a student in a campus or not is judged by combining a new number appearing in the campus with identity and age information of a number registration owner, and the mobile phone number of the student is obtained according to the method, but the method has the defect that the number registration owner may not be the owner of the mobile phone or the identity data of the owner of the mobile phone is lacked, so that the number of the mobile phone of the student which can be obtained is limited;
another way is to find out whether the number has a clue of the student according to historical data of the campus, for example, whether the student dials a college entrance hot line, etc., but the mobile phone number that dials the college entrance hot line may be a parent, so that the accuracy of the mobile phone number of the student obtained in this way is not high.
Another method is to judge numbers frequently contacted with other students as student numbers by analyzing communication relations after new numbers in the campus arrive at school according to communication data, and the method is based on the premise that identities of other students are accurately identified and accurate new data is needed to effectively start the method.
Disclosure of Invention
In view of the foregoing problems, embodiments of the present invention provide a student identification method and apparatus, a computing device, and a computer storage medium, which overcome the foregoing problems.
According to an aspect of an embodiment of the present invention, there is provided a student identification method, including: acquiring a campus roaming number set, wherein the campus roaming number set comprises a plurality of roaming numbers of at least one campus, and the time of the roaming numbers appearing in the corresponding campus does not exceed a preset value; obtaining attribute data of the plurality of roaming numbers, each of the attribute data including one or more of: the activity record of the user corresponding to each roaming number in a preset time period, the real-time mobile phone data stream of the corresponding user and the behavior data of the corresponding user; identifying students from the campus roaming number set based on the attribute data.
According to another aspect of an embodiment of the present invention, there is provided a student identification apparatus, including: the device comprises a set acquisition module, a processing module and a processing module, wherein the set acquisition module is used for acquiring a campus roaming number set, the campus roaming number set comprises a plurality of roaming numbers of at least one campus, and the time when the roaming numbers appear in the corresponding campus does not exceed a preset value; a data obtaining module, configured to obtain attribute data of the roaming numbers, where each attribute data includes one or more of the following: the activity record of the user corresponding to each roaming number in a preset time period, the real-time mobile phone data stream of the corresponding user and the behavior data of the corresponding user are recorded; an identification module to identify a student from the set of campus roaming numbers based on the attribute data.
According to another aspect of embodiments of the present invention, there is provided a computing device including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the steps of the student identification method.
According to a further aspect of the embodiments of the present invention, there is provided a computer-readable storage medium, wherein at least one executable instruction is stored in the storage medium, and the executable instruction causes the processor to execute the steps of the student identification method.
According to the embodiment of the invention, the campus roaming number set is obtained firstly, each number in the campus roaming number set is identified based on the activity record of the user of the mobile phone number, the real-time mobile phone data stream, the user behavior data and the like, and the student number list is output, so that the accuracy of student number identification can be improved.
The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and the embodiments of the present invention can be implemented according to the content of the description in order to make the technical means of the embodiments of the present invention more clearly understood, and the detailed description of the present invention is provided below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a schematic flow chart illustrating a student identification method according to a first embodiment of the present invention;
fig. 2 is a schematic flow chart illustrating a specific flow of step S13 of the student identification method according to the first embodiment of the present invention;
fig. 3 is a schematic flow chart illustrating a specific process of step S131 of the student identification method according to the first embodiment of the present invention;
fig. 4 is a schematic specific flowchart illustrating step S1311 of the student identification method according to the first embodiment of the present invention;
fig. 5 is a schematic flow chart illustrating a specific flow of step S42 of the student identification method according to the first embodiment of the present invention;
fig. 6 is a schematic specific flowchart illustrating step S1312 of the student identification method according to the first embodiment of the present invention;
fig. 7 is a schematic flow chart illustrating a specific flow of step S62 of the student identification method according to the first embodiment of the present invention;
fig. 8 is a schematic flow chart illustrating a specific process of step S622 of the student identification method according to the first embodiment of the present invention;
fig. 9 is a schematic specific flowchart of step S1313 of the student identification method according to the first embodiment of the present invention;
fig. 10 is a flowchart illustrating a student identification method according to a second embodiment of the present invention;
fig. 11 is a detailed flowchart illustrating step S103 of the student identification method according to the second embodiment of the present invention;
fig. 12 is a flowchart illustrating a student identification method according to a third embodiment of the present invention;
fig. 13 is a detailed flowchart illustrating step S123 of the student identification method according to the third embodiment of the present invention;
fig. 14 is a flowchart illustrating a student identification method according to a fourth embodiment of the present invention;
fig. 15 is a schematic specific flowchart of step S143 of the student identification method according to the fourth embodiment of the present invention;
fig. 16 is a schematic structural view showing a student identification device provided in a fifth embodiment of the present invention;
fig. 17 is a schematic structural diagram of a computing device according to a sixth embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In a first embodiment of the present invention, fig. 1 shows a flowchart of a student identification method provided in an embodiment of the present invention. As shown in fig. 1, the student identification method includes:
step S11: acquiring a campus roaming number set.
Specifically, the campus roaming number set refers to numbers which enter an area where a campus is located and are captured by a campus-associated base station, the number resident in the campus is removed from the captured numbers, the rest numbers are used as a campus roaming number set, the time of the number appearing in the campus does not exceed a preset value, the preset value can be set according to actual conditions, for example, half a year, further, the number captured by the campus-associated base station within one year before the day of school is obtained, the number appearing in half and more than half a year is removed, the rest numbers are used as a campus roaming number set, the number resident in the campus refers to the time of the number existing in the associated base station reaching more than half a year, and the number can be considered to be owned by people resident in the campus. And a time length of less than half a year may be considered a roaming number. The length of time may be a time of continuous cumulative occurrence, for example: within a year, the total time of appearance reaches more than half a year, and the number can be regarded as a permanent number. It should be noted that, the associated base station stores the real-time mobile phone data of the captured mobile phone numbers, and the base station may associate multiple schools, that is, the campus roaming number set may include roaming number sets of multiple schools, and preferably, the schools are used as distinctions, that is, one school forms a campus roaming number list. And the roaming number may include a parent, classmate, friend, or student at school.
Step S12: acquiring attribute data of the plurality of roaming numbers;
specifically, a campus roaming number set L is obtainedSchRoamAttribute data for each number in the set, the attribute data may include one or more of: recording the activity of the user corresponding to each roaming number in a preset time period, and carrying out real-time mobile phone data flow and behavior data of the user; that is, the attribute data may be one of the activity record of the user corresponding to each roaming number in the preset time period, the real-time mobile phone data stream of the user, and the behavior data of the user, or may include three of the activity record of the user corresponding to each roaming number in the preset time period, the real-time mobile phone data stream of the user, and the behavior data of the user, which is not limited herein. Further, the campus roaming number set L of the last year can be extracted from the big data platformSchRoamActive record set R of all numbers inhistRoamThe activity record r corresponding to each number in the activity record seti,t,bsThe method comprises the following steps: the number i belongs to LSchRoamRecording time t, recording base station bs, base station longitude and latitude (lon/lat), call/short message identifier actType (call, SMS, Other), call/short message opposite terminal number conNum, and incoming/outgoing call identifier direction (in/out). Real-time mobile phone data stream r of the useri,t,gridObtaining, from a base station associated with a campus, comprising: the number i belongs to LSchRoamRecording time t, recording base station bs, base station longitude and latitude (lon/lat), accurate positioning grid number grid, grid center longitude and latitude (grid/grid), call/short message identification actType (call, SMS, Other), call/short message opposite terminal number conNum, incoming call/outgoing call identification direction (in/out), real-time mobile phone data stream R corresponding to the campus roaming number setRTAccuComprises a real-time mobile phone data stream r corresponding to each numberi,t,grid. The behavior data come from a real-time mobile phone data stream R of a campus associated base stationRTReal-time mobile phone data stream RRTAnd campus roaming number set LSchRoamMatching is carried out, numbers which do not belong to the campus roaming number set are removed, and the obtained mobile phone data stream RRTSchoamThe mobile phone data stream R is usedRTSchoamClassifying and summarizing according to each number i to obtain the user i belonging to L of each numberSchRoamAccumulated short-term mobile phone data stream R in campusi,RTSchoamWhen obtaining the mobile phone data stream R of each user ii,RTSchoamThen, the behavior data coding is carried out on each mobile phone number, and the mobile phone data stream R of the user i is extractedi,RTSchoamMobile phone browsing record ri,t,loc,urlThe method comprises the following steps: i is as large as LSchRoamRecording time t, recording base station bsid, base station longitude and latitude (lon/lat), access page classification (pageTypeId), used APP- (appId), call/short message identification actType (call, SMS, Other), call/short message opposite terminal number conNum, and the like.
Step S13: identifying students from the campus roaming number set based on the attribute data;
specifically, each number in the campus roaming number set is identified based on the attribute data to obtain a corresponding identification result, where the identification result at least includes a student number list, and may also include a parent number list and an uncertain list, which is not limited herein.
In this embodiment, a campus roaming number set is first obtained, each number in the campus roaming number set is identified based on an activity record of a user of a mobile phone number, a real-time mobile phone data stream, user behavior data, and the like, and a student number list is output, so that accuracy of student number identification can be improved.
In a preferred embodiment of the present invention, the attribute data includes: referring to fig. 2, the step S13 includes an activity record of the user corresponding to each roaming number in a preset time period, a real-time mobile phone data stream, and behavior data of the corresponding user:
step S131, identifying numbers of students from the campus roaming number set respectively based on activity records of corresponding users in preset time periods, real-time mobile phone data streams and behavior data of corresponding users to obtain corresponding student lists;
specifically, numbers of students are identified from the campus roaming number set based on activity records of corresponding users in a preset time period, real-time mobile phone data streams and behavior data of corresponding users, so as to obtain a corresponding student list, wherein the learning list may include identification results based on the activity records, identification results based on the mobile phone data streams and identification results based on the behavior data.
Step S132, merging the obtained student lists, and outputting student identification results;
specifically, the obtained recognition result based on the activity record, the recognition result based on the mobile phone data stream, and the recognition result based on the behavior data are input to a merge model for merging, the student recognition result is output, a final student number list is obtained, and the current student number list is updated, and in addition, the student recognition result may further include: a list of parent numbers.
In the embodiment of the invention, number recognition is firstly carried out based on the activity record, the mobile phone data stream and the behavior data respectively, then all number recognition results are input into the merging model for learning training to obtain a final student number list, and the accuracy and the reliability of recognition can be further improved.
In the embodiment of the present invention, referring to fig. 3, the step S131 includes:
step S1311, identifying a student number from the campus roaming number set based on the activity record in the preset time period, and obtaining an identification result based on the activity record;
specifically, numbers of students are identified from a campus roaming number set based on activity records of each number in a preset time period, and a student number list based on the activity records is obtained; and executing the identification operation on each number in the campus roaming number set to obtain a corresponding identification result, wherein all the identification results form an identification list, and the identification list comprises a student number identification list and an accompanying number list of each number and the like. The specific value of the preset time period may be set according to the actual situation, and is not limited herein, for example: one or half a year, or two years, etc. Preferably, the preset period of time is one year.
Step S1312, identifying numbers of students from the campus roaming number set based on the real-time mobile phone data streams to obtain identification results based on the mobile phone data streams;
specifically, the identification result of each number is obtained by identifying the real-time mobile phone data stream acquired from the base station and the campus roaming number set, and the identification results of all numbers form the identification result based on the mobile phone data stream, wherein the identification result comprises a student number list and an accompanying number list of each student number.
Step S1313: identifying numbers of students from the campus roaming number set based on the behavior data to obtain identification results based on the behavior data;
specifically, the behavior data acquired from the base station is identified with the campus roaming number set to obtain the identification result of each number, the identification results of all numbers form the identification result based on the behavior data, and the identification result comprises a student number list and an accompanying number list of each student number.
It should be noted that, the order of the step S1311, the step S1312, and the step S1313 is not limited, and the step S1311, the step S1313, and the step S1312 may be performed first; or may be the first step S1312, the step S1311, and the last step S1313; step S1312, step S1313, and step S1311 may be performed first; step S1313, step S1311, and step S1312 may be performed first, step S1313, step S1312, and step S1311 may be performed first, and step S1311, step S1312, and step S1313 may be performed simultaneously, which is not limited herein.
In the embodiment of the present invention, as shown in fig. 4, step S1311 specifically includes:
step S41, obtaining the activity record of each number in the campus roaming number set in the preset time period, and summarizing the activity records into an activity record set;
specifically, the campus roaming number set L of the campus in the last year is firstly extracted from a big data platformSchRoamActive record set R of all numbers inhistRoamThe activity record r corresponding to each number in the activity record seti,t,bsThe method comprises the following steps: the number i belongs to LSchRoamRecording time t, recording base station bs, base station longitude and latitude (lon/lat), call/short message identification actType (call, SMS, Other), call/short message opposite terminal number conNum, incoming call/outgoing call identification direction (in/out), and collecting the activity records of each number i into an activity record set RhistRoam
Step S42, acquiring base station data corresponding to each number based on the activity record set;
in particular, each activity record is analyzed to identify base station data for the corresponding number, e.g., to identify the number corresponding to the subscriber's residence base station BSi,rAnd a work site base station BSi,p(for students, the work place base station is the school base station), namely, the position of the residence place and the position of the work place of the user are identified;
step S43, acquiring numbers appearing in the same base station based on the base station data corresponding to each number to obtain a number set of corresponding base stations;
specifically, after obtaining the work site base station and the residence base station of each number corresponding to the user, respectively obtaining the number sets appearing in the same base station based on the work site base station and the residence base station, the number set comprises a target number set and an accompanying number set, the target number set comprises each residential base station number set and each work place base station number set, a corresponding residential number set is formed for each residential base station based on the residential base station corresponding to each number, the residential accompanying number set corresponding to each number is obtained from the acquired residential number set, and then forming a corresponding work place number set for each work place base station based on the work place base station corresponding to each number, and obtaining the work place accompanying number set corresponding to each number from the work place number set, for example, the work place accompanying number set has the same living foundation.Station BSi,rForm a number set corresponding to each residential base station, and base stations BS with the same working placei,pThe numbers are collected to form a number set corresponding to each work base station; according to the working ground station BSi,pObtaining a companion number set for each number
Figure BDA0002237981260000081
According to the residence base station BSi,rObtaining a companion number set for each number
Figure BDA0002237981260000091
It should be noted that, the working base station and the working number set and the working accompanying number set may be acquired first, and then the residential base station, the residential number set and the residential accompanying number set may be acquired, or both may be performed simultaneously, which is not limited herein.
Step S44, acquiring a student number set corresponding to each school and an accompanying number set of each number based on the number sets;
specifically, a student number set corresponding to each school and a companion number set of each number are obtained based on a number set, campus base station data obtained in advance are matched with each work place number set to form a student number set corresponding to the campus, and a companion number set of each student on the campus is obtained based on the student number set, for example: base station BS based on the aforementioned concrete same operationi,pIs matched with the base station associated with the school
Figure BDA0002237981260000092
Obtaining the student number set corresponding to each school
Figure BDA0002237981260000093
The school may be a university, middle school, or primary school, although this is not a limitation and preferably the school is a university or middle school. According to the student number set and each student on the campusObtaining the accompany number set of each student number
Figure BDA0002237981260000094
In a preferable embodiment of this embodiment, after step S44, the method further includes:
step S45, acquiring a social relationship set based on the student number set and the accompanying number set;
specifically, a corresponding social relationship set is obtained according to the student number set and the accompanying number set of each student. The social relationship binding includes: a family number set, a college number set and a friend number set;
further, the step S45 specifically includes:
acquiring a family and punish number set of the student based on a residence accompanying number set corresponding to the number of the student and a contact number set corresponding to the number;
for example, for each student i ∈ Lschool,1Extracting the accompanying number set of the residence
Figure BDA0002237981260000095
The contact number set of the number i is then extracted
Figure BDA0002237981260000096
Acquiring an intersection between the accompanying number set and the contact number set to obtain a family and a family number set;
extracting a classmate number set of the student from a workplace number set corresponding to the number;
for example: for each student i ∈ Lschool,1Extracting the classmate number set
Figure BDA0002237981260000101
Obtaining a friend number set based on the classmate number set and a contact number set corresponding to the number;
for example: for each student i ∈ Lschool,1Extracting contact number set by using call relation
Figure BDA0002237981260000102
And obtaining the intersection between the college number set and the contact number set to obtain the friend number set F ═ n { T ═ n #i,Comi}。
Specifically, referring to fig. 5, the step S42 specifically includes:
step S421, respectively extracting the activity record of the student vacation and the activity record of the student non-vacation corresponding to one number in the campus roaming number set from the activity record set;
specifically, the activity record of the student vacation and the activity record of the student non-vacation corresponding to one number in the campus roaming number set are respectively extracted from the activity record set, and the activity record process of the student vacation is extracted as follows:
from the set of active records RhistroamExtracting the holidays (cold holidays and/or summer holidays) of the students
Figure BDA0002237981260000103
All record subsets of
Figure BDA0002237981260000104
At each one
Figure BDA0002237981260000105
Extracting all numbers to form a number set
Figure BDA0002237981260000106
Where n represents the student vacation code, t1 and t2 represent corresponding time periods (e.g., t1 for 7 months 10 days, t2 for 8 months 30 days), for each number i e Ln,hisRoamAcquiring all records corresponding to the number in the student holiday n, and dividing all records into records corresponding to working day daytime, working day night and public holiday, wherein the records are respectively as follows:
Figure BDA0002237981260000107
Figure BDA0002237981260000108
it should be noted that the vacation time period VnAre defined with respect to students and weekdays are defined with respect to non-students in order to differentiate public holidays (e.g., weekends, legal holidays, etc.).
Step S422, obtaining a residence base station corresponding to the number based on the activity record of the student vacation;
specifically, the residence base station corresponding to the number is obtained based on the activity record of the student vacation, such as: firstly, base station data of the numbers appearing in the working day and night of the student holiday and in the public holiday period are respectively obtained, then, respectively extracting a base station with the largest number of days of occurrence (for example, extracting a base station with the largest number of times of occurrence in a time period corresponding to the daytime of the working day, extracting a base station with the largest number of times of occurrence in a time period at night of the working day, and extracting a base station with the largest number of times of occurrence in a time period of the public holiday) from the obtained base station data of the daytime of the working day, the nighttime of the working day, and the public holiday to obtain target base station data corresponding to the daytime of the working day, the nighttime of the working day, and the public holiday, comparing the obtained target base station data with corresponding preset threshold values respectively to obtain corresponding comparison results, and obtaining a residential base station corresponding to the number based on the comparison results; for example: separately acquire
Figure BDA0002237981260000111
Figure BDA0002237981260000112
Base station with the most days in each set
Figure BDA0002237981260000113
And the corresponding number of days
Figure BDA0002237981260000114
Comparing the corresponding days with corresponding preset thresholds respectively, wherein the corresponding preset thresholds are as follows:
Figure BDA0002237981260000115
the corresponding comparison results are obtained as follows:
Figure BDA0002237981260000116
wherein, Res is 1(D is more than or equal to Thre), Res is 0(D is less than Thre), wherein Thre is a preset threshold, Res is a comparison result, and D represents days; for each number i, and all base stations that occurred during the comparison
Figure BDA0002237981260000117
The sum resi,j=∑nk=1,2,3resn,i,kAnd k is a time segmentation type, the holiday is divided into a working day, a working day night and a public holiday period, and the values of k are 1,2 and 3 respectively. Then
Figure BDA0002237981260000118
Wherein, BSi,rThe residence base station of number i, preferably, adds to this number the frequency of residence of the associated tag, which takes the value:
Figure BDA0002237981260000121
wherein j0Is the identified residential base station. The preset threshold value
Figure BDA0002237981260000122
The specific value of (2) can be set according to actual conditions, but is not limited thereto, and for example, can be set according to the length of the vacation period, and can also be set according to other conditions.
Step 423, obtaining the corresponding work place base station based on the activity record of the non-holiday of the student;
specifically, similar to the method in step S422, the corresponding work base station is obtained through the activity record of the number in the non-holiday, preferably, the activity record of the non-holiday is divided into three activity records of a working day, a working day and a night, and a public holiday, the base station with the highest frequency appearing on the campus in the three time periods is obtained based on the activity records, and then is compared with the set numerical value, and the work (school) base station is obtained according to the comparison result, and the specific implementation process is consistent with the process of obtaining the residential base station in step S422, which can refer to the above process, and the steps are described here again. The set value may be set according to the actual situation, and is not limited herein.
After the work base station and the residential base station with the same number are acquired, the steps from the step S421 to the step S423 are executed again to acquire the work base station and the residential base station with another number until the work base station and the residential base station with each number in the campus roaming number set are acquired. It should be noted that the work base station is a school base station for students. Usually, the list of the base stations of each school is usually stored in a file at the operator, and the updated accurate data is kept through daily continuous drive tests, CQTs and the like.
Meanwhile, the above steps are classified according to different student holidays, the main reason is that the activities of the students and the teacher staff in the three time periods are different, so that the students, not the teacher staff, need to be better identified by differentiating the threshold settings used in the three time periods.
Specifically, referring to fig. 6, the step S1312 specifically includes:
step S61, acquiring a mobile phone data stream corresponding to a number in the campus roaming number set;
specifically, the handset data stream ri,t,gridObtaining, from a base station associated with a campus, comprising: the number i belongs to LSchRoamRecording time t, recording base station bs, base station longitude and latitude (lon/lat), accurate positioning grid number grid, grid center longitude and latitude (grid/grid), call/short message identification actType (call, SMS, Other), call/short message opposite terminal number conNum, incoming call/outgoing call identification direction (in/out), real-time mobile phone data stream R corresponding to the campus roaming number setRTAccuComprises a real-time mobile phone data stream r corresponding to each numberi,t,grid,ri,t,grid∈RRTAccuSchRoam
Step S62, acquiring position data of a corresponding user based on the mobile phone data stream, and identifying according to the position data to obtain a corresponding identification result;
specifically, position data of a user is obtained according to the obtained mobile phone data stream, and identification is carried out according to the position data to obtain a corresponding identification result;
obtaining the identification result corresponding to one number through the steps S61 and S62, and then repeating the steps S61 and S62 to obtain the identification result of another number until obtaining the identification result of each number in the campus roaming number set, thereby obtaining the student number list.
Specifically, referring to fig. 7, the step S62 specifically includes:
step S621, acquiring position data of the corresponding number in the bedtime period;
specifically, the location data includes location information of the number acquired by the corresponding base station, and since the user may change different locations in a day, there may be a plurality of grids corresponding to the locations, where the location data includes a plurality of corresponding grids (i.e., a plurality of grids in which the number appears), and the grids include grid numbers, grid center longitude and latitude, and other information.
Step S622, matching the corresponding user based on the position data appearing in the bedtime period, obtaining a first matching result;
specifically, the corresponding user is matched based on the position data appearing in the sleeping time period to obtain a first matching result, the grid where the number appears in the position in the sleeping time period is analyzed, the grid is matched with the grids of each dormitory in the campus to obtain a first matching result, and the result is the dormitory location of the number to the user.
In a preferred example of this embodiment, as shown in fig. 8, step S622 specifically includes:
step S81, acquiring the occurrence frequency of each grid in the bedtime period;
specifically, grids appearing in a sleeping time period and corresponding appearing times are obtained according to the mobile phone data stream; this bedtime period is the bedtime that school set up, for example: 10 o 'clock at night to 7 o' clock in the morning;
step S82, selecting a preset number of grills from the plurality of grills appearing in the bedtime period, wherein the number of times of appearance of any selected grille in the bedtime period is greater than the number of times of appearance of any unselected grille in the bedtime period;
specifically, because the occurrence times of each grid are inconsistent, the occurrence times of each grid are sorted, a preset number of grids with a larger occurrence time are selected, the preset number can be set according to the actual situation, and the preset number is not limited herein, such as 3 or 5, for example, 10 appeared grids are provided, the occurrence times of each grid may be inconsistent, the grids are sorted according to the occurrence times from high to low, and the grids ranked in the top five are selected;
step S83, respectively matching the grids of each dormitory corresponding to the campus with the selected grids to obtain a first matching result;
specifically, because each dormitory is located with a corresponding outline polygon, the selected grid is matched with the outline polygon, whether the grid center longitude and latitude of the grid is surrounded by the outline polygon is judged, and if so, the grid is matched with the dormitory.
For ease of understanding, the identification process is described in detail below:
because the number appearing grids (more than one and inconsistent appearing times) in the bedtime period are obtained, the mobile phone signal stream record r corresponding to each number is recordedi,t,grid∈RRTAccuSchRoamCalculating the sleeping time period
Figure BDA0002237981260000141
The number of times of occurrence of each grid in the grid is equal to L for each number iSchRoamObtaining corresponding bedding grid activity vectors
Figure BDA0002237981260000142
(u is the number of the grid) and the active vector is set as the number i ∈ LSchRoamAppear in grid during bedtime perioduNumber of occurrences ofEach number i ∈ LSchRoamIs extracted every day
Figure BDA0002237981260000143
Five highest-valued grid numbers Gi,s(1,...,5). Now using the outline polygon of each dormitory
Figure BDA0002237981260000144
For each Gi,sGrid center longitude and latitude Gi,s(gridloni,s,gridlati,s) Carrying out inclusion relation calculation (whether the center longitude and latitude are in the outline polygon or not), if the center longitude and latitude are surrounded by the outline polygon, the matching is shown, and the dormitory successfully matched each time is added with 1 in the corresponding dormitory matching number sequence, namely Dormi,s=Dormi,s+1, and if
Figure BDA0002237981260000145
Dorm dailyi,sThe dormitory s with the highest corresponding value is the best judgment Dorm of the dormitory where the user corresponding to the number i is located in the same dayiThe Dorm may be a person who is walking, e.g. a student walking to a series of dormitoriesiThe corresponding values will vary, and will repeat over a period of time (e.g., one or two weeks from study), but will settle within one or two weeks, at which time the DormiAs an output value, adding a label to the number i, wherein the label carries a dormitory number and a corresponding frequency, and the frequency value is
Figure BDA0002237981260000151
s is DormiThe corresponding dormitory number.
Step S623, acquiring position data of the corresponding number appearing in at least one courtyard activity time period;
specifically, an activity list of each department in the campus is obtained in advance, and the activity list comprises: information such as activity time, grids, activity contents, a holding department and the like, and position data (such as grids, the number of occurrences and the like) of the number is obtained based on the mobile phone signal flow and the activity time obtained by each department;
step S624, matching the corresponding hospital system based on the acquired position data to obtain a second matching result;
specifically, the obtained grids are respectively matched with the grids where the hospital department of each activity list is located, and a corresponding second matching result is obtained. The second matching result is the hospital system of the user corresponding to the number; for example, the grid where the number appears is matched with the corresponding courtyard outline polygon to obtain a corresponding second matching result.
Preferably, the step S624 specifically includes:
acquiring raster data of corresponding numbers appearing in each courtyard activity time period based on the position data;
specifically, grid data appearing in each department activity time period is obtained based on position data, wherein the grid data comprises grids and corresponding appearance times;
matching the grid with the largest occurrence frequency with the position of the corresponding hospital system to obtain a second matching result;
specifically, as more than one grid appears in the activity time and the appearance times are inconsistent, the appearing grids are sequenced according to the appearance times to obtain the grid with the most appearance times, so that the grids with the most appearance times corresponding to each hospital department activity are obtained, and then the grids with the most appearance times are matched with the positions of the corresponding hospital departments to obtain the hospital department of the number where the user is located.
For ease of understanding, the identification process is described in detail below:
obtaining a campus activity plan comprising a plurality of court-family activity lists Act (ActName)h,th,1,th,2,Depth) At each active time period (t)h,1,th,2) In, for each number i ∈ LSchRoamCalculating each grid griId it has appeared during the period and the number of occurrences, activating for each number i the grid G that is most active during activity hi,hCourtyard outline polygon using activity h correspondences
Figure BDA0002237981260000161
For Gi,hGrid center longitude and latitude
Figure BDA0002237981260000162
Calculating the inclusion relation, and judging Gi,hWhether surrounded by a courtyard outline polygon) and can also be calculated by Gi,hDistance G from center point of polygon of hospital system contouri,hJudging whether the outline polygon surrounds the boundary by the difference value between the maximum distances between the edges of the outline polygon of the courtyard system (the difference value is larger than 0, which indicates that the outline polygon is outside, or else indicates that the outline polygon is not outside), if the outline polygon surrounds the boundary, the matching is successful, and adding 1 to the matching sequence of the courtyard system in each successfully matched activity, namely adding 1 to the matching sequence of the courtyard system
Figure BDA0002237981260000163
And is
Figure BDA0002237981260000164
Daily life
Figure BDA0002237981260000165
Highest value hospital grade depthI.e. the best judgment Dept of the family of the user corresponding to the number iiThe value of DeptiRepeats at the beginning of a period of time (e.g., one week), but stabilizes after a week because the student's activities stabilize after a period of time of admission, the number i is tagged with a number of homes, a frequency, etc., with the frequency being set to
Figure BDA0002237981260000166
depthIs DeptiThe yard is the number.
Step S625, obtaining an identification result based on the first matching result and the second matching result;
specifically, the identification result of the number is obtained by combining the first matching result and the second matching result, for example, the dormitory where the user corresponding to the number is located is obtained according to the first matching result, the institution where the user is located is obtained by combining the second matching result, whether the user is a student is determined, and the corresponding result is output.
In a preferable scheme of this embodiment, after step S625, the method further includes:
acquiring a companion number set of students based on the campus roaming number set;
specifically, after the number is matched with an institution and a dormitory, the user corresponding to the number is identified as a student, and then a matching number set corresponding to the student is obtained based on the campus roaming number set;
further, each number i e L is sortedSchRoamIts associated value DormiAnd DeptiAll being not 0, if i belongs to LSchool,2And other numbers in the campus roaming number set form a companion number set
Figure BDA0002237981260000171
At the same time, for each number i ∈ LSchRoamCorresponding DormiAnd DeptiAnd dormitory frequency
Figure BDA0002237981260000172
Frequency of hospital series
Figure BDA0002237981260000173
Two values are output as dormitory and institution labels of number i, i belongs to LSchool,2Corresponding to the number of (1)
Figure BDA0002237981260000176
And
Figure BDA0002237981260000174
the values are all 0.
Specifically, referring to fig. 9, the step S1313 specifically includes:
step S91, acquiring a mobile phone data stream set from a base station associated with a campus;
specifically, a base station associated with the campus is used for acquiring a real-time mobile phone data flow set RRTSaid handThe machine data stream comprises data stream records of a plurality of numbers, and each data stream record comprises corresponding behavior data;
step S92, matching the mobile phone data stream set with the campus roaming number set to obtain a mobile phone data stream subset corresponding to the campus;
specifically, a mobile phone data stream set L which does not belong to the campus roaming number set is firstly selectedSchRoamThe records corresponding to other numbers except the number in the list are removed, and the mobile phone data stream set is matched with the campus roaming number set, for example, the campus roaming number set L is removedSchRoamEach number in the list is compared with a mobile phone data stream set to obtain a campus roaming number set LSchRoamThe mobile phone data stream data corresponding to each number in the mobile phone data stream subset R is obtainedRTSchRoam
Step S93, behavior data corresponding to a number is obtained from the mobile phone data stream subset;
in particular, from RRTSchRoamObtaining each number i belongs to LSchRoamData stream of mobile phone
Figure BDA0002237981260000175
And (5) performing behavior data coding on each number i based on mobile phone data flow data, and extracting mobile phone browsing record ri,t,loc,urlThe mobile phone browsing record comprises: the number i belongs to LSchRoamRecording time t (occurrence time), recording base station bsid, base station longitude and latitude (lon/lat), accessing page data (pageTypeId), using APP data (APP-appId), call/short message identification actType (call, SMS, Other), call/short message opposite terminal number comNum, and incoming/outgoing call identification direction (in/out).
Step S94, substituting the behavior data into a pre-established two-dimensional matrix to obtain a behavior matrix corresponding to the number;
specifically, a two-dimensional matrix is established in advance, and the establishing process of the two-dimensional matrix is as follows: arranging all access page data (APPid) and using APP data (pageTypeId) according to an axis sequence (the arrangement is related to the encoding modes of the pageTypeId and the APPid), arranging the recording base station bsid according to another axis, and constructing a two-dimensional matrix of the using behavior of the user with the number i, wherein the expression is as follows:
Figure BDA0002237981260000181
substituting the behavior data into the two-dimensional matrix to form an L belonging to each number iSchRoamConstructing a corresponding two-dimensional array:
Figure BDA0002237981260000182
where row (1 … c) represents c sorted appids and pageTypeId, column (1 … m) represents m sorted recording base stations bsid,
Figure BDA0002237981260000183
the number of times that the user of the number i uses the alpha application or browses the alpha web content classification under the beta base station is indicated.
Step S95, carrying out column normalization processing on the behavior matrix to obtain a processed behavior matrix;
in particular, Act is a matrix for each behavioriPerforming column normalization to obtain processed row matrix
Figure BDA0002237981260000184
Step S96, calculating corresponding behavior correlation values based on the processed behavior matrix and the standard student behavior matrix;
specifically, during the start of a school, each school will develop a campus marketing campaign that confirms the true identity of a portion of the numbers from which the student list L is derivedstudentAnd a parental list LfamilyInputting the two lists into a two-dimensional matrix for training a student identification model, and matching a school yard roaming number set with the student list and a parent list to obtain an uncertain identity number set of the school yard
Figure BDA0002237981260000191
Based on the student list i belongs to LstudentAnd the current processed behavior matrix
Figure BDA0002237981260000192
Establishing a student behavior model:
Figure BDA0002237981260000193
the behavior matrix model is subjected to column normalization processing to obtain a standard student behavior model matrix
Figure BDA0002237981260000194
Then, an uncertain (to be identified) identity number i epsilon L is calculatedundefinedCalculating the behavior correlation value of the student with the standard student, and processing the behavior matrix according to the number
Figure BDA0002237981260000195
Processed behavior matrix with standard students
Figure BDA0002237981260000196
Calculating a behavior correlation value as follows:
Figure BDA0002237981260000197
step S97, comparing the corresponding behavior correlation value with a standard correlation value to obtain an identification result corresponding to the number;
in particular, for all student numbers i' e L for which identities have been confirmedstudentObtaining each pre-calculated number i' epsilon LstudentCorresponding behavior related value
Figure BDA0002237981260000201
(in a manner consistent with the foregoing); get
Figure BDA0002237981260000202
With rcutoffAs a standard for confirming the numbers of students to be confirmed, comparing the behavior related value corresponding to the numbers to be identified with the confirmation standard, identifying whether the user corresponding to the numbers is a student according to the comparison result, obtaining the identification result corresponding to each number, and finally obtaining a student list and an accompanying person list, wherein the student list specifically comprises:
Figure BDA0002237981260000203
the accompanying person list is
Figure BDA0002237981260000204
Then attaching a behavior tag to each recognized student number, i.e.
Figure BDA0002237981260000205
Specifically, the step S132 specifically includes: inputting the recognition result based on the activity record, the recognition result based on the real-time mobile phone data stream and the recognition result based on the behavior data into a merging model for learning training, and outputting a student number list;
further, the merged model is a two-layer neural network model, a first layer neural network and a second layer neural network, the first layer neural network includes three neurons, the second layer neural network includes two neurons, the first layer neural network receives the three recognition results, and specifically:
Figure BDA0002237981260000206
the two neurons of the second layer comprise two weighting matrices, respectively:
Figure BDA0002237981260000211
the merging model comprises the following structure:
Figure BDA0002237981260000212
Figure BDA0002237981260000213
Figure BDA0002237981260000214
wherein in is individual and is the number of a certain individual, n is the sample size, and k is the number of the neuron in the first layer network; w is the weighted value of each neuron, a is the output of the neuron, b is two layers of offset vectors, belongs to the standard neuron operation calculation set value,
Figure BDA0002237981260000215
for the activation function of any neuron, z is a parameter for adjusting the shape of the activation function σ, and belongs to the setting value of the standard neural network activation function, the cost function (cost function) is:
Figure BDA0002237981260000216
Figure BDA0002237981260000217
for the identity of the input function (i.e. the user of number i) confirmed in the marketing campaign,
Figure BDA0002237981260000218
respectively, are the offset vectors, respectively,
Figure BDA0002237981260000221
three recognition results are input into three neurons of the first layer neural network,
Figure BDA0002237981260000222
the result is the identity of the student or the identity of the accompanying parent, if the marketing campaign confirms that the student is the student, the component of the vector is _ student is 1, and if the marketing campaign confirms that the parent is the accompanying parent, the component of the vector is _ family is 11, if there is no marketing activity confirmation information, both are 0 (discard the sample); if both marketing campaign feedbacks are acknowledged, both are 1, (this sample also needs to be discarded), if the output result is
Figure BDA0002237981260000223
The cutoff for both component decisions corresponds to a value of 0.5.
In this embodiment, the training data of the merged model is derived from the confirmed student list and the confirmed parent list fed back by the marketing campaign, a backward propagation method is used for training the model, the student list and the parent list are updated every day during the beginning of each year, the merged model is retrained every day by using the data of the student list and the parent list to obtain an updated merged model, and if more basic data are trained and learned in the model, the recognition scientificity of the model can be improved.
The merging process is as follows:
for each unidentified number
Figure BDA0002237981260000224
The corresponding three recognition results are imported into the merging model for training and learning, the recognition is recalculated, and the result is output
Figure BDA0002237981260000225
Forming a list of inferred students from the output results
Figure BDA0002237981260000226
And guess parental lists
Figure BDA0002237981260000231
However, in both of the above two guess lists, there may be one number in both the guess student list and the guess parent list.
After obtaining the guess list, it is necessary to confirm the tag of each number in the list, for example, it is necessary to compare the three recognition results with the result obtained by inputting the three recognition results into the merging model, and selectively output the tags of the three recognition results to the final user tag, which includes the following specific processes:
confirmation of the companion tag:
Figure BDA0002237981260000232
retaining its companion tag;
Figure BDA0002237981260000233
&i∈LSchool,1then the companion tag is temporarily retained, i.e., not retained in the list, but stored for later use;
the identification of the dormitory and department tags,
Figure BDA0002237981260000234
keeping the labels of the courtyard and dormitory;
Figure BDA0002237981260000235
&i∈LSchool,2temporarily not using any tag thereof, i.e. not keeping the companion tag in the list, but storing it for later use;
after the student list and the parent list confirmed after the marketing activity feedback are updated, the corresponding labels need to be confirmed again for the numbers I newly added in all the lists;
as with the validation of the tag:
Figure BDA0002237981260000236
retaining its companion tag;
Figure BDA0002237981260000237
discarding the corresponding tag;
and (3) confirmation of dormitory and institution labels:
Figure BDA0002237981260000238
the labels of dormitories and institutions are reserved,
Figure BDA0002237981260000239
discarding the corresponding tag;
in this embodiment, a campus roaming number set is first obtained, each number in the campus roaming number set is identified based on an activity record of a user of a mobile phone number, a real-time mobile phone data stream, user behavior data, and the like, and a student number list is output, so that accuracy of student number identification can be improved.
In a second embodiment of the present invention, as shown in fig. 10, a flow chart of a student identification method provided in an embodiment of the present invention is shown, where the student identification method includes:
step S101, acquiring a campus roaming number set.
Specifically, the campus roaming number set refers to numbers which enter an area where a campus is located and are captured by a campus-associated base station, the number resident in the campus is removed from the captured numbers, the rest numbers are used as a campus roaming number set, the time of the number appearing in the campus does not exceed a preset value, the preset value can be set according to actual conditions, for example, half a year, further, the number captured by the campus-associated base station within one year before the day of school is obtained, the number appearing in half and more than half a year is removed, the rest numbers are used as a campus roaming number set, the number resident in the campus refers to the time of the number existing in the associated base station reaching more than half a year, and the number can be considered to be owned by people resident in the campus. And a time length of less than half a year may be considered a roaming number. The length of time may be a time of continuous cumulative occurrence, for example: within a year, the total time of appearance reaches more than half a year, and the number can be regarded as a permanent number. It should be noted that, the associated base station stores the real-time mobile phone data of the captured mobile phone numbers, and the base station may associate multiple schools, that is, the campus roaming number set may include roaming number sets of multiple schools, and preferably, the schools are used as distinctions, that is, one school forms a campus roaming number list. And the roaming number may include a parent, classmate, friend, or student at school.
Step S102, acquiring attribute data of a plurality of roaming numbers;
specifically, a campus roaming number set L is obtainedSchRoamThe attribute data of each number in the above description includes, for example, an activity record of a user corresponding to each roaming number in a preset time period, that is, the attribute data may be an activity record of a user corresponding to each roaming number in a preset time period, and further, the campus roaming number set L in the last year may be extracted from a big data platformSchRoamActive record set R of all numbers inhistRoamThe activity record r corresponding to each number in the activity record seti,t,bsComprises the following steps: the number i belongs to LSchRoamRecording time t, recording base station bs, base station longitude and latitude (lon/lat), call/short message identifier actType (call, SMS, Other), call/short message opposite terminal number conNum, and incoming/outgoing call identifier direction (in/out).
Step S103, identifying a student number from the campus roaming number set based on the activity record of a preset time period, and outputting an identification result based on the activity record;
specifically, each number in the campus roaming number set is identified based on the attribute data to obtain a corresponding identification result, where the identification result at least includes a student number list, and may also include a parent number list and an uncertain list, which is not limited herein.
In this embodiment, a campus roaming number set is first obtained, and the accuracy of number identification is recorded based on the activity of a mobile phone number user.
In this embodiment, referring to fig. 11, the step S103 specifically includes:
step S111, acquiring activity records of each number in the campus roaming number set in a preset time period, and summarizing the activity records into an activity record set;
specifically, the campus roaming number set L of the campus in the last year is firstly extracted from a big data platformSchRoamActive record set R of all numbers inhistRoamThe activity record r corresponding to each number in the activity record seti,t,bsThe method comprises the following steps: the number i belongs to LSchRoamRecording the time tRecording base station bs, base station longitude and latitude (lon/lat), call/short message identification actType (call, SMS, Other), call/short message opposite terminal number conNum, incoming call/outgoing call identification direction (in/out), and collecting the activity records of each number i into an activity record set RhistRoam
Step S112, acquiring base station data corresponding to each number based on the activity record set;
in particular, each activity record is analyzed to identify base station data for the corresponding number, e.g., to identify the number corresponding to the subscriber's residence base station BSi,rAnd a work site base station BSi,p(for students, the work base station is a school base station), namely, the position of the residence and the position of the work place of the user are identified;
step S113, acquiring numbers appearing in the same base station based on the base station data corresponding to each number to obtain a number set of the corresponding base station;
specifically, after obtaining the work site base station and the residence base station of each number corresponding to the user, respectively obtaining the number sets appearing in the same base station based on the work site base station and the residence base station, the number set comprises a target number set and an accompanying number set, the target number set comprises each residential base station number set and each work place base station number set, a corresponding residential number set is formed for each residential base station based on the residential base station corresponding to each number, the residential accompanying number set corresponding to each number is obtained from the acquired residential number set, then, a corresponding work area number set is formed for each work area base station based on the work base station corresponding to each number, and a work area accompanying number set corresponding to each number is obtained from the work area number set, for example, the work area base stations BS with the same residence area are obtained.i,rForm a number set corresponding to each residential base station, and base stations BS with the same working placei,pThe numbers are collected to form a number set corresponding to each work base station; according to the working ground station BSi,pObtaining a companion number set for each number
Figure RE-GDA0002280497990000261
According to the residence base station BSi,rObtaining a companion number set for each number
Figure RE-GDA0002280497990000262
It should be noted that, the operation site base station and the operation site number set and the operation site accompanying number set may be acquired first, and then the residential site base station, the residential site number set and the residential site accompanying number set may be acquired, or both may be performed simultaneously, which is not limited herein.
Step S114, acquiring a student number set corresponding to each school and an accompanying number set of each number based on the number sets;
specifically, a student number set corresponding to each school and a companion number set of each number are obtained based on a number set, campus base station data obtained in advance are matched with each work place number set to form a student number set corresponding to the campus, and a companion number set of each student on the campus is obtained based on the student number set, for example: base station BS based on the aforementioned concrete same operationi,pIs matched with the base station associated with the school
Figure BDA0002237981260000263
Obtaining the student number set corresponding to each school
Figure BDA0002237981260000264
The school may be a university, middle school, or primary school, although this is not a limitation and preferably the school is a university or middle school. Obtaining an accompanying number set of each student number according to the student number set and the accompanying number set of each student on the campus
Figure BDA0002237981260000265
Step S115, acquiring a social relationship set based on the student number set and the accompanying number set;
specifically, a corresponding social relationship set is obtained according to the student number set and the accompanying number set of each student. The social relationship binding includes: a family number set, a college number set and a friend number set;
it should be noted that, steps S111 to S115 in this embodiment are the same as steps S41 to S45 shown in fig. 4 in the first preferred embodiment, and the implementation processes and technical effects of the steps are the same, and specific reference is made to the above description, which is not described herein again.
In a third embodiment of the present invention, as shown in fig. 12, a flow chart of a student identification method provided in an embodiment of the present invention is shown, where the student identification method includes:
step S121, acquiring a campus roaming number set.
Specifically, the campus roaming number set refers to numbers which enter an area where a campus is located and are captured by a campus-associated base station, the number resident in the campus is removed from the captured numbers, the rest numbers are used as a campus roaming number set, the time of the number appearing in the campus does not exceed a preset value, the preset value can be set according to actual conditions, for example, half a year, further, the number captured by the campus-associated base station within one year before the day of school is obtained, the number appearing in half and more than half a year is removed, the rest numbers are used as a campus roaming number set, the number resident in the campus refers to the time of the number existing in the associated base station reaching more than half a year, and the number can be considered to be owned by people resident in the campus. And a time length of less than half a year may be considered a roaming number. The length of time may be a time of continuous cumulative occurrence, for example: within a year, the total time of appearance reaches more than half a year, and the number can be regarded as a permanent number. It should be noted that, the associated base station stores the real-time mobile phone data of the captured mobile phone numbers, and the base station may associate multiple schools, that is, the campus roaming number set may include roaming number sets of multiple schools, and preferably, the schools are used as distinctions, that is, one school forms a campus roaming number list. And the roaming number may include a parent, classmate, friend, or student at school.
Step S122, acquiring attribute data of the plurality of roaming numbers;
specifically, a campus roaming number set L is obtainedSchRoamThe attribute data includes the real-time mobile phone data stream of the user, the real-time mobile phone data stream r of the useri,t,gridObtaining, from a base station associated with a campus, comprising: the number i belongs to LSchRoamRecording time t, recording base station bs, base station longitude and latitude (lon/lat), accurate positioning grid number grid, grid center longitude and latitude (grid lon/grid), call/short message identification actType (call, SMS, Other), call/short message opposite terminal number conNum, and incoming/outgoing call identification direction (in/out);
step S123, identifying numbers of students from the campus roaming number set based on the real-time mobile phone data streams to obtain identification results based on the mobile phone data streams;
specifically, each number in the campus roaming number set is identified based on the real-time mobile phone data stream, so as to obtain a corresponding identification result, where the identification result at least includes a student number list, and may also include a parent number list and an uncertain list, which is not limited herein.
In this embodiment, a campus roaming number set is first obtained, each number in the campus roaming number set is identified based on a real-time mobile phone data stream, and a student number list is output, so that the accuracy of student number identification can be improved.
Referring to fig. 13, the step S123 specifically includes:
step S1301, acquiring a mobile phone data stream corresponding to a number in the campus roaming number set;
step S1302, acquiring position data of a corresponding user based on the mobile phone data stream, and identifying according to the position data to obtain a corresponding identification result;
obtaining an identification result corresponding to one number through the steps S1301 and S1302, and then repeating the steps S1301 and S1302 to obtain an identification result of another number until obtaining an identification result of each number in the campus roaming number set, thereby obtaining a student number list.
It should be noted that, in this embodiment, the specific processes of step S1301 and step S1302 are the same as the specific implementation processes of step S61 and step S62 in the first preferred embodiment, and specific reference may be made to the description of the above embodiments, and details are not repeated here.
In a fourth embodiment of the present invention, as shown in fig. 14, a flow chart of a student identification method provided in an embodiment of the present invention is shown, where the student identification method includes:
step S141, a campus roaming number set is obtained.
Specifically, the campus roaming number set refers to numbers which enter an area where a campus is located and are captured by a campus-associated base station, the number resident in the campus is removed from the captured numbers, the rest numbers are used as a campus roaming number set, the time of the number appearing in the campus does not exceed a preset value, the preset value can be set according to actual conditions, for example, half a year, further, the number captured by the campus-associated base station within one year before the day of school is obtained, the number appearing in half and more than half a year is removed, the rest numbers are used as a campus roaming number set, the number resident in the campus refers to the time of the number existing in the associated base station reaching more than half a year, and the number can be considered to be owned by people resident in the campus. And a time length of less than half a year may be considered a roaming number. The length of time may be a time of continuous cumulative occurrence, for example: within a year, the total time of appearance reaches more than half a year, and the number can be regarded as a permanent number. It should be noted that, the associated base station stores the real-time mobile phone data of the captured mobile phone numbers, and the base station may associate multiple schools, that is, the campus roaming number set may include roaming number sets of multiple schools, and preferably, the schools are used as distinctions, that is, one school forms a campus roaming number list. And the roaming number may include a parent, classmate, friend, or student at school.
Step S142, acquiring attribute data of the plurality of roaming numbers;
specifically, a campus roaming number set L is obtainedSchRoamEach number in (1)The attribute data comprises the behavior data of the user, and the behavior data comes from the real-time mobile phone data stream R of the campus associated base stationRTReal-time mobile phone data stream RRTAnd campus roaming number set LSchRoamMatching is carried out, numbers which do not belong to the campus roaming number set are removed, and the obtained mobile phone data stream RRTSchoamThe mobile phone data stream R is usedRTSchoamClassifying and summarizing according to each number i to obtain the user i belonging to L of each numberSchRoamAccumulated short-term mobile phone data stream R in campusi,RTSchoamWhen obtaining the mobile phone data stream R of each user ii,RTSchoamThen, the behavior data coding is carried out on each mobile phone number, and the mobile phone data stream R of the user i is extractedi,RTSchoamMobile phone browsing record ri,t,loc,urlThe method comprises the following steps: i is as large as LSchRoamRecording time t, recording base station bsid, base station longitude and latitude (lon/lat), access page classification (pageTypeId), used APP- (appId), call/short message identification actType (call, SMS, Other), call/short message opposite terminal number conNum, and the like.
Step S143, identifying numbers of students from the campus roaming number set based on the behavior data, and obtaining identification results based on the behavior data;
specifically, the behavior data acquired from the base station is identified with the campus roaming number set to obtain the identification result of each number, the identification results of all numbers form the identification result based on the behavior data, and the identification result comprises a student number list and an accompanying number list of each student number.
In this embodiment, the campus roaming number set is first obtained, each number in the campus roaming number set is identified based on the behavior data of the user, and the student number list is output, so that the accuracy of student number identification can be improved.
Specifically, referring to fig. 15, the step S143 specifically includes:
step S151, acquiring a mobile phone data stream set from a base station associated with a campus;
specifically, a base station associated with the campus is used for acquiring real-time mobile phone data streamsSet RRTThe mobile phone data stream comprises data stream records of a plurality of numbers, and each data stream record comprises corresponding behavior data;
step S152, matching the mobile phone data stream set with the campus roaming number set to obtain a mobile phone data stream subset corresponding to a school garden;
specifically, a mobile phone data stream set L which does not belong to the campus roaming number set is firstly selectedSchRoamThe records corresponding to other numbers except the number in the list are removed, and the mobile phone data stream set is matched with the campus roaming number set, for example, the campus roaming number set L is removedSchRoamEach number in the list is compared with a mobile phone data stream set to obtain a campus roaming number set LSchRoamThe mobile phone data stream data corresponding to each number in the mobile phone data stream subset R is obtainedRSchRoamT
Step S153, acquiring behavior data corresponding to a number from the mobile phone data stream subset;
in particular, from RRSchRoamTObtaining each number i belongs to LSchRoamData flow R of mobile phone datai,RSchRoamTPerforming behavior data coding on each number i based on mobile phone data flow data, and extracting mobile phone browsing record ri,t,loc,urlThe mobile phone browsing record comprises: the number i belongs to LSchRoamRecording time t (occurrence time), recording base station bsid, base station longitude and latitude (lon/lat), accessing page data (pageTypeId), using APP data (APP-appId), call/short message identification actType (call, SMS, Other), call/short message opposite terminal number comNum, and incoming/outgoing call identification direction (in/out).
Step S154, substituting the behavior data into a pre-established two-dimensional matrix to obtain a behavior matrix corresponding to the number;
specifically, a two-dimensional matrix is established in advance, and the establishing process of the two-dimensional matrix is as follows: arranging all access page data (APPid) and using APP data (pageTypeId) according to an axis sequence (the arrangement is related to the encoding modes of the pageTypeId and the APPid), arranging the recording base station bsid according to another axis, and constructing a two-dimensional matrix of the using behavior of the user with the number i, wherein the expression is as follows:
Figure BDA0002237981260000302
substituting the behavior data into the two-dimensional matrix to form an L belonging to each number iSchRoamConstructing a corresponding two-dimensional array:
Figure BDA0002237981260000303
where row (1 … n) represents p sorted appids and pageTypeId, column (1 … m) represents m sorted base stations bsid,
Figure BDA0002237981260000304
the number of times that the user of the number i uses the alpha application or browses the alpha web content classification under the beta base station is indicated.
Step S155, normalizing the behavior matrix to obtain a processed behavior matrix;
in particular, Act is a matrix for each behavioriPerforming column normalization to obtain a processed behavior matrix
Figure BDA0002237981260000311
Step S156, calculating corresponding behavior correlation values based on the processed behavior matrix and the standard student behavior matrix;
specifically, during the start of a school, each school will develop a campus marketing campaign that confirms the true identity of a portion of the numbers from which the student list L is derivedstudentAnd a parental list LfamilyInputting the two lists into a two-dimensional matrix for training a student identification model, and matching a school yard roaming number set with the student list and a parent list to obtain an uncertain identity number set of the school yard
Figure BDA0002237981260000312
Based on the student list i belongs to LstudentAnd the current processed behavior matrix
Figure BDA0002237981260000313
Establishing a student behavior model:
Figure BDA0002237981260000314
the behavior matrix model is subjected to column normalization processing to obtain a standard student behavior model matrix
Figure BDA0002237981260000315
Then, an uncertain (to be identified) identity number i epsilon L is calculatedundefinedCalculating the behavior correlation value of the student with the standard student, and processing the behavior matrix according to the number
Figure BDA0002237981260000316
Processed behavior matrix with standard students
Figure BDA0002237981260000317
Calculating a behavior correlation value as follows:
Figure BDA0002237981260000321
step S157, comparing the corresponding behavior correlation value with the standard correlation value to obtain an identification result corresponding to the number;
in particular, for all student numbers i' e L for which identities have been confirmedstudentObtaining each pre-calculated number i' epsilon LstudentCorresponding behavior related value
Figure BDA0002237981260000322
(in a manner consistent with the foregoing); get
Figure BDA0002237981260000323
With rcutoffAs a reference for confirming the numbers of the students to be confirmed, the numbers to be identified are correspondedComparing the behavior correlation value with the confirmation reference, identifying whether the user corresponding to the number is a student according to the comparison result, obtaining the identification result corresponding to each number, and finally obtaining a student list and a partner list, wherein the student list specifically comprises:
Figure BDA0002237981260000324
the list of accompanying persons is
Figure BDA0002237981260000325
Then attaching a behavior tag to each recognized student number, i.e.
Figure BDA0002237981260000326
It should be noted that the specific implementation process of step S151 to step S157 in this embodiment is the same as the specific process described in the embodiment corresponding to fig. 9, and is not described herein again, and reference may be made to the above description.
In this embodiment, the campus roaming number set is first obtained, each number in the campus roaming number set is identified based on user behavior data, and a student number list is output, so that the accuracy of student number identification can be improved.
Fig. 16 is a schematic structural diagram of a student identification device according to a fifth embodiment of the present invention, and as shown in fig. 16, the device includes: the number acquisition module 161, the data acquisition module 162 connected with the number acquisition module 161, and the identification module 163 connected with the data acquisition module 162, wherein:
the number obtaining module 161 is configured to obtain a campus roaming number set.
Specifically, the campus roaming number set refers to numbers which enter an area where a campus is located and are captured by a campus-associated base station, the number resident in the campus is removed from the captured numbers, the rest numbers are used as a campus roaming number set, the time of the number appearing in the campus does not exceed a preset value, the preset value can be set according to actual conditions, for example, half a year, further, the number captured by the campus-associated base station within one year before the day of school is obtained, the number appearing in half and more than half a year is removed, the rest numbers are used as a campus roaming number set, the number resident in the campus refers to the time of the number existing in the associated base station reaching more than half a year, and the number can be considered to be owned by people resident in the campus. And a time length of less than half a year may be considered a roaming number. The length of time may be a time of continuous cumulative occurrence, for example: within a year, the total time of appearance reaches more than half a year, and the number can be regarded as a permanent number. It should be noted that, the associated base station stores the real-time mobile phone data of the captured mobile phone numbers, and the base station may associate multiple schools, that is, the campus roaming number set may include roaming number sets of multiple schools, and preferably, the schools are used as distinctions, that is, one school forms a campus roaming number list. And the roaming number may include a parent, classmate, friend, or student at school.
A data obtaining module 162, configured to obtain attribute data of the multiple roaming numbers;
specifically, a campus roaming number set L is obtainedSchRoamAttribute data for each number in the set, the attribute data may include one or more of: recording the activity of the user corresponding to each roaming number in a preset time period, and carrying out real-time mobile phone data flow and behavior data of the user; that is, the attribute data may be one of the activity record of the user corresponding to each roaming number in the preset time period, the real-time mobile phone data stream of the user, and the behavior data of the user, or may include three of the activity record of the user corresponding to each roaming number in the preset time period, the real-time mobile phone data stream of the user, and the behavior data of the user, which is not limited herein. Further, the campus roaming number set L of the last year can be extracted from the big data platformSchRoamActive record set R of all numbers inhistRoamThe activity record r corresponding to each number in the activity record seti,t,bsThe method comprises the following steps: the number i belongs to LSchRoamRecording time t, recording base station bs, base station longitude and latitude (lon/lat), call/short message identification actType (call, SMS, Other), general communicationA call/short message opposite terminal number conNum, an incoming/outgoing call identification direction (in/out). The user's real-time handset data stream ri,t,gridObtaining, from a base station associated with a campus, comprising: the number i belongs to LSchRoamRecording time t, recording base station bs, base station longitude and latitude (lon/lat), accurate positioning grid number grid, grid center longitude and latitude (grid/grid), call/short message identification actType (call, SMS, Other), call/short message opposite terminal number conNum, call/outgoing identification direction (in/out), real-time mobile phone data stream R corresponding to the campus roaming number setRTAccuComprises a real-time mobile phone data stream r corresponding to each numberi,t,grid. The behavior data come from a real-time mobile phone data stream R of a campus associated base stationRTReal-time mobile phone data stream RRTAnd campus roaming number set LSchRoamMatching is carried out, numbers which do not belong to the campus roaming number set are removed, and the obtained mobile phone data stream RRTSchoamThe mobile phone data stream R is usedRTSchoamClassifying and summarizing according to each number i to obtain the user i belonging to L of each numberSchRoamAccumulated short-term mobile phone data stream R in campusi,RTSchoamWhen obtaining the mobile phone data stream R of each user ii,RTSchoamThen, encoding the mobile phone number to extract the mobile phone data stream R of the user ii,RTSchoamMobile phone browsing record ri,t,loc,urlThe method comprises the following steps: i is as large as LSchRoamRecording time t, recording base station bsid, base station longitude and latitude (lon/lat), access page classification (pageTypeId), used APP- (appId), call/short message identification actType (call, SMS, Other), call/short message opposite terminal number conNum, and the like.
The identification module 163 is used for identifying students from the campus roaming number set based on the attribute data and outputting identification results;
specifically, each number in the campus roaming number set is identified based on the attribute data to obtain a corresponding identification result, where the identification result at least includes a student number list, and may also include a parent number list and an uncertain list, which is not limited herein.
In this embodiment, a campus roaming number set is first obtained, each number in the campus roaming number set is identified based on an activity record of a user of a mobile phone number, a real-time mobile phone data stream, user behavior data, and the like, and a student number list is output, so that accuracy of student number identification can be improved.
In a preferred scheme of this embodiment, the apparatus presets a plurality of sets of databases, including: a middle school zone database (such as campus base station data), a college school zone building database (such as dormitory and institution), a confirmation identity database (such as identification number).
In an alternative form, the attribute data includes: the identification module 163 specifically includes, for each user corresponding to the roaming number, an activity record of the user in a preset time period, a real-time mobile phone data stream, and behavior data of the corresponding user: an identification unit and a merging unit, wherein: the identification unit specifically comprises: long-term identification model, activity matching model and action learning model, this merging unit includes the merging models who all connects with long-term identification model, activity matching model and action learning model, wherein:
the long-term identification model is used for identifying numbers of students from the campus roaming number set based on the activity records in the preset time period to obtain an identification result based on the activity records;
specifically, numbers of students are identified from a campus roaming number set based on activity records of each number in a preset time period, and a student number list based on the activity records is obtained; and executing the identification operation on each number in the campus roaming number set to obtain a corresponding identification result, wherein all the identification results form an identification list, and the identification list comprises a student number identification list and an accompanying number list of each number and the like. The specific value of the preset time period may be set according to the actual situation, and is not limited herein, for example: one or half a year, or two years, etc. Preferably, the preset period of time is one year. The data that the model needs to prepare includes: drawing a middle school campus, establishing data blocks of the middle school campus, and establishing a base station data list covering each campus;
the activity matching model is used for identifying numbers of students from the campus roaming number set based on the real-time mobile phone data stream to obtain an identification result based on the mobile phone data stream;
specifically, the identification result of each number is obtained by identifying the real-time mobile phone data stream acquired from the base station and the campus roaming number set, and the identification results of all numbers form the identification result based on the mobile phone data stream, wherein the identification result comprises a student number list and an accompanying number list of each student number. The model needs to be subjected to building mapping of colleges and universities to form data of all college campus buildings, establish a corresponding indoor base station data list covering each building and store all activity records during the study period in all college campuses;
the behavior learning model is used for identifying numbers of students from the campus roaming number set based on behavior data to obtain identification results based on the behavior data;
specifically, the behavior data acquired from the base station is identified with the campus roaming number set to obtain the identification result of each number, the identification results of all numbers form the identification result based on the behavior data, and the identification result comprises a student number list and an accompanying number list of each student number. The learning model needs to transmit the information of the confirmed identity in real time;
the merging model is used for inputting the obtained recognition result based on the activity record, the recognition result based on the mobile phone data stream and the recognition result based on the behavior data into the merging model for merging, outputting the student recognition result, obtaining a final student number list, and updating the current student number list, and in addition, the student recognition result can also comprise: a list of parent numbers.
Preferably, the long-term identification model is specifically used for:
acquiring activity records of each number in the campus roaming number set in a preset time period, and summarizing the activity records into an activity record set;
specifically, first from largeThe data platform extracts a campus roaming number set L in the last yearSchRoamActive record set R of all numbers inhistRoamThe activity record r corresponding to each number in the activity record seti,t,bsThe method comprises the following steps: the number i belongs to LSchRoamRecording time t, recording base station bs, base station longitude and latitude (lon/lat), call/short message identification actType (call, SMS, Other), call/short message opposite terminal number conNum, incoming call/outgoing call identification direction (in/out), and aggregating the activity records of each number i into an activity record set RhistRoam
Acquiring base station data corresponding to each number based on the activity record set;
in particular, each activity record is analyzed to identify base station data for the corresponding number, e.g., to identify the number corresponding to the subscriber's residence base station BSi,rAnd a work site base station BSi,p(for students, the work place base station is the school base station), namely, the position of the residence place and the position of the work place of the user are identified;
acquiring numbers appearing in the same base station based on the base station data corresponding to each number to obtain a number set of the corresponding base station;
specifically, after obtaining the work site base station and the residence base station of each number corresponding to the user, respectively obtaining the number sets appearing in the same base station based on the work site base station and the residence base station, the number set comprises a target number set and an accompanying number set, the target number set comprises each residential base station number set and each work place base station number set, a corresponding residential number set is formed for each residential base station based on the residential base station corresponding to each number, the residential accompanying number set corresponding to each number is obtained from the acquired residential number set, then, a corresponding work area number set is formed for each work area base station based on the work base station corresponding to each number, and a work area accompanying number set corresponding to each number is obtained from the work area number set, for example, the work area base stations BS with the same residence area are obtained.i,rThe numbers of (a) are collected to form a number set corresponding to each residential base station, and the numbers will have the same workActing base station BSi,pThe numbers are collected to form a number set corresponding to each work base station; according to the working ground station BSi,pObtaining a companion number set for each number
Figure BDA0002237981260000361
According to the residence base station BSi,rObtaining a companion number set for each number
Figure BDA0002237981260000362
It should be noted that, the working base station and the working number set and the working accompanying number set may be acquired first, and then the residential base station, the residential number set and the residential accompanying number set may be acquired, or both may be performed simultaneously, which is not limited herein.
Acquiring a student number set corresponding to each school and an accompanying number set of each number based on the number sets;
specifically, a student number set corresponding to each school and a companion number set of each number are obtained based on a number set, campus base station data obtained in advance are matched with each work place number set to form a student number set corresponding to the campus, and a companion number set of each student on the campus is obtained based on the student number set, for example: base station BS based on the aforementioned concrete same operationi,pIs matched with the base station associated with the school
Figure BDA0002237981260000371
Obtaining the student number set corresponding to each school
Figure BDA0002237981260000372
The school may be a university, middle school, or primary school, although this is not a limitation and preferably the school is a university or middle school. Obtaining an accompanying number set of each student number according to the student number set and the accompanying number set of each student on the campus
Figure BDA0002237981260000373
In a preferred aspect of this embodiment, the method is further configured to:
acquiring a social relationship set based on the student number set and the accompanying number set;
specifically, a corresponding social relationship set is obtained according to the student number set and the accompanying number set of each student. The social relationship binding includes: a family number set, a college number set and a friend number set;
in a further preferred embodiment of this embodiment, a specific implementation process for acquiring base station data corresponding to each number based on the active record set is as follows:
respectively extracting the activity record of the student vacation period and the activity record of the student non-vacation period corresponding to one number in the campus roaming number set from the activity record set;
specifically, the activity record of the student vacation and the activity record of the student non-vacation corresponding to one number in the campus roaming number set are respectively extracted from the activity record set, and the activity record process of the student vacation is extracted as follows:
from the set of active records RhistroamExtracting the holidays (cold holidays and/or summer holidays) of the students
Figure BDA0002237981260000374
All record subsets of
Figure BDA0002237981260000375
At each one
Figure BDA0002237981260000376
Extracting all numbers to form a number set
Figure BDA0002237981260000377
For each number
Figure BDA0002237981260000378
All records corresponding to the number in the student's vacation are obtained,the records are divided into corresponding records of working day, working day and night and public holidays, and are respectively as follows:
Figure BDA0002237981260000381
it should be noted that the vacation time period VnAre defined with respect to students and weekdays are defined with respect to non-students in order to differentiate public holidays (e.g., weekends, legal holidays, etc.).
Obtaining a residence base station corresponding to the number based on the activity record of the student vacation;
specifically, the residence base station corresponding to the number is obtained based on the activity record of the student vacation, such as: firstly, respectively acquiring base station data of numbers appearing in working day days, working day nights and public holidays of students, then respectively extracting a base station with the largest number of days of appearance from the acquired base station data of the working day days, the working day nights and the public holidays to obtain target base station data corresponding to the working day days, the working day nights and the public holidays, respectively comparing the obtained target base station data with corresponding preset threshold values to obtain corresponding comparison results, and obtaining a residential base station corresponding to the numbers based on the comparison results; for example: separately acquire
Figure BDA0002237981260000382
Base station with the most days in each set
Figure BDA0002237981260000383
And the corresponding number of days
Figure BDA0002237981260000384
Comparing the corresponding days with corresponding preset thresholds respectively, wherein the corresponding preset thresholds are as follows:
Figure BDA0002237981260000385
the corresponding comparison results are obtained as follows:
Figure BDA0002237981260000386
wherein res is 1(D is more than or equal to Thre)
Res is 0(D < Thre), where Thre is a preset threshold, Res is the comparison result, and D represents the number of days; for each number i, and all base stations that occurred during the comparison
Figure BDA0002237981260000387
Summing
Figure BDA0002237981260000388
Then
Figure BDA0002237981260000389
Wherein, BSi,rThe residence base station with number i preferably adds to this number the frequency of residence of the associated tag, which is:
Figure BDA0002237981260000391
the preset threshold value
Figure BDA0002237981260000392
The specific value of (a) may be set according to actual conditions, but is not limited thereto, and may be set according to the length of the vacation, for example, or according to other conditions.
Obtaining a corresponding work place base station based on the activity record of the non-vacation period of the student;
specifically, similar to the acquiring process of the residential base station, the corresponding work base station is obtained through the non-holiday activity record of the number, preferably, the non-holiday activity record is divided into three parts, namely, activity records of workday day, workday night and public holiday, the base station with the highest frequency appearing on the campus in the three parts of time is obtained respectively based on the activity records, then the obtained base stations are compared with the set values respectively, and the work (school) base station is obtained according to the comparison result. The set value may be set according to the actual situation, and is not limited herein.
And after the work place base station and the residence base station with the same number are obtained, the obtaining of the work place base station and the residence base station corresponding to the user with the next number is restarted until the work place base station and the residence base station of each number in the campus roaming number set are obtained. It should be noted that the work base station is a school base station for students.
In a further preferred embodiment of this embodiment, a specific implementation process for obtaining the social relationship set based on the student number set and the companion number set is as follows:
acquiring a family and punish number set of the student based on a residence accompanying number set corresponding to the number of the student and a contact number set corresponding to the number;
for example, for each student i ∈ Lschool,1Extracting the accompanying number set of the residence
Figure BDA0002237981260000393
The contact number set of the number i is then extracted
Figure BDA0002237981260000394
Acquiring an intersection between the accompanying number set and the contact number set to obtain a family and a family number set;
extracting a classmate number set of the student from a workplace number set corresponding to the number;
for example: for each student i ∈ Lschool,1Extracting the classmate number set
Figure BDA0002237981260000401
Obtaining a friend number set based on the classmate number set and a contact number set corresponding to the number;
for example: for each student i ∈ Lschool,1Extracting contact number set by using call relation
Figure BDA0002237981260000402
Get classmate numberThe intersection between the code set and the contact number set obtains the friend number set F ═ n { Ti,Comi}。
In a preferred embodiment of this embodiment, the activity matching model is specifically configured to: acquiring a mobile phone data stream corresponding to a number in a campus roaming number set;
specifically, the handset data stream ri,t,gridObtaining, from a base station associated with a campus, comprising: the number i belongs to LSchRoamRecording time t, recording base station bs, base station longitude and latitude (lon/lat), accurate positioning grid number grid, grid center longitude and latitude (grid/grid), call/short message identification actType (call, SMS, Other), call/short message opposite terminal number conNum, incoming call/outgoing call identification direction (in/out), real-time mobile phone data stream R corresponding to the campus roaming number setRTAccuComprises a real-time mobile phone data stream r corresponding to each numberi,t,grid,ri,t,grid∈RRTAccuSchRoam
Acquiring position data of a corresponding user based on the mobile phone data stream, and identifying according to the position data to obtain a corresponding identification result;
specifically, position data of a user is obtained according to the obtained mobile phone data stream, and identification is carried out according to the position data to obtain a corresponding identification result;
and after the identification result of one number is obtained, executing the same identification operation on the next number to obtain a corresponding identification result until the identification result of each number in the campus roaming number set is obtained, and obtaining a student number list.
In a further preferred embodiment of this embodiment, the position data of the corresponding user is obtained based on the mobile phone data stream, and the identification is performed according to the position data, and the specific implementation process of obtaining the corresponding identification result is as follows:
acquiring position data of the corresponding number in a sleeping time period;
specifically, the location data includes location information of the number acquired by the corresponding base station, and since the user may change different locations in a day, there may be a plurality of grids corresponding to the locations, where the location data includes a plurality of corresponding grids (i.e., a plurality of grids in which the number appears), and the grids include grid numbers, grid center longitude and latitude, and other information.
Matching the corresponding user based on the position data appearing in the bedtime period to obtain a first matching result;
specifically, the corresponding user is matched based on the position data appearing in the sleeping time period to obtain a first matching result, the grid where the number appears in the position in the sleeping time period is analyzed, the grid is matched with the grids of each dormitory in the campus to obtain a first matching result, and the result is the dormitory location of the number to the user.
Matching the corresponding hospital system based on the acquired position data to obtain a second matching result;
specifically, the obtained grids are respectively matched with the grids where the hospital department of each activity list is located, and a corresponding second matching result is obtained. The second matching result is the hospital system of the user corresponding to the number; for example, the grid where the number appears is matched with the corresponding courtyard outline polygon to obtain a corresponding second matching result.
Obtaining an identification result based on the first matching result and the second matching result;
specifically, the identification result of the number is obtained by combining the first matching result and the second matching result, for example, the dormitory where the user corresponding to the number is located is obtained according to the first matching result, the institution where the user is located is obtained by combining the second matching result, whether the user is a student is determined, and the corresponding result is output.
Acquiring a companion number set of students based on the campus roaming number set;
specifically, after the number is matched with an institution and a dormitory, the user corresponding to the number is identified as a student, and then a matching number set corresponding to the student is obtained based on the campus roaming number set;
further, each number i e L is sortedSchRoamIts associated value DormiAnd DeptiIf none of the values is 0, if i belongs to LSchool,2In the campus roaming number setHis number is combined into a cosy number set
Figure BDA0002237981260000411
At the same time, for each number i ∈ LSchRoamCorresponding DormiAnd DeptiAnd dormitory frequency
Figure BDA0002237981260000412
Frequency of hospital series
Figure BDA0002237981260000413
Two values are output as dormitory and institution labels of number i, i belongs to LSchool,2Corresponding to the number of (1)
Figure BDA0002237981260000414
And
Figure BDA0002237981260000415
the values are all 0.
In a further preferred embodiment of this embodiment, the specific implementation process of obtaining the first matching result based on matching the corresponding user with the location data appearing in the bedtime period is as follows:
acquiring the occurrence frequency of each grid in a sleeping time period;
specifically, grids appearing in a sleeping time period and corresponding appearing times are obtained according to the mobile phone data stream; this bedtime period is the bedtime that school set up, for example: 10 o 'clock at night to 7 o' clock in the morning;
selecting a preset number of grids from a plurality of grids appearing in a bedtime period, wherein the frequency of the appearance of any selected grid in the bedtime period is more than the frequency of the appearance of any grid which is not selected in the plurality of grids in the bedtime period;
specifically, because the occurrence times of each grid are inconsistent, the occurrence times of each grid are sorted, a preset number of grids with a larger occurrence time are selected, the preset number can be set according to the actual situation, and the preset number is not limited herein, such as 3 or 5, for example, 10 appeared grids are provided, the occurrence times of each grid may be inconsistent, the grids are sorted according to the occurrence times from high to low, and the grids ranked in the top five are selected; (ii) a
Respectively matching the grating of each dormitory corresponding to the campus with the selected grating to obtain a first matching result;
specifically, the selected grating is matched with the profile polygon due to the corresponding profile polygon at the position of each dormitory, whether the longitude and latitude of the center of the grating is surrounded by the profile polygon or not is judged, and if so, the grating is matched with the dormitory.
For ease of understanding, the identification process is described in detail below:
because the number appearing grids (more than one and inconsistent appearing times) in the bedtime period are obtained, the mobile phone signal stream record r corresponding to each number is recordedi,t,grid∈RRTAccuSchRoamCalculating the sleeping time period
Figure BDA0002237981260000421
The number of times each grid appears in the grid is equal to L for each number i ∈SchRoamObtaining a corresponding bedding grid activity vector:
Figure BDA0002237981260000422
the active vector is used as the number i ∈ LSchRoamGrid occurring at bedtimeu(u represents the grid number) for each number i ∈ LSchRoamIs extracted every day
Figure BDA0002237981260000423
Five highest-valued grid numbers Gi,s(1,...,5). Now using the outline polygon of each dormitory
Figure BDA0002237981260000431
For each Gi,sGrid center longitude and latitude Gi,s(gridloni,s,gridlati,s) Performing containment relationship calculation (whether the central longitude and latitude is inside the outline polygon) if the center isThe longitude and latitude are surrounded by the outline polygon, which shows the matching, and the dormitory of each successful matching and the Dorm matching sequence are added with 1, i.e. Dormi,s=Dormi,s+1, and
Figure BDA0002237981260000432
dorm dailyi,sThe dormitory k with the highest corresponding value is the best judgment Dorm of the dormitory where the user corresponding to the number i is located in the same dayiThe Dorm may be a person who is walking, e.g. a student walking to a series of dormitoriesiThe corresponding values will vary, and will repeat over a period of time (e.g., one or two weeks from study), but will settle within one or two weeks, at which time the DormiAs an output value, adding a label to the number i, wherein the label carries a dormitory number and a corresponding frequency, and the frequency value is
Figure BDA0002237981260000433
s is DormiThe corresponding dormitory number.
In a further preferred embodiment of this embodiment, the specific implementation process of obtaining the second matching result based on the obtained location data matching the corresponding department is as follows:
acquiring raster data of corresponding numbers appearing in each courtyard activity time period based on the position data;
specifically, grid data appearing in each department activity time period is obtained based on position data, wherein the grid data comprises grids and corresponding appearance times;
matching the grid with the largest occurrence frequency with the position of the corresponding hospital system to obtain a second matching result;
specifically, as more than one grid appears in the activity time and the appearance times are inconsistent, the appearing grids are sequenced according to the appearance times to obtain the grid with the most appearance times, so that the grids with the most appearance times corresponding to each hospital department activity are obtained, and then the grids with the most appearance times are matched with the positions of the corresponding hospital departments to obtain the hospital department of the number where the user is located.
For ease of understanding, the identification process is described in detail below:
obtaining a campus activity plan comprising a plurality of court-family activity lists Act (ActName)h,th,1,th,2,Depth) At each active time period (t)h,1,th,2) In, for each number i ∈ LSchRoamCalculating each grid griId it has appeared during the period and the number of appearances, activating for each number i the grid G that is most active during activity hi,hCourtyard outline polygon using activity h correspondences
Figure BDA0002237981260000441
For Gi,hGrid center longitude and latitude
Figure BDA0002237981260000442
Calculating the inclusion relation, and judging Gi,hWhether surrounded by a courtyard outline polygon) and can also be calculated by Gi,hDistance G from center point of polygon of hospital system contouri,hJudging whether the outline polygon surrounds the boundary by the difference value between the maximum distances between the edges of the outline polygon of the courtyard system (the difference value is larger than 0, which indicates that the outline polygon is outside, or else indicates that the outline polygon is not outside), if the outline polygon surrounds the boundary, the matching is successful, and adding 1 to the matching sequence of the courtyard system in each successfully matched activity, namely adding 1 to the matching sequence of the courtyard system
Figure BDA0002237981260000443
And the number of the first and second electrodes,
Figure BDA0002237981260000444
daily life
Figure BDA0002237981260000445
Highest value department of institutionhI.e. the best judgment Dept of the family of the user corresponding to the number iiThe value of DeptiRecur at the beginning of a period of time, such as a week, but settle after a week, because the student's activities stabilize after a period of time,adding a label to the number i, wherein the label carries a hospital series label, a frequency and the like, and the frequency takes the value of
Figure BDA0002237981260000446
depthIs DeptiThe yard is the number.
In a preferred aspect of this embodiment, the behavior learning model is specifically configured to:
acquiring a mobile phone data stream set from a base station associated with a campus;
specifically, a base station associated with the campus is used for acquiring a real-time mobile phone data flow set RRTThe mobile phone data stream comprises a plurality of number data stream records, and each data stream record comprises corresponding behavior data;
matching the mobile phone data stream set with the campus roaming number set to obtain a mobile phone data stream subset corresponding to the campus;
specifically, a mobile phone data stream set L which does not belong to the campus roaming number set is firstly selectedSchRoamThe records corresponding to other numbers except the number in the list are removed, and the mobile phone data stream set is matched with the campus roaming number set, for example, the campus roaming number set L is removedSchRoamEach number in the list is compared with a mobile phone data stream set to obtain a campus roaming number set LSchRoamThe mobile phone data stream data corresponding to each number in the mobile phone data stream subset R is obtainedRTSchRoam
Acquiring behavior data corresponding to a number from a mobile phone data stream subset;
in particular, from RRTSchRoamObtaining each number i belongs to LSchRoamData stream of mobile phone
Figure BDA0002237981260000451
Performing behavior data coding on each number i based on mobile phone data flow data, and extracting mobile phone browsing record ri,t,loc,urlThe mobile phone browsing record comprises: the number i belongs to LSchRoamRecording time t (occurrence time), recording base station bsid, base station longitude and latitude (lon/lat), and number of access pagesAccording to (pageTypeId), APP data (APP-appId), call/SMS identifier actType (call, SMS, Other), call/SMS opposite-end number comNum, incoming/outgoing call identifier direction (in/out) are used.
Substituting the behavior data into a pre-established two-dimensional matrix to obtain a behavior matrix corresponding to the number;
specifically, a two-dimensional matrix is established in advance, and the establishing process of the two-dimensional matrix is as follows: arranging all access page data (APPid) and using APP data (pageTypeId) according to an axis sequence (the arrangement is related to the encoding modes of the pageTypeId and the APPid), arranging the recording base station bsid according to another axis, and constructing a two-dimensional matrix of the using behavior of the user with the number i, wherein the expression is as follows:
Figure BDA0002237981260000452
substituting the behavior data into the two-dimensional matrix to form an L belonging to each number iSchRoamConstructing a corresponding two-dimensional array:
Figure BDA0002237981260000453
where row (1 … c) represents c sorted appids and pageTypeId, column (1 … m) represents m sorted recording base stations bsid,
Figure BDA0002237981260000461
the number of times the user of number i uses alpha applications or browses alpha web page content categories under the beta base station is indicated.
Normalizing the behavior matrix to obtain a processed behavior matrix;
in particular, Act is a matrix for each behavioriPerforming column normalization to obtain a processed behavior matrix
Figure BDA0002237981260000462
Calculating corresponding behavior correlation values based on the processed behavior matrix and the standard student behavior matrix;
specifically, during the start of a school, each school will develop a campus marketing campaign that confirms the true identity of a portion of the numbers from which the student list L is derivedstudentAnd a parental list LfamilyInputting the two lists into a two-dimensional matrix for training a student identification model, and matching a school garden roaming number set with the student list and a parent list to obtain an uncertain identity number set of the school
Figure BDA0002237981260000463
Based on the student list i belongs to LstudentAnd the current processed behavior matrix
Figure BDA0002237981260000464
Establishing a student behavior model:
Figure BDA0002237981260000465
the behavior matrix model is subjected to column normalization processing to obtain a standard student behavior model matrix
Figure BDA0002237981260000466
Then, an uncertain (to be identified) identity number i epsilon L is calculatedundefinedCalculating the behavior correlation value with standard student, and processing the behavior matrix according to the number
Figure BDA0002237981260000471
Processed behavior matrix with standard students
Figure BDA0002237981260000472
Calculating a behavior correlation value as follows:
Figure BDA0002237981260000473
comparing the corresponding behavior correlation value with a standard correlation value to obtain an identification result corresponding to the number;
in particular, for each student number i' e L for which identity has been confirmedstudentAnd pre-calculating to obtain each number i' epsilon LstudentCorresponding behavior related value
Figure BDA0002237981260000474
Calculating a node value (cutoff), for all student numbers i' e L with confirmed identitiesstudentObtaining each number i' epsilon LstudentCorresponding behavior related value
Figure BDA0002237981260000475
Get
Figure BDA0002237981260000476
With rcutoffAs a standard for confirming the numbers of students to be confirmed, comparing the behavior related value corresponding to the numbers to be identified with the confirmation standard, identifying whether the user corresponding to the number is a student according to the comparison result, obtaining the identification result corresponding to each number, and finally obtaining a student list and an accompanying person list, wherein the student list specifically comprises:
Figure BDA0002237981260000477
the accompanying person list is
Figure BDA0002237981260000478
Then attaching a behavior tag to each recognized student number, i.e.
Figure BDA0002237981260000479
Further, the merging module 153 is specifically configured to: inputting the recognition result based on the activity record, the recognition result based on the real-time mobile phone data stream and the recognition result based on the behavior data into a merging model for learning training, and outputting a student number list;
further, the merged model is a two-layer neural network model, a first layer neural network and a second layer neural network, the first layer neural networkThe neural network comprises three neurons, the second layer of neural network comprises two neurons, and the first layer of neural network receives the three recognition results, specifically:
Figure BDA0002237981260000481
the two neurons of the second layer comprise two weighted matrices, respectively:
Figure BDA0002237981260000482
the merged model comprises the following structure:
Figure BDA0002237981260000483
Figure BDA0002237981260000484
Figure BDA0002237981260000485
wherein the content of the first and second substances,
Figure BDA0002237981260000486
the cost function (cost function) of the activation function of any neuron is:
Figure BDA0002237981260000487
for the identity of the input function (i.e. the user of number i) confirmed in the marketing campaign,
Figure BDA0002237981260000491
respectively, are the offset vectors, respectively,
Figure BDA0002237981260000492
three of the above recognition results are input into three neurons of the first layer neural network,
Figure BDA0002237981260000493
the result output by the first layer of neural network is input into the second layer of neural network for learning training, and the result is output, the result is the identity of the student or the identity of the accompanying parent, if the marketing activity is confirmed to be the student, the component of the vector is _ student is 1, if the marketing activity is confirmed to be the accompanying parent, the component of the vector is _ family is 1, if no marketing activity confirmation information exists, the two are 0 (the sample is abandoned); if both marketing campaign feedback are confirmed, both are 1, (this sample also needs to be discarded), if the outcome of the output is
Figure BDA0002237981260000494
The cutoff for both component decisions corresponds to a value of 0.5.
In this embodiment, the training data of the merged model is derived from the confirmed student list and the confirmed parent list fed back by the marketing campaign, a backward propagation method is used for training the model, the student list and the parent list are updated every day during the beginning of each year, the merged model is retrained every day by using the data of the student list and the parent list to obtain an updated merged model, and if more basic data are trained and learned in the model, the recognition scientificity of the model can be improved.
The merging process is as follows:
for each unidentified number
Figure BDA0002237981260000495
The corresponding three recognition results are imported into the merging model for training and learning, recognition is recalculated, and the result is output
Figure BDA0002237981260000501
Forming a list of inferred students from the output results
Figure BDA0002237981260000502
And guess parental lists
Figure BDA0002237981260000503
However, in the above two kinds of guess lists, there may be one number existing in both the guess student list and the guess parent list.
After obtaining the guess list, it is necessary to confirm the tag of each number in the list, for example, it is necessary to compare the three recognition results with the result obtained by inputting the three recognition results into the merging model, and selectively output the tags of the three recognition results to the final user tag, which includes the following specific processes:
confirmation of the companion tag:
Figure BDA0002237981260000504
retaining its companion tag;
Figure BDA0002237981260000505
&i∈LSchool,1then the companion tag is temporarily retained, i.e. the companion tag is not retained in the list, but stored for later use;
the identification of the dormitory and department tags,
Figure BDA0002237981260000506
&i∈LSchool,2keeping labels of the courtyard and dormitory;
Figure BDA0002237981260000507
&i∈LSchool,2temporarily not using any of its tags, i.e. not keeping the companion tag in the list, but storing it for later use;
after the student list and the parent list confirmed after the marketing activity feedback are updated, the corresponding labels need to be confirmed again for the numbers I newly added in all the lists;
as with the validation of the tag:
Figure BDA0002237981260000508
retaining its companion tag;
Figure BDA0002237981260000509
discarding the corresponding tag;
and (3) confirmation of dormitory and institution labels:
Figure BDA00022379812600005010
the labels of dormitories and hospitals are reserved,
Figure BDA0002237981260000511
discarding the corresponding tag;
in this embodiment, a campus roaming number set is first obtained, each number in the campus roaming number set is identified based on an activity record of a user of a mobile phone number, a real-time mobile phone data stream, user behavior data, and the like, and a student number list is output, so that accuracy of student number identification can be improved.
In another variation of the present invention, the number may be identified by using only the long-term identification model, the activity matching model, or the behavior learning model to obtain the corresponding student list, and the identification process is consistent with the identification process of each model in the identification module 163 shown in fig. 16, which may specifically refer to the above description and is not repeated here.
The embodiment of the invention provides a nonvolatile readable computer storage medium, wherein the computer storage medium stores at least one executable instruction, and the computer executable instruction can execute the student identification method in any method embodiment.
Embodiments of the present invention provide a computer program product comprising a computer program stored on a computer storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform a student identification method in any of the above-mentioned method embodiments.
In this embodiment, a campus roaming number set is first obtained, each number in the campus roaming number set is identified based on an activity record of a user of a mobile phone number, a real-time mobile phone data stream, user behavior data, and the like, and a student number list is output, so that accuracy of student number identification can be improved.
Fig. 17 is a schematic structural diagram of an embodiment of the apparatus according to the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the apparatus.
As shown in fig. 17, the apparatus may include: a processor (processor)1702, a Communications Interface 1704, a memory 1706, and a communication bus 1708.
Wherein: the processor 1702, communication interface 1704, and memory 1706 communicate with one another via a communication bus 1708. A communication interface 1704 for communicating with network elements of other devices, such as clients or other servers. Processor 1702, configured to execute program 1710, may specifically execute relevant steps in the above-described student identification method embodiment.
In particular, the program 1710 may include program code that includes computer operating instructions.
The processor 1702 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
A memory 1706 for storing the program 1710. The memory 1706 may include a high-speed RAM memory and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 1710 may specifically be configured to cause the processor 1702 to perform the following:
acquiring a campus roaming number set, wherein the campus roaming number set comprises a plurality of roaming numbers of at least one campus, and the time of the roaming numbers appearing in the corresponding campus does not exceed a preset value;
obtaining attribute data of the plurality of roaming numbers, each of the attribute data including one or more of: the activity record of the user corresponding to each roaming number in a preset time period, the real-time mobile phone data stream of the corresponding user and the behavior data of the corresponding user;
and identifying students from the campus roaming number set based on the attribute data, and outputting an identification result, wherein the identification result at least comprises a student list.
In an optional manner, the attribute data includes: the program 1710 causes the processor 1702 to execute the following operations, where the activity record of the user corresponding to each roaming number in a preset time period, the real-time mobile phone data stream, and the behavior data of the corresponding user are recorded:
identifying numbers of students from the campus roaming number set respectively based on activity records of the corresponding users in a preset time period, real-time mobile phone data streams and behavior data of the corresponding users to obtain corresponding student lists;
and merging the obtained student lists, and outputting student identification results.
In an alternative approach, the program 1710 causes the processor 1702 to:
identifying numbers of students from the campus roaming number set based on the activity records of the preset time period to obtain an identification result based on the activity records;
identifying numbers of students from the campus roaming number set based on the real-time mobile phone data streams to obtain identification results based on the mobile phone data streams;
and identifying numbers of students from the campus roaming number set based on the behavior data to obtain identification results based on the behavior data.
In an alternative approach, the program 1710 causes the processor 1702 to: and identifying the number of the student from the campus roaming number set based on the activity record of the preset time period, and outputting an identification result based on the activity record.
In an alternative approach, the program 1710 causes the processor 1702 to:
acquiring activity records of each number in the campus roaming number set in a preset time period, and summarizing the activity records into an activity record set;
acquiring base station data corresponding to each number based on the activity record set, wherein the base station data comprises a residential base station and a working place base station;
acquiring numbers appearing in the same base station based on base station data corresponding to each number to obtain a number set corresponding to the base station, wherein the number set comprises a target number set and an accompanying number set;
and acquiring a student number set corresponding to each school and a companion number set of each number based on the number sets.
In an alternative approach, the program 1710 causes the processor 1702 to:
respectively extracting the activity record of the student vacation and the activity record of the student non-vacation corresponding to one number in the campus roaming number set from the activity record set;
obtaining a residence base station corresponding to the number based on the activity record of the student vacation;
obtaining a work place base station corresponding to the number based on the activity record of the non-holiday of the student;
and repeating the steps until the residential area base station and the working area base station of each number in the campus roaming number set are obtained.
In an alternative approach, the program 1710 causes the processor 1702 to:
respectively acquiring base station data of the number appearing in the working day and night of the student holiday and in the public holiday period;
extracting a base station with the most days from the acquired base station data of the working day, the working day and the working night and the public holiday respectively to obtain target base station data corresponding to the working day, the working day and the working night and the public holiday;
comparing the obtained target base station data with corresponding preset threshold values respectively to obtain corresponding comparison results;
and obtaining the residential area base station corresponding to the number based on the comparison result.
In an alternative approach, the program 1710 causes the processor 1702 to:
forming a corresponding residential area number set for each residential station base based on the residential area base station corresponding to each number;
obtaining a residence accompanying number set corresponding to each number from the obtained residence number set;
forming a corresponding work place number set for each work place base station based on the work place base station corresponding to each number;
and obtaining a working accompanying number set corresponding to each number from the working number set.
In an alternative approach, the set of social relationships includes: a family number set, a classmate number set and a friend number set; the program 1710 causes the processor 1702 to obtain a set of social relationships based on the set of student numbers and the set of companion numbers, by:
acquiring a social relationship set based on the student number set and the accompanying number set, specifically: acquiring a family and punish number set of the student based on a residence accompanying number set corresponding to the number of the student and a contact number set corresponding to the number;
extracting a classmate number set of the student from a workplace number set corresponding to the number;
and obtaining a friend number set based on the college number set and the contact number set corresponding to the numbers.
In an alternative embodiment, where the attribute data is real-time cell phone data stream, the program 1710 causes the processor 1702 to: and identifying the numbers of the students from the campus roaming number set based on the real-time mobile phone data stream, and outputting an identification result based on the mobile phone data stream.
In an alternative approach, the program 1710 causes the processor 1702 to:
acquiring a mobile phone data stream corresponding to a number in the campus roaming number set;
acquiring position data of a corresponding user based on the mobile phone data stream, and identifying according to the position data to obtain a corresponding identification result;
and repeating the steps until an identification result corresponding to each number in the campus roaming number set is obtained, and obtaining a student number list.
In an alternative approach, the program 1710 causes the processor 1702 to:
acquiring position data of a corresponding number appearing in a sleeping time period, wherein the position data comprises a plurality of corresponding grids;
matching corresponding users based on the position data appearing in the bedtime period to obtain a first matching result;
acquiring position data of the corresponding number in at least one courtyard activity time period;
matching the corresponding hospital system based on the acquired position data to obtain a second matching result;
and obtaining an identification result based on the first matching result and the second matching result.
In an alternative approach, the program 1710 causes the processor 1702 to:
acquiring the occurrence frequency of each grid in the sleeping time period;
selecting a preset number of grids from the plurality of grids appearing in the bedtime period, wherein the occurrence frequency of any selected grid in the bedtime period is more than the occurrence frequency of any grid which is not selected in the plurality of grids in the bedtime period;
and respectively matching the grating of each dormitory corresponding to the campus with the selected grating to obtain a first matching result.
In an alternative approach, the program 1710 causes the processor 1702 to:
acquiring raster data of corresponding numbers appearing in each courtyard activity time period based on the position data, wherein the raster data comprises a raster and corresponding occurrence times;
and matching the grid with the most occurrence times with the position of the corresponding hospital system to obtain a second matching result.
In an alternative, where the attribute data is behavior data, the program 1710 causes the processor 1702 to: and identifying students from the campus roaming number set based on the behavior data, and outputting identification results.
In an alternative approach, the program 1710 causes the processor 1702 to:
acquiring a mobile phone data stream set from a base station associated with a campus, wherein the mobile phone data stream comprises a plurality of numbers and corresponding behavior data;
matching the mobile phone data stream set with the campus roaming number set to obtain a mobile phone data stream subset corresponding to the campus;
acquiring behavior data corresponding to a number from the mobile phone data stream subset, wherein the behavior data comprises access page data, APP data, occurrence time and a base station which correspondingly appears;
substituting the behavior data into a pre-established two-dimensional matrix to obtain a behavior matrix corresponding to the number;
performing column normalization processing on the behavior matrix to obtain a processed behavior matrix;
calculating corresponding behavior correlation values based on the processed behavior matrix and a standard student behavior matrix;
comparing the corresponding behavior correlation value with a standard correlation value to obtain an identification result corresponding to the number;
and acquiring behavior data corresponding to another number from the mobile phone data stream subset again, and repeating the steps until the numbers corresponding to the mobile phone data stream subset are all identified to obtain an identification result corresponding to the campus.
In an alternative approach, the program 1710 causes the processor 1702 to: inputting the recognition result based on the activity record, the recognition result based on the real-time mobile phone data stream and the recognition result based on the behavior data into a merging model for learning training, and outputting a student number list; the merging model comprises: the neural network comprises a first layer of neural network and a second layer of neural network, wherein the first layer of neural network comprises three neurons, and the second layer of neural network comprises two neurons.
The embodiment of the invention firstly obtains the campus roaming number set, identifies each number of the campus roaming number set based on the activity record of the user of the mobile phone number, the real-time mobile phone data stream, the user behavior data and the like, outputs the student number list, and can improve the accuracy of student number identification.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and placed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those of skill in the art will understand that while some embodiments herein include some features included in other embodiments, rather than others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims (10)

1. A student identification method, the method comprising:
acquiring a campus roaming number set, wherein the campus roaming number set comprises a plurality of roaming numbers of at least one campus, and the time of the roaming numbers appearing in the corresponding campus does not exceed a preset value;
obtaining attribute data of the plurality of roaming numbers, each of the attribute data including one or more of: the activity record of the user corresponding to each roaming number in a preset time period, the real-time mobile phone data stream of the corresponding user and the behavior data of the corresponding user;
identifying students from the set of campus roaming numbers based on the attribute data.
2. The method of claim 1, wherein identifying the student from the collection of campus roaming numbers based on the attribute data and outputting the identification result comprises:
identifying numbers of students from the campus roaming number set based on the activity records of the preset time period to obtain an identification result based on the activity records;
identifying numbers of students from the campus roaming number set based on the real-time mobile phone data streams to obtain identification results based on the mobile phone data streams;
and identifying numbers of students from the campus roaming number set based on the behavior data to obtain identification results based on the behavior data.
3. The method of claim 2, wherein identifying the student's number from the campus roaming number set based on the activity record for the preset time period, and obtaining an identification result based on the activity record comprises:
acquiring activity records of each number in the campus roaming number set in a preset time period, and summarizing the activity records into an activity record set;
acquiring base station data corresponding to each number based on the activity record set, wherein the base station data comprises a residential area base station and a working area base station;
acquiring numbers appearing in the same base station based on base station data corresponding to each number to obtain a number set of the corresponding base station, wherein the number set comprises a target number set and an accompanying number set;
and acquiring a student number set corresponding to each school and a companion number set of each number based on the number sets.
4. The method of claim 2, wherein identifying the student's number from the campus roaming number set based on the real-time mobile phone data stream, and obtaining a mobile phone data stream-based identification result comprises:
acquiring a mobile phone data stream corresponding to a number in the campus roaming number set;
acquiring position data of a corresponding user based on the mobile phone data stream, and identifying according to the position data to obtain a corresponding identification result;
and repeating the steps until the identification result corresponding to each number in the campus roaming number set is obtained, and obtaining a student number list.
5. The method of claim 4, wherein the obtaining location data of a corresponding user based on the real-time mobile phone data stream, and performing identification according to the location data to obtain a corresponding identification result comprises:
acquiring position data of a corresponding number appearing in a sleeping time period, wherein the position data comprises a plurality of corresponding grids;
matching corresponding users based on the position data appearing in the bedtime period to obtain a first matching result;
acquiring position data of the corresponding number in at least one courtyard activity time period;
matching the corresponding hospital system based on the acquired position data to obtain a second matching result;
and obtaining an identification result based on the first matching result and the second matching result.
6. The method of claim 2, wherein identifying the student from the campus roaming number set based on the behavior data, and obtaining the identification result based on the behavior data comprises:
acquiring a mobile phone data stream set from a base station associated with a campus, wherein the mobile phone data stream comprises a plurality of numbers and corresponding behavior data;
matching the mobile phone data stream set with the campus roaming number set to obtain a mobile phone data stream subset corresponding to the campus;
acquiring behavior data corresponding to a number from the mobile phone data stream subset, wherein the behavior data comprises access page data, occurrence time and a base station which correspondingly appears;
substituting the behavior data into a pre-established two-dimensional matrix to obtain a behavior matrix corresponding to the number;
performing column normalization processing on the behavior matrix to obtain a processed behavior matrix;
calculating corresponding behavior correlation values based on the processed behavior matrix and a standard student behavior matrix;
comparing the corresponding behavior correlation value with a standard correlation value to obtain an identification result corresponding to the number;
and acquiring behavior data corresponding to another number from the mobile phone data stream subset again, and repeating the steps until the numbers corresponding to the mobile phone data stream subset are all identified to obtain an identification result corresponding to the campus.
7. The method of claim 2, wherein the student is identified from the set of campus roaming numbers based on the attribute data, further comprising:
inputting the recognition result based on the activity record, the recognition result based on the real-time mobile phone data stream and the recognition result based on the behavior data into a merging model for learning training, and outputting a student number list; the merging model comprises: the neural network comprises a first layer of neural network and a second layer of neural network, wherein the first layer of neural network comprises three neurons, and the second layer of neural network comprises two neurons.
8. A student identification device, the device comprising:
the device comprises a set acquisition module, a processing module and a processing module, wherein the set acquisition module is used for acquiring a campus roaming number set, the campus roaming number set comprises a plurality of roaming numbers of at least one campus, and the time of the roaming numbers appearing in the corresponding campus does not exceed a preset value;
a data obtaining module, configured to obtain attribute data of the roaming numbers, where each attribute data includes one or more of the following: the activity record of the user corresponding to each roaming number in a preset time period, the real-time mobile phone data stream of the corresponding user and the behavior data of the corresponding user;
an identification module to identify a student from the set of campus roaming numbers based on the attribute data.
9. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is configured to store at least one executable instruction that causes the processor to perform the steps of the student identification method according to any one of claims 1 to 7.
10. A computer-readable storage medium having stored therein at least one executable instruction for causing a processor to perform the steps of the student identification method according to any one of claims 1 to 7.
CN201910990107.8A 2019-10-17 2019-10-17 Student identification method and device, computing equipment and readable computer storage medium Active CN112685654B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910990107.8A CN112685654B (en) 2019-10-17 2019-10-17 Student identification method and device, computing equipment and readable computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910990107.8A CN112685654B (en) 2019-10-17 2019-10-17 Student identification method and device, computing equipment and readable computer storage medium

Publications (2)

Publication Number Publication Date
CN112685654A true CN112685654A (en) 2021-04-20
CN112685654B CN112685654B (en) 2023-04-07

Family

ID=75444648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910990107.8A Active CN112685654B (en) 2019-10-17 2019-10-17 Student identification method and device, computing equipment and readable computer storage medium

Country Status (1)

Country Link
CN (1) CN112685654B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979958A (en) * 2022-06-08 2022-08-30 中国联合网络通信集团有限公司 Juvenile user identification method, juvenile user identification platform, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102047697A (en) * 2008-05-27 2011-05-04 高通股份有限公司 Methods and apparatus for generating user profile based on periodic location fixes
US20130090980A1 (en) * 2011-10-10 2013-04-11 Brett Patrick Hummel System & method for tracking members of an affinity group
CN106658394A (en) * 2015-11-04 2017-05-10 中国移动通信集团公司 High-speed rail user separation method and apparatus thereof
CN107155214A (en) * 2016-03-02 2017-09-12 中国移动通信集团河北有限公司 A kind of number determines method and apparatus
CN108537909A (en) * 2018-03-23 2018-09-14 广州米度信息科技有限公司 A kind of the personnel's detection method and big data analysis system of unaware
CN109949063A (en) * 2017-12-20 2019-06-28 中移(苏州)软件技术有限公司 A kind of address determines method, apparatus, electronic equipment and readable storage medium storing program for executing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102047697A (en) * 2008-05-27 2011-05-04 高通股份有限公司 Methods and apparatus for generating user profile based on periodic location fixes
US20130090980A1 (en) * 2011-10-10 2013-04-11 Brett Patrick Hummel System & method for tracking members of an affinity group
CN106658394A (en) * 2015-11-04 2017-05-10 中国移动通信集团公司 High-speed rail user separation method and apparatus thereof
CN107155214A (en) * 2016-03-02 2017-09-12 中国移动通信集团河北有限公司 A kind of number determines method and apparatus
CN109949063A (en) * 2017-12-20 2019-06-28 中移(苏州)软件技术有限公司 A kind of address determines method, apparatus, electronic equipment and readable storage medium storing program for executing
CN108537909A (en) * 2018-03-23 2018-09-14 广州米度信息科技有限公司 A kind of the personnel's detection method and big data analysis system of unaware

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张琳: ""河北移动经营分析系统中高校市场综合分析子系统的分析与设计"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979958A (en) * 2022-06-08 2022-08-30 中国联合网络通信集团有限公司 Juvenile user identification method, juvenile user identification platform, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112685654B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
Cheng et al. Is tourism development a catalyst of economic recovery following natural disaster? An analysis of economic resilience and spatial variability
Hasan et al. Urban activity pattern classification using topic models from online geo-location data
JP6693502B2 (en) Information processing apparatus, information processing method, and program
Hecht et al. Automatic identification of building types based on topographic databases–a comparison of different data sources
CN105532030B (en) For analyzing the devices, systems, and methods of the movement of target entity
CN111444952B (en) Sample recognition model generation method, device, computer equipment and storage medium
Bicego et al. On the distinctiveness of the electricity load profile
US10445386B2 (en) Search result refinement
KR20210073819A (en) Method and server for recommending personalized real estate information using the consumer choice model
Harter et al. Address-based sampling
Mia et al. Registration status prediction of students using machine learning in the context of Private University of Bangladesh
CN112685654B (en) Student identification method and device, computing equipment and readable computer storage medium
CN110472057B (en) Topic label generation method and device
Ogunmokun et al. The effect of manufacturing flexibility on export performance in China
CN108647189B (en) Method and device for identifying user crowd attributes
CN108133296B (en) Event attendance prediction method combining environmental data under social network based on events
Zöllig et al. A conceptual, agent-based model of land development for UrbanSim
US11501100B1 (en) Computer processes for clustering properties into neighborhoods and generating neighborhood-specific models
CN113743838A (en) Target user identification method and device, computer equipment and storage medium
Pan et al. A data mining approach to the analysis of a catering lean service project
Subramanian et al. Predictive Modeling and Mobility Pattern Analysis
Kats et al. Twitter Activity Timeline as a Signature of Urban Neighborhood
Stalidis Discovering marketing rules for the tourist sector in visitor service quality surveys
US20230027774A1 (en) Smart real estate evaluation system
Kanani Predicting Log Error for House Price Using Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant