CN109977265A - A kind of IPTV log user identification method based on user behavior characteristics - Google Patents

A kind of IPTV log user identification method based on user behavior characteristics Download PDF

Info

Publication number
CN109977265A
CN109977265A CN201910254105.2A CN201910254105A CN109977265A CN 109977265 A CN109977265 A CN 109977265A CN 201910254105 A CN201910254105 A CN 201910254105A CN 109977265 A CN109977265 A CN 109977265A
Authority
CN
China
Prior art keywords
user
period
channel
similarity
iptv
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910254105.2A
Other languages
Chinese (zh)
Other versions
CN109977265B (en
Inventor
杨灿
谢伟锟
袁启虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910254105.2A priority Critical patent/CN109977265B/en
Publication of CN109977265A publication Critical patent/CN109977265A/en
Application granted granted Critical
Publication of CN109977265B publication Critical patent/CN109977265B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

The invention discloses a kind of IPTV log user identification method based on user behavior characteristics, comprising the following steps: 1. choose log in continuous several days as initial data;2. being smashed initial data for several periods by clustering algorithm, scoring of the user to channel in each period is analyzed, the feature similar period is merged, generates characteristic;3. the Data duplication step 2 that pair needs identify, obtains pre-matching data;4. pre-matching data are matched with characteristic, the most user of statistics number of repetition is as output.Method proposed by the invention can identify anonymous with higher accuracy, and investigation organ can be helped faster and more accurately to carry out identity identification to anonymous suspect in practice.

Description

A kind of IPTV log user identification method based on user behavior characteristics
Technical field
The present invention relates to the interleaving techniques fields that IPTV technical field and user identify, specifically a kind of to be based on user The IPTV log user identification method of behavioural characteristic.
Background technique
Nowadays social crime is all occurring daily, and fugitive suspect often hides to the digital behavior of oneself Name processing causes the identification of its identity and tracking very difficult.With internet protocol television IPTV (Internet Protocol Television) technology development with rapid changepl. never-ending changes and improvements, IPTV start to spread to each family, and IPTV User action log abundant is Carrying out suspect's identification based on IPTV user behavior characteristics becomes possibility.
In IPTV system, user's watching behavior feature refers to the features such as duration, frequency, the period of different people viewing channel, Different users can be accurately distinguished, and IPTV daily record data can be extracted by the set-top box of subscriber household, this Hardware condition and data source are provided to system operation.Also, under user's scene of IPTV, it is commonly multiple users share one Platform IPTV device.This allows for the feature that multiple users are mixed in log, so that traditional frequency analysis accuracy is not high.
Summary of the invention
The main object of the present invention is that the one kind for being directed to IPTV Multi-user recognition problem and providing is based on user behavior characteristics User identification method.The identity of suspect can be positioned with higher accuracy by the method and determine the geography of suspect Position.
A kind of IPTV log user identification method based on user behavior characteristics, comprising the following steps:
1. choosing log in continuous several days as initial data;
2. smash initial data for several periods by clustering algorithm, user is analyzed in each period to channel Scoring, the feature similar period is merged, characteristic is obtained:
3. the Data duplication step 2 that pair needs identify, obtains pre-matching data;
4. pre-matching data are matched with characteristic, the most user of statistics number of repetition is defeated as recognition result Out.
Further, the initial data includes following data structure: station number or user smart card card number, user Currently watched channel ID, at the time of starting viewing time and viewing or cut platform.
Further, the process of the generation characteristic of the step 2 includes the following steps:
Beginning viewing time in 2.1 pairs of initial data is gathered using k- average algorithm (k-means clustering) Class obtains k time period tk, it is denoted as { t1, t2, t3..., tn... tk};
2.2 for time period tn, user user is calculated to the scoring vector of each channel using scoring formula A(user, channel), score formula:
WhereinIndicate user user in tnThe channel list watched in period, d(user, channel)Indicate user User watches the total duration of channel channel, and c is indicatedEach of channel list channel, d(user, c)Indicate table Show the total duration of user user viewing channel c;
2.3 for taAnd tbTwo different time sections calculate similarity using cosine formula, cosine formula:
Wherein A and B respectively indicate taAnd tbThe scoring vector that period extracts.
2.4 define threshold value beta, and the period by similarity less than β merges, and calculate the scoring vector after merging, institute Obtained scoring vector is characteristic.
The merging process of step 2.4 specifically:
I. calculate the similarity between all periods using formula (2), using the period as node, side right for node it Between similarity, two-by-two connection formed a complete graph;
Ii. two periods that similarity is less than β are successively merged from big to small according to side right, and are calculated using formula (1) The similarity of the scoring vector of period and the period and other time section after merging;
Iii. step ii is repeated, until there is no the period that similarity is less than β.
Further, the matching process of step 4 includes the following steps:
Each is needed to carry out characteristic obtained in matched pre-matching data and step 2 by 4.1 utilizes formula (2) Similarity is calculated, after sequencing of similarity, the highest characteristic of n similarity before choosing extracts station number unIt obtains Sequence { u1, u2, u3..., un};
4.2 statistical series { u1, u2, u3..., unIn the most user of number of repetition exported as recognition result.
Further, the pre-matching data and characteristic are scoring vector.
Compared with prior art, characteristic extracting method proposed by the invention, it is only necessary to obtain the channel of user Viewing record, and characteristic can be stored for a long time as unit of day, feature temporally carries out fragment, therefore will not be due to data Time window is too long and the problem of bringing discrimination to decline.
User's identifying system proposed by the invention and method will carry out user's matching, Ke Yiyou after initial data fragment Effect splits multiple user characteristics in the same apparatus, and multi-user is avoided to cause asking for characteristic fuzzy diagnosis rate decline Topic.
Detailed description of the invention
Fig. 1 is the flow chart that user's identification is carried out using user log files;
Fig. 2 is the method schematic diagram that suspect's positioning is carried out using IPTV log file feature.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
A kind of IPTV log user identification method based on user behavior characteristics as shown in Figure 1, first pretreatment user Journal file carries out fragment according to user of the period to each IPTV device, the similar timeslice of feature is merged.So Afterwards to the user data fragment identified, characteristic is matched with treated respectively, the specific steps of which are as follows:
1. choosing continuous several days user journals as original number according to chronological order ordering user journal file According to.Initial data includes following data structure<deviceID, channelID, beginTime, and endTime>, deviceID table Show that the device number or user smart card card number of set-top box, channelID are the currently watched channel ID of user, beginTime is At the time of starting viewing, endTime is at the time of terminating viewing or cut platform.
2. smashing initial data for several time slices by clustering algorithm, user couple in each time slice is analyzed The similar time slice of feature is merged, obtains characteristic by the scoring of channel, specifically includes the following steps:
Beginning viewing time in 2.1 pairs of initial data clusters, and algorithm can use k- average algorithm (k-means Clustering), k period is obtained, { t is denoted as1, t2, t3..., tk};
2.2 for time period tn, scoring of the user user to each channel is calculated using scoring formula, score formula:
WhereinIndicate user user in tnThe channel list watched in period, d(user, channel)Indicate user User watches the total duration of channel channel, and c is indicatedEach of channel list channel, d(user, c)Indicate table Show the total duration of user user viewing channel c;
2.3 for ta, tbTwo different time sections calculate similarity using cosine formula, cosine formula:
Wherein A, B respectively indicate ta, tbThe scoring vector that period extracts
2.4 define threshold value betas, and similarity is merged less than period of β, and after being calculated using formula (1) and merging Score vector, and detailed process is as follows:
I. its similarity is calculated using formula (2) to all periods, connection forms a complete graph;
Ii. two periods that similarity is less than β are successively merged from big to small according to side right, and are calculated using formula (1) Merge the scoring vector of posterior nodal point and the similarity of the node and other nodes;
Iii. step ii is repeated, until there is no the period that similarity is less than β.
3. the Data duplication step 2 that pair needs identify obtains pre-matching data;
4. pre-matching data are matched with characteristic, original user device number is determined:
Each is needed to carry out characteristic obtained in matched pre-matching data and step 2 by 4.1 calculates similarity, Similarity calculation uses cosine similarity, and calculation formula is as follows:
Wherein A, B indicate the scoring vector that any two user data extracts;
After 4.2 pairs of sequencing of similarity, the highest characteristic of n similarity before choosing extracts station number un, obtain Sequence { u1, u2, u3..., un, statistical series { u1, u2, u3..., unIn the most user of number of repetition as recognition result Output.
Described shown in Fig. 2 it is a kind of using IPTV log file feature carry out suspect's positioning method, specifically include with Lower step:
1. delimiting identification range, after primarily determining area locating for suspect, the user of continuous several days this areas is chosen Log is as initial data;
2. preprocessed features data, initial data is smashed as several time slices by clustering algorithm, analysis is each The similar time slice of feature is merged, obtains characteristic by scoring of the user to channel in time slice;
3. handling suspicion personal data, the viewing log of IPTV is extracted from suspect place of abode, equally passes through clustering algorithm Initial data is smashed as several time slices, the similar time slice of feature is merged, suspicion personal data is obtained;
4. suspicion personal data is compared with characteristic, after sorting from large to small to similarity, similarity is taken earlier above Result obtain user list { u1, u2, u3..., un, suspect have it is biggish may conceal in these users, count Sequence { u1, u2, u3..., unIn the most user of number of repetition exported as recognition result.As analysis obtains two in Fig. 2 The suspect device of ranking earlier above, the similarity with suspicion personal data are respectively 81.8% and 18.2%;
The foregoing is merely the preferred embodiments of the application, are not intended to limit this application, all originals in the application Then with any modifications, equivalent replacements, and improvements made within spirit etc., it is included within the protection scope of the application.

Claims (6)

1. a kind of IPTV log user identification method based on user behavior characteristics, which comprises the following steps:
(1) log in continuous several days is chosen as initial data;
(2) initial data was smashed as several periods by clustering algorithm, analyzes in each period user to channel Scoring, the vector similar period that will score merge, and generate characteristic;
(3) the Data duplication step (2) identified to needs, obtains pre-matching data;
(4) pre-matching data are matched with characteristic, the most user of statistics number of repetition exports as recognition result.
2. IPTV log user identification method according to claim 1, which is characterized in that the content packet of the initial data Include: station number or user smart card card number, the currently watched channel ID of user start viewing time and viewing or cut platform At the time of.
3. IPTV log user identification method according to claim 1, which is characterized in that step (2) the generation feature Data include the following steps:
Beginning viewing time in 2.1 pairs of initial data is clustered using k- average algorithm (k-means clustering), Obtain k time period tk, it is denoted as { t1, t2, t3..., tn... tk};
2.2 for time period tn, user user is calculated to the scoring vector A of each channel using scoring formula(user, channel), comment Divide formula:
WhereinIndicate user user in tnThe channel list watched in period, d(user, channel)Indicate user user The total duration of channel channel is watched, c is indicatedEach of channel list channel;d(user, c)It indicates to use The total duration of family user viewing channel c;
2.3 for taAnd tbTwo different time sections calculate similarity using cosine formula, cosine formula:
Wherein A and B respectively indicate taAnd tbThe scoring vector that period extracts;
2.4 define threshold value beta, and the period by similarity less than β merges, and calculate the scoring vector after merging, acquired Scoring vector be characteristic.
4. IPTV log user identification method according to claim 3, which is characterized in that the merging process of step 2.4 has Body are as follows:
I. the similarity between all periods is calculated using formula (2), using the period as node, side right is between node Similarity, connection forms a complete graph two-by-two;
Ii. two periods that similarity is less than β are successively merged from big to small according to side right, and is calculated and is merged using formula (1) The similarity of the scoring vector of period and the period and other time section afterwards;
Iii. step ii is repeated, until there is no the period that similarity is less than β.
5. IPTV log user identification method according to claim 1, which is characterized in that the matching process of step 4 includes Following steps:
Each is needed to carry out characteristic obtained in matched pre-matching data and step 2 by 4.1 is calculated using formula (2) Similarity, after sequencing of similarity, the highest characteristic of n similarity before choosing extracts station number unObtain sequence {u1, u2, u3..., un};
4.2 statistical series { u1, u2, u3..., unIn the most user of number of repetition exported as recognition result.
6. IPTV log user identification method according to claim 1, which is characterized in that the pre-matching data and feature Data are scoring vector.
CN201910254105.2A 2019-03-30 2019-03-30 IPTV log user identification method based on user behavior characteristics Active CN109977265B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910254105.2A CN109977265B (en) 2019-03-30 2019-03-30 IPTV log user identification method based on user behavior characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910254105.2A CN109977265B (en) 2019-03-30 2019-03-30 IPTV log user identification method based on user behavior characteristics

Publications (2)

Publication Number Publication Date
CN109977265A true CN109977265A (en) 2019-07-05
CN109977265B CN109977265B (en) 2022-12-16

Family

ID=67081997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910254105.2A Active CN109977265B (en) 2019-03-30 2019-03-30 IPTV log user identification method based on user behavior characteristics

Country Status (1)

Country Link
CN (1) CN109977265B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102957969A (en) * 2012-05-18 2013-03-06 华东师范大学 Device and method for recommending program to IPTV (Internet protocol television) terminal user
CN103297853A (en) * 2013-06-07 2013-09-11 华东师范大学 IPTV (internet protocol television) program recommendation method based on context recognition for multiple users
CN105430504A (en) * 2015-11-27 2016-03-23 中国科学院深圳先进技术研究院 Family member mix identification method and system based on television watching log mining
CN109450882A (en) * 2018-10-26 2019-03-08 安徽继远软件有限公司 A kind of security management and control system and method for the internet behavior merging artificial intelligence and big data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102957969A (en) * 2012-05-18 2013-03-06 华东师范大学 Device and method for recommending program to IPTV (Internet protocol television) terminal user
CN103297853A (en) * 2013-06-07 2013-09-11 华东师范大学 IPTV (internet protocol television) program recommendation method based on context recognition for multiple users
CN105430504A (en) * 2015-11-27 2016-03-23 中国科学院深圳先进技术研究院 Family member mix identification method and system based on television watching log mining
CN109450882A (en) * 2018-10-26 2019-03-08 安徽继远软件有限公司 A kind of security management and control system and method for the internet behavior merging artificial intelligence and big data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
STOLERMAN A ET AL.: "Breaking the closed-world assumption in stylometric authorship attribution", 《INTERNATIONAL CONFERENCE ON DIGITAL FORENSICS》 *
李红波等: "Web访问挖掘中的匿名用户识别算法研究", 《西南师范大学学报(自然科学版)》 *

Also Published As

Publication number Publication date
CN109977265B (en) 2022-12-16

Similar Documents

Publication Publication Date Title
CN105488478B (en) Face recognition system and method
CN106953887B (en) Fine-grained radio station audio content personalized organization recommendation method
Weng et al. Event detection in twitter
DE60120417T2 (en) METHOD FOR SEARCHING IN AN AUDIO DATABASE
DE60302651T2 (en) FAST HASH-BASED METADATA RETRIEVAL FOR MULTIMEDIA OBJECTS
US8995823B2 (en) Method and system for content relevance score determination
CN105744292B (en) A kind of processing method and processing device of video data
CN110119711A (en) A kind of method, apparatus and electronic equipment obtaining video data personage segment
US8457368B2 (en) System and method of object recognition and database population for video indexing
US9098807B1 (en) Video content claiming classifier
EP3367676A1 (en) Video content analysis for automatic demographics recognition of users and videos
CN107682719A (en) A kind of monitoring and assessing method and device of live content health degree
EP1081960A1 (en) Signal processing method and video/voice processing device
US8411964B2 (en) Method and apparatus for analyzing nudity of image using body part detection model, and method and apparatus for managing image database based on nudity and body parts
EP1955458A2 (en) Social and interactive applications for mass media
CN104331493B (en) By the computer implemented method and device that data are explained for generating trend
CN113709527B (en) Method and device for paying attention to anchor in multi-anchor scene
CN110730473B (en) WiFi activity recognition-oriented signal feature extraction method
Wang et al. Identifying relevant event content for real-time event detection
CN108491496A (en) A kind of processing method and processing device of promotion message
Hinami et al. Audience Behavior Mining by Integrating TV Ratings with Multimedia Contents
Sandhu et al. Summarizing Videos by Key frame extraction using SSIM and other Visual Features
Elsawy et al. Tweetmogaz v2: Identifying news stories in social media
CN109977265A (en) A kind of IPTV log user identification method based on user behavior characteristics
CN111163366A (en) Video processing method and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant