CN107239573A - Data filtering method - Google Patents

Data filtering method Download PDF

Info

Publication number
CN107239573A
CN107239573A CN201710509189.0A CN201710509189A CN107239573A CN 107239573 A CN107239573 A CN 107239573A CN 201710509189 A CN201710509189 A CN 201710509189A CN 107239573 A CN107239573 A CN 107239573A
Authority
CN
China
Prior art keywords
data
access
object data
target
threshold value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710509189.0A
Other languages
Chinese (zh)
Inventor
王加锋
冯方方
孙健
刘斌
付强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Universal Wisdom Technology Beijing Co Ltd
Original Assignee
Universal Wisdom Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Universal Wisdom Technology Beijing Co Ltd filed Critical Universal Wisdom Technology Beijing Co Ltd
Priority to CN201710509189.0A priority Critical patent/CN107239573A/en
Publication of CN107239573A publication Critical patent/CN107239573A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present embodiments relate to a kind of data filtering method, including:Obtain the target data to be screened towards the first user;It is determined that the access log of each target data to be screened;Access log includes the timestamp of playing duration, the IP address of terminal of access target data and the access action of target data;Verify the form of ID and destination object ID in the access log of first object data;When being verified, timestamp of the same IP address of terminal to the access action of same target data is counted, and calculate the access frequency for obtaining first object data;Determine whether the access frequency of first object data exceedes preset frequency threshold value;When the access frequency of first object data exceedes preset frequency threshold value, the first data attribute is added to first object data;First data attribute is to represent that first object data are invalid data;First object data are deleted from target data to be screened.

Description

Data filtering method
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of data filtering method.
Background technology
With the rapid development of Internet, the network data increasingly expanded makes Internet user gradually get lost in information Among ocean.Therefore, various Personalized Service Technologies are suggested, different services are provided for different users, to meet not Same demand.Collaborative filtering recommending (Collaborative Filtering recommendation) is in information filtering and letter A technology being popular is quickly becoming in breath system.Recommended with traditional Cempetency-based education Direct Analysis content Difference, collaborative filtering analysis user interest, similar (interest) user of specified user is found in customer group, these are integrated similar Evaluation of the user to a certain information, forms system and specifies user to predict the fancy grade of this information this.
Mix wherein however, usually having some invalid datas, cause collaborative filtering result inaccurate, cause prediction to be tied Fruit and physical presence deviation.
The content of the invention
It is an object of the invention to provide a kind of data filtering method, it can be identified and screen for data, filter out Data are imitated, so as to ensure to be subsequently used for the data validity of data calculating.
To achieve the above object, the invention provides a kind of data filtering method, including:
Obtain the target data to be screened towards the first user;
It is determined that the access log of each target data to be screened;The access log includes the broadcasting of the target data The timestamp of duration, the IP address of terminal for accessing the target data and access action;
Verify the form of ID and the destination object ID described in the access log of first object data;
When being verified, timestamp of the same IP address of terminal to the access action of same target data is counted, and count Calculate the access frequency for obtaining the first object data;
Determine whether the access frequency of the first object data exceedes preset frequency threshold value;
When the access frequency of the first object data exceedes preset frequency threshold value, the first object data are added First data attribute;First data attribute is to represent that the first object data are invalid data;
The first object data are deleted from the target data to be screened.
It is preferred that, methods described also includes:
When the checking is obstructed out-of-date, first data attribute is added to the first object data;
According to first data attribute, the first object data are deleted from the target data to be screened.
It is preferred that, the access log also includes:The ID of first user and the target pair of the target data As ID form;Before first data attribute of addition to the first object data, methods described also includes:
Determine whether the playing duration of the first object data exceedes effective reproduction time threshold value;
When the playing duration of the first object data is no more than effective reproduction time threshold value, to first mesh Mark data and add the first data attribute.
It is further preferred that methods described also includes:
When the playing duration of the first object data exceedes effective reproduction time threshold value, by the first object Data are added to valid data set.
It is preferred that, the checking ID and the destination object ID form are specially:
The data check of the ID and the form of the destination object ID is carried out by canonical mode.
It is preferred that, when the access frequency of the first object data is no more than preset frequency threshold value, by first mesh Mark data and be added to valid data set.
Data filtering method provided in an embodiment of the present invention, passes through the data format and data access frequency to target data Filtering filter out invalid data, valid data are determined, so as to ensure to be subsequently used for the data validity of data calculating.
Brief description of the drawings
Fig. 1 is the flow chart of data filtering method provided in an embodiment of the present invention.
Embodiment
Below by drawings and examples, technical scheme is described in further detail.
Data filtering method provided in an embodiment of the present invention, can be used in the filtering and screening of automatic data validity.
With reference to the flow chart of the data filtering method shown in Fig. 1, the application serviced with user oriented data filtering Exemplified by scene, data filtering method provided in an embodiment of the present invention is illustrated.
As shown in figure 1, the data filtering method of the present invention comprises the following steps:
Step 110, the target data to be screened towards the first user is obtained;
Specifically, in the present embodiment, target data is stored according to user property.Each user has a target The database of data, to store target data.
In specific example, for example, the film of user is being watched in the scene that hobby carries out data filtering, number of targets According to the film information that can be user's viewing film, such as film title, film ID, protagonist title etc.;In the purchase to user Thing hobby carry out data filtering scene in, target data can be user pay close attention to commodity merchandise news, such as trade name, Commodity ID etc..For different application scenarios, target data can be different, but the method for the present invention goes for a variety of fields Scape.
Because target data is stored based on ID, it is possible to get required progress by ID The target data of screening.
Step 120, it is determined that the access log of each target data to be screened;
Specifically, access log is generated when target data is accessed, checked.
Access log can include the target pair of playing duration, the ID of the first user and the target data of target data Form as ID etc..
Wherein, the playing duration of target data is not restricted to the concept of the broadcasting shown by its literal meaning.Such as, it is right It is the situation of the film information of user's viewing film in target data, playing duration can be the time that user watches film;Again Such as when target data is that user pays close attention to the merchandise news of commodity, playing duration can refer to that user rests on commodity The time checked on the page, or accumulative within certain period check the time.
The ID of user and the destination object ID for the target data checked form also visiting by corresponding all record Ask in daily record.The destination object ID of target data mentioned here refers to the unique identification information of target data.Such as commodity ID, Film ID etc..
Step 130, the form of ID and destination object ID in the access log of first object data are verified, data are determined Whether format verification passes through;
Specifically, when carrying out data filtering, being verified first to data format in this example, determining number of targets to be screened According to data format it is whether correct.
In the specific implementation, carrying out data format checking can be realized by canonical mode.
When data format is verified, step 140 is performed, when data format verifies obstructed out-of-date, step 170 is performed.
Step 140, timestamp of the same IP address of terminal to the access action of same target data is counted, and calculating is obtained The access frequency of the first object data;
Specifically, a class invalid data may be mixed into target data to be screened, such as user's simulated injection or The data reported frequently are called, these data need what is filtered.
It can specifically be judged by the associated IP address of terminal of target data and the timestamp of access action.For example may be used To count timestamp of the same IP address of terminal to the access action of same target data, it is determined that to the access frequency of the target data Whether rate exceedes preset frequency threshold value.
When target data is accessed, a timestamp can all be added every time by accessing, therefore can be counted in the time of one end The quantity of timestamp calculate average access frequency during this period of time.If access frequency is too high, illustrating to have very much can The energy data are to be called frequently the data reported, are the data of abnormal access.Therefore need to reject.
Step 150, determine whether the access frequency of first object data exceedes preset frequency threshold value;
When the access frequency of first object data exceedes preset frequency threshold value, step 170 is performed.
When the access frequency of first object data is no more than preset frequency threshold value, step 160 is performed.
Step 160, it is valid data to determine first object data;
Specifically, can be to being defined as the first object data interpolation data attributes of valid data, to represent it to have Imitate data.Or first object data can also be added in the data list of valid data, when subsequently carrying out data processing, Directly valid data are obtained by obtaining the data in data list.
Step 170, the first data attribute is added to first object data;
Specifically, the first data attribute is to represent that first object data are invalid data.By being added for target data Data attribute identifies the data for invalid data.
Step 180, according to the first data attribute, first object data are deleted from target data to be screened.
Step 170 can certainly be skipped, directly deletes invalid first object data from target data to be screened Remove.
Further, it is also possible to by setting effective reproduction time threshold value, can the data too short to reproduction time screen out. Because if user's viewing time is too short, although be the access for having carried out target data, but the true of user can not be objectively responded Real interest, and often because the not interested situation for just occurring that viewing time is too short.
Such as, exemplified by watching film, viewing time is less than 1 minute, or less than 3 minutes, can not reflect user's Interest.
And exemplified by consulting commodity, user is less than 5 seconds in commodity page residence time, it is believed that user is to the business Product are not interested, therefore can set 5 seconds this time as reproduction time threshold value.
In order to more accurately carry out data filtering, for different types of target data, it can set different effective Reproduction time threshold value.Corresponding effectively reproduction time threshold value can be specifically determined according to the destination object ID of target data.
When the playing duration of first object data exceedes effective reproduction time threshold value, it is effective to determine first object data Data;
When the playing duration of first object data is no more than effective reproduction time threshold value, to first object data addition the One data attribute, and according to the first data attribute, first object data are deleted from target data to be screened.
Data filtering method provided in an embodiment of the present invention, passes through the data format and data access frequency to target data Filtering filter out invalid data, valid data are determined, so as to ensure to be subsequently used for the data validity of data calculating.
Professional should further appreciate that, each example described with reference to the embodiments described herein Unit and algorithm steps, can be realized with electronic hardware, computer software or the combination of the two, hard in order to clearly demonstrate The interchangeability of part and software, generally describes the composition and step of each example according to function in the above description. These functions are performed with hardware or software mode actually, depending on the application-specific and design constraint of technical scheme. Professional and technical personnel can realize described function to each specific application using distinct methods, but this realize It is not considered that beyond the scope of this invention.
The method that is described with reference to the embodiments described herein can use hardware, computing device the step of algorithm Software module, or the two combination are implemented.Software module can be placed in random access memory (RAM), internal memory, read-only storage (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field In any other form of storage medium well known to interior.
Above-described embodiment, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc. all should be included Within protection scope of the present invention.

Claims (6)

1. a kind of data filtering method, it is characterised in that methods described includes:
Obtain the target data to be screened towards the first user;
It is determined that the access log of each target data to be screened;When the access log includes the broadcasting of the target data IP address of terminal that is long, accessing the target data and the timestamp of access action;
Verify the form of ID and the destination object ID described in the access log of first object data;
When being verified, timestamp of the same IP address of terminal to the access action of same target data is counted, and calculate To the access frequency of the first object data;
Determine whether the access frequency of the first object data exceedes preset frequency threshold value;
When the access frequency of the first object data exceedes preset frequency threshold value, described are added to the first object data First data attribute;First data attribute is to represent that the first object data are invalid data;
The first object data are deleted from the target data to be screened.
2. data filtering method according to claim 1, it is characterised in that methods described also includes:
When the checking is obstructed out-of-date, first data attribute is added to the first object data;
According to first data attribute, the first object data are deleted from the target data to be screened.
3. data filtering method according to claim 1, it is characterised in that the access log also includes:Described first The form of the ID of user and the destination object ID of the target data;Described to first object data addition first Before data attribute, methods described also includes:
Determine whether the playing duration of the first object data exceedes effective reproduction time threshold value;
When the playing duration of the first object data is no more than effective reproduction time threshold value, to the first object number According to adding the first data attribute.
4. data filtering method according to claim 3, it is characterised in that methods described also includes:
When the playing duration of the first object data exceedes effective reproduction time threshold value, by the first object data Added to valid data set.
5. data filtering method according to claim 1, it is characterised in that the checking ID and the target The form of object ID is specially:
The data check of the ID and the form of the destination object ID is carried out by canonical mode.
6. data filtering method according to claim 1, it is characterised in that when the access frequency of the first object data During no more than preset frequency threshold value, the first object data are added to valid data set.
CN201710509189.0A 2017-06-28 2017-06-28 Data filtering method Pending CN107239573A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710509189.0A CN107239573A (en) 2017-06-28 2017-06-28 Data filtering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710509189.0A CN107239573A (en) 2017-06-28 2017-06-28 Data filtering method

Publications (1)

Publication Number Publication Date
CN107239573A true CN107239573A (en) 2017-10-10

Family

ID=59990075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710509189.0A Pending CN107239573A (en) 2017-06-28 2017-06-28 Data filtering method

Country Status (1)

Country Link
CN (1) CN107239573A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063007A (en) * 2018-07-10 2018-12-21 阿里巴巴集团控股有限公司 A kind of exchange medium cleaning method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100318546A1 (en) * 2009-06-16 2010-12-16 Microsoft Corporation Synopsis of a search log that respects user privacy
CN104135475A (en) * 2014-07-18 2014-11-05 国家电网公司 Safety protection method of electric power information for mobile Internet
CN105718545A (en) * 2016-01-18 2016-06-29 合一网络技术(北京)有限公司 Recommendation method and device of multimedia resources
CN106021609A (en) * 2016-06-24 2016-10-12 武汉斗鱼网络科技有限公司 Method and device for intelligently recommending website videos

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100318546A1 (en) * 2009-06-16 2010-12-16 Microsoft Corporation Synopsis of a search log that respects user privacy
CN104135475A (en) * 2014-07-18 2014-11-05 国家电网公司 Safety protection method of electric power information for mobile Internet
CN105718545A (en) * 2016-01-18 2016-06-29 合一网络技术(北京)有限公司 Recommendation method and device of multimedia resources
CN106021609A (en) * 2016-06-24 2016-10-12 武汉斗鱼网络科技有限公司 Method and device for intelligently recommending website videos

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063007A (en) * 2018-07-10 2018-12-21 阿里巴巴集团控股有限公司 A kind of exchange medium cleaning method and device

Similar Documents

Publication Publication Date Title
CN107220382A (en) Data analysing method
US9305145B2 (en) Site directed management of audio components of uploaded video files
US10643250B2 (en) Controlling effectiveness of online video advertisement campaign
CA3112126A1 (en) Methods and apparatus to monitor media presentations
US20110231522A1 (en) Distributed digital media metering & reporting system
CN105897671A (en) Anti-hotlinking method and system
CN109982068A (en) Synthetic video method for evaluating quality, device, equipment and medium
CN113680074B (en) Service information pushing method and device, electronic equipment and readable medium
CN107239573A (en) Data filtering method
CN107220383A (en) Data filtering method
US9438610B2 (en) Anti-tampering server
CN106294765A (en) Process the method and device of news data
KR102626741B1 (en) Method of recommanding products based on user activities
CN106934708B (en) Event recording method and device
CN113268690B (en) Method and system for safely filtering website short video playing information
CN110648156A (en) Advertisement processing method, device and equipment
CN107798134A (en) A kind of data filtering method, device, equipment and storage medium
CN104506892B (en) Data adjustment method and device
CN108629610B (en) Method and device for determining popularization information exposure
Nasution et al. Investigating Social Media User Activity on Android Smartphone
CN112819434A (en) Data content auditing method and device
CN107609926B (en) Digital resource transaction system and method for multiple channel users
WO2023237665A1 (en) System and method for calculating a distributor quality score
CN111145354B (en) BIM data model identification method and device
CN110516084A (en) Multimedia related information determines method, apparatus, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171010

WD01 Invention patent application deemed withdrawn after publication