CN107239573A - Data filtering method - Google Patents
Data filtering method Download PDFInfo
- Publication number
- CN107239573A CN107239573A CN201710509189.0A CN201710509189A CN107239573A CN 107239573 A CN107239573 A CN 107239573A CN 201710509189 A CN201710509189 A CN 201710509189A CN 107239573 A CN107239573 A CN 107239573A
- Authority
- CN
- China
- Prior art keywords
- data
- access
- object data
- target
- threshold value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present embodiments relate to a kind of data filtering method, including:Obtain the target data to be screened towards the first user;It is determined that the access log of each target data to be screened;Access log includes the timestamp of playing duration, the IP address of terminal of access target data and the access action of target data;Verify the form of ID and destination object ID in the access log of first object data;When being verified, timestamp of the same IP address of terminal to the access action of same target data is counted, and calculate the access frequency for obtaining first object data;Determine whether the access frequency of first object data exceedes preset frequency threshold value;When the access frequency of first object data exceedes preset frequency threshold value, the first data attribute is added to first object data;First data attribute is to represent that first object data are invalid data;First object data are deleted from target data to be screened.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of data filtering method.
Background technology
With the rapid development of Internet, the network data increasingly expanded makes Internet user gradually get lost in information
Among ocean.Therefore, various Personalized Service Technologies are suggested, different services are provided for different users, to meet not
Same demand.Collaborative filtering recommending (Collaborative Filtering recommendation) is in information filtering and letter
A technology being popular is quickly becoming in breath system.Recommended with traditional Cempetency-based education Direct Analysis content
Difference, collaborative filtering analysis user interest, similar (interest) user of specified user is found in customer group, these are integrated similar
Evaluation of the user to a certain information, forms system and specifies user to predict the fancy grade of this information this.
Mix wherein however, usually having some invalid datas, cause collaborative filtering result inaccurate, cause prediction to be tied
Fruit and physical presence deviation.
The content of the invention
It is an object of the invention to provide a kind of data filtering method, it can be identified and screen for data, filter out
Data are imitated, so as to ensure to be subsequently used for the data validity of data calculating.
To achieve the above object, the invention provides a kind of data filtering method, including:
Obtain the target data to be screened towards the first user;
It is determined that the access log of each target data to be screened;The access log includes the broadcasting of the target data
The timestamp of duration, the IP address of terminal for accessing the target data and access action;
Verify the form of ID and the destination object ID described in the access log of first object data;
When being verified, timestamp of the same IP address of terminal to the access action of same target data is counted, and count
Calculate the access frequency for obtaining the first object data;
Determine whether the access frequency of the first object data exceedes preset frequency threshold value;
When the access frequency of the first object data exceedes preset frequency threshold value, the first object data are added
First data attribute;First data attribute is to represent that the first object data are invalid data;
The first object data are deleted from the target data to be screened.
It is preferred that, methods described also includes:
When the checking is obstructed out-of-date, first data attribute is added to the first object data;
According to first data attribute, the first object data are deleted from the target data to be screened.
It is preferred that, the access log also includes:The ID of first user and the target pair of the target data
As ID form;Before first data attribute of addition to the first object data, methods described also includes:
Determine whether the playing duration of the first object data exceedes effective reproduction time threshold value;
When the playing duration of the first object data is no more than effective reproduction time threshold value, to first mesh
Mark data and add the first data attribute.
It is further preferred that methods described also includes:
When the playing duration of the first object data exceedes effective reproduction time threshold value, by the first object
Data are added to valid data set.
It is preferred that, the checking ID and the destination object ID form are specially:
The data check of the ID and the form of the destination object ID is carried out by canonical mode.
It is preferred that, when the access frequency of the first object data is no more than preset frequency threshold value, by first mesh
Mark data and be added to valid data set.
Data filtering method provided in an embodiment of the present invention, passes through the data format and data access frequency to target data
Filtering filter out invalid data, valid data are determined, so as to ensure to be subsequently used for the data validity of data calculating.
Brief description of the drawings
Fig. 1 is the flow chart of data filtering method provided in an embodiment of the present invention.
Embodiment
Below by drawings and examples, technical scheme is described in further detail.
Data filtering method provided in an embodiment of the present invention, can be used in the filtering and screening of automatic data validity.
With reference to the flow chart of the data filtering method shown in Fig. 1, the application serviced with user oriented data filtering
Exemplified by scene, data filtering method provided in an embodiment of the present invention is illustrated.
As shown in figure 1, the data filtering method of the present invention comprises the following steps:
Step 110, the target data to be screened towards the first user is obtained;
Specifically, in the present embodiment, target data is stored according to user property.Each user has a target
The database of data, to store target data.
In specific example, for example, the film of user is being watched in the scene that hobby carries out data filtering, number of targets
According to the film information that can be user's viewing film, such as film title, film ID, protagonist title etc.;In the purchase to user
Thing hobby carry out data filtering scene in, target data can be user pay close attention to commodity merchandise news, such as trade name,
Commodity ID etc..For different application scenarios, target data can be different, but the method for the present invention goes for a variety of fields
Scape.
Because target data is stored based on ID, it is possible to get required progress by ID
The target data of screening.
Step 120, it is determined that the access log of each target data to be screened;
Specifically, access log is generated when target data is accessed, checked.
Access log can include the target pair of playing duration, the ID of the first user and the target data of target data
Form as ID etc..
Wherein, the playing duration of target data is not restricted to the concept of the broadcasting shown by its literal meaning.Such as, it is right
It is the situation of the film information of user's viewing film in target data, playing duration can be the time that user watches film;Again
Such as when target data is that user pays close attention to the merchandise news of commodity, playing duration can refer to that user rests on commodity
The time checked on the page, or accumulative within certain period check the time.
The ID of user and the destination object ID for the target data checked form also visiting by corresponding all record
Ask in daily record.The destination object ID of target data mentioned here refers to the unique identification information of target data.Such as commodity ID,
Film ID etc..
Step 130, the form of ID and destination object ID in the access log of first object data are verified, data are determined
Whether format verification passes through;
Specifically, when carrying out data filtering, being verified first to data format in this example, determining number of targets to be screened
According to data format it is whether correct.
In the specific implementation, carrying out data format checking can be realized by canonical mode.
When data format is verified, step 140 is performed, when data format verifies obstructed out-of-date, step 170 is performed.
Step 140, timestamp of the same IP address of terminal to the access action of same target data is counted, and calculating is obtained
The access frequency of the first object data;
Specifically, a class invalid data may be mixed into target data to be screened, such as user's simulated injection or
The data reported frequently are called, these data need what is filtered.
It can specifically be judged by the associated IP address of terminal of target data and the timestamp of access action.For example may be used
To count timestamp of the same IP address of terminal to the access action of same target data, it is determined that to the access frequency of the target data
Whether rate exceedes preset frequency threshold value.
When target data is accessed, a timestamp can all be added every time by accessing, therefore can be counted in the time of one end
The quantity of timestamp calculate average access frequency during this period of time.If access frequency is too high, illustrating to have very much can
The energy data are to be called frequently the data reported, are the data of abnormal access.Therefore need to reject.
Step 150, determine whether the access frequency of first object data exceedes preset frequency threshold value;
When the access frequency of first object data exceedes preset frequency threshold value, step 170 is performed.
When the access frequency of first object data is no more than preset frequency threshold value, step 160 is performed.
Step 160, it is valid data to determine first object data;
Specifically, can be to being defined as the first object data interpolation data attributes of valid data, to represent it to have
Imitate data.Or first object data can also be added in the data list of valid data, when subsequently carrying out data processing,
Directly valid data are obtained by obtaining the data in data list.
Step 170, the first data attribute is added to first object data;
Specifically, the first data attribute is to represent that first object data are invalid data.By being added for target data
Data attribute identifies the data for invalid data.
Step 180, according to the first data attribute, first object data are deleted from target data to be screened.
Step 170 can certainly be skipped, directly deletes invalid first object data from target data to be screened
Remove.
Further, it is also possible to by setting effective reproduction time threshold value, can the data too short to reproduction time screen out.
Because if user's viewing time is too short, although be the access for having carried out target data, but the true of user can not be objectively responded
Real interest, and often because the not interested situation for just occurring that viewing time is too short.
Such as, exemplified by watching film, viewing time is less than 1 minute, or less than 3 minutes, can not reflect user's
Interest.
And exemplified by consulting commodity, user is less than 5 seconds in commodity page residence time, it is believed that user is to the business
Product are not interested, therefore can set 5 seconds this time as reproduction time threshold value.
In order to more accurately carry out data filtering, for different types of target data, it can set different effective
Reproduction time threshold value.Corresponding effectively reproduction time threshold value can be specifically determined according to the destination object ID of target data.
When the playing duration of first object data exceedes effective reproduction time threshold value, it is effective to determine first object data
Data;
When the playing duration of first object data is no more than effective reproduction time threshold value, to first object data addition the
One data attribute, and according to the first data attribute, first object data are deleted from target data to be screened.
Data filtering method provided in an embodiment of the present invention, passes through the data format and data access frequency to target data
Filtering filter out invalid data, valid data are determined, so as to ensure to be subsequently used for the data validity of data calculating.
Professional should further appreciate that, each example described with reference to the embodiments described herein
Unit and algorithm steps, can be realized with electronic hardware, computer software or the combination of the two, hard in order to clearly demonstrate
The interchangeability of part and software, generally describes the composition and step of each example according to function in the above description.
These functions are performed with hardware or software mode actually, depending on the application-specific and design constraint of technical scheme.
Professional and technical personnel can realize described function to each specific application using distinct methods, but this realize
It is not considered that beyond the scope of this invention.
The method that is described with reference to the embodiments described herein can use hardware, computing device the step of algorithm
Software module, or the two combination are implemented.Software module can be placed in random access memory (RAM), internal memory, read-only storage
(ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field
In any other form of storage medium well known to interior.
Above-described embodiment, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect
Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention
Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc. all should be included
Within protection scope of the present invention.
Claims (6)
1. a kind of data filtering method, it is characterised in that methods described includes:
Obtain the target data to be screened towards the first user;
It is determined that the access log of each target data to be screened;When the access log includes the broadcasting of the target data
IP address of terminal that is long, accessing the target data and the timestamp of access action;
Verify the form of ID and the destination object ID described in the access log of first object data;
When being verified, timestamp of the same IP address of terminal to the access action of same target data is counted, and calculate
To the access frequency of the first object data;
Determine whether the access frequency of the first object data exceedes preset frequency threshold value;
When the access frequency of the first object data exceedes preset frequency threshold value, described are added to the first object data
First data attribute;First data attribute is to represent that the first object data are invalid data;
The first object data are deleted from the target data to be screened.
2. data filtering method according to claim 1, it is characterised in that methods described also includes:
When the checking is obstructed out-of-date, first data attribute is added to the first object data;
According to first data attribute, the first object data are deleted from the target data to be screened.
3. data filtering method according to claim 1, it is characterised in that the access log also includes:Described first
The form of the ID of user and the destination object ID of the target data;Described to first object data addition first
Before data attribute, methods described also includes:
Determine whether the playing duration of the first object data exceedes effective reproduction time threshold value;
When the playing duration of the first object data is no more than effective reproduction time threshold value, to the first object number
According to adding the first data attribute.
4. data filtering method according to claim 3, it is characterised in that methods described also includes:
When the playing duration of the first object data exceedes effective reproduction time threshold value, by the first object data
Added to valid data set.
5. data filtering method according to claim 1, it is characterised in that the checking ID and the target
The form of object ID is specially:
The data check of the ID and the form of the destination object ID is carried out by canonical mode.
6. data filtering method according to claim 1, it is characterised in that when the access frequency of the first object data
During no more than preset frequency threshold value, the first object data are added to valid data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710509189.0A CN107239573A (en) | 2017-06-28 | 2017-06-28 | Data filtering method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710509189.0A CN107239573A (en) | 2017-06-28 | 2017-06-28 | Data filtering method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107239573A true CN107239573A (en) | 2017-10-10 |
Family
ID=59990075
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710509189.0A Pending CN107239573A (en) | 2017-06-28 | 2017-06-28 | Data filtering method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107239573A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109063007A (en) * | 2018-07-10 | 2018-12-21 | 阿里巴巴集团控股有限公司 | A kind of exchange medium cleaning method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100318546A1 (en) * | 2009-06-16 | 2010-12-16 | Microsoft Corporation | Synopsis of a search log that respects user privacy |
CN104135475A (en) * | 2014-07-18 | 2014-11-05 | 国家电网公司 | Safety protection method of electric power information for mobile Internet |
CN105718545A (en) * | 2016-01-18 | 2016-06-29 | 合一网络技术(北京)有限公司 | Recommendation method and device of multimedia resources |
CN106021609A (en) * | 2016-06-24 | 2016-10-12 | 武汉斗鱼网络科技有限公司 | Method and device for intelligently recommending website videos |
-
2017
- 2017-06-28 CN CN201710509189.0A patent/CN107239573A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100318546A1 (en) * | 2009-06-16 | 2010-12-16 | Microsoft Corporation | Synopsis of a search log that respects user privacy |
CN104135475A (en) * | 2014-07-18 | 2014-11-05 | 国家电网公司 | Safety protection method of electric power information for mobile Internet |
CN105718545A (en) * | 2016-01-18 | 2016-06-29 | 合一网络技术(北京)有限公司 | Recommendation method and device of multimedia resources |
CN106021609A (en) * | 2016-06-24 | 2016-10-12 | 武汉斗鱼网络科技有限公司 | Method and device for intelligently recommending website videos |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109063007A (en) * | 2018-07-10 | 2018-12-21 | 阿里巴巴集团控股有限公司 | A kind of exchange medium cleaning method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109672939B (en) | Method and device for marking video content popularity | |
CN107220382A (en) | Data analysing method | |
US9305145B2 (en) | Site directed management of audio components of uploaded video files | |
US10643250B2 (en) | Controlling effectiveness of online video advertisement campaign | |
CN108304426B (en) | Identification obtaining method and device | |
US20110231522A1 (en) | Distributed digital media metering & reporting system | |
CN105897671A (en) | Anti-hotlinking method and system | |
WO2023237665A1 (en) | System and method for calculating a distributor quality score | |
CN109982068A (en) | Synthetic video method for evaluating quality, device, equipment and medium | |
CN113680074B (en) | Service information pushing method and device, electronic equipment and readable medium | |
CN107239573A (en) | Data filtering method | |
CN107220383A (en) | Data filtering method | |
US9438610B2 (en) | Anti-tampering server | |
KR102626741B1 (en) | Method of recommanding products based on user activities | |
CN106934708B (en) | Event recording method and device | |
CN108073640A (en) | Page push method and system | |
CN113268690B (en) | Method and system for safely filtering website short video playing information | |
CN110648156A (en) | Advertisement processing method, device and equipment | |
CN107798134A (en) | A kind of data filtering method, device, equipment and storage medium | |
CN104506892B (en) | Data adjustment method and device | |
CN108629610B (en) | Method and device for determining popularization information exposure | |
Nasution et al. | Investigating social media user activity on android smartphone | |
CN107609926B (en) | Digital resource transaction system and method for multiple channel users | |
CN111145354B (en) | BIM data model identification method and device | |
CN110968785B (en) | Target account identification method and device, storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171010 |
|
WD01 | Invention patent application deemed withdrawn after publication |