Disclosure of Invention
The present invention is directed to a computer data processing system based on big data analysis, so as to solve the problems mentioned in the background art.
In order to solve the technical problems, the invention provides the following technical scheme: a computer data processing system based on big data analysis comprises a storage space analysis module, a target data selection module and a state information analysis module, wherein the storage space analysis module acquires video download data stored by a current computer as analysis data, if the storage space occupied by the analysis data is larger than a storage space threshold value, the target data selection module is enabled to analyze the analysis data, if the time interval duration between the latest triggered time of a certain video download data and the current time in the analysis data is larger than a duration threshold value, the video download data is the target data, and the state information analysis module analyzes the state information of the target data and judges whether the target data is deleted.
The state information analysis module comprises a sorting module, a demarcation data selection module, a set dividing module, a set analysis module and a deletion control module, wherein the sorting module acquires download initiating time of video download data downloaded by a computer history, the video download data are sequentially sorted from front to back according to the download initiating time to obtain sort sorting, the demarcation data selection module selects the demarcation data in the sort sorting, if the time interval between the download initiating times of two adjacent video download data in the sort sorting is greater than an interval threshold value, the front one of the two video download data in the sort sorting is the demarcation data, the set dividing module divides a plurality of download sets according to the positions of the demarcation data in the sort sorting, wherein the video download data in one download set is the video download data between two adjacent demarcation data in the sort sorting, the back one of the two adjacent demarcation data in the sort sorting is the demarcation data, the set analysis module sets the download set in which the target data are located as a central set, wherein the download set in which the target data are located in the central set is the reference set, the reference set is used for inquiring whether the target data set influencing the download data in the target data, and the target data is a target data, and the target data is deleted control module; and when the target data is the second data, directly deleting the target data.
Further, the set analysis module includes an influence threshold comparison module, a preferred data selection module, a reference index calculation module, a first index comparison module and an effective analysis module, where the influence threshold comparison module is configured to count the number of influence sets, if the number of influence sets is smaller than the influence threshold, the target data is the first data, otherwise, the preferred data selection module obtains a condition that each video download data is effectively triggered, and if a time interval duration between a latest effective triggering time of a certain video download data and a current time is smaller than or equal to a duration threshold, the video download data is the preferred data, where a certain video download data is opened by a user and viewed at a certain time and is effectively triggered at the time, the reference index calculation module calculates a reference index of the center set
Wherein m is the number of reference sets, c
i The number of data downloaded for the video in the ith reference set,
F
i is the ith parameterNumber of preferred data, H, for video download data in an exam
i The first index calculation module calculates a first index P = u/v of the center set for the number of the video download data in the ith reference set, wherein u is the number of the preferred data in the center set, and v is the number of the video download data in the center set, the first index comparison module compares the first index of the center set with the reference index, if the first index of the center set is smaller than the reference index, the target data is the first data, and otherwise, the effective analysis module analyzes that the video of the reference set is effectively triggered.
Further, the effective analysis module includes a passive data judgment module, a passive index calculation module and an attention passive index calculation module, where the passive data judgment module refers to a preferred data in the set in a recent preset time period, when a certain time of effective triggering of a certain video download data in the influence set is after the certain video download data in the influence set is effectively triggered, the preferred data is the attention data, the attention data effectively triggered at the time is the passive data, the passive index calculation module calculates an influence factor of the certain attention data for the time of the passive index by taking the continuous number of the video download data in the influence set which are effectively triggered before the certain attention data is effectively triggered when the certain attention data is the passive data, and calculates the passive index of the certain attention data
Wherein e is the number of times that the concerned data is passive data in the latest preset time period, N is the number of times that the concerned data is effectively triggered in the latest preset time period, w is the average number of influence factors when the concerned data is passive data in the latest preset time period, and the concerned passive index calculation and comparison module calculates the concerned passive index of the preferred data of the reference set
Wherein S is the number of concerned data in the preferred data, T
x Is the average value of the passive indexes of all the concerned data, R is the number of the preferred data, if the concerned passive index of the preferred data of the reference set is smaller than the passive indexAnd (4) threshold value, wherein the target data is the first data, and otherwise, the target data is the second data.
Further, the data processing system adopts a data processing method, and the data processing method comprises the following steps:
acquiring the video download data stored in the current computer as analysis data, if the storage space occupied by the analysis data is larger than the storage space threshold value,
if the time interval duration between the last triggered time of a certain video downloading data and the current time is greater than the time duration threshold value in the analysis data, the video downloading data is the target data,
and analyzing the state information of the target data and judging whether the target data needs to be deleted or not.
Further, the analyzing the state information of the target data includes:
acquiring download initiation time of video download data downloaded by a computer in history, sorting the video download data in sequence from front to back according to the download initiation time to obtain sort order,
in the sorting order, if the time interval between the download initiation times of two adjacent video download data is greater than the interval threshold, the one of the two video download data that is positioned at the front in the sorting order is the delimiting data,
dividing a plurality of downloading sets according to the positions of the demarcation data in the sorting order, wherein the video downloading data included in one downloading set is the video downloading data between two adjacent demarcation data in the sorting order and the demarcation data positioned at the back of the two adjacent demarcation data in the sorting order,
setting the download set where the target data is located as a center set, wherein the download set located in front of the center set in the sort order is a reference set, the download set located behind the center set in the sort order is an influence set,
analyzing the center set, the reference set and the influence set, judging the type of the target data,
if the target data is the first data, pushing inquiry information about whether the target data is deleted or not to a user;
and if the target data is the second data, directly deleting the target data.
Further, the analyzing the center set, the reference set, and the influence set includes:
if the number of impact sets is less than the impact threshold, then the target data is the first data,
otherwise, acquiring the condition that each video downloading data is effectively triggered, if the time interval between the latest effective triggering time of a certain video downloading data and the current time is less than or equal to the time threshold, then the video downloading data is the preferred data, wherein, the certain video downloading data is opened by the user for watching at a certain time, and the certain video downloading data is effectively triggered at the time,
calculating a reference index for a set of centers
Where m is the number of reference sets, C
i The number of data downloaded for the video in the ith reference set,
F
i downloading data for the video in the ith reference set as the preferred number of data, H
i The number of data downloaded for the video in the ith reference set,
calculating a first index P = u/v of the center set, wherein u is the number of preferred data in the center set, and v is the number of video downloading data in the center set;
if the first index of the central set is less than the reference index, then the target data is the first data,
otherwise, analyzing the condition that the video of the reference set is effectively triggered.
Further, the analyzing the video of the reference set that is effectively triggered includes:
if a certain preferred data in the reference set is effectively triggered within the last preset time period after a certain video download data in the influence set is effectively triggered, the preferred data is the concerned data, the concerned data which is effectively triggered is the passive data,
if the continuous number of the video download data in the influence set before being effectively triggered when certain concerned data is passive data is the influence factor of the passive data at the time, calculating the passive index of the certain concerned data
Wherein e is the number of times that the concerned data is passive data in the latest preset time period, N is the number of times that the concerned data is effectively triggered in the latest preset time period, w is the average of the influence factors when the concerned data is passive data in the latest preset time period,
passive index of interest for preferred data of reference set
Wherein S is the number of concerned data in the preferred data, T
x The average value of the passive indexes of all the concerned data is shown, and R is the number of the preferred data;
the target data is the first data if the interest passivity index of the preferred data of the reference set is less than the passivity threshold, otherwise the target data is the second data.
Further, the time when the certain video download data is triggered last time includes:
if the video download data is viewed by the user, the time when the video download data was last triggered is the time when the video download data was last viewed,
otherwise, the time when the video download data is triggered last time is the time when the video download data is downloaded.
Compared with the prior art, the invention has the following beneficial effects: the invention analyzes the video which is not watched for a long time, judges the probability condition that the follow-up user watches the video again, and directly deletes the video under the condition of low probability, thereby reducing the occupation of idle video data to the storage space of the computer, ensuring the normal operation of the computer and improving the operation efficiency of the computer.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides the following technical solutions: a computer data processing system based on big data analysis comprises a storage space analysis module, a target data selection module and a state information analysis module, wherein the storage space analysis module acquires video download data stored by a current computer as analysis data, if the storage space occupied by the analysis data is larger than a storage space threshold value, the target data selection module is enabled to analyze the analysis data, if the time interval duration between the latest triggered time of a certain video download data and the current time in the analysis data is larger than a duration threshold value, the video download data is the target data, and the state information analysis module analyzes the state information of the target data and judges whether the target data is deleted.
The state information analysis module comprises a sorting module, a demarcation data selection module, a set dividing module, a set analysis module and a deletion control module, wherein the sorting module acquires download initiating time of video download data downloaded by a computer history, the video download data are sequentially sorted from front to back according to the download initiating time to obtain classification sorting, the demarcation data selection module selects the demarcation data in the classification sorting, if the time interval between the download initiating times of two adjacent video download data in the classification sorting is greater than an interval threshold value, the first demarcation data of the two video download data in the classification sorting is the demarcation data, the set dividing module divides a plurality of download sets according to the positions of the demarcation data in the classification sorting, wherein the video download data in one download set is the video download data between two adjacent demarcation data in the sorting and the demarcation data behind the two adjacent demarcation data in the classification sorting, the set analysis module sets the download set of target data in the sorting as a central set, wherein the download set in the classification sorting is a reference set, the classification set in the front of the central set affects the central set to the download set, and judges whether the target data are the download set which affects the download set and delete control module, and whether the target data are the target data, and the target data are the type of the deletion module; and when the target data is the second data, directly deleting the target data.
The set analysis module comprises an influence threshold comparison module, a preferred data selection module, a reference index calculation module, a first index comparison module and an effective analysis module, wherein the influence threshold comparison module is used for counting the number of influence sets, if the number of the influence sets is smaller than the influence threshold, target data is first data, otherwise, the preferred data selection module obtains the condition that each piece of video downloading data is effectively triggered, if the time interval duration between the latest effective triggering time of a certain piece of video downloading data and the current time is smaller than or equal to a duration threshold, the video downloading data is preferred data, wherein the certain piece of video downloading data is opened by a user and viewed by the user, the video downloading data is effectively triggered at the time, and the reference index calculation module calculates the reference index of a central set
Where m is the number of reference sets, C
i The number of data downloaded for the video in the ith reference set,
F
i number of preferred data, H, for video download in ith reference set
i The first index calculation module calculates a first index P = u/v of the center set for the number of the video download data in the ith reference set, wherein u is the number of the preferred data in the center set, and v is the number of the video download data in the center set, the first index comparison module compares the first index of the center set with the reference index, if the first index of the center set is smaller than the reference index, the target data is the first data, and otherwise, the effective analysis module analyzes that the video of the reference set is effectively triggered.
The effective analysis module comprises a passive data judgment module, a passive index calculation module and an attention passive index calculation module, wherein the passive data judgment module is used for judging whether a certain time of certain preferred data in a reference set is effectively triggered within a latest preset time period under the condition that certain video downloading data in an influence set is effectively triggered, the certain preferred data is the attention data, the attention data effectively triggered at the time is the passive data, the continuous number of the video downloading data in the influence set before the certain time of the attention data is effectively triggered is the influence factor of the time of the passive data, and the passive index of the certain attention data is calculated
Wherein e is the number of times that the concerned data is passive data in the latest preset time period, N is the number of times that the concerned data is effectively triggered in the latest preset time period, w is the average number of influence factors when the concerned data is passive data in the latest preset time period, and the concerned passive index calculation and comparison module calculates the concerned passive index of the preferred data of the reference set
Wherein S is the number of concerned data in the preferred data, T
x And R is the number of the preferred data, if the attention passive index of the preferred data of the reference set is smaller than a passive threshold value, the target data is the first data, otherwise, the target data is the second data.
The data processing system adopts a data processing method, and the data processing method comprises the following steps:
acquiring the video download data stored in the current computer as analysis data, when the storage space occupied by the analysis data is larger than the storage space threshold value,
if the time interval between the latest triggered time of a certain video downloading data and the current time in the analysis data is greater than a time threshold, the video downloading data is the target data, wherein the latest triggered time of the certain video downloading data comprises: if the video download data is watched by the user, the latest triggered time of the video download data is the latest watched time of the video download data by the user, otherwise, the latest triggered time of the video download data is the downloading finish time of the video download data; if a certain video download data is not opened for viewing since being downloaded, or is opened for viewing long before, it is likely that the user does not need to use the video download data;
and analyzing the state information of the target data and judging whether the target data needs to be deleted or not.
The state information of the analysis target data includes:
acquiring download initiation time of video download data downloaded by computer, sorting the video download data in sequence from front to back according to the download initiation time to obtain sort order,
in the sorting order, if the time interval between the download initiation times of two adjacent video download data is greater than the interval threshold, the one of the two video download data that is located at the front in the sorting order is the delimiting data, such as the sorting order is: the method comprises the following steps that 1, 2, 3 and 4 videos are displayed, wherein the time interval between download initiating times of the 1 and 2 videos is smaller than an interval threshold, the time interval between download initiating times of the 2 and 3 videos is smaller than the interval threshold, the time interval between download initiating times of the 3 and 4 videos is larger than the interval threshold, and the 3 videos are demarcation data;
dividing a plurality of downloading sets according to the positions of the demarcation data in the sorting order, wherein the video downloading data included in one downloading set is the video downloading data between two adjacent demarcation data in the sorting order and the demarcation data positioned at the back of the two adjacent demarcation data in the sorting order,
setting a download set where the target data is located as a central set, where a download set located in front of the central set in the classification ordering is a reference set, and a download set located behind the central set in the classification ordering is an influence set, for example, the classification ordering is: video 1, video 2, video 3, video 4, video 5, video 6, video 7, video 8, video 9, the target data is video 6, the delimiting data is video 3, video 6 respectively,
video 1, video 2, video 3 are a download set, video 4, video 5, video 6 are a download set, video 7, video 8, video 9 are a download set,
then video 1, video 2, video 3 are reference sets, video 4, video 5, video 6 are center sets, video 7, video 8, video 9 are influence sets,
analyzing the center set, the reference set and the influence set, judging the type of the target data,
if the target data is the first data, the probability that the target data is used by the user in the later period is high, and therefore inquiry information about whether the target data is deleted is sent to the user;
and if the target data is the second data, the probability that the target data is used by the user in the later period is very small, and the target data is directly deleted.
The analyzing the center set, the reference set and the influence set comprises:
if the number of influence sets is less than the influence threshold, which indicates that the newly downloaded video download data is less, the user may remember the target data, and may possibly use the video download data, then the target data is the first data,
otherwise, acquiring the condition that each video downloading data is effectively triggered, if the time interval between the latest effective triggering time of a certain video downloading data and the current time is less than or equal to the time threshold, the video downloading data is the preferred data, wherein the video downloading data is effectively triggered when the certain video downloading data is opened and viewed by the user at a certain time,
calculating a reference index for a set of centers
Wherein m is the number of reference sets, C
i The number of data downloaded for the video in the ith reference set,
F
i number of preferred data, H, for video download in ith reference set
i The number of data downloaded for the video in the ith reference set,
calculating a first index P = u/v of the center set, wherein u is the number of preferred data in the center set, and v is the number of video downloading data in the center set;
if the first index of the center set is smaller than the reference index, the user can watch the previously downloaded data even if new video download data are stored in the computer, and the first index of the center set is smaller than the reference index, the probability that the user watches the video download data of the center set at a later stage is higher, the target data is the first data, the possibility that the user watches the target data is judged according to the watching situation of the previously video download data, and if the user still often watches the previously video download data, the possibility that the user watches the target data is higher, so that the user needs to be asked whether to delete the data to prevent mistaken deletion; the more data in the reference set, the more the contribution of the analysis should be to his reference, andto pass through
As a weight, the rationality of the reference index is improved, so that the judgment accuracy is improved; in practice, a threshold value may also be set according to the reference index, and if the first index of the central set is smaller than the threshold value, the target data is the first data;
otherwise, analyzing the condition that the video of the reference set is effectively triggered; the analyzing the video of the reference set for effective triggering comprises:
if a certain preferred data in the reference set is effectively triggered for a certain time within the last preset time period after a certain video download data in the influence set is effectively triggered, the preferred data is the attention data, the attention data which is effectively triggered for the time is passive data, for example, video 1, video 2 and video 3 are reference sets, video 7, video 8 and video 9 are influence sets, if a certain time within the last preset time period is that video 2 is seen after video 7 and video 9 are seen, video 2 is the attention data, and video 2 is passive data, if no video in any influence set is seen before video 2 is seen within the last preset time period, video 2 is not passive data;
if the continuous number of the video download data in the influence set before being effectively triggered when certain concerned data is passive data is the influence factor of the passive data at the time, calculating the passive index of the certain concerned data
Wherein e is the number of times that the data of interest is passive data in the last preset time period, N is the total number of times that the data of interest is effectively triggered in the last preset time period, w is the average of the influence factors when the data of interest is passive data in the last preset time period, for example, the video 2 is watched 3 times in the last time period, where two times the video 2 is watched after the influence set is watched, e =2, N =3, and when e/N is smaller, it indicates that the number of times that a certain data of interest is actively watched is the sameThe comparison is more, the active viewing of the user is stronger, when w is larger, it indicates that the user views a lot of videos influencing the concentration and then views the videos of the reference set, and when w is smaller, it indicates that the user easily thinks that the user views the videos of the reference set, so that the passive index is smaller, the initiative of the user in viewing the videos of the reference set is stronger, and the viewing probability is higher;
passive index of interest for preferred data of reference set
Wherein S is the number of concerned data in the preferred data, T
x The average value of the passive indexes of all the concerned data is shown, and R is the number of the preferred data;
the smaller the preferred data, the more the actively viewed video download data is, i.e., the preferred data is
The smaller the probability, the greater the probability that the user will actively watch the target data subsequently;
if the attention passivity index of the preferred data of the reference set is smaller than a passivity threshold, the smaller the attention passivity index is, the higher the probability that the user actively watches the previous video is, the target data is the first data, otherwise, the target data is the second data.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.