CN112818032B

CN112818032B - Data screening method and data analysis server for serving big data mining analysis

Info

Publication number: CN112818032B
Application number: CN202110104856.3A
Authority: CN
Inventors: 龚世燕
Original assignee: Sino Parsons Technology Beijing Co ltd
Current assignee: SINO-PARSONS TECHNOLOGY (BEIJING) Co.,Ltd.
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2022-03-01
Anticipated expiration: 2041-01-26
Also published as: CN112818032A; CN114595367A

Abstract

The data screening method and the data analysis server for big data mining analysis of the embodiment of the application can judge whether the candidate business big data is the business big data with potential value or not by analyzing the business big data (namely the reference business big data) before the candidate business big data without acquiring excessive reference business big data for data value analysis, and only the reference business big data and the candidate business big data need to be processed, so the data processing pressure of the data analysis server can be reduced, the credibility and the efficiency of screening the business big data with potential value can be improved as much as possible based on a time sequence level, and the technical problems of complicated processing process, poor accuracy and low credibility of judging whether the business big data is the business big data with potential value in the related technology can be solved, and further, the technical effects of improving the efficiency and the reliability of judging and screening the business big data with potential value are achieved.

Description

Data screening method and data analysis server for serving big data mining analysis

Technical Field

The application relates to the technical field of big data mining analysis, in particular to a data screening method and a data analysis server for serving big data mining analysis.

Background

Big Data (Big Data) is a product of growth promotion in the process of high-speed development of information technology, and in a popular way, the appearance of Big Data can be understood as a direct expression that the capacity of technical personnel for storing Data and the capacity for using Data are improved. Besides the huge data volume, the data value of the big data is also continuously changed, that is, the big data corresponds to the big value.

At present, with the application of a new generation of information technology, applications such as Mobile Internet, Internet of Things (IOT), Social Networking Service (SNS), digital home (home Network), and Electronic Commerce (Electronic Commerce) are continuously generating big data, which can provide very useful services for subsequent work and life, and thus, it is very necessary to perform data mining and analysis on the big data.

The related big data mining technology is generally to collect related big data from a data server or a service platform and then mine the data, but with the continuous expansion of the big data scale, a considerable part of big data may be duplicated, which may cause the mining of some duplicated or highly similar big data, thereby reducing the data mining efficiency, therefore, how to realize the screening of valuable big data to improve the data mining efficiency and meet the real-time big data mining service requirement is a technical problem that needs to be considered at present.

Disclosure of Invention

In order to solve the above technical problem, the embodiments of the present application provide the following solutions.

One of the embodiments of the present application provides a data screening method for serving big data mining analysis, including:

receiving a data screening instruction, wherein the data screening instruction is used for instructing to screen business big data with potential value;

responding to the data screening indication, and acquiring a section of data flow which is uninterrupted in time sequence and has a time sequence interval of T as candidate service big data by adopting a data interception thread with the time sequence interval of T in platform service big data which is collected by a target cloud service platform in a first service processing time period, wherein the duration of the first service processing time period is a preset time length value, and T is a positive integer;

and judging whether the candidate service big data is service big data with potential value or not based on reference service big data, wherein the reference service big data is platform service big data collected by the target cloud service platform in a second service processing period, the failure time of the second service processing period is not later than the activation time of the first service processing period, and the duration of the second service processing period is greater than or equal to the preset duration value.

Optionally, the determining whether the candidate service big data is service big data with potential value based on the reference service big data includes:

determining that the candidate service big data is not the service big data with potential value on the premise that a first user portrait updating state of the platform service big data in the candidate service big data in the first service processing period is highly correlated with a second user portrait updating state of the platform service big data in the reference service big data in the second service processing period;

and on the premise that the first user portrait updating state of the platform service big data in the candidate service big data in the first service processing period is not highly correlated with the second user portrait updating state of the platform service big data in the reference service big data in the second service processing period, determining that the candidate service big data is the service big data with potential value.

Optionally, the number of the reference service big data is n, where determining whether the candidate service big data is service big data with potential value based on the reference service big data includes:

on the premise that a first user portrait updating state of platform service big data in the candidate service big data in the first service processing period is highly correlated with a second user portrait updating state of platform service big data in at least m reference service big data in the second service processing period, determining that the candidate service big data is not service big data with potential value, wherein the integer m is not more than the integer n;

determining the candidate service big data as service big data with potential value on the premise that the first user portrait updating state is not highly correlated with the second user portrait updating state of the platform service big data in at least m reference service big data.

Optionally, it is determined whether the first user profile update status of the platform service big data in the candidate service big data and the second user profile update status of the platform service big data in the reference service big data are highly correlated according to the following manner:

acquiring platform service big data of t first effective service time sequence nodes in the candidate service big data, and acquiring platform service big data of t second effective service time sequence nodes in the reference service big data, wherein the first service processing time period is a service processing time period in a first cloud service activation period, the second service processing time period is a service processing time period in a second cloud service activation period, time sequence position information of the first effective service time sequence nodes in the first cloud service activation period is the same as time sequence position information of a corresponding one of the second effective service time sequence nodes in the second cloud service activation period, and t is an integer greater than 1;

and determining whether the first user portrait updating state of the platform service big data in the candidate service big data is highly correlated with the second user portrait updating state of the platform service big data in the reference service big data or not according to the platform service big data of the t first effective service time sequence nodes and the platform service big data of the t second effective service time sequence nodes.

Optionally, determining whether the first user portrait update status of the platform service big data in the candidate service big data and the second user portrait update status of the platform service big data in the reference service big data are highly correlated according to the platform service big data of the t first effective service time sequence nodes and the platform service big data of the t second effective service time sequence nodes includes:

on the premise that the number of user portrait data sets with portrait category relevancy in a first relevancy interval in t first user portrait data sets reaches k, determining that the first user portrait updating state of platform service big data in the candidate service big data is highly correlated with the second user portrait updating state of platform service big data in the reference service big data, wherein the first user portrait data set is a user portrait data set formed by combining a first map data track segment, a second map data track segment, a third map data track segment and a fourth map data track segment, the first map data track segment is a map data track between a third effective service time sequence node and a fourth effective service time sequence node on the first map data track, and the second map data track segment is a fifth effective service time sequence node and a second map data track on the first map data track A graph data track between nodes, where the third graph data track segment is a graph data track between the fifth effective service timing node and a sixth effective service timing node on the second graph data track, the fourth graph data track segment is a graph data track between the third effective service timing node and the sixth effective service timing node on the second graph data track on the first graph data track, the first graph data track is used to represent platform service big data on a plurality of effective service timing nodes in the first cloud service activation period included in the candidate service big data, the second graph data track is used to represent platform service big data on a plurality of effective service timing nodes in the second cloud service activation period included in the reference service big data, and the t first effective service timing nodes include the third effective service timing node and the fourth effective service timing node without interruption The t second effective service time sequence nodes comprise a fifth effective service time sequence node and a sixth effective service time sequence node which are uninterrupted, the time sequence position information of the fifth effective service time sequence node in the second cloud service activation period is the same as the time sequence position information of the third effective service time sequence node in the first cloud service activation period, the time sequence position information of the sixth effective service time sequence node in the second cloud service activation period is the same as the time sequence position information of the fourth effective service time sequence node in the first cloud service activation period, and the integer k is not more than the integer t;

and on the premise that the number of the user image data sets with the image category correlation degrees in the first correlation degree interval in the t first user image data sets is not more than k, determining that the first user image updating state of the platform service big data in the candidate service big data is not highly correlated with the second user image updating state of the platform service big data in the reference service big data.

determining that the first user portrait updating state of the platform service big data in the candidate service big data is highly correlated with the second user portrait updating state of the platform service big data in the reference service big data on the premise that the number of seventh effective service time sequence nodes in the t first effective service time sequence nodes reaches i, wherein a difference of portrait category correlation degrees between the platform service big data of the seventh effective service time sequence node and the platform service big data of an eighth effective service time sequence node in the t second effective service time sequence nodes is within a second correlation degree interval, and the time sequence position information of the eighth effective service time sequence node in the second cloud service activation period is the same as the time sequence position information of the seventh effective service time sequence node in the first cloud service activation period, the integer i is not more than the integer t;

and on the premise that the number of the seventh effective service time sequence nodes in the t first effective service time sequence nodes is not more than i, determining that the first user portrait updating state of the platform service big data in the candidate service big data is not highly correlated with the second user portrait updating state of the platform service big data in the reference service big data.

determining that there is no high correlation between the first user portrait update status of the platform service big data in the candidate service big data and the second user portrait update status of the platform service big data in the reference service big data on the premise that there are at least i uninterrupted seventh effective service time sequence nodes in the t first effective service time sequence nodes, wherein a difference between portrait category correlations between the platform service big data of the seventh effective service time sequence node and platform service big data of an eighth effective service time sequence node in the t second effective service time sequence nodes is not within a second correlation interval, and time sequence position information of the eighth effective service time sequence node in the second cloud service activation period is the same as time sequence position information of the seventh effective service time sequence node in the first cloud service activation period, the integer i is not more than the integer t;

determining that the first user portrait update status of the platform service big data in the candidate service big data is highly correlated with the second user portrait update status of the platform service big data in the reference service big data on the premise that the number of uninterrupted seventh effective service time sequence nodes in the t first effective service time sequence nodes is less than i;

determining that the candidate service big data is service big data with potential value comprises the following steps:

and determining that the platform service big data positioned on the uninterrupted seventh effective service time sequence node in the candidate service big data is service big data with potential value.

Optionally, the obtaining of candidate service big data includes: acquiring platform service big data collected in a first service processing period in a first cloud service activation period as the candidate service big data, wherein the time sequence position information of the activation time of the first service processing period in the first cloud service activation period is the same as the time sequence position information of the activation time of the second service processing period in the second cloud service activation period, and the time sequence position information of the failure time of the first service processing period in the first cloud service activation period is the same as the time sequence position information of the failure time of the second service processing period in the second cloud service activation period.

Optionally, the method further includes:

responding to a data mining request sent by a target service terminal, performing data mining on the candidate service big data based on a preset convolutional neural network to obtain a data mining result based on the user interest tendency, and feeding the data mining result back to the target service terminal; the target service terminal is a terminal corresponding to the service provider platform;

the method for responding to the data mining request sent by the target service terminal and performing data mining on the candidate service big data based on the preset convolutional neural network to obtain a data mining result based on the user interest tendency comprises the following steps:

acquiring data characteristic content to be subjected to data mining corresponding to the candidate service big data based on a service requirement label in the data mining request, inputting the data characteristic content to be subjected to data mining into an updated data characteristic identification degree analysis model for analysis to obtain a current data characteristic identification degree, wherein the updated data characteristic identification degree analysis model is a convolutional neural network model obtained after an initial data characteristic identification degree analysis model is subjected to iterative updating;

determining a user interest heat value interval corresponding to a target data feature identification degree corresponding to the current data feature identification degree from a user interest heat value interval corresponding to an updated data feature identification degree, wherein the user interest heat value interval corresponding to the updated data feature identification degree has a corresponding relation with an associated data feature identification degree, the associated data feature identification degree is determined according to an interest tendency label of the user interest heat value interval corresponding to an initial data feature identification degree associated with the user interest heat value interval corresponding to the updated data feature identification degree, a configuration strategy of the user interest heat value interval corresponding to the updated data feature identification degree is determined according to statistical result information of the initial data feature identification degree and the updated data feature identification degree, and the initial data feature identification degree is obtained by inputting a preset data feature content sample into the initial data feature identification degree analysis model, the updated data feature identification is obtained by inputting the preset data feature content sample into the updated data feature identification analysis model;

determining the target associated data feature identification degree corresponding to the user interest heat value interval corresponding to the target data feature identification degree according to the corresponding relation;

and determining a data mining result based on the user interest tendency corresponding to the data feature content to be subjected to data mining based on the global interest tendency content set corresponding to the target associated data feature identification degree and the preset associated data feature identification degree.

One of the embodiments of the present application provides a data analysis server, which includes a processing engine, a network module and a memory; the processing engine and the memory communicate through the network module, and the processing engine reads the computer program from the memory and operates to perform the above-described method.

In the embodiment of the invention, when the data screening instruction is obtained, candidate service big data to be subjected to data screening can be obtained, wherein the candidate service big data is platform service big data collected by a target cloud service platform in a first service processing period, and the duration of the first service processing period is a preset duration value; then, whether candidate service big data is service big data with potential value or not is judged based on the reference service big data, the reference service big data is platform service big data collected by the target cloud service platform in the second service processing period, in other words, whether the candidate service big data is service big data with potential value or not can be judged by analyzing the service big data (namely the reference service big data) before the candidate service big data, excessive reference service big data for data value analysis is not required to be obtained, and only the reference service big data and the candidate service big data need to be processed, so that the data processing pressure of a data analysis server can be reduced, the credibility and the efficiency of screening the service big data with potential value can be improved as much as possible based on a time sequence level, and the problem that the processing process of judging whether the service big data is the service big data with potential value or not in the related technology is complicated, and the problem that the processing process of judging whether the service big data is the service big data with potential value or not is complicated, The method has the technical problems of poor accuracy and low reliability, and further achieves the technical effects of improving the efficiency and the reliability of judging and screening the business big data with potential value. In addition, the screening of the business big data with the potential value is carried out fully according to the business processing time interval, so that the screened business big data with the potential value can meet the real-time big data mining business requirement, and accurate and reliable big data raw materials are provided for subsequent big data mining analysis.

In the description that follows, additional features will be set forth, in part, in the description. These features will be in part apparent to those skilled in the art upon examination of the following and the accompanying drawings, or may be learned by production or use. The features of the present application may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations particularly pointed out in the detailed examples that follow.

Drawings

The present application will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

FIG. 1 is a block diagram illustrating an exemplary data screening system serving large data mining analysis, in accordance with some embodiments of the present invention;

FIG. 2 is a flow diagram illustrating an exemplary data screening method and/or process for serving large data mining analysis, according to some embodiments of the invention;

FIG. 3 is a block diagram illustrating an exemplary data screening apparatus serving large data mining analysis in accordance with some embodiments of the present invention; and

FIG. 4 is a diagram illustrating the hardware and software components of an exemplary data analysis server, according to some embodiments of the invention.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only examples or embodiments of the application, from which the application can also be applied to other similar scenarios without inventive effort for a person skilled in the art. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flow charts are used herein to illustrate operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

According to an aspect of embodiments of the present invention, there is provided a method embodiment of a data screening method for serving big data mining analysis.

Optionally, in this embodiment, the data screening method for serving big data mining analysis described above may be applied to a hardware environment formed by the data analysis server 110 and the big data service device 120 as shown in fig. 1. As shown in fig. 1, the data analysis server 110 is connected to the big data business device 120 through a communication network, which may be used to provide various big data business services (such as enterprise business services, online entertainment business services, e-commerce shopping business services, etc.) for the big data business device 120 or an application client installed on the big data business device 120, and a relational database 130 may be provided on the data analysis server 110 or separately from the data analysis server 110, and is used to provide data storage services for the data analysis server 110, where the communication network includes but is not limited to: the wide area network, the metropolitan area network, or the local area network, and the big data service device 120 are not limited to a PC, a mobile phone, a tablet computer, and the like.

The data screening method for serving big data mining analysis according to the embodiment of the present invention may be performed by the data analysis server 110. FIG. 2 is a flow diagram of an alternative data screening method for serving large data mining analysis, which may include the following steps, as shown in FIG. 2, according to an embodiment of the invention.

Step S202, the data analysis server receives a data screening instruction, and the data screening instruction is used for instructing to screen the service big data with potential value.

Step S204, responding to the data screening indication, the data analysis server obtains candidate service big data, wherein the candidate service big data is platform service big data collected by the target cloud service platform in a first service processing time period, and the duration of the first service processing time period is a preset time value.

Optionally, the candidate service big data may be obtained in the form of a data interception thread (e.g., a data crawler), and the service big data in a fixed time period may be selected by the data interception thread, for example, a data stream with a time sequence interval of size T may be selected by the data interception thread with a time sequence interval of size T and without interruption in time sequence. After the time sequence interval of the data interception thread is selected, the time sequence interval is kept unchanged, and the service big data in a fixed time period can be selected only by changing the initial time sequence position information of the data interception thread. In this embodiment, the time interval size may be understood as a time duration, for example, if T is 10, the time duration may be 10s, or 10min, or 10 h.

The platform service big data is service big data collected in the operation process of the cloud service platform, the types of the cloud service platform include, but are not limited to, an enterprise service cloud service platform, an online shopping cloud service platform, a scientific research cloud service platform, an industrial internet cloud service platform, a remote office cloud service platform, and the like.

Step S206, the data analysis server judges whether the candidate service big data is service big data with potential value or not based on the reference service big data, the reference service big data is platform service big data collected by the target cloud service platform in a second service processing time period, the failure time of the second service processing time period is not later than the activation time of the first service processing time period, and the duration of the second service processing time period is more than or equal to a preset duration value.

The candidate service big data and the reference service big data can be recorded in a Time sequence (Time series, event sequence) form, the Time sequence refers to a data stream formed by arranging the data content of the same service big data (such as the platform service big data) according to the generated Time sequence, and the future or present candidate service big data can be judged according to the existing historical service big data (reference service big data) by adopting the Time sequence form, so as to judge whether the candidate service big data and the reference service big data are data with potential value. The business big data with the potential value is business big data which can be used for subsequent big data mining analysis, the business big data with the potential value contains more user portrait information and potential interest tendency information of the user, and the business big data with the potential value can be better served to the user and a service provider by data mining. Of course, in the process of data mining of the business big data with potential value, the individual privacy of the user needs to be ensured, so the business big data with potential value should not contain data information too private for the user.

In the above embodiment, the data screening method for serving big data mining analysis according to the embodiment of the present invention is performed by the data analysis server 110 as an example, the data screening method for serving big data mining analysis according to the embodiment of the present invention may also be performed by the big data service device 120, the difference from the above-mentioned scheme is that the execution subject is replaced by the data analysis server 110 for the big data service device 120, and can also be executed by the data analysis server 110 and the big data service device 120 together, if the big data service device 120 provides candidate service big data to the data analysis server 110 to determine whether it has potentially valuable service big data, or the data analysis server 110 provides the reference business big data to the big data business device 120 to determine whether the candidate business big data has potentially valuable business big data. The big data service device 120 may perform the data screening method for serving big data mining analysis according to the embodiment of the present invention, or may perform the data screening method by an application client installed thereon, which is not limited herein.

Through the steps S202 to S206, when a data screening instruction is obtained, candidate service big data to be subjected to data screening may be obtained, where the candidate service big data is platform service big data collected by a target cloud service platform in a first service processing period, and a duration of the first service processing period is a preset duration value; then, whether candidate service big data is service big data with potential value or not is judged based on the reference service big data, the reference service big data is platform service big data collected by the target cloud service platform in the second service processing period, in other words, whether the candidate service big data is service big data with potential value or not can be judged by analyzing the service big data (namely the reference service big data) before the candidate service big data, excessive reference service big data for data value analysis is not required to be obtained, and only the reference service big data and the candidate service big data need to be processed, so that the data processing pressure of a data analysis server can be reduced, the credibility and the efficiency of screening the service big data with potential value can be improved as much as possible based on a time sequence level, and the problem that the processing process of judging whether the service big data is the service big data with potential value or not in the related technology is complicated, and the problem that the processing process of judging whether the service big data is the service big data with potential value or not is complicated, The method has the technical problems of poor accuracy and low reliability, and further achieves the technical effects of improving the efficiency and the reliability of judging and screening the business big data with potential value.

It can be understood that the clustering algorithm Kmeans or other clustering algorithms based on multidimensional features divide the business big data that can be classified into one class by means of clustering, and the outlier is discarded as the business big data that does not meet the requirement, this method is usually a method for analyzing the large-scale business big data, and the business big data processed by these clustering algorithms are not usually formed based on time series (i.e. not data flow), and in colloquial, the business big data processed by the clustering algorithms are not time-dimensional. Therefore, the common clustering algorithm is used for realizing the clustering of the service big data and screening the service big data with potential value, so that a great amount of calculation pressure is brought to the data analysis server, and the running speed of the data analysis server is possibly slowed. Based on the above, the scheme provides a method for determining under what condition the business big data can be identified as the business big data with the potential value by adopting the portrait category correlation degree calculation method on the premise that the time sequence interval size based on the data interception thread is used as the business big data processing index, so that the data processing pressure of the data analysis server can be reduced, the reliability and the efficiency of screening the business big data with the potential value can be improved as much as possible based on the time sequence layer, the technical problems that the processing process for judging whether the business big data is the business big data with the potential value in the related technology is complicated, the accuracy is poor and the reliability is low can be solved, and the technical effects of improving the efficiency and the reliability of judging and screening the business big data with the potential value are achieved. The technical solution of the present application is further detailed below with reference to the steps shown in fig. 2.

In the technical solution provided in step S202, various scientific and technological service products such as APPs, online cloud service platforms, and the like usually record service behavior data or service feedback data of a service user or a transactor, and these data can become important basis for a big data mining party to measure service product operation data, value of user side service data, and associated data information, for example, in a remote office cloud service platform, the big data mining party can analyze data including service operation data, office software response data, office network status data, and the like, a data screening instruction can be triggered during analysis, the data analysis server 110 obtains the triggered data screening instruction, screens service big data with potential value according to the data screening instruction, thereby discovering service big data with potential value, and providing as complete as possible, for subsequent big data mining, Authentic data material.

For example, the service coverage index corresponding to the service operation data of a certain service user is maintained at about index50 in the last t hours. And at the time t +1, the service range coverage index corresponding to the service operation data reaches index 55. Then whether index55 pertains to a user representation update of potential value, the present invention can be used to identify whether the service user has a user representation update of value. The valuable user portrait updating means that a service big data sequence can be used for providing more valuable user portrait information, and the scheme provided by the application can be used for screening service big data with potential value and performing subsequent big data service mining.

In the technical solution provided in step S204, in response to the data screening instruction, the data analysis server 110 obtains candidate service big data, where the candidate service big data is platform service big data collected by the target cloud service platform in a first service processing period, and a duration of the first service processing period is a preset duration value.

Optionally, in order to completely and accurately determine whether the candidate service big data is service big data with potential value, the time sequence interval size of the candidate service big data may be smaller than or equal to the time sequence interval size of the reference service big data, for example, different service big data acquisition cycles may be determined, the different service big data acquisition cycles may include different time points, taking a service big data acquisition cycle as 1 hour as an example, each service big data acquisition cycle may have 60 minutes, or for example, service big data between 15 th minute and 45 th minute is acquired, the time sequence interval size of the candidate service big data is smaller than or equal to the time sequence interval size of the reference service big data, then the activation time of the candidate service big data in the first service big data acquisition cycle period-1 should be greater than or equal to the activation time of the reference service big data in the second service big data acquisition cycle period-2, if the activation time of the candidate service big data is 15 th minute or 18 th minute, the failure time of the candidate service big data in the first service big data acquisition period-1 should be less than or equal to the failure time of the reference service big data in the second service big data acquisition period-2, for example, the failure time of the candidate service big data is 40 th minute or 45 th minute. Generally, the traffic big data collection period may also be in units of days or weeks, and in some special cases, may be in units of seconds, which is not limited herein.

In order to improve the efficiency of judging whether candidate service big data is service big data with potential value or not and reduce computer resources consumed by service big data judgment, the size of a time sequence interval of the candidate service big data can be set to be equal to that of a time sequence interval of reference service big data, so that when the candidate service big data is obtained, platform service big data collected in a first service processing period in a first cloud service activation period cycle-1 can be obtained to be the candidate service big data, the time sequence position information of the activation time of the first service processing period in the first cloud service activation period cycle-1 is the same as the time sequence position information of the activation time of a second service processing period in a second cloud service activation cycle-2, and is the same as the 15 th minute, and the time sequence position information of the failure time of the first service processing period in the first cloud service activation cycle-1 and the time of the failure time of the second service processing period in the second cloud service activation cycle-2 The timing position information within the activation cycle-2 is the same as for the 45 th minute. In this embodiment, the time-series position information may be understood as relative position information corresponding to a certain time within a certain time period, for example, a time period of 10 seconds is taken as an example, the time-series position information corresponding to the 1 st second may be 1, the time-series position information corresponding to the 5 th second may be 5, and of course, the time-series position information may also be expressed in other manners, for example, a time period of 10 seconds is taken as an example, the time-series position information corresponding to the 1 st second may be 10, the time-series position information corresponding to the 8 th second may be 3, that is, the time-series position information may be understood as one record form of the relative time, which is not limited herein.

The method for acquiring candidate service big data can be acquired by a data interception thread (such as a data crawler s), the size of a time sequence interval of the data interception thread can be v, the size of the time sequence interval of the service big data (namely the candidate service big data) acquired each time can be (s + 2), the service big data with the size of one unit time sequence interval can be acquired forward from initial time sequence position information corresponding to the data interception thread, and the service big data with the size of one unit time sequence interval can be acquired backward.

In the related technology, for business big data with potential value, algorithms such as clustering algorithm and the like are generally adopted, after clustering is carried out, clustered objects or noisy objects are classified into data with potential value, generally, the algorithms need a large amount of business big data and do not consider the action of time factors in the processing process, for a cloud business platform, often concerned business big data are generally the latest business big data samples with less data volume, and the business big data samples are generally related to the time factors, for example, the business big data with time sequence characteristics or time efficiency characteristics can be understood, and the requirement can be met by adopting the technical scheme of the application.

In the technical solution provided in step S206, the data analysis server 120 determines whether the candidate service big data is service big data with potential value based on the reference service big data, where the reference service big data is platform service big data collected by the target cloud service platform in a second service processing period, a failure time of the second service processing period is not later than an activation time of the first service processing period, and a duration of the second service processing period is greater than or equal to a preset duration value.

In the above technical solution, taking the reference service big data as an example, determining whether the candidate service big data is the service big data with potential value based on the reference service big data may include the following two aspects:

on the premise that a first user portrait updating state of platform service big data in candidate service big data in a first service processing period is highly correlated with a second user portrait updating state of platform service big data in reference service big data in a second service processing period, determining that the candidate service big data is not service big data with potential value, wherein the user portrait updating state can be understood as a user portrait updating trend;

on the other hand, on the premise that the first user portrait updating state of the platform service big data in the candidate service big data in the first service processing period is not highly correlated with the second user portrait updating state of the platform service big data in the reference service big data in the second service processing period, the candidate service big data is determined to be service big data with potential value.

In general, high correlation is also called high correlation, that is, when one row of variables changes, the probability that the other row of variables corresponding to the variable increases (or decreases) is very high. In this embodiment, the high correlation can be understood as the matching relationship between the updated states of different user images, if there is a high correlation between different user portrait update states, indicating that the difference between them is small, they may be considered similar or identical user portrait update states, and as such, the difference between the service big data corresponding to the user portrait updating state is relatively small, if the difference between the subsequent service big data (i.e. candidate service big data) and the previous service big data (i.e. reference service big data) is relatively small, it can be understood that the candidate service big data and the reference service big data are almost the same, since the reference business big data may have been subjected to data mining processing before, repeated mining may be caused if the data mining processing is continued on the candidate business big data. In addition, if there is no high correlation between the user portrait update states corresponding to the candidate service big data and the reference service big data, it is indicated that the candidate service big data and the reference service big data are different to some extent, and thus, the candidate service big data may have mining potential, and in this case, the candidate service big data may be determined as service big data with potential value. It can be understood that, in the embodiment, whether the candidate service big data is the service big data with potential value is determined based on the high correlation between the user portrait updating states, and the correlation between the data can be considered based on the time sequence level, that is, the candidate service big data and the reference service big data have at least a certain correlation even if there is no high correlation, so that it can be ensured that the data mining value corresponding to the candidate service big data is associated with the reference service big data, and thus, the global matching during the subsequent data service mining can be ensured.

In the above technical solution, the reference service big data may be multiple, for example, the reference service big data is n, and similarly to the previous embodiment, the determining whether the candidate service big data is the service big data with potential value based on the reference service big data may also include the following two aspects:

on the premise that a first user portrait updating state of platform service big data in candidate service big data in a first service processing period is highly correlated with a second user portrait updating state of platform service big data in at least m reference service big data in a second service processing period, in other words, as long as the number of user portrait updating trends of the platform service big data in the n reference service big data and the number of user portrait updating trends of the platform service big data in the candidate service big data are highly correlated reaches m, the candidate service big data can be determined not to be service big data with potential value, and the integer m is not greater than the integer n;

on the other hand, on the premise that the first user portrait updating state is not highly correlated with the second user portrait updating state of the platform service big data in the at least m pieces of reference service big data, in other words, if the user portrait updating tendency of the platform service big data in the n pieces of reference service big data is not highly correlated with the user portrait updating tendency of the platform service big data of the candidate service big data, or the number of the reference service big data highly correlated with the user portrait updating tendency of the platform service big data of the candidate service big data is less than m, the candidate service big data can be determined to be service big data with potential value.

In the foregoing two technical solutions, no matter the number of the reference service big data is one or more, there is a step of determining whether there is a high correlation between a first user portrait updating state of the platform service big data in the candidate service big data in the first service processing period and a second user portrait updating state of the platform service big data in the reference service big data in the second service processing period, in an embodiment of the present application, an optional implementation is provided, which may specifically include the following steps 1 to 2:

step 1, obtaining platform service big data of t first effective service time sequence nodes in candidate service big data, and obtaining platform service big data of t second effective service time sequence nodes in reference service big data, wherein the first service processing time interval is a service processing time interval in a first cloud service activation period, the second service processing time interval is a service processing time interval in a second cloud service activation period, time sequence position information of the first effective service time sequence nodes in the first cloud service activation period is the same as time sequence position information of a corresponding second effective service time sequence node in the second cloud service activation period, and t is an integer greater than 1.

For example, an optional way to select the first valid service timing node and the second valid service timing node may be selecting at equal time intervals, such as obtaining one valid service timing node every 2 hours; another optional way to select the second valid service timing node may be to select a node where the data content of the reference service big data corresponds to a user portrait update trend with frequent user portrait updates, such as a node changing from a first portrait track to a second portrait track, or a node changing from the second portrait track to the first portrait track, where the first portrait track and the second portrait track are different portrait tracks.

And 2, determining whether the first user portrait updating state of the platform service big data in the candidate service big data is highly correlated with the second user portrait updating state of the platform service big data in the reference service big data or not according to the platform service big data of the t first effective service time sequence nodes and the platform service big data of the t second effective service time sequence nodes.

Generally speaking, normal service operation of a service user basically does not cause user portrait update, and data content capable of affecting user portrait update corresponding to candidate service big data or reference service big data is corresponding amplification or loss of data content caused by abrupt change of service user number, in other words, a difference of portrait category correlation between data content of service big data corresponding to same sequence position information in different periods should be in accordance with the corresponding amplification or loss of data content, that is, should be in a certain correlation interval (i.e. a second correlation interval described below), for a selection mode of the first effective service time sequence node and a selection mode of the second effective service time sequence node, it can be determined according to the following mode whether user portrait update trends of the two modes are highly correlated:

on the premise that the number of seventh effective service time sequence nodes in t first effective service time sequence nodes reaches i, determining that a first user portrait updating state of platform service big data in candidate service big data and a second user portrait updating state of platform service big data in reference service big data are highly correlated, wherein the difference of portrait category correlation degrees between the platform service big data of the seventh effective service time sequence node and the platform service big data of an eighth effective service time sequence node in the t second effective service time sequence nodes is in a second correlation degree interval, time sequence position information of the eighth effective service time sequence node in a second cloud service activation period is the same as the time sequence position information of the seventh effective service time sequence node in the first cloud service activation period, and an integer i is not greater than an integer t;

and on the premise that the number of seventh effective service time sequence nodes in the t first effective service time sequence nodes is not more than i, determining that the first user portrait updating state of the platform service big data in the candidate service big data is not highly related to the second user portrait updating state of the platform service big data in the reference service big data.

Optionally, in the above embodiment, if the difference between the image category correlations between the platform service big data of the plurality of first valid service timing nodes and the platform service big data of the reference service big data point does not exist in the second correlation interval, it may be further determined that the time range in which the service big data with potential value in the candidate service big data1 exists is the service processing time period with potential value (i.e., the service processing time period in which the difference between the image category correlations between the platform service big data and the platform service big data of the reference service big data exists in the plurality of first valid service timing nodes that do not exist in the second correlation interval).

In other words, the above-described modes can be changed to:

on the premise that the number of uninterrupted seventh effective service time sequence nodes in t first effective service time sequence nodes is at least i, determining that the first user portrait updating state of the platform service big data in the candidate service big data is not highly correlated with the second user portrait updating state of the platform service big data in the reference service big data, wherein the difference of portrait category correlation degrees between the platform service big data of the seventh effective service time sequence node and the platform service big data of an eighth effective service time sequence node in the t second effective service time sequence nodes is not in a second correlation degree interval, the time sequence position information of the eighth effective service time sequence node in a second cloud service activation period is the same as the time sequence position information of the seventh effective service time sequence node in the first cloud service activation period, and the integer i is not greater than the integer t;

and on the premise that the number of uninterrupted seventh effective service time sequence nodes in the t first effective service time sequence nodes is less than i, determining that the first user portrait updating state of the platform service big data in the candidate service big data is highly related to the second user portrait updating state of the platform service big data in the reference service big data.

By adopting the method, the platform service big data positioned on the uninterrupted seventh effective service time sequence node in the candidate service big data can be determined to be the service big data with potential value. It should be noted that, when the second valid service timing node is a node (which may be understood as a key point) with a sudden change in portrait state update, the node (which may be understood as a key point, for example) with the sudden change in portrait state update can reflect a user portrait update situation corresponding to a service operation behavior of a service user, so that a user portrait update trend of platform service big data can be reflected, and thus, the analysis is performed by using the technology, so that an analysis result can be more accurate and more reliable.

Optionally, in the above embodiment, a plurality of time sequence nodes are not continuous to determine whether there is a high correlation in the user portrait updating trend, although the processing efficiency is relatively high, the operation speed is relatively fast, and a certain accuracy can also be determined, there still exists a certain contingency and chance component, that is, there is a high correlation exactly at the sampling time sequence node, and there is no high correlation at the remaining time sequence nodes. In order to overcome the problem, considering that the difference of the correlation degree of the image categories between the data contents of the service big data corresponding to the same sequence position information in different periods should be consistent with the corresponding amplified or lost data content, for a plurality of consecutive sequence nodes, the difference of the correlation degree of the image categories corresponding to the accumulated amplified or lost data content should also be within a certain correlation degree interval (i.e. a first correlation degree interval), so that the following method can be adopted for determining, and the method is suitable for the scheme corresponding to the selection mode of the first effective service sequence node:

on the premise that the number of user portrait data sets with portrait category relevancy in a first relevancy interval in t first user portrait data sets reaches k, determining that a first user portrait updating state of platform service big data in candidate service big data is highly correlated with a second user portrait updating state of platform service big data in reference service big data, wherein the first user portrait data set is a user portrait data set formed by combining a first map data track segment, a second map data track segment, a third map data track segment and a fourth map data track segment, the first map data track segment p3p4 is a map data track between a node p3 where a third effective service time sequence node is located on the first map data track and a node p4 where a fourth effective service time sequence node is located, and the second map data track segment p4p1 is a node p4 where the fourth effective service time sequence node is located on the first map data track and a fifth effective service on the second map data track A graph data track between nodes p1 where sequence nodes are located, a third graph data track segment p1p2 is a graph data track between a node p1 where a fifth effective service time sequence node is located on the second graph data track and a node p2 where a sixth effective service time sequence node is located, a fourth graph data track segment p2p3 is a graph data track between a node p3 where the third effective service time sequence node is located on the first graph data track and a node p2 where the sixth effective service time sequence node is located on the second graph data track, a first graph data track L1 is used for representing platform service big data m on a plurality of effective service time sequence nodes in a first cloud service activation period in candidate service big data, a second graph data track L2 is used for representing platform service big data m on a plurality of effective service time sequence nodes in a second cloud service activation period in reference service big data, t first effective service time sequence nodes include uninterrupted third effective time sequence service node and fourth effective time sequence service node The t second effective service time sequence nodes comprise a fifth effective service time sequence node and a sixth effective service time sequence node which are uninterrupted, the time sequence position information of the fifth effective service time sequence node in the second cloud service activation period is the same as the time sequence position information of the fourth effective service time sequence node in the first cloud service activation period, the time sequence position information of the sixth effective service time sequence node in the second cloud service activation period is the same as the time sequence position information of the third effective service time sequence node in the first cloud service activation period, and the integer k is not more than the integer t;

and on the premise that the number of the user image data sets with the image category correlation degree in the first correlation degree interval in the t first user image data sets is not more than k, determining that the first user image updating state of the platform service big data in the candidate service big data is not highly correlated with the second user image updating state of the platform service big data in the reference service big data.

Alternatively, the above-described manner may be changed to: on the premise that the continuous number of second user portrait data sets in the t first user portrait data sets reaches k, determining that the first user portrait updating state of the platform service big data in the candidate service big data is not highly correlated with the second user portrait updating state of the platform service big data in the reference service big data, wherein the second user portrait data sets are user portrait data sets of which the portrait category correlation degrees are not in a first correlation degree interval in the t first user portrait data sets; and on the premise that the continuous number of the second user portrait data sets in the t first user portrait data sets does not reach k, determining that the first user portrait updating state of the platform service big data in the candidate service big data is highly related to the second user portrait updating state of the platform service big data in the reference service big data.

By adopting the method, the service big data with potential value can be judged to be the platform service big data corresponding to the uninterrupted second user portrait data set.

By adopting the technical scheme, initial business big data with potential value and corresponding time sequence information (such as generation or recording time) can be determined accurately and reliably, large-scale business big data is not needed to be used in real-time processing, and the generated business big data in a period of continuous time is only utilized to judge whether the future business big data is the business big data with potential value, so that the processing efficiency is high, the reliability of a judgment result can be ensured, and the screened business big data with potential value can be ensured to be highly matched with actual business; by using the method and the device, the threshold (namely the first relevance interval or the second relevance interval) can be freely set, and the size of the threshold U can flexibly meet the potential value judgment condition of various service big data.

In some optional embodiments, after the business big data with potential value is determined, the business big data with potential value can be migrated and stored for subsequent data mining. In an actual application process, the data analysis server 120 may also operate as a data mining server, and a mining task thereof may be mainly specified by a terminal corresponding to the service provider platform, for example, on the basis of the above contents, the method may further include a technical scheme corresponding to a data mining part, specifically as follows: responding to a data mining request sent by a target service terminal, performing data mining on the candidate service big data based on a preset convolutional neural network to obtain a data mining result based on the user interest tendency, and feeding the data mining result back to the target service terminal; and the target service terminal is a terminal corresponding to the service provider platform.

It can be understood that the data mining result based on the user interest tendency can provide a basis for service update or product update for the service provider platform, and therefore, in order to ensure accuracy and timeliness of the data mining result, the identification degree of the business big data needs to be considered, so that reasonable classification prediction is facilitated in the mining process, for achieving the purpose, a data mining request sent by a target service terminal is responded, data mining is performed on the candidate business big data based on a preset convolutional neural network, and a data mining result based on the user interest tendency is obtained, which further includes the following contents: acquiring data characteristic content to be subjected to data mining corresponding to the candidate service big data based on a service requirement label in the data mining request, inputting the data characteristic content to be subjected to data mining into an updated data characteristic identification degree analysis model for analysis to obtain a current data characteristic identification degree, wherein the updated data characteristic identification degree analysis model is a convolutional neural network model obtained after an initial data characteristic identification degree analysis model is subjected to iterative updating; determining a user interest heat value interval corresponding to a target data feature identification degree corresponding to the current data feature identification degree from a user interest heat value interval corresponding to an updated data feature identification degree, wherein the user interest heat value interval corresponding to the updated data feature identification degree has a corresponding relation with an associated data feature identification degree, the associated data feature identification degree is determined according to an interest tendency label of the user interest heat value interval corresponding to an initial data feature identification degree associated with the user interest heat value interval corresponding to the updated data feature identification degree, a configuration strategy of the user interest heat value interval corresponding to the updated data feature identification degree is determined according to statistical result information of the initial data feature identification degree and the updated data feature identification degree, and the initial data feature identification degree is obtained by inputting a preset data feature content sample into the initial data feature identification degree analysis model, the updated data feature identification is obtained by inputting the preset data feature content sample into the updated data feature identification analysis model; determining the target associated data feature identification degree corresponding to the user interest heat value interval corresponding to the target data feature identification degree according to the corresponding relation; and determining a data mining result based on the user interest tendency corresponding to the data feature content to be subjected to data mining based on the global interest tendency content set corresponding to the target associated data feature identification degree and the preset associated data feature identification degree. In this way, by implementing the above contents, the data feature content can be analyzed by using the data feature identification degree analysis model, so that different data feature identification degrees are taken into account, and thus the identification degree of the large business data can be taken into account, thereby facilitating reasonable classification prediction in the mining process, and further ensuring the accuracy and timeliness of the data mining result.

It can be understood that the data mining result determined by the above contents based on the user interest tendency can guide the corresponding service provider platform to update the business product, for example, for the game product, the skin of different heros can be updated through "want more game skin" in the data mining result, and for example, for the online office product, the optimization of the related office software can be performed through "the overall replacement requirement of the individual selected target" in the data mining result. Therefore, the obtained data mining result can be ensured to serve the user and the service provider platform, and the value of the big data can be played as much as possible.

Next, for the data screening method for serving big data mining analysis, an exemplary data screening apparatus for serving big data mining analysis is further provided in the embodiment of the present invention, as shown in fig. 3, the data screening apparatus 30 for serving big data mining analysis may include the following functional modules.

The receiving module 31 is configured to receive a data screening instruction, where the data screening instruction is used to instruct to screen the service big data with the potential value.

The obtaining module 32 is configured to, in response to the data screening indication, obtain, by using a data interception thread with a time sequence interval size of T, a data stream with a time sequence interval size of T that is uninterrupted in a time sequence as candidate service big data from platform service big data collected by the target cloud service platform in a first service processing period, where a duration of the first service processing period is a preset duration value, and T is a positive integer.

The determining module 33 is configured to determine whether the candidate service big data is service big data with a potential value based on reference service big data, where the reference service big data is platform service big data collected by the target cloud service platform in a second service processing period, a failure time of the second service processing period is not later than an activation time of the first service processing period, and a duration of the second service processing period is greater than or equal to the preset duration value.

For the description of the functional modules, reference may be made to the description of the embodiment of the method shown in fig. 2.

Then, on the basis of the above, the data screening system for serving the big data mining analysis may also be the following architecture: the system comprises a data analysis server and a cloud service platform which are communicated with each other. Further, a description is given below with respect to a system formed by the data analysis server and the cloud service platform.

A data screening system serving big data mining analysis comprises a data analysis server and a cloud service platform which are communicated with each other;

the cloud service platform collects service big user data in a general way in the running process;

the data analysis server receives a data screening instruction, wherein the data screening instruction is used for instructing screening of business big data with potential value; responding to the data screening indication, and acquiring a section of data flow which is uninterrupted in time sequence and has a time sequence interval of T as candidate service big data by adopting a data interception thread with the time sequence interval of T in platform service big data which is collected by a target cloud service platform in a first service processing time period, wherein the duration of the first service processing time period is a preset time length value, and T is a positive integer; and judging whether the candidate service big data is service big data with potential value or not based on reference service big data, wherein the reference service big data is platform service big data collected by the target cloud service platform in a second service processing period, the failure time of the second service processing period is not later than the activation time of the first service processing period, and the duration of the second service processing period is greater than or equal to the preset duration value.

Reference may be made to the description of the method embodiment shown in fig. 2 with respect to the above-described system.

Further, referring to fig. 4 in conjunction, the data analysis server 110 may include a processing engine 111, a network module 112 and a memory 113, the processing engine 111 and the memory 113 communicating through the network module 112.

Processing engine 111 may process the relevant information and/or data to perform one or more of the functions described herein. For example, in some embodiments, processing engine 111 may include at least one processing engine (e.g., a single core processing engine or a multi-core processor). By way of example only, Processing engine 111 may include a Central Processing Unit (CPU), an Application-Specific Integrated Circuit (ASIC), an Application-Specific Instruction Set Processor (ASIP), a Graphics Processing Unit (GPU), a Physical Processing Unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller Unit, a Reduced Instruction Set Computer (RISC), a microprocessor, or the like, or any combination thereof.

Network module 112 may facilitate the exchange of information and/or data. In some embodiments, the network module 112 may be any type of wired or wireless network or combination thereof. Merely by way of example, Network module 112 may include a cable Network, a wired Network, a fiber optic Network, a telecommunications Network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Public Switched Telephone Network (PSTN), a bluetooth Network, a Wireless personal Area Network, a Near Field Communication (NFC) Network, and the like, or any combination thereof. In some embodiments, the network module 112 may include at least one network access point. For example, the network module 112 may include a wired or wireless network access point, such as a base station and/or a network access point.

The Memory 113 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 113 is configured to store a program, and the processing engine 111 executes the program after receiving the execution instruction.

It will be appreciated that the configuration shown in FIG. 4 is merely illustrative, and that the data analysis server 110 may include more or fewer components than shown in FIG. 4, or have a different configuration than shown in FIG. 4. The components shown in fig. 4 may be implemented in hardware, software, or a combination thereof.

The foregoing disclosure of embodiments of the present invention will be apparent to those skilled in the art. It should be understood that the process of deriving and analyzing technical terms, which are not explained, by those skilled in the art based on the above disclosure is based on the contents described in the present application, and thus the above contents are not an inventive judgment of the overall scheme.

It should be appreciated that the system and its modules shown above may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules of the present application may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).

It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the broad application. Various modifications, improvements and adaptations to the present application may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.

Also, this application uses specific language to describe embodiments of the application. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the present application is included in at least one embodiment of the present application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the present application may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the present application may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereon. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of the present application may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which elements and sequences of the processes described herein are processed, the use of alphanumeric characters, or the use of other designations, is not intended to limit the order of the processes and methods described herein, unless explicitly claimed. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the numbers allow for adaptive variation. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

The entire contents of each patent, patent application publication, and other material cited in this application, such as articles, books, specifications, publications, documents, and the like, are hereby incorporated by reference into this application. Except where the application is filed in a manner inconsistent or contrary to the present disclosure, and except where the claim is filed in its broadest scope (whether present or later appended to the application) as well. It is noted that the descriptions, definitions and/or use of terms in this application shall control if they are inconsistent or contrary to the statements and/or uses of the present application in the material attached to this application.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present application. Other variations are also possible within the scope of the present application. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the present application can be viewed as being consistent with the teachings of the present application. Accordingly, the embodiments of the present application are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. A data screening method for serving big data mining analysis is characterized by comprising the following steps:

judging whether the candidate service big data is service big data with potential value or not based on reference service big data, wherein the reference service big data is platform service big data collected by the target cloud service platform in a second service processing period, the failure time of the second service processing period is not later than the activation time of the first service processing period, and the duration of the second service processing period is more than or equal to the preset duration value;

wherein judging whether the candidate service big data is service big data with potential value based on the reference service big data comprises:

determining that the candidate service big data is service big data with potential value on the premise that a first user portrait updating state of the platform service big data in the candidate service big data in the first service processing period is not highly correlated with a second user portrait updating state of the platform service big data in the reference service big data in the second service processing period;

wherein it is determined whether the first user profile update status of the platform service big data in the candidate service big data and the second user profile update status of the platform service big data in the reference service big data are highly correlated as follows:

determining whether the first user portrait updating state of the platform service big data in the candidate service big data is highly correlated with the second user portrait updating state of the platform service big data in the reference service big data according to the platform service big data of the t first effective service time sequence nodes and the platform service big data of the t second effective service time sequence nodes;

wherein determining whether the first user portrait update status of the platform service big data in the candidate service big data and the second user portrait update status of the platform service big data in the reference service big data are highly correlated according to the platform service big data of the t first effective service time sequence nodes and the platform service big data of the t second effective service time sequence nodes comprises:

determining that the first user portrait update status of the platform service big data in the candidate service big data is not highly correlated with the second user portrait update status of the platform service big data in the reference service big data on the premise that the number of user portrait data sets with portrait category correlation in the first correlation interval in the t first user portrait data sets is not more than k;

wherein the method further comprises:

2. The method of claim 1, wherein the number of the reference service big data is n, and wherein determining whether the candidate service big data is potentially valuable service big data based on the reference service big data comprises:

3. The method of claim 1, wherein determining whether the first user portrait update status of a platform service big data of the candidate service big data and the second user portrait update status of a platform service big data of the reference service big data are highly correlated according to the platform service big data of the t first active service timing nodes and the platform service big data of the t second active service timing nodes comprises:

4. The method of claim 1, wherein determining whether the first user portrait update status of a platform service big data of the candidate service big data and the second user portrait update status of a platform service big data of the reference service big data are highly correlated according to the platform service big data of the t first active service timing nodes and the platform service big data of the t second active service timing nodes comprises:

determining that the first user portrait update status of the platform service big data in the candidate service big data is not highly correlated with the second user portrait update status of the platform service big data in the reference service big data on the premise that the number of uninterrupted seventh effective service time sequence nodes in the t first effective service time sequence nodes is at least i, wherein an integer i is not greater than an integer t;

5. The method of any of claims 1-4, wherein obtaining candidate traffic big data comprises: acquiring platform service big data collected in a first service processing period in a first cloud service activation period as the candidate service big data, wherein the time sequence position information of the activation time of the first service processing period in the first cloud service activation period is the same as the time sequence position information of the activation time of the second service processing period in the second cloud service activation period, and the time sequence position information of the failure time of the first service processing period in the first cloud service activation period is the same as the time sequence position information of the failure time of the second service processing period in the second cloud service activation period.

6. A data analysis server comprising a processing engine, a network module and a memory; the processing engine and the memory communicate through the network module, the processing engine reading a computer program from the memory and operating to perform the method of any of claims 1-5.