CN118132965B - Social platform user intelligent analysis method based on big data - Google Patents
Social platform user intelligent analysis method based on big data Download PDFInfo
- Publication number
- CN118132965B CN118132965B CN202410557734.3A CN202410557734A CN118132965B CN 118132965 B CN118132965 B CN 118132965B CN 202410557734 A CN202410557734 A CN 202410557734A CN 118132965 B CN118132965 B CN 118132965B
- Authority
- CN
- China
- Prior art keywords
- data
- social
- distribution
- interaction
- interaction type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 71
- 230000003993 interaction Effects 0.000 claims abstract description 180
- 238000009826 distribution Methods 0.000 claims abstract description 156
- 238000000034 method Methods 0.000 claims abstract description 38
- 238000012512 characterization method Methods 0.000 claims abstract description 36
- 230000002776 aggregation Effects 0.000 claims abstract description 19
- 238000004220 aggregation Methods 0.000 claims abstract description 19
- 238000012545 processing Methods 0.000 claims abstract description 18
- 238000010606 normalization Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 13
- 238000012935 Averaging Methods 0.000 claims description 6
- 238000005311 autocorrelation function Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000009827 uniform distribution Methods 0.000 claims description 6
- 238000000354 decomposition reaction Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 description 9
- 230000002159 abnormal effect Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000000737 periodic effect Effects 0.000 description 4
- 230000005856 abnormality Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/26—Discovering frequent patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/28—Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Physics (AREA)
- Operations Research (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of multidimensional data processing, in particular to a social platform user intelligent analysis method based on big data. According to the method, a characterization weight is obtained through the distribution condition of social data corresponding to each interaction type on a numerical range, a distribution weight is obtained according to the quantity distribution aggregation degree of the social data and other social data in the interaction type, and a data benchmark value of the interaction type is obtained by combining the numerical values; obtaining a time sequence rule coefficient according to the time distribution rule degree of the social data in the interaction type; according to the characterization weights of different interaction types in the data strip, the time sequence rule coefficient of the social data and the deviation condition of the time sequence rule coefficient and the data reference value, the highlighting degree of the data strip is obtained, and the highlighted data strip is determined for user analysis. According to the method, the characterizations and the behavior regularity of the social data in different dimensions are considered, the social information with stronger characterizations is obtained to perform more accurate user analysis, and the credibility of the user characteristic analysis result is higher.
Description
Technical Field
The invention relates to the technical field of multidimensional data processing, in particular to a social platform user intelligent analysis method based on big data.
Background
With the rapid development of social platforms, the amount of data generated by users grows exponentially, and how to effectively analyze and process such data becomes a very important issue. The big data technology provides strong support for the collection, storage and processing of the social platform user data, and can process massive, diversified and real-time updated data, so that the feature analysis of the social platform user data is possible. By comprehensively analyzing a great amount of information such as user relation, interaction behavior and the like in the social platform, some behavior characteristics of the user in the social network are characterized, the behavior and the demand of the user are reflected, and powerful support is provided for business decisions or advertisement delivery and the like of the platform.
Because the multiple elements of the social information of the user are complicated, the interest degree of the user on different events or activity hotspots is also different, the overall interaction behavior of the user is mutated by considering the more prominent hotspot information, therefore, the focus shifting condition of some attention of the user is generally represented by the prominent points of the social information in the multidimensional space, and the different distribution characterizations of different data and the behavior regularity of the user are not considered when the user is screened only according to the prominence analysis of the data in the multidimensional space due to the multi-source heterogeneity of the social information, so that the characterizations of the obtained prominent social data information are weaker, the error of the subsequent user analysis according to the prominent social information is larger, and the reliability of the user characteristic analysis result is lower.
Disclosure of Invention
In order to solve the technical problems that the representation of the highlighted interaction data information is weak, the error of the subsequent user analysis according to the highlighted interaction information is large, and the reliability of the user characteristic analysis result is low in the prior art, the invention aims to provide the social platform user intelligent analysis method based on big data, and the adopted technical scheme is as follows:
The invention provides a social platform user intelligent analysis method based on big data, which comprises the following steps:
Obtaining social data of each interaction type in each data bar in a social platform database; the interaction types include: user, time, interaction event, click frequency, interaction frequency and browsing duration; according to the situation that social data of each interaction type in all data bars are distributed uniformly in a numerical range, obtaining a characterization weight of each interaction type;
obtaining the distribution weight of each social data according to the quantity distribution aggregation degree of each social data and other social data of the interaction type; obtaining a data reference value of each interaction type according to the distribution weight and the numerical distribution condition of social data of each interaction type in all data bars;
obtaining a time sequence rule coefficient of each social data according to the time distribution rule degree of each social data in all social data corresponding to the interaction type; according to the representation weights of different interaction types in each data bar, the deviation condition of each social data and the data reference value of the interaction type, and the time sequence rule coefficient of each social data, obtaining the highlighting degree of each data bar;
Determining a highlight data bar from all data bars according to the highlight degree; user analysis is performed by highlighting the data bar.
Further, the method for obtaining the characterization weight comprises the following steps:
Sequentially taking each interaction type as an analysis type, presetting different division scales, and uniformly dividing the data range of the social data of the analysis type in all data bars into a plurality of intervals of the division scales for any one division scale to obtain the region range of the analysis type under the division scale;
Counting the quantity of social data of the analysis type distributed in each area range under the dividing scale, and obtaining the distribution quantity of each area range; calculating the variance of the distribution quantity of all the area ranges under the dividing scale and carrying out normalization processing to obtain a uniform distribution index of the analysis type under the dividing scale;
And taking the average value of the uniform distribution indexes of the analysis type under all the division scales as the characterization weight of the analysis type.
Further, the method for acquiring the distribution weight comprises the following steps:
For any interaction type, counting the occurrence times of each numerical value in all social data corresponding to the interaction type; mapping the numerical value and the occurrence number of all social data corresponding to the interaction type into a quantity distribution coordinate system of the interaction type to obtain distribution data points; in the quantity distribution coordinate system of the interaction type, the horizontal axis represents the numerical value of the social data of the interaction type, and the vertical axis represents the occurrence number of the numerical value of the social data; clustering the distributed data points in the quantity distribution coordinate system of the interaction type to obtain a distribution cluster;
Sequentially taking each distributed data point as a reference data point, calculating the distance between the reference data point and each other distributed data point in the distributed cluster, and averaging to obtain an intra-cluster distribution index of the reference data point; taking other distributed clusters closest to the cluster where the reference data point is located as the adjacent cluster of the reference data point, calculating the distance between the reference data point and each distributed data point in the adjacent cluster, and averaging to obtain an out-of-cluster distribution index of the reference data point;
Taking the difference between the intra-cluster distribution index and the outer-cluster distribution index of the reference data points as the aggregation distribution index of the reference data points; taking the ratio of the aggregation distribution index of the reference data points to the distribution index in the clusters as the distribution compactness of the reference data points; and calculating the product of the distribution compactness of the reference data points and the occurrence times corresponding to the reference data points, and carrying out normalization processing to obtain the distribution weight of the reference data points.
Further, the other distributed clusters closest to the distributed cluster where the reference data point is located are used as the adjacent cluster of the reference data point, and the method comprises the following steps:
acquiring a central point of each distributed cluster; calculating the distance between the center point of the cluster distributed by the reference data point and the center point of each other distributed cluster, and obtaining each interval distance of the cluster distributed by the reference data point;
And taking other distributed clusters corresponding to the smallest spacing distance of the distributed clusters of the reference data points as adjacent clusters of the reference data points.
Further, the method for acquiring the data reference value includes:
for any interaction type, taking the product of the value corresponding to each distribution data point in the interaction type and the distribution weight as an adjustment value of each distribution data point; and calculating the average value of the adjustment values of all the distributed data points corresponding to the interaction type, and obtaining the data reference value of the interaction type.
Further, the method for acquiring the time sequence rule coefficient comprises the following steps:
For any interaction type, ordering all social data corresponding to the interaction type according to a time sequence, and performing curve fitting to obtain a time sequence social curve of the interaction type; acquiring the cycle size of the time sequence social curve of the interaction type as the cycle length of the interaction type;
For any one social data, in a time sequence social curve of the interaction type of the social data, the social data corresponding to the moment of being separated by an integer multiple of the period length of the social data is used as equal period data of the social data; and calculating the variance of the numerical values of all the equal period data of the social data, performing negative correlation mapping and normalization processing to obtain the time sequence rule coefficient of the social data.
Further, the expression of the degree of highlighting is:
; in the method, in the process of the invention, Denoted as the firstThe degree of highlighting of the individual data bars,Expressed as the total number of categories of interaction types,Denoted as the firstThe characterization weights of the type of interaction,Denoted as the firstThe data reference value of the type of interaction,Denoted as the firstThe first data barThe timing law coefficient of social data of the type of interaction,Denoted as the firstThe first data barThe number of social data of the type of interaction,Represented as an absolute value extraction function,Represented as an exponential function with a base of a natural constant,Represented as a normalized processing function.
Further, the determining the highlight data bar from all the data bars according to the highlight degree comprises:
When the highlighting degree of the data bar is larger than a preset highlighting threshold value, the corresponding data bar is used as the highlighting data bar.
Further, the obtaining the cycle size of the time-series social graph of the interaction type as the cycle length of the interaction type includes:
Decomposing the time sequence social graph of the interaction type through an STL time sequence decomposition algorithm to obtain a period item; the period size in the period term is obtained by the time autocorrelation function as the period length of the interaction type.
Further, the distance is calculated using a Euclidean distance.
The invention has the following beneficial effects:
According to the method, the situation that the social information is different in information characteristic capability in different dimensions under the multi-dimension condition is considered, the characteristic weight is obtained through the distribution condition of the social data corresponding to each interaction type on the numerical range, the characteristic capability of the data in each interaction type is reflected through the uniformity of data distribution, and for the interaction type with poorer uniformity, the characteristic of data aggregation can represent different information characteristics, so that the different characteristic weights are required to be given for consideration. Meanwhile, since the highlighted information is usually obtained based on the deviation of the information from the general situation, if the distribution characteristic problem of information characterization is ignored in reference line calculation of the general situation only according to calculation of an average value, some data values with the saliency cause errors when the influence of the data values on the reference value is too large, so that the distribution weight is obtained according to the quantity distribution aggregation degree of each social data and other social data in the interaction type, and more accurate data reference values under each interaction type are obtained according to the distribution weight and the values of all the social data in each interaction type. When the social data is further analyzed in a saliency manner, the higher the deviation from the data reference value is, the higher the saliency is reflected, but because a user has a plurality of social information with a relatively regular rule in the interaction process, the abnormal consideration of the information is lower, so that the time sequence rule coefficient of each social data is obtained according to the time distribution rule degree of each social data in the interaction type. Finally, the highlighting degree of each data bar is obtained through the representing weights of different interaction types in each data bar, the time sequence rule coefficient of each social data and the deviation condition of the time sequence rule coefficient and the corresponding data reference value, so that the highlighting data bar is determined, and the highlighting data bar has more outstanding representing capability so as to facilitate the subsequent user analysis. According to the method, the social information with stronger characterization capability is obtained by considering different dimensionality characterizations of the social data and the behavior regularity of the user, so that the user analysis can be performed more accurately according to the prominent social information, and the reliability of the user characteristic analysis result is higher.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a social platform user intelligent analysis method based on big data according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of a specific implementation, structure, characteristics and effects of the social platform user intelligent analysis method based on big data according to the invention with reference to the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The invention provides a social platform user intelligent analysis method based on big data, which is specifically described below with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a social platform user intelligent analysis method based on big data according to an embodiment of the present invention is shown, and the method includes the following steps:
s1: obtaining social data of each interaction type in each data bar in a social platform database; the interaction types include: user, time, interaction event, click frequency, interaction frequency and browsing duration; and obtaining the characterization weight of each interaction type according to the condition that social data of each interaction type in all data bars are uniformly distributed in a numerical range.
In the embodiment of the invention, a user generates a data bar reflecting social information when interacting with a social platform, historical data bars are collected in a social platform database, each data bar contains social data of each interaction type, the interaction type is a data type obtained by each data bar, the user and time are necessary interaction types, and other interaction types comprise: interactive events, click frequency, interactive frequency, browsing duration and the like, and the implementers can increase and adjust types according to specific real-time conditions. Meanwhile, due to the diversity of data, the data of each interaction type needs to be subjected to numerical processing so as to be convenient for comprehensive analysis, and in the embodiment of the invention, the data is encoded in a One-Hot encoding mode of the data to obtain the corresponding social data. It should be noted that, in the embodiment of the present invention, the acquisition and the acquisition of the user information data are both authorized by the relevant user, the process does not violate the relevant laws and regulations, and the well-known methods of data acquisition and One-Hot coding are all technical means known to those skilled in the art, and are not described herein.
Because the data bar contains a plurality of interaction types, but not every interaction type can better represent the highlighting abnormality of the data, for example, the highlighting abnormality which can be focused by the data bar is far lower than the interaction frequency or the click frequency and the like in terms of time and browsing duration due to the distribution of a large amount of interaction information. It is therefore necessary to consider first for each interaction type its ability to characterize the information, so that the characterization weights for each interaction type are obtained according to the fact that the social data of each interaction type is distributed uniformly over the numerical range in all the data bars.
Preferably, each interaction type is sequentially used as an analysis type, each interaction type is analyzed, different division scales are preset, for any one division scale, the data range of social data of the analysis type in all data bars is uniformly divided into a plurality of intervals of the division scale, the area range of the analysis type under the division scale is obtained, and the distribution condition of numerical values can be observed through the division of the data range. The data range is the range of the analysis type from the minimum value to the maximum value of all the social data. In the embodiment of the invention, different division scales are preset to be 3, 4, 5,6, 7, 8 and 9, for example, when the data range of the analysis type is 1-15 and the division scale is 3, the data range needs to be divided into 3 sections, namely, the area ranges are 1-5,6-10 and 11-15 respectively, and social data in the analysis type is distributed in different area ranges according to different values.
And counting the quantity of the social data of the analysis type distributed in each area range under the dividing scale, and obtaining the distribution quantity of each area range, wherein the distribution quantity reflects the local distribution degree of the numerical value. Calculating the variance of the distribution quantity of all the area ranges under the dividing scale and carrying out normalization processing to obtain a distribution uniformity index of the analysis type under the dividing scale, wherein the variance reflects the uniformity degree of numerical distribution, and when the variance is smaller, namely the distribution uniformity index is smaller, the numerical distribution in the analysis type is more uniform, and the capability of reflecting the salient features is weaker. In other embodiments of the present invention, the uniformity of distribution may be reflected by the sum of squares of the number of distributions, and the more uniform the distribution of the values, the smaller the sum of squares, which is not limited herein.
Taking the average value of the uniform distribution indexes of the analysis types under all division scales as the characterization weight of the analysis types, integrating the numerical distribution conditions under various division scales, and when the trend of the uniform distribution indexes of the whole distribution is smaller, indicating that the abnormal highlighting capability of the characterization information of the analysis types is weaker, and considering that the interaction types are fewer in the subsequent analysis, so that the characterization weight is lower.
S2: obtaining the distribution weight of each social data according to the quantity distribution aggregation degree of each social data and other social data of the interaction type; and obtaining the data reference value of each interaction type according to the distribution weight and the numerical distribution condition of the social data of each interaction type in all the data bars.
The method can further combine the characterization weights to obtain the highlighted data bar according to the distribution deviation condition of social data in the data bar in the interaction type, and because the highlighted information is usually obtained based on the deviation condition of the information and the general condition, if the distribution characteristic problem of the information characterization is ignored only according to the calculation of the average value in the reference line calculation of the general condition, some data values with the saliency have excessive influence on the reference value and cause errors, and therefore the reference value of each interaction type needs to be adjusted before the saliency analysis is carried out. Firstly, according to the quantity distribution aggregation degree of each social data and other social data of the interaction type, the distribution weight of each social data is obtained.
Preferably, for any interaction type, counting the occurrence number of each numerical value in all social data corresponding to the interaction type, and analyzing the distribution situation of the numerical values according to the occurrence number of the numerical values. And mapping the numerical value and the occurrence frequency of all the social data corresponding to the interaction type into a quantity distribution coordinate system of the interaction type to obtain distribution data points, wherein in the quantity distribution coordinate system of the interaction type, the horizontal axis represents the numerical value of the social data of the interaction type, and the vertical axis represents the occurrence frequency of the numerical value of the social data. It should be noted that the transformation of the midpoint of the two-dimensional coordinate system is a technical means well known to those skilled in the art, and will not be described herein.
The method comprises the steps of analyzing the distribution characteristics under the interaction type through the aggregation degree of numerical distribution, and clustering the distribution data points in the quantity distribution coordinate system of the interaction type to obtain a distribution cluster.
And taking each distributed data point as a reference data point in turn, calculating the distance between the reference data point and each other distributed data point in the distributed cluster, and averaging to obtain an intra-cluster distribution index of the reference data point, and reflecting the aggregation of the reference data point in the self cluster. In the embodiment of the invention, the distance between the points is calculated by using Euclidean distance. In the embodiment of the invention, the center point of each distribution cluster is obtained, the distance between the center point of the distribution cluster at the reference data point and the center point of each other distribution cluster is calculated, each interval distance of the distribution clusters at the reference data point is obtained, the distribution distance between the distribution clusters is reflected, the other distribution clusters corresponding to the smallest interval distance of the distribution clusters at the reference data point are used as the adjacent clusters of the reference data point, namely, the other distribution clusters closest to the center point are the adjacent clusters of the distribution clusters at the reference data point. It should be noted that, the acquisition of the cluster center point and the euclidean distance calculation are all technical means well known to those skilled in the art, and are not described herein.
Further, the distance between the reference data point and each distributed data point in the adjacent cluster is calculated, and the average value is calculated to obtain an out-of-cluster distribution index of the reference data point, and the separability between the reference data point and other clusters is reflected. And taking the difference between the intra-cluster distribution index and the outer-cluster distribution index of the reference data points as an aggregation distribution index of the reference data points, wherein the larger the aggregation distribution index is, the better the clustering cluster of the reference data points on the local clustering cluster distribution is. And taking the ratio of the aggregation distribution index of the reference data points to the distribution index in the clusters as the distribution compactness of the reference data points, and when the local clustering degree of the reference data points is better, the better the aggregation of the reference data points in the clusters is, the higher the distribution compactness is, so that the social data local distribution represented by the reference data points has higher consistency characteristics.
The influence of the occurrence frequency on the representation is synthesized, the product of the distribution compactness of the reference data points and the occurrence frequency corresponding to the reference data points is calculated, normalization processing is carried out, and the distribution weight of the reference data points is obtained, wherein in the embodiment of the invention, the expression of the distribution weight is as follows:
in the method, in the process of the invention, Denoted as the firstThe distribution weights of the individual distribution data points,Denoted as the firstThe number of occurrences corresponding to the individual distribution data points,Denoted as the firstAn index is distributed within the cluster of distributed data points,Denoted as the firstAn out-of-cluster distribution index for each distribution data point,Expressed as an absolute value extraction function,The normalization function is represented as a normalization processing function, and the normalization is a technical means well known to those skilled in the art, and the normalization function may be selected as a linear normalization or a standard normalization, and the specific normalization method is not limited herein.
Wherein,Denoted as the firstAn aggregate distribution index of the individual distribution data points,Denoted as the firstDistribution compactness of individual distribution data points, it should be noted that since the calculation of the intra-cluster distribution index is based on the average distance of a plurality of distribution data points, the distribution compactness is improvedIt is not possible to zero, and there is no case where the denominator is zero making the formula meaningless. The greater the distribution compactness, the more the occurrence number, which indicates that the corresponding distribution data points are consistent in characteristics on the local distribution and the larger the data quantity, the higher the reflected reference degree, and therefore, the higher the weight of the numerical value of the corresponding distribution data points in subsequent calculation.
According to the distribution weight and the numerical distribution condition of social data of each interaction type in all data bars, a data reference value of each interaction type is obtained. Calculating the average value of the adjustment values of all the distributed data points corresponding to the interaction type, obtaining a data reference value of the interaction type, and obtaining the data reference value through weighted averaging of all the values, wherein in the embodiment of the invention, the expression of the data reference value is as follows:
in the method, in the process of the invention, Denoted as the firstThe data reference value of the type of interaction,Denoted as the firstThe total number of distributed data points of the type of interaction,Denoted as the firstThe distribution weights of the individual distribution data points,Denoted as the firstThe first of the species interaction typesThe values corresponding to the individual distribution data points. Wherein,Denoted as the firstThe first of the species interaction typesAn adjustment value for each distribution data point.
Thus, the analysis and adjustment of the reference situation for each interaction type are completed by combining the data distribution situation of the interaction type.
S3: obtaining a time sequence rule coefficient of each social data according to the time distribution rule degree of each social data in all social data corresponding to the interaction type; and obtaining the highlighting degree of each data bar according to the representation weights of different interaction types in each data bar, the deviation condition of each social data and the data reference value of the interaction type and the time sequence rule coefficient of each social data.
The highlighting of the data bar can be judged by the deviation of the social data in each data bar from the data reference value in the interaction type, but because the user has some social information with a relatively regular rule in the interaction process, the abnormal consideration of the information is low, for example, many users can browse weather forecast in the morning, although the interaction information can be prominently distributed only in a certain period of the day, the interaction information is still a relatively regular behavior in the period rule, and therefore, the highlighting degree of the information needs to be reduced. Therefore, the time sequence rule coefficient of each social data is obtained according to the time distribution rule degree of each social data in all social data corresponding to the interaction type.
Preferably, for any interaction type, all social data corresponding to the interaction type are ordered according to time sequence and curve fitting is carried out, so that a time sequence social curve of the interaction type is obtained, and the time sequence distribution condition of the interaction type can be reflected through the time sequence social curve. In one embodiment of the invention, the time sequence social curve of the interaction type is decomposed by an STL time sequence decomposition algorithm to obtain a periodic item of the interaction type, and the periodic item obtained by STL decomposition can reflect periodic variation in the social interaction type, which is possibly caused by various factors such as daily habits, holiday effects, periodic activities and the like of users. The period size in the period term is obtained through a time autocorrelation function, the autocorrelation function can measure the correlation of time series data in different time intervals as the period length of the interaction type, and the periodicity of the data can be determined by searching the peak value in the autocorrelation function. It should be noted that, the STL timing decomposition algorithm and the autocorrelation function are all technical means well known to those skilled in the art, and are not described herein.
For any one social data, in a time sequence social curve of the interaction type where the social data is located, social data corresponding to the moment of being an integral multiple of the period length of the time interval of the social data is taken as equal period data of the social data, the social data with the period length equal to the period length of the time interval of the social data before and after the time interval of the social data is obtained, and whether the distribution of the data is regular can be further judged. Calculating the variance of the numerical values of all the equal period data of the social data, carrying out negative correlation mapping and normalization processing to obtain a time sequence rule coefficient of the social data, reflecting the fluctuation degree of data change through the variance, and when the smaller the variance is, indicating that the social data has certain consistency on the period, so that the larger the time sequence rule coefficient is, the higher the regularity is reflected.
Carrying out abrupt analysis, namely analysis of deviation, on each data strip by combining the characterizations of the interaction types and the time regularity of the social data, and obtaining the abrupt degree of each data strip according to the characterization weights of different interaction types in each data strip, the deviation condition of each social data and the data benchmark value of the interaction type and the time sequence rule coefficient of each social data, wherein the expression of the abrupt degree is preferably:
in the method, in the process of the invention, Denoted as the firstThe degree of highlighting of the individual data bars,Expressed as the total number of categories of interaction types,Denoted as the firstThe characterization weights of the type of interaction,Denoted as the firstThe data reference value of the type of interaction,Denoted as the firstThe first data barThe timing law coefficient of social data of the type of interaction,Denoted as the firstThe first data barThe number of social data of the type of interaction,Represented as an absolute value extraction function,Represented as an exponential function with a base of a natural constant,Represented as a normalized processing function.
Wherein,Denoted as the firstThe first data barThe difference of the social data of the type of interaction from the corresponding data benchmark value,Expressed as negative correlation adjustment of the timing coefficient, when the firstThe first data barThe more regular the distribution of social data of the type of interaction, the less the subsequent differences are considered,Namely, the larger the time sequence rule coefficient is, the difference between the social data and the data reference value is reduced and considered,The representation is to increase the adjustment of the representation degree of the interaction type, and for the interaction type with weak representation capability, the smaller the difference degree is considered, the influence on the highlighting degree is reduced,And (3) analyzing the comprehensive difference conditions of all the interaction types to obtain the highlighting condition of each data bar, wherein when the average value is larger, the overall deviation trend degree of the corresponding data bar under each interaction type is higher, so that the highlighting degree is higher.
S4: determining a highlight data bar from all data bars according to the highlight degree; user analysis is performed by highlighting the data bar.
Finally, determining the salient data bar from all the data bars according to the salient degree, preferably, when the salient degree of the data bar is larger than a preset salient threshold, indicating that the salient of the data bar is larger, and taking the corresponding data bar as the salient data bar with certain characterization characteristics. In the embodiment of the present invention, the preset highlighting threshold is set to 0.38, and a specific numerical value implementation can be adjusted according to a specific real-time situation, which is not described herein.
Thus, analysis of the characterizations of the data bars is completed, social information with outstanding characterizations is screened out, and user analysis can be further performed through the outstanding data bars. In the embodiment of the invention, the highlighted data bar can be used as analysis data of the user characteristics to be input into an analysis system for outputting the user behavior characteristics, and in other embodiments of the invention, analysis of the correlation influence relationship among users can be performed based on the highlighted data bar so as to realize information recommendation based on the users, and the like, and the subsequent user analysis process is not repeated here.
In summary, the invention considers the situation that social information reflects different information characteristic capacities in different dimensions under the multi-dimension, obtains the characterization weight according to the distribution situation of social data corresponding to each interaction type on the numerical range, reflects the characterization capability of data in each interaction type according to the uniformity of data distribution, and characterizes different information characteristics according to the characteristics of data aggregation of the interaction type with poorer uniformity, so that the consideration is required to be given to different characterization weights. Meanwhile, since the highlighted information is usually obtained based on the deviation condition of the information and the general condition, if the distribution characteristic problem of information characterization is ignored in the reference line calculation of the general condition only according to the calculation of the average value, some data values with the saliency cause errors when the influence of the data values on the reference value is too large, so that the distribution weight is obtained according to the quantity distribution aggregation degree of each social data and other social data in the interaction type, and the more accurate reference value under each interaction type is obtained according to the distribution weight and the value of all the social data in each interaction type. When the social data is further analyzed in a saliency manner, the higher the deviation from the reference value is, the higher the saliency is reflected, but because a user has a plurality of social information with a relatively regular rule in the interaction process, the abnormal consideration of the information is lower, so that the time sequence rule coefficient of each social data is obtained according to the time distribution rule degree of each social data in the interaction type. Finally, the highlighting degree of each data strip is obtained through the representing weights of different interaction types in each data strip, the time sequence rule coefficient of each social data and the deviation condition of the time sequence rule coefficient and the corresponding reference value, so that the highlighting data strip is determined, and the highlighting data strip has more outstanding representing capability so as to facilitate the subsequent user analysis. According to the method, the social information with stronger characterization capability is obtained by considering different dimensionality characterizations of the social data and the behavior regularity of the user, so that the user analysis can be performed more accurately according to the prominent social information, and the reliability of the user characteristic analysis result is higher.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
Claims (5)
1. The intelligent analysis method for the social platform user based on the big data is characterized by comprising the following steps:
Obtaining social data of each interaction type in each data bar in a social platform database; the interaction types include: user, time, interaction event, click frequency, interaction frequency and browsing duration; according to the situation that social data of each interaction type in all data bars are distributed uniformly in a numerical range, obtaining a characterization weight of each interaction type;
obtaining the distribution weight of each social data according to the quantity distribution aggregation degree of each social data and other social data of the interaction type; obtaining a data reference value of each interaction type according to the distribution weight and the numerical distribution condition of social data of each interaction type in all data bars;
obtaining a time sequence rule coefficient of each social data according to the time distribution rule degree of each social data in all social data corresponding to the interaction type; according to the representation weights of different interaction types in each data bar, the deviation condition of each social data and the data reference value of the interaction type, and the time sequence rule coefficient of each social data, obtaining the highlighting degree of each data bar;
determining a highlight data bar from all data bars according to the highlight degree; user analysis is performed through the highlighted data bar;
the method for acquiring the characterization weight comprises the following steps:
Sequentially taking each interaction type as an analysis type, presetting different division scales, and uniformly dividing the data range of the social data of the analysis type in all data bars into a plurality of intervals of the division scales for any one division scale to obtain the region range of the analysis type under the division scale;
Counting the quantity of social data of the analysis type distributed in each area range under the dividing scale, and obtaining the distribution quantity of each area range; calculating the variance of the distribution quantity of all the area ranges under the dividing scale and carrying out normalization processing to obtain a uniform distribution index of the analysis type under the dividing scale;
Taking the average value of the uniform distribution indexes of the analysis type under all the dividing scales as the characterization weight of the analysis type;
the method for acquiring the distribution weight comprises the following steps:
For any interaction type, counting the occurrence times of each numerical value in all social data corresponding to the interaction type; mapping the numerical value and the occurrence number of all social data corresponding to the interaction type into a quantity distribution coordinate system of the interaction type to obtain distribution data points; in the quantity distribution coordinate system of the interaction type, the horizontal axis represents the numerical value of the social data of the interaction type, and the vertical axis represents the occurrence number of the numerical value of the social data; clustering the distributed data points in the quantity distribution coordinate system of the interaction type to obtain a distribution cluster;
Sequentially taking each distributed data point as a reference data point, calculating the distance between the reference data point and each other distributed data point in the distributed cluster, and averaging to obtain an intra-cluster distribution index of the reference data point; taking other distributed clusters closest to the cluster where the reference data point is located as the adjacent cluster of the reference data point, calculating the distance between the reference data point and each distributed data point in the adjacent cluster, and averaging to obtain an out-of-cluster distribution index of the reference data point;
Taking the difference between the intra-cluster distribution index and the outer-cluster distribution index of the reference data points as the aggregation distribution index of the reference data points; taking the ratio of the aggregation distribution index of the reference data points to the distribution index in the clusters as the distribution compactness of the reference data points; calculating the product of the distribution compactness of the reference data points and the occurrence times corresponding to the reference data points, and carrying out normalization processing to obtain the distribution weight of the reference data points;
the method for acquiring the data reference value comprises the following steps:
For any interaction type, taking the product of the value corresponding to each distribution data point in the interaction type and the distribution weight as an adjustment value of each distribution data point; calculating the average value of the adjustment values of all the distributed data points corresponding to the interaction type, and obtaining a data reference value of the interaction type;
the acquisition method of the time sequence rule coefficient comprises the following steps:
For any interaction type, ordering all social data corresponding to the interaction type according to a time sequence, and performing curve fitting to obtain a time sequence social curve of the interaction type; acquiring the cycle size of the time sequence social curve of the interaction type as the cycle length of the interaction type;
For any one social data, in a time sequence social curve of the interaction type of the social data, the social data corresponding to the moment of being separated by an integer multiple of the period length of the social data is used as equal period data of the social data; calculating the variance of the numerical values of all the equal period data of the social data, performing negative correlation mapping and normalization processing to obtain a time sequence rule coefficient of the social data;
the expression of the highlighting degree is:
; in the method, in the process of the invention, Denoted as the firstThe degree of highlighting of the individual data bars,Expressed as the total number of categories of interaction types,Denoted as the firstThe characterization weights of the type of interaction,Denoted as the firstThe data reference value of the type of interaction,Denoted as the firstThe first data barThe timing law coefficient of social data of the type of interaction,Denoted as the firstThe first data barThe number of social data of the type of interaction,Represented as an absolute value extraction function,Represented as an exponential function with a base of a natural constant,Represented as a normalized processing function.
2. The intelligent analysis method for social platform users based on big data according to claim 1, wherein the other distributed clusters closest to the distributed cluster where the reference data point is located are used as the adjacent clusters of the reference data point, and the method comprises the following steps:
acquiring a central point of each distributed cluster; calculating the distance between the center point of the cluster distributed by the reference data point and the center point of each other distributed cluster, and obtaining each interval distance of the cluster distributed by the reference data point;
And taking other distributed clusters corresponding to the smallest spacing distance of the distributed clusters of the reference data points as adjacent clusters of the reference data points.
3. The intelligent analysis method for social platform users based on big data according to claim 1, wherein the determining the highlighted data bar from all data bars according to the highlighting degree comprises:
When the highlighting degree of the data bar is larger than a preset highlighting threshold value, the corresponding data bar is used as the highlighting data bar.
4. The method for intelligent analysis of social platform users based on big data according to claim 1, wherein the step of obtaining the cycle size of the time-series social graph of the interaction type as the cycle length of the interaction type comprises the steps of:
Decomposing the time sequence social graph of the interaction type through an STL time sequence decomposition algorithm to obtain a period item; the period size in the period term is obtained by the time autocorrelation function as the period length of the interaction type.
5. The intelligent analysis method for social platform users based on big data according to claim 1, wherein the distance is calculated by using Euclidean distance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410557734.3A CN118132965B (en) | 2024-05-08 | 2024-05-08 | Social platform user intelligent analysis method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410557734.3A CN118132965B (en) | 2024-05-08 | 2024-05-08 | Social platform user intelligent analysis method based on big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118132965A CN118132965A (en) | 2024-06-04 |
CN118132965B true CN118132965B (en) | 2024-07-16 |
Family
ID=91233077
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410557734.3A Active CN118132965B (en) | 2024-05-08 | 2024-05-08 | Social platform user intelligent analysis method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118132965B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118378193B (en) * | 2024-06-20 | 2024-08-27 | 山东征途信息科技股份有限公司 | Intelligent community data analysis method and system based on big data |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341270A (en) * | 2017-07-28 | 2017-11-10 | 东北大学 | Towards the user feeling influence power analysis method of social platform |
CN111581522A (en) * | 2020-06-05 | 2020-08-25 | 预见你情感(北京)教育咨询有限公司 | Social analysis method based on user identity identification |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016012493A1 (en) * | 2014-07-24 | 2016-01-28 | Agt International Gmbh | System and method for social event detection |
WO2016142906A1 (en) * | 2015-03-11 | 2016-09-15 | Iou Concepts Inc. | System and method for generating a user status and authenticating social interactions in a computer network |
JP2017117306A (en) * | 2015-12-25 | 2017-06-29 | ルネサスエレクトロニクス株式会社 | Marking analysis system and marking analysis method |
US10713588B2 (en) * | 2016-02-23 | 2020-07-14 | Salesforce.Com, Inc. | Data analytics systems and methods with personalized sentiment models |
CN114077705A (en) * | 2021-09-24 | 2022-02-22 | 中国科学院计算技术研究所 | Method and system for portraying media account on social platform |
CN116303663A (en) * | 2023-02-08 | 2023-06-23 | 启明信息技术股份有限公司 | User affinity calculation method and system based on content social platform |
CN116705337B (en) * | 2023-08-07 | 2023-10-27 | 山东第一医科大学第一附属医院(山东省千佛山医院) | Health data acquisition and intelligent analysis method |
CN117522758B (en) * | 2024-01-04 | 2024-03-26 | 深圳对对科技有限公司 | Smart community resource management method and system based on big data |
CN117875501B (en) * | 2024-01-12 | 2024-07-02 | 深圳振华数据信息技术有限公司 | Social media user behavior prediction system and method based on big data |
-
2024
- 2024-05-08 CN CN202410557734.3A patent/CN118132965B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341270A (en) * | 2017-07-28 | 2017-11-10 | 东北大学 | Towards the user feeling influence power analysis method of social platform |
CN111581522A (en) * | 2020-06-05 | 2020-08-25 | 预见你情感(北京)教育咨询有限公司 | Social analysis method based on user identity identification |
Also Published As
Publication number | Publication date |
---|---|
CN118132965A (en) | 2024-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN118132965B (en) | Social platform user intelligent analysis method based on big data | |
CN114676883A (en) | Power grid operation management method, device and equipment based on big data and storage medium | |
CN113377568A (en) | Abnormity detection method and device, electronic equipment and storage medium | |
CN114997321A (en) | Transformer area user change relationship identification method and device, electronic equipment and storage medium | |
CN103714135A (en) | MapReduce recommendation method and system of second-degree interpersonal relationships of massive users | |
CN116431931A (en) | Real-time incremental data statistical analysis method | |
CN111898637B (en) | Feature selection algorithm based on Relieff-DDC | |
CN118134539B (en) | User behavior prediction method based on intelligent kitchen multi-source data fusion | |
CN112418485A (en) | Household load prediction method and system based on load characteristics and power consumption behavior mode | |
CN117971625B (en) | Performance data intelligent monitoring system based on computer cloud platform | |
CN118171900A (en) | Canned fruit and vegetable information traceability management system based on block chain | |
CN111858245A (en) | Abnormal data analysis method and device, electronic equipment and storage medium | |
CN109241320A (en) | The division methods of teenage crime area cluster based on Time Series Clustering | |
CN114757722B (en) | Sales predicting method and device for electronic equipment | |
CN118132860B (en) | Intelligent processing method for personal office information technology software data | |
CN112540819A (en) | Method for automatically generating recommended detailed page and form page according to query page | |
Davarzani et al. | Study of missing meter data impact on domestic load profiles clustering and characterization | |
CN112884192A (en) | High-quality power value-added service product decision method based on multi-index bilateral matching | |
CN117114761B (en) | Supply chain information management system based on integral data analysis | |
CN117076990B (en) | Load curve identification method, device and medium based on curve dimension reduction and clustering | |
CN118350855B (en) | User intelligent matching method and system based on user acquisition information | |
JP2001066377A (en) | Weather system | |
CN116781984B (en) | Set top box data optimized storage method | |
CN116719665B (en) | Intelligent judging and identifying method for abnormal state of meteorological numerical mode | |
CN113780862A (en) | Power load comprehensive change rate evaluation method and system and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |