CN117614845A - Communication information processing method and device based on big data analysis - Google Patents

Communication information processing method and device based on big data analysis Download PDF

Info

Publication number
CN117614845A
CN117614845A CN202311513146.1A CN202311513146A CN117614845A CN 117614845 A CN117614845 A CN 117614845A CN 202311513146 A CN202311513146 A CN 202311513146A CN 117614845 A CN117614845 A CN 117614845A
Authority
CN
China
Prior art keywords
data
communication
communication data
feature
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311513146.1A
Other languages
Chinese (zh)
Other versions
CN117614845B (en
Inventor
康波峰
黄明金
周雯
熊刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weichuang Software Wuhan Co ltd
Original Assignee
Weichuang Software Wuhan Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weichuang Software Wuhan Co ltd filed Critical Weichuang Software Wuhan Co ltd
Priority to CN202311513146.1A priority Critical patent/CN117614845B/en
Publication of CN117614845A publication Critical patent/CN117614845A/en
Application granted granted Critical
Publication of CN117614845B publication Critical patent/CN117614845B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5061Network service management, e.g. ensuring proper service fulfilment according to agreements characterised by the interaction between service providers and their network customers, e.g. customer relationship management
    • H04L41/507Filtering out customers affected by service problems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a communication information processing method and a device based on big data analysis, which relate to the technical field of communication information processing, and the method comprises the following steps: collecting initial communication data of a user, and storing the initial communication data into a communication database according to a data type; preprocessing the acquired initial communication data to obtain standard communication data; designing a feature combination of standard communication data, and extracting features of the standard communication data according to the feature combination to obtain a first feature set; screening the first feature set according to a screening rule to obtain a second feature set; acquiring a communication task list, and analyzing and processing the second feature set according to the communication tasks in the communication task list to obtain analysis results corresponding to the communication tasks; and visualizing the analysis result by using a visualization tool. The invention utilizes the advantage of big data analysis to carry out personalized analysis on the communication data, and provides personalized, intelligent and accurate service for users.

Description

Communication information processing method and device based on big data analysis
Technical Field
The invention belongs to the technical field of communication information processing, and particularly relates to a communication information processing method and device based on big data analysis.
Background
With the development and popularization of communication technology, a great deal of data is generated by the communication behavior of people, including short messages, telephone call records, social media messages and the like. The communication data contains rich information, and can be used in various fields such as personal communication habit analysis, social relationship mining, business marketing and the like. However, conventional communication data processing methods often face many challenges and limitations, including large data volume, various data types, and uneven data quality.
At present, conventional database management systems and data mining technologies are often used for processing communication data. These methods perform well in processing structured data, but have limitations in processing unstructured communication data. For example, conventional database management systems may not be able to efficiently process large-scale text data, while data mining techniques may face inefficiencies in feature extraction and task analysis. In addition, the traditional communication data processing method often lacks deep understanding and mining of the communication behaviors of the user, and is difficult to provide personalized and accurate service for the user.
The invention patent with the Chinese application number of 202210491734.9 discloses a communication information automatic analysis system and equipment based on big data, wherein the field of special interest of a user is summarized according to the history search record of the corresponding user, and then a search information analysis unit searches by using a search engine according to the previously summarized field of interest of the user and a keyword input by the user. In the prior art, rough recommendation is performed according to local files, online browsing records, searching interests and the like of users, deep mining and analysis of communication information are not performed, user feedback is usually required to be obtained according to irregular pushing, follow-up recommendation optimization is performed, and the method is not intelligent enough, and the experience of the users is poor.
Disclosure of Invention
In view of the above, the invention provides a communication information processing method and device based on big data analysis, which performs personalized analysis on communication data by utilizing the advantages of big data analysis, provides personalized, intelligent and precise services for users, improves the efficiency and quality of communication data processing, realizes deeper and comprehensive analysis on the communication data, and improves the utilization value of the communication data.
The technical purpose of the invention is realized as follows:
in one aspect, the invention provides a communication information processing method based on big data analysis, which comprises the following steps:
s1, initial communication data of a user are collected and stored in a communication database according to data types;
s2, preprocessing the acquired initial communication data to obtain standard communication data, and storing the standard communication data into a communication database;
s3, designing a feature combination of standard communication data, and carrying out feature extraction on the standard communication data according to the feature combination to obtain a first feature set;
s4, screening the first feature set according to a screening rule to obtain a second feature set;
s5, acquiring a communication task list, and analyzing and processing the second feature set according to the communication tasks in the communication task list to obtain analysis results corresponding to the communication tasks;
And S6, visualizing the analysis result by using a visualization tool, and displaying the result to the user according to the user requirement.
Based on the above technical solution, preferably, step S2 includes:
s21, coding the acquired initial communication data to obtain a unique identifier of the initial communication data;
s22, judging whether the initial communication data is repeated or not according to the unique identifier, and if the unique identifier is repeated, merging the corresponding initial communication data to obtain first communication data;
s23 setting a first threshold delta 1 Second threshold delta 2 And a third threshold delta 3 Calculating the number N of missing characters in the first communication data 1 And is in contact with a first threshold delta 1 Second threshold delta 2 And a third threshold delta 3 And (3) judging:
if the number delta of the missing characters of the first communication data 1 <N 1 ≤δ 2 Classifying the first communication data into first data to be repaired;
if the number delta of the missing characters of the first communication data 2 <N 1 ≤δ 3 Classifying the first communication data into second data to be repaired;
if the number delta of the missing characters of the first communication data 3 <N 1 Classifying the first communication data into third data to be repaired;
s24, respectively processing the first data to be repaired, the second data to be repaired and the third data to be repaired to obtain second communication data;
S25, carrying out anomaly identification on the second communication data by adopting an anomaly detection method to obtain anomaly data, and repairing the anomaly data to obtain third communication data;
s26, carrying out format conversion and normalization on the third communication data to obtain standard communication data.
Based on the above technical solution, preferably, step S24 includes:
deleting the first data to be repaired from the first communication data;
acquiring a time stamp of the second data to be repaired, taking the time stamp as a target time stamp, and respectively searching Y non-missing communication data forwards in the first communication data by taking the target time stamp as an origin pointBackward searching Y non-missing communication data +.>Calculate->And->Filling the second data to be repaired by using the weighted average value to obtain second repair data, wherein +_>Weight of (2) is less than +.>Weight of (2);
predicting the missing value of the third data to be repaired by adopting a pre-trained random forest model, and filling the missing value in the third data to be repaired according to the prediction result of the random forest model to obtain third repair data;
and updating the first communication data by using the second repair data and the third repair data to obtain second communication data.
On the basis of the above technical solution, preferably, step S25 includes:
s251, traversing the second communication data, calculating a first distance between each second communication data and other second communication data, forming a distance sorting table according to the arrangement of the first distances from small to large, selecting the first m communication data as a neighbor set of the current second communication data, and taking the second communication data and the corresponding neighbor set as a relationship set;
s252 traversing the relation set, calculating a second distance d between each second communication data and its neighbor according to the first distance 2 And updating the second distance to the relation set, wherein the calculation formula of the second distance is as follows:
in the method, in the process of the invention,the ith neighbor and neighbor in the neighbor set of the current second communication dataA second distance of the second communication data, < >>For the first distance between the ith neighbor in the neighbor set of the current second communication data and the second communication data,/the first distance is equal to the second distance>The method comprises the steps of setting a first distance between an ith neighbor in a neighbor set of current second communication data and an mth second communication data in a distance sorting table of the current second communication data;
s253, traversing the relation set, calculating the distance density of each second communication data according to the second distance, and updating the distance density into the relation set, wherein the calculation formula of the distance density is as follows:
Wherein ρ is d For the distance density of the current second communication data,m is the sum of second distances of the current second communication data, and m is the number of neighbors in the neighbor set of the current second communication data;
s254, setting a density threshold, calculating the local density of each second communication data according to the distance density, and taking the second communication data with the local density larger than the density threshold as abnormal data, wherein the calculation formula of the local density is as follows:
in the method, in the process of the invention,sigma ρ is the local density of the current second communication data d A represents the second communication data, which is the sum of the distance densities of all the second communication dataIs the number of (3);
s255, acquiring the time stamp of the abnormal data, setting a time interval, sequentially and forwards acquiring n time-interval normal data in the second communication data by taking the time stamp of the abnormal data as an origin, and repairing the abnormal data by utilizing the n time-interval normal data to obtain third communication data.
Based on the above technical solution, preferably, in step S255, the formula of anomaly repair is as follows:
in the method, in the process of the invention,representing repaired abnormal data, x k (t) normal data representing the kth time interval, lambda k A weighting coefficient for normal data of a kth time interval;
Wherein lambda is k The calculation formula of (2) is as follows:
in the method, in the process of the invention,is the initial assigned weight of normal data for the kth time interval, +.>Is the time attenuation term, r is the attenuation factor, t k Time value t of normal data for the kth time interval x For the time value corresponding to the abnormal data g k~x Is the correlation coefficient of normal data and corresponding abnormal data of the kth time interval.
Based on the above technical solution, preferably, step S3 includes:
s31, determining B feature types according to the characteristics of standard communication data, and carrying out feature recombination according to the determined B feature types to obtain C feature combinations, wherein the feature recombination mode comprises feature operation, feature intersection and feature transformation;
s32, according to the designed C feature combinations and the determined B feature types, carrying out corresponding feature extraction on each standard communication data, namely extracting B+C features from each standard communication data, taking the B+C features as first features, and forming a first feature set by the first features of all the standard communication data.
Based on the above technical solution, preferably, step S4 includes:
s41, forming a matrix of B+C dimensions by the first features as a first feature matrix;
S42, performing redundancy removal on the first feature matrix by using a selection function to obtain a second feature, wherein the selection function is as follows:
F=Sigmoid(conv(fc(AP(X))))
in the formula, F is a feature screening function, sigmoid represents an activation function, conv represents convolution processing, fc represents full-connection layer processing, AP represents adaptive pooling processing, and X is a first feature matrix;
s43, forming all second features into a second feature set.
Based on the above technical solution, preferably, step S5 includes:
s51, acquiring a communication task list, wherein the communication task comprises a classification task, a clustering task and a recommendation task;
s52, when the communication task is a classification task, determining a classification target, selecting a feature corresponding to the required data type from the second feature set according to the classification target, and performing classification prediction on the classification feature by adopting a plurality of SVMs to obtain a classification result;
s53, when the communication task is a clustering task, determining a clustering target, determining the number of clusters according to the clustering target, selecting the characteristics corresponding to the required data types from the second characteristic set as clustering characteristics, performing k-means++ clustering analysis on the clustering characteristics according to the number of clusters to obtain the clustering labels of all the clusters, and taking the clusters and the clustering labels thereof as clustering results;
And S54, when the communication task is a recommendation task, confirming a recommendation target, selecting the feature corresponding to the required data type from the second feature set according to the recommendation target, and recommending the recommendation feature by adopting a recommendation algorithm to obtain a recommendation result, wherein the feature is used as the recommendation feature.
Based on the above technical solution, preferably, in step S54, the recommendation algorithm includes:
firstly, setting a recommendation characteristic as a node of a recommendation algorithm, putting the recommendation characteristic into an opening list, and evaluating the node in the opening list according to a correlation index of the recommendation characteristic to obtain an evaluation score, wherein the correlation index of the recommendation characteristic is obtained by mining the correlation between the recommendation characteristic and an adjacent recommendation characteristic according to a correlation rule;
step two, sorting the evaluation scores from large to small, selecting a node corresponding to the first evaluation score as a starting node of a recommendation algorithm, and placing the starting node into a closing list;
step three, calculating the weighted values of the evaluation scores of each node in the open list and all nodes in the closed list, selecting the node in the open list corresponding to the first weighted value according to the sequence from big to small, putting the node in the closed list, and updating the weighted value of the open list;
And step four, repeating the step three until the opening list is empty.
In another aspect, the present invention also provides a communication information processing apparatus based on big data analysis, the apparatus being configured to perform the method of any one of the above, the apparatus comprising:
the data acquisition module is internally provided with a communication database and is used for acquiring initial communication data of a user and storing the initial communication data into the communication database according to data types;
the data processing module is used for preprocessing initial communication data to obtain standard communication data, and storing the standard communication data into a communication database, wherein the preprocessing comprises repeated data deletion, missing data identification, missing value filling, abnormal data identification and repair;
the feature storage module is used for designing a feature combination of the standard communication data, extracting features of the standard communication data according to the feature combination to obtain a first feature set, screening the first feature set according to a screening rule to obtain a second feature set, and storing the first feature set and the second feature set;
the data analysis module is used for acquiring a communication task list, analyzing and processing the second feature set according to the communication tasks in the communication task list to obtain analysis results corresponding to the communication tasks, and storing the analysis results;
The visual display module is used for visualizing the analysis result and displaying the visualized analysis result and corresponding communication data to the user according to the user demand.
Compared with the prior art, the method has the following beneficial effects:
(1) According to the method, the communication data is deeply preprocessed, the usability of the communication data is improved, so that the communication data with low quality originally can be used after being processed, the waste of resources is avoided, customized analysis is performed based on task driving, corresponding analysis results are obtained according to different communication tasks, the pertinence and the practicability of the analysis are improved, visual display is performed, and a user is helped to better use the communication data;
(2) According to the invention, the standard communication data is finally obtained by encoding, de-duplication, missing value processing, anomaly detection and repair of the initial communication data, and the standard communication data is stored in the communication database, so that the quality and consistency of the data are improved, the integrity and accuracy of the communication data are improved, and the reliability of subsequent analysis and application is ensured;
(3) According to the invention, through feature recombination and extraction, feature types can be determined according to the characteristics of standard communication data, and features are recombined to obtain a new feature combination, which is helpful for extracting more representative and effective features, so that the features and modes of the communication data are better represented, and then redundancy removal is performed on the first feature matrix by using a selection function to obtain a second feature, which is helpful for reducing the dimension of the feature matrix, removing redundant information, improving the compactness and effectiveness of the features, and facilitating subsequent data processing and analysis;
(4) According to the invention, classification prediction, cluster analysis and recommendation effects are carried out according to different communication task types, so that a plurality of different data processing and analysis modes are provided for users, the application field of communication data is expanded, and the application value and practicability of the data are improved;
(5) The recommendation algorithm provided by the invention is beneficial to realizing personalized recommendation, selecting high-value nodes, dynamically updating the recommendation process and optimizing the recommendation result, improves the recommendation efficiency and accuracy, and enhances the satisfaction degree of users on the recommendation result, thereby improving the application value and user experience of data;
(6) The processing device provided by the invention utilizes the advantages of big data analysis to perform personalized analysis on the communication data, and provides personalized and accurate service for users, so that the device is expected to improve the efficiency of communication data processing, improve the processing capacity of unstructured data, and provide more intelligent and personalized communication service for users.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method according to an embodiment of the present invention;
fig. 2 is a block diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will clearly and fully describe the technical aspects of the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
As shown in fig. 1, first, the present invention provides a communication information processing method based on big data analysis, which includes the following steps:
s1, initial communication data of a user are collected and stored in a communication database according to data types;
s2, preprocessing the acquired initial communication data to obtain standard communication data, and storing the standard communication data into a communication database;
s3, designing a feature combination of standard communication data, and carrying out feature extraction on the standard communication data according to the feature combination to obtain a first feature set;
s4, screening the first feature set according to a screening rule to obtain a second feature set;
S5, acquiring a communication task list, and analyzing and processing the second feature set according to the communication tasks in the communication task list to obtain analysis results corresponding to the communication tasks;
and S6, visualizing the analysis result by using a visualization tool, and displaying the result to the user according to the user requirement.
Specifically, in an embodiment of the present invention, step S1 includes:
collecting initial communication data of a user refers to collecting various communication activities of the user on communication equipment, including telephone call records, short messages, mails, social media messages and the like. These data types include text, audio, video, etc. forms, which need to be stored according to their particular data type.
The communication database is a database specially used for storing user communication data, and needs a reasonable data structure to store different types of communication data. For example, the phone call record may include information such as call time, call duration, opposite party number, etc.; the short message can comprise information such as sending time, receiving time, content and the like; mail may include information about sender, recipient, subject, text, etc.
When initial communication data of a user is collected, the integrity and the accuracy of the data are required to be ensured, and meanwhile, the privacy information of the user is required to be protected, so that relevant laws and regulations and privacy policies are complied with. When storing communication data, the backup and recovery of the data, and the safety and reliability of the data are also required to be considered.
Specifically, in an embodiment of the present invention, step S2 includes:
s21, the acquired initial communication data are encoded, and a unique identifier of the initial communication data is obtained.
Specifically, the unique identifier is a combination of the device ID that collected the initial communication data and the time stamp at the time of collection.
S22, judging whether the initial communication data is repeated or not according to the unique identifier, and if the unique identifier is repeated, merging the corresponding initial communication data to obtain the first communication data.
Whether duplicate data exists is determined by comparing the combination of the device ID and the time stamp. If the unique identifier is repeated, the fact that repeated communication data exists is indicated, and corresponding data are needed to be combined to obtain first communication data.
S23 setting a first threshold delta 1 Second threshold delta 2 And a third threshold delta 3 Calculating the number N of missing characters in the first communication data 1 And is in contact with a first threshold delta 1 Second threshold delta 2 And a third threshold delta 3 And (3) judging:
if the number delta of the missing characters of the first communication data 1 <N 1 ≤δ 2 Classifying the first communication data into first data to be repaired;
if the number delta of the missing characters of the first communication data 2 <N 1 ≤δ 3 Classifying the first communication data into second data to be repaired;
If the number delta of the missing characters of the first communication data 3 <N 1 And classifying the first communication data into third data to be repaired.
Specifically, δ 123 These threeThe thresholds are set according to the specific situation, and are usually determined according to the number of the whole characters of the communication data, for example, a first threshold delta 1 I.e. the ratio is 10%, the second threshold delta 2 I.e. 30% of the total, third threshold delta 3 I.e. 50%.
S24, the first data to be repaired, the second data to be repaired and the third data to be repaired are respectively processed to obtain second communication data.
Specifically, step S24 includes:
deleting the first data to be repaired from the first communication data;
acquiring a time stamp of the second data to be repaired, taking the time stamp as a target time stamp, and respectively searching Y non-missing communication data forwards in the first communication data by taking the target time stamp as an origin pointBackward searching Y non-missing communication data +.>Calculate->And->Filling the second data to be repaired by using the weighted average value to obtain second repair data, wherein +_>Weight of (2) is less than +.>Weight of (2);
predicting the missing value of the third data to be repaired by adopting a pre-trained random forest model, and filling the missing value in the third data to be repaired according to the prediction result of the random forest model to obtain third repair data;
And updating the first communication data by using the second repair data and the third repair data to obtain second communication data.
Specifically, the first data to be repaired, i.e., the missing data, has a smaller proportion and has a smaller influence on the subsequent analysis, and the record containing the missing data can be deleted directly.
The proportion of the second data to be repaired, namely the missing data, is centered, and has a certain influence on the subsequent analysis, at this time, the filling value is calculated according to the adjacent non-missing communication data, the filling value can be a weighted average value, and the communication data is usually data with a time stamp, because the communication data records the occurrence time of a communication event. The timestamp may be a combination of date and time used to mark the point in time of the occurrence of the communication event. The time stamp of the communication data may be used to analyze patterns of communication behavior, timing, association with other variables, and the like. And for communication data, timeliness is important, so that non-missing data with a time stamp after the target time stamp should be given a larger weight.
For the case of more missing data or irregular pattern of missing data, i.e. the third data to be repaired, a pre-trained random forest model is used to predict the missing value, which may be implemented as follows:
Data preparation: first, a data set containing communication data needs to be prepared. It is ensured that both the characteristic value and the target value in the dataset are numerical. The data set is divided into a training set and a test set.
Feature selection and engineering: and selecting proper characteristics to perform model training according to the characteristics and field knowledge of the communication data. Feature engineering, such as feature scaling, feature combining, feature dimension reduction, etc., can be performed to improve the performance of the model.
Model training: training the training set using a random forest model. Random forests are an integrated learning algorithm that consists of multiple decision trees. In the training process, the random forest can randomly select characteristics and samples for training so as to improve the generalization capability of the model.
Model evaluation: and evaluating the trained random forest model by using the test set. The predictive performance of the model may be evaluated using evaluation metrics such as mean square error, accuracy, etc. If the model performs poorly, the parameters of the model are adjusted.
Missing value prediction: and predicting the missing value in the communication data by using the trained random forest model. For each missing value, using information of other relevant fields as input, an estimate of the missing value is predicted by a random forest model.
S25, carrying out anomaly identification on the second communication data by adopting an anomaly detection method to obtain anomaly data, and repairing the anomaly data to obtain third communication data.
Specifically, step S25 includes:
s251, traversing the second communication data, calculating a first distance between each second communication data and other second communication data, forming a distance sorting table according to the arrangement of the first distances from small to large, selecting the first m communication data as a neighbor set of the current second communication data, and taking the second communication data and the corresponding neighbor set as a relationship set.
Specifically, the first distance may be Jaccard similarity, which is a commonly used distance measurement method, and is suitable for aggregate data. For aggregate features in the communication data, jaccard similarity may be used to calculate the distance between data points.
S252 traversing the relation set, calculating a second distance d between each second communication data and its neighbor according to the first distance 2 And updating the second distance to the relation set, wherein the calculation formula of the second distance is as follows:
in the method, in the process of the invention,for the ith neighbor in the neighbor set of the current second communication data and the second communication dataSecond distance,/, of- >For the first distance between the ith neighbor in the neighbor set of the current second communication data and the second communication data,/the first distance is equal to the second distance>The method comprises the steps of setting a first distance between an ith neighbor in a neighbor set of current second communication data and an mth second communication data in a distance sorting table of the current second communication data;
s253, traversing the relation set, calculating the distance density of each second communication data according to the second distance, and updating the distance density into the relation set, wherein the calculation formula of the distance density is as follows:
wherein ρ is d For the distance density of the current second communication data,m is the sum of second distances of the current second communication data, and m is the number of neighbors in the neighbor set of the current second communication data;
s254, setting a density threshold, calculating the local density of each second communication data according to the distance density, and taking the second communication data with the local density larger than the density threshold as abnormal data, wherein the calculation formula of the local density is as follows:
in the method, in the process of the invention,sigma ρ is the local density of the current second communication data d A represents the number of second communication data, which is the sum of the distance densities of all the second communication data.
In particular, the method comprises the steps of,the larger the second communication data is, the more far from other data is indicated, if +. >The second communication data is considered to be abnormal data.
S255, acquiring the time stamp of the abnormal data, setting a time interval, sequentially and forwards acquiring n time-interval normal data in the second communication data by taking the time stamp of the abnormal data as an origin, and repairing the abnormal data by utilizing the n time-interval normal data to obtain third communication data. The formula for anomaly repair is as follows:
in the method, in the process of the invention,representing repaired abnormal data, x k (t) normal data representing the kth time interval, lambda k A weighting coefficient for normal data of a kth time interval;
wherein lambda is k The calculation formula of (2) is as follows:
in the method, in the process of the invention,is the initial assigned weight of normal data for the kth time interval, +.>Is the time attenuation term, r is the attenuation factor, t k Time value t of normal data for the kth time interval x For the time value corresponding to the abnormal data g k~x Is the correlation coefficient of normal data and corresponding abnormal data of the kth time interval.
S26, carrying out format conversion and normalization on the third communication data to obtain standard communication data.
And formatting the data according to the requirements to ensure the consistency and normalization of the data and obtain standard communication data. Such standard communication data can be more easily stored, analyzed and applied.
Specifically, in an embodiment of the present invention, step S3 includes:
s31, determining B feature types according to the characteristics of standard communication data, and carrying out feature recombination according to the determined B feature types to obtain C feature combinations, wherein the feature recombination mode comprises feature operation, feature intersection and feature transformation.
S32, according to the designed C feature combinations and the determined B feature types, carrying out corresponding feature extraction on each standard communication data, namely extracting B+C features from each standard communication data, taking the B+C features as first features, and forming a first feature set by the first features of all the standard communication data.
Taking b=4 and c=3 as an example, step S3 will be described:
determining the feature type: in determining the feature type, a proper feature type needs to be selected according to the characteristics and requirements of the communication data. In this embodiment, the feature types include:
time domain features: the extraction method of the features extracted based on the time sequence information of the communication data comprises the following steps: average value: and carrying out averaging operation on the time sequence of the communication data. Variance: and carrying out variance operation on the time sequence of the communication data. Maximum value: and finding out the maximum value in the communication data time sequence. Minimum value: and finding out the minimum value in the communication data time sequence.
Frequency domain characteristics: the extraction method of the features extracted based on the frequency spectrum information of the communication data comprises the following steps: spectral energy: the communication data is converted into a frequency domain through Fourier transformation, and the frequency spectrum energy is calculated. Spectral mean: the mean value of the spectrum is calculated. Spectral peak: the peak frequencies and corresponding energy values in the spectrum are found.
Statistical characteristics: the extraction method of the features extracted based on the statistical distribution information of the communication data comprises the following steps: the statistical characteristics of the mean, variance, skewness, kurtosis and the like can be directly calculated through the statistical distribution information of the communication data.
Filtering characteristics: based on the features extracted from the filtering result of the communication data, the features such as the mean value, the variance and the like after filtering can be obtained by performing filtering operation on the communication data and then calculating corresponding statistical features.
Selecting a proper feature type from the four determined feature types to carry out feature recombination, wherein the method specifically comprises the following steps:
and (3) characteristic operation: dividing the communication frequency characteristic by the communication duration characteristic to obtain an average communication frequency characteristic.
Feature crossover: and crossing the communication frequency characteristic and the communication duration characteristic to obtain the communication total quantity characteristic.
Feature transformation: and carrying out logarithmic transformation on the communication frequency characteristics to obtain logarithmic communication frequency characteristics.
The 7 extracted features can be combined into a feature vector, the 7 features are used as first features, and the first features of all standard communication data form a first feature set.
Specifically, in an embodiment of the present invention, step S4 includes:
s41, forming a matrix of the dimension B+C by the first features as a first feature matrix.
First, all the extracted first features are combined into a matrix according to a specific rule, and if 4 kinds of preliminary features and 3 kinds of recombined features are provided, the features are arranged in sequence to form a 7-dimensional matrix.
S42, performing redundancy removal on the first feature matrix by using a selection function to obtain a second feature, wherein the selection function is as follows:
F=Sigmoid(conv(fc(AP(X))))
in the formula, F is a feature screening function, sigmoid represents an activation function, conv represents convolution processing, fc represents full-connection layer processing, AP represents adaptive pooling processing, and X is a first feature matrix.
In this step, the first feature matrix is processed using a selection function to remove redundant information and extract more representative features. The function of the selection function is to process the input feature matrix through operations such as convolution, full connection layer and self-adaptive pooling, and finally to output the screened features through the Sigmoid activation function.
S43, forming all second features into a second feature set.
Specifically, in an embodiment of the present invention, step S5 includes:
s51, acquiring a communication task list, wherein the communication task comprises a classification task, a clustering task and a recommendation task;
s52, when the communication task is a classification task, determining a classification target, selecting a feature corresponding to the required data type from the second feature set according to the classification target, and performing classification prediction on the classification feature by adopting a plurality of SVMs to obtain a classification result;
s53, when the communication task is a clustering task, determining a clustering target, determining the number of clusters according to the clustering target, selecting the characteristics corresponding to the required data types from the second characteristic set as clustering characteristics, performing k-means++ clustering analysis on the clustering characteristics according to the number of clusters to obtain the clustering labels of all the clusters, and taking the clusters and the clustering labels thereof as clustering results;
and S54, when the communication task is a recommendation task, confirming a recommendation target, selecting the feature corresponding to the required data type from the second feature set according to the recommendation target, and recommending the recommendation feature by adopting a recommendation algorithm to obtain a recommendation result, wherein the feature is used as the recommendation feature.
In this embodiment, the communication tasks include a classification task, a clustering task and a recommendation task, and the task is used as a driver to perform analysis and processing, and the specific implementation process is as follows:
(1) The communication task is a classification task
Determining a classification target:
firstly, determining a classification target of a communication task, for example, classifying communication types according to characteristics of communication data, such as speaker identification in voice communication, emotion classification in text communication, and the like.
Selecting classification characteristics:
and selecting the characteristic corresponding to the required data type from the second characteristic set as the classification characteristic according to the determined classification target. These features should be those that distinguish the classification target, and may be selected by a fuzzy selection algorithm that may be combined with domain knowledge to avoid missing features of potential importance to the classification task, the specific fuzzy selection algorithm including: 1) The importance of each feature is calculated using the concepts of fuzzy sets and membership functions in fuzzy set theory. Fuzzy sets may help describe the degree of membership of a feature to a classified object and thus measure its importance. 2) For each feature, its degree of membership to the classification target, i.e. the degree of influence of that feature on the classification target, is determined. 3) A threshold for feature selection is set, which may be determined empirically or by knowledge in the field, for screening features with membership higher than the threshold. 4) And selecting the characteristic with high membership as a final classification characteristic according to the characteristic with membership higher than a set threshold.
Training a plurality of SVM classifiers:
the prepared data set is divided into a training set and a testing set, and the training and the evaluation are performed by adopting a cross-validation mode.
For a plurality of SVM classifiers, a one-to-one or one-to-many mode is adopted for multi-class classification.
For each SVM classifier, the training set is used to train, and parameters are adjusted to obtain the best performance.
And classifying and predicting the data in the test set by using a plurality of trained SVM classifiers.
And performing performance evaluation on the classification result, and performing evaluation by using the F1 value.
And according to the performance evaluation result, selecting classification characteristics, adjusting and optimizing parameters of the SVM classifier and the like so as to improve classification performance.
Classification prediction:
and taking the selected classification features as input data, and inputting the input data into a pre-trained SVM classifier to obtain a classification result.
Specifically, the purpose of the classification task is to identify spam messages, telephone nuisances and the like, so that the labels of the classification result are whether spam messages, telephone nuisances and the like.
(2) The communication task is a clustering task
Determining clustering targets and clustering quantity:
according to task requirements and data characteristics, determining how many clusters the data set is wished to be divided into, namely determining the clustering targets and the number of clusters.
Selecting a clustering feature:
and selecting the characteristic corresponding to the required data type from the second characteristic set as a clustering characteristic. These features will be used for the input of the clustering algorithm.
k-means++ cluster analysis:
and carrying out cluster analysis on the selected cluster features by using a k-means++ cluster algorithm. k-means++ iteratively divides the samples into k clusters such that the sum of the squares of the distances of each sample point to the center point of the cluster to which it belongs is minimized.
And (3) obtaining a clustering result:
the k-means++ clustering algorithm obtains cluster labels of each sample, and takes the clusters and the cluster labels thereof as a clustering result.
Specifically, the purpose of the clustering task is to find similar groups, such as similar pictures, similar contacts, similar senders of messages, etc., so that the clustering result is a description of the similar groups and their similar content.
(3) The communication task is a recommended task
Confirming a recommended target:
according to the service requirements and the user characteristics, the recommended targets, such as recommending movies, commodities, music and the like, are confirmed.
Selecting recommended features:
and selecting the characteristics corresponding to the required data types from the second characteristic set as recommended characteristics. These features will be used for input of the recommendation algorithm.
Recommendation algorithm:
step one, setting the recommended features as nodes of a recommendation algorithm, putting the recommended features into an opening list, and evaluating the nodes in the opening list according to the relevance indexes of the recommended features to obtain evaluation scores, wherein the relevance indexes of the recommended features are obtained by mining the relevance between the recommended features and adjacent recommended features according to a relevance rule.
Specifically, the association rule mining employs Apriori algorithm.
Step two, sorting the evaluation scores from large to small, selecting a node corresponding to the first evaluation score as a starting node of a recommendation algorithm, and placing the starting node into a closing list.
Step three, calculating the weighted values of the evaluation scores of each node in the open list and all nodes in the closed list, selecting the node in the open list corresponding to the first weighted value according to the sequence from big to small, putting the node in the closed list, and updating the weighted value of the open list.
Specifically, the weighted value calculation mode of the single node i in the open list is as follows:
wherein E is i To turn on the weighting value, y, of node i in the list j To turn off the evaluation score of the jth node in the list, x i To turn on the evaluation score for node i in the list, z is the number of nodes in the closed list, α j 、β i Is the weight.
And step four, repeating the step three until the opening list is empty.
Obtaining a recommended result:
the recommendation algorithm will obtain recommended content or product as recommendation results. These results may be personalized for the user based on the interests and needs of the user.
According to the embodiment, the features corresponding to the required data types are selected from the second feature set according to the recommendation targets, and the recommendation algorithm is adopted to recommend the recommendation features, so that recommendation results are obtained. These recommendation results may be used to provide personalized recommendation services to the user, improving user experience and satisfaction.
In addition, referring to fig. 2, the present invention further provides a communication information processing apparatus based on big data analysis, where the apparatus is configured to perform the method described in any one of the foregoing, and the apparatus includes:
the data acquisition module is internally provided with a communication database and is used for acquiring initial communication data of a user and storing the initial communication data into the communication database according to data types.
The communication data may include various forms of communication records such as short messages, phone call records, social media messages, and the like.
The data processing module is used for preprocessing the initial communication data to obtain standard communication data, and storing the standard communication data into the communication database, wherein the preprocessing comprises repeated data deletion, missing data identification, missing value filling, abnormal data identification and repair. Through these preprocessing steps, the accuracy and integrity of the communication data can be ensured.
The feature storage module is used for designing a feature combination of the standard communication data, extracting features of the standard communication data according to the feature combination to obtain a first feature set, screening the first feature set according to screening rules to obtain a second feature set, and storing the first feature set and the second feature set.
The data analysis module is used for acquiring a communication task list, analyzing and processing the second feature set according to the communication tasks in the communication task list, obtaining analysis results corresponding to the communication tasks, and storing the analysis results. These analysis results may include patterns of communication behavior, trends, anomalies, and so forth.
The visual display module is used for visualizing the analysis result and displaying the visualized analysis result and corresponding communication data to the user according to the user demand. Including presentation in the form of charts, statistics, trend analysis, etc., to help the user better understand the meaning and pattern of the communication data.
Specifically, the device has the functions of communication data collection, preprocessing, feature extraction and analysis and visual display, and provides a comprehensive communication data processing and analysis platform for users.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (10)

1. The communication information processing method based on big data analysis is characterized by comprising the following steps:
s1, initial communication data of a user are collected and stored in a communication database according to data types;
s2, preprocessing the acquired initial communication data to obtain standard communication data, and storing the standard communication data into a communication database;
s3, designing a feature combination of standard communication data, and carrying out feature extraction on the standard communication data according to the feature combination to obtain a first feature set;
s4, screening the first feature set according to a screening rule to obtain a second feature set;
s5, acquiring a communication task list, and analyzing and processing the second feature set according to the communication tasks in the communication task list to obtain analysis results corresponding to the communication tasks;
and S6, visualizing the analysis result by using a visualization tool, and displaying the result to the user according to the user requirement.
2. The communication information processing method based on big data analysis according to claim 1, wherein step S2 includes:
s21, coding the acquired initial communication data to obtain a unique identifier of the initial communication data;
s22, judging whether the initial communication data is repeated or not according to the unique identifier, and if the unique identifier is repeated, merging the corresponding initial communication data to obtain first communication data;
s23 setting a first threshold delta 1 Second threshold delta 2 And a third threshold delta 3 Calculating the number N of missing characters in the first communication data 1 And is in contact with a first threshold delta 1 Second threshold delta 2 And a third threshold delta 3 And (3) judging:
if the number delta of the missing characters of the first communication data 1 <N 1 ≤δ 2 Classifying the first communication data into first data to be repaired;
if the number delta of the missing characters of the first communication data 2 <N 1 ≤δ 3 Classifying the first communication data into second data to be repaired;
if the number delta of the missing characters of the first communication data 3 <N 1 Classifying the first communication data into third data to be repaired;
s24, respectively processing the first data to be repaired, the second data to be repaired and the third data to be repaired to obtain second communication data;
s25, carrying out anomaly identification on the second communication data by adopting an anomaly detection method to obtain anomaly data, and repairing the anomaly data to obtain third communication data;
S26, carrying out format conversion and normalization on the third communication data to obtain standard communication data.
3. The communication information processing method based on big data analysis according to claim 2, wherein step S24 includes:
deleting the first data to be repaired from the first communication data;
acquiring a time stamp of the second data to be repaired, taking the time stamp as a target time stamp, and respectively searching Y non-missing communication data forwards in the first communication data by taking the target time stamp as an origin pointBackward searching Y non-missing communication data +.>Calculation ofAnd->Filling the second data to be repaired by using the weighted average value to obtain second repair data, wherein +_>Weight of (2) is less than +.>Weight of (2);
predicting the missing value of the third data to be repaired by adopting a pre-trained random forest model, and filling the missing value in the third data to be repaired according to the prediction result of the random forest model to obtain third repair data;
and updating the first communication data by using the second repair data and the third repair data to obtain second communication data.
4. The communication information processing method based on big data analysis according to claim 2, wherein step S25 includes:
S251, traversing the second communication data, calculating a first distance between each second communication data and other second communication data, forming a distance sorting table according to the arrangement of the first distances from small to large, selecting the first m communication data as a neighbor set of the current second communication data, and taking the second communication data and the corresponding neighbor set as a relationship set;
s252 traversing the relation set, calculating a second distance d between each second communication data and its neighbor according to the first distance 2 And willUpdating the second distance to the relation set, wherein the calculation formula of the second distance is as follows:
in the method, in the process of the invention,is the second distance between the ith neighbor in the neighbor set of the current second communication data and the second communication data,for the first distance between the ith neighbor in the neighbor set of the current second communication data and the second communication data,/the first distance is equal to the second distance>The method comprises the steps of setting a first distance between an ith neighbor in a neighbor set of current second communication data and an mth second communication data in a distance sorting table of the current second communication data;
s253, traversing the relation set, calculating the distance density of each second communication data according to the second distance, and updating the distance density into the relation set, wherein the calculation formula of the distance density is as follows:
Wherein ρ is d For the distance density of the current second communication data,m is the sum of second distances of the current second communication data, and m is the number of neighbors in the neighbor set of the current second communication data;
s254, setting a density threshold, calculating the local density of each second communication data according to the distance density, and taking the second communication data with the local density larger than the density threshold as abnormal data, wherein the calculation formula of the local density is as follows:
in the method, in the process of the invention,sigma ρ is the local density of the current second communication data d A represents the number of second communication data for the sum of the distance densities of all the second communication data;
s255, acquiring the time stamp of the abnormal data, setting a time interval, sequentially and forwards acquiring n time-interval normal data in the second communication data by taking the time stamp of the abnormal data as an origin, and repairing the abnormal data by utilizing the n time-interval normal data to obtain third communication data.
5. The method for processing communication information based on big data analysis according to claim 4, wherein in step S255, the formula for anomaly repair is as follows:
in the method, in the process of the invention,representing repaired abnormal data, x k (t) normal data representing the kth time interval, lambda k A weighting coefficient for normal data of a kth time interval;
wherein lambda is k The calculation formula of (2) is as follows:
in the method, in the process of the invention,is the initial assigned weight of normal data for the kth time interval, +.>Is the time attenuation term, r is the attenuation factor, t k Time value t of normal data for the kth time interval x For the time value corresponding to the abnormal data g k~x Is the correlation coefficient of normal data and corresponding abnormal data of the kth time interval.
6. The communication information processing method based on big data analysis according to claim 1, wherein step S3 includes:
s31, determining B feature types according to the characteristics of standard communication data, and carrying out feature recombination according to the determined B feature types to obtain C feature combinations, wherein the feature recombination mode comprises feature operation, feature intersection and feature transformation;
s32, according to the designed C feature combinations and the determined B feature types, carrying out corresponding feature extraction on each standard communication data, namely extracting B+C features from each standard communication data, taking the B+C features as first features, and forming a first feature set by the first features of all the standard communication data.
7. The method for processing communication information based on big data analysis according to claim 6, wherein step S4 comprises:
S41, forming a matrix of B+C dimensions by the first features as a first feature matrix;
s42, performing redundancy removal on the first feature matrix by using a selection function to obtain a second feature, wherein the selection function is as follows:
F=Sigmoid(conv(fc(AP(X))))
in the formula, F is a feature screening function, sigmoid represents an activation function, conv represents convolution processing, fc represents full-connection layer processing, AP represents adaptive pooling processing, and X is a first feature matrix;
s43, forming all second features into a second feature set.
8. The communication information processing method based on big data analysis according to claim 1, wherein step S5 includes:
s51, acquiring a communication task list, wherein the communication task comprises a classification task, a clustering task and a recommendation task;
s52, when the communication task is a classification task, determining a classification target, selecting a feature corresponding to the required data type from the second feature set according to the classification target, and performing classification prediction on the classification feature by adopting a plurality of SVMs to obtain a classification result;
s53, when the communication task is a clustering task, determining a clustering target, determining the number of clusters according to the clustering target, selecting the characteristics corresponding to the required data types from the second characteristic set as clustering characteristics, performing k-means++ clustering analysis on the clustering characteristics according to the number of clusters to obtain the clustering labels of all the clusters, and taking the clusters and the clustering labels thereof as clustering results;
And S54, when the communication task is a recommendation task, confirming a recommendation target, selecting the feature corresponding to the required data type from the second feature set according to the recommendation target, and recommending the recommendation feature by adopting a recommendation algorithm to obtain a recommendation result, wherein the feature is used as the recommendation feature.
9. The method for processing communication information based on big data analysis according to claim 8, wherein in step S54, the recommendation algorithm includes:
firstly, setting a recommendation characteristic as a node of a recommendation algorithm, putting the recommendation characteristic into an opening list, and evaluating the node in the opening list according to a correlation index of the recommendation characteristic to obtain an evaluation score, wherein the correlation index of the recommendation characteristic is obtained by mining the correlation between the recommendation characteristic and an adjacent recommendation characteristic according to a correlation rule;
step two, sorting the evaluation scores from large to small, selecting a node corresponding to the first evaluation score as a starting node of a recommendation algorithm, and placing the starting node into a closing list;
step three, calculating the weighted values of the evaluation scores of each node in the open list and all nodes in the closed list, selecting the node in the open list corresponding to the first weighted value according to the sequence from big to small, putting the node in the closed list, and updating the weighted value of the open list;
And step four, repeating the step three until the opening list is empty.
10. A communication information processing apparatus based on big data analysis, characterized in that the apparatus is adapted to perform the method of any of claims 1-9, the apparatus comprising:
the data acquisition module is internally provided with a communication database and is used for acquiring initial communication data of a user and storing the initial communication data into the communication database according to data types;
the data processing module is used for preprocessing initial communication data to obtain standard communication data, and storing the standard communication data into a communication database, wherein the preprocessing comprises repeated data deletion, missing data identification, missing value filling, abnormal data identification and repair;
the feature storage module is used for designing a feature combination of the standard communication data, extracting features of the standard communication data according to the feature combination to obtain a first feature set, screening the first feature set according to a screening rule to obtain a second feature set, and storing the first feature set and the second feature set;
the data analysis module is used for acquiring a communication task list, analyzing and processing the second feature set according to the communication tasks in the communication task list to obtain analysis results corresponding to the communication tasks, and storing the analysis results;
The visual display module is used for visualizing the analysis result and displaying the visualized analysis result and corresponding communication data to the user according to the user demand.
CN202311513146.1A 2023-11-13 2023-11-13 Communication information processing method and device based on big data analysis Active CN117614845B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311513146.1A CN117614845B (en) 2023-11-13 2023-11-13 Communication information processing method and device based on big data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311513146.1A CN117614845B (en) 2023-11-13 2023-11-13 Communication information processing method and device based on big data analysis

Publications (2)

Publication Number Publication Date
CN117614845A true CN117614845A (en) 2024-02-27
CN117614845B CN117614845B (en) 2024-05-10

Family

ID=89947250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311513146.1A Active CN117614845B (en) 2023-11-13 2023-11-13 Communication information processing method and device based on big data analysis

Country Status (1)

Country Link
CN (1) CN117614845B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256964A (en) * 2020-10-22 2021-01-22 重庆邮电大学 Financial institution potential customer recommendation method based on multi-dimensional data learning
WO2021169116A1 (en) * 2020-02-29 2021-09-02 平安科技(深圳)有限公司 Intelligent missing data filling method, apparatus and device, and storage medium
CN113657545A (en) * 2021-08-30 2021-11-16 平安医疗健康管理股份有限公司 Method, device and equipment for processing user service data and storage medium
CN114186121A (en) * 2021-02-02 2022-03-15 寿带鸟信息科技(苏州)有限公司 Mixed recommendation algorithm system based on service record
CN115712780A (en) * 2022-11-04 2023-02-24 深圳数字动能信息技术有限公司 Information pushing method and device based on cloud computing and big data
CN116401459A (en) * 2023-04-20 2023-07-07 陕西中睿荣晟信息科技有限公司 Internet information processing method, system and recording medium
CN116578761A (en) * 2023-05-18 2023-08-11 图林科技(深圳)有限公司 Deep learning-based big data intelligent analysis method
CN116579842A (en) * 2023-07-13 2023-08-11 南开大学 Credit data analysis method and system based on user behavior data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021169116A1 (en) * 2020-02-29 2021-09-02 平安科技(深圳)有限公司 Intelligent missing data filling method, apparatus and device, and storage medium
CN112256964A (en) * 2020-10-22 2021-01-22 重庆邮电大学 Financial institution potential customer recommendation method based on multi-dimensional data learning
CN114186121A (en) * 2021-02-02 2022-03-15 寿带鸟信息科技(苏州)有限公司 Mixed recommendation algorithm system based on service record
CN113657545A (en) * 2021-08-30 2021-11-16 平安医疗健康管理股份有限公司 Method, device and equipment for processing user service data and storage medium
CN115712780A (en) * 2022-11-04 2023-02-24 深圳数字动能信息技术有限公司 Information pushing method and device based on cloud computing and big data
CN116401459A (en) * 2023-04-20 2023-07-07 陕西中睿荣晟信息科技有限公司 Internet information processing method, system and recording medium
CN116578761A (en) * 2023-05-18 2023-08-11 图林科技(深圳)有限公司 Deep learning-based big data intelligent analysis method
CN116579842A (en) * 2023-07-13 2023-08-11 南开大学 Credit data analysis method and system based on user behavior data

Also Published As

Publication number Publication date
CN117614845B (en) 2024-05-10

Similar Documents

Publication Publication Date Title
JP2005523533A (en) Processing mixed numeric and / or non-numeric data
Koh et al. Rare association rule mining and knowledge discovery: technologies for infrequent and critical event detection: Technologies for infrequent and critical event detection
CN108733791B (en) Network event detection method
CN113887219B (en) Hot line public opinion identification and early warning method and system for competent department
GB2418499A (en) Information analysis arrangement
CN111738843B (en) Quantitative risk evaluation system and method using running water data
CN111951104A (en) Risk conduction early warning method based on associated graph
CN113205134A (en) Network security situation prediction method and system
CN116823496A (en) Intelligent insurance risk assessment and pricing system based on artificial intelligence
Pednekar et al. Crime rate prediction using KNN
CN114528405A (en) Public opinion monitoring method based on network burst hotspot
Wang et al. CPB: a classification-based approach for burst time prediction in cascades
Khatun et al. Data mining technique to analyse and predict crime using crime categories and arrest records
CN116702059B (en) Intelligent production workshop management system based on Internet of things
CN117614845B (en) Communication information processing method and device based on big data analysis
CN112199388A (en) Strange call identification method and device, electronic equipment and storage medium
CN108763242B (en) Label generation method and device
Keneshloo et al. Predicting the shape and peak time of news article views
Sukhija et al. Spatial and temporal trends reveal: Hotspot identification of crimes using machine learning approach
Al-Shalabi Perceptions of crime behavior and relationships: rough set based approach
Acosta et al. Characterization of disaster related tweets according to its urgency: a pattern recognition
Long et al. Automated crisis content categorization for covid-19 tweet streams
Punjabi et al. Forensic Intelligence-Combining Artificial Intelligence with Digital Forensics
Prasad et al. Analysis and prediction of crime against woman using machine learning techniques
Kumar Social media analytics for crisis response

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant