CN118152383A - Big data real-time analysis processing method - Google Patents

Big data real-time analysis processing method Download PDF

Info

Publication number
CN118152383A
CN118152383A CN202410202029.1A CN202410202029A CN118152383A CN 118152383 A CN118152383 A CN 118152383A CN 202410202029 A CN202410202029 A CN 202410202029A CN 118152383 A CN118152383 A CN 118152383A
Authority
CN
China
Prior art keywords
data
real
processing
analysis
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410202029.1A
Other languages
Chinese (zh)
Inventor
陈星栋
郭浩哲
蒙圣光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Fastersoft Software Co ltd
Original Assignee
Guangdong Fastersoft Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Fastersoft Software Co ltd filed Critical Guangdong Fastersoft Software Co ltd
Priority to CN202410202029.1A priority Critical patent/CN118152383A/en
Publication of CN118152383A publication Critical patent/CN118152383A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Fuzzy Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data processing, and discloses a real-time analysis processing method for big data, which collects a large amount of structured and unstructured data from various data sources, and performs cleaning and preprocessing, including removing duplicate data, processing missing values and converting data formats. The preprocessed data is stored in the data storage system and transmitted from the data source to the data processing engine using the stream processing engine. Real-time analysis and mining algorithms are applied in the data processing engine to extract useful information from the data. Finally, the analysis results are presented to the relevant personnel in a form that is easy to understand and decision-making using data visualization tools, thereby supporting real-time decision-making and execution. The technical scheme has the beneficial effects of real-time performance, accuracy, visual display, comprehensive application and expansibility, provides high-efficiency, accurate and visual big data analysis and mining support for users, and helps the users make more intelligent decisions and optimize business processes.

Description

Big data real-time analysis processing method
Technical Field
The invention relates to the technical field of data analysis and processing, in particular to a real-time big data analysis and processing method.
Background
With the continuous development of society and economy, the application of big data is becoming more and more common. The advent of big data analysis processing technology has allowed businesses and organizations to extract useful information from big data to aid in decision making, optimize business processes, and the like. The existing big data analysis processing method comprises batch processing and real-time processing. Disadvantages of batch processing methods: time delay: the batch processing needs to wait for the data to accumulate to a certain amount and then process, so that the analysis result is delayed, and the real-time feedback of the real-time data cannot be realized. Data transmission and storage overhead: batch processing requires the transfer of large amounts of data from a data source to a data processing engine and the storage of the processing results, resulting in network burden and waste of storage resources. Not adapting to dynamic scenarios: batch processing is difficult to process dynamically-changed data, and cannot meet application scenes with high requirements on real-time performance. The disadvantage of the real-time processing method is that: the processing complexity is high: real-time processing requires real-time calculation and mining in a data stream, and has high requirements on calculation resources and algorithms and relatively high processing complexity. Data quality problem: real-time processing has higher requirements on the real-time performance and accuracy of data, but in reality, sensor data, text data and the like often have noise and missing values, and have higher requirements on the quality of the data. Results are difficult to visualize: the results generated by real-time processing are generally real-time data streams, and how to intuitively display the results to users and effectively communicate becomes challenging.
In summary, the method for real-time analysis and processing of big data overcomes the defects of the prior art by comprehensively utilizing various technical means, so that real-time analysis and mining can be efficiently performed in a big data environment facing complex changes, and results are visually displayed to users, thereby providing more timely, accurate and visual data analysis support for the users.
Disclosure of Invention
In order to achieve the above purpose, the present invention provides the following technical solutions:
a real-time analysis processing method for big data comprises the following steps:
S1, collecting a large amount of data from various data sources, including structured data and unstructured data;
S2: cleaning and preprocessing the collected data, including removing duplicate data, processing missing values, and converting data formats;
S3: the data storage system is utilized to store the preprocessed data;
S4: transmitting data from the data source to the data processing engine using the stream processing engine;
s5: data analysis and mining: applying appropriate real-time analysis and mining algorithms in the data processing engine to extract useful information from the data;
s6: an interactive chart or dashboard is created using a data visualization tool to visually present the analysis results to the user.
As a preferable technical scheme of the invention, the structured data in the S1 comprises data in a database and sensor data, and the unstructured data comprises text data, image data and audio data.
As a preferable technical scheme of the invention, the data storage system in the S3 adopts a distributed file system or a distributed database.
As a preferable technical scheme of the invention, the S4 stream processing engine adopts one or more of APACHE KAFKA, APACHE FLINK or Storm.
5. The real-time big data analyzing and processing method according to claim 1, wherein the method comprises the following steps: the S5 real-time analysis and mining algorithm adopts a machine learning algorithm, an image processing algorithm or a text analysis algorithm.
As a preferable technical scheme of the invention, the S6 data visualization tool adopts one or more of Tableau or Power BI.
As a preferable technical scheme of the invention, in S2
The duplicate data is removed:
the formula is used: data=data.drop_ duplicates ();
The processing missing values:
deleting a record/run of use formula containing a missing value: data=data.dropana ();
the specified value replaces the missing value using the formula: data=data.filena (value);
Filling the missing values using interpolation methods using the formula by estimating the missing values from the trend of the known data: data=data.interface ();
The conversion data format:
Converting the data type of the designated column into a new type new_type uses the formula: data [ column ] = data [ column ]. Astype (new_type).
As a preferred technical solution of the present invention, the machine learning algorithm:
Linear regression: the formula is y=w1×1+w2×2+ + wn×n+b, where y is the predicted variable, x1, x2, xn is the input variable, w1, w2, wn is the weight, and b is the deviation;
decision tree: constructing a decision tree model according to the information gain or the base index of the features, and using the decision tree model for classification and regression tasks;
support Vector Machine (SVM): the formula is y=sign (w≡t x+b), where y is the prediction result, x is the input sample, w is the weight vector, and b is the bias;
k means clustering: distributing data points into K clusters through iteration, so that the distance between each data point and the mass center in the cluster to which each data point belongs is minimized;
deep learning algorithm: including a combination of linear combinations and various activation functions, one or more of ReLU, sigmoid, or Softmax;
The image processing algorithm comprises:
And (3) image filtering: convolving the image with a filter;
Image segmentation: one or more of threshold segmentation, edge detection, or region growing are employed;
feature extraction: extracting image features using one or more of SIFT, HOG, CNN methods;
image recognition and classification: classifying the extracted image features by using a classifier, wherein the algorithm comprises one or more of a support vector machine or a convolutional neural network;
the text analysis algorithm includes one or more of a bag of words model, TF-IDF, topic model, or text classification algorithm.
As a preferable technical scheme of the invention, the filter comprises one or more of a Gaussian filter and a median filter.
Advantageous effects
Compared with the prior art, the invention provides an online psychological consultation system based on big data and an implementation method thereof, which have the following beneficial effects:
the beneficial effects of the technical scheme can be summarized as follows:
1. Real-time performance: the technical scheme can process a large amount of structured and unstructured data in real time, realizes the real-time analysis and mining of the real-time data, greatly shortens the response time, and improves the efficiency of decision making and service optimization.
2. Accuracy: through the steps of data cleaning, preprocessing, real-time analysis and the like, the technical scheme can process the data quality problems such as noise and missing values, improves the accuracy and the reliability of data, and ensures the reliability of analysis results.
3. Visual display: by utilizing data visualization tools, such as interactive charts or dashboards, the technical scheme can display analysis results to users in an intuitive and easy-to-understand manner, and help the users to better understand and apply the analysis results, thereby supporting decision making and business optimization.
4. Comprehensive application: according to the technical scheme, the stream processing engine, the big data storage system and various analysis and mining algorithms are comprehensively utilized, so that data analysis and mining can be efficiently realized in a big data environment with complex changes, and comprehensive and deep data insight is provided.
5. And (3) expansibility: the technical scheme adopts an open architecture and popular technical components, such as APACHE KAFKA, APACHE FLINK, a machine learning algorithm and the like, has good expansibility, and can support the ever-increasing data volume and application requirements.
The technical scheme has the beneficial effects of real-time performance, accuracy, visual display, comprehensive application and expansibility, provides high-efficiency, accurate and visual big data analysis and mining support for users, and helps the users make more intelligent decisions and optimize business processes.
Drawings
Fig. 1 is a flow chart of a real-time analysis processing method for big data according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a real-time analysis processing method for big data involves the following specific steps:
① And (3) data collection: a large amount of data is collected from various data sources, including structured data (e.g., databases, sensor data, etc.) and unstructured data (e.g., text, images, audio, etc.).
② Data cleaning and pretreatment: the collected data is cleaned and preprocessed, including removing duplicate data, processing missing values, converting data formats, etc., to ensure data quality and consistency of data formats.
Data cleansing and preprocessing are very important steps in real-time analysis processing of large data, and the following are examples of some common data cleansing and preprocessing operations:
duplicate data is removed:
the formula is used: data=data.drop_ duplicates ()
This operation will remove duplicate data in the dataset, leaving a unique record.
Processing the missing values:
the formula is used: data=data.dropna ()
This operation will delete the record/row containing the missing value.
The formula is used: data=data.filna (value)
This operation will replace the missing value with the specified value (value).
The formula is used: data=data.interface ()
This operation will fill in missing values using interpolation methods, by estimating missing values from the trend of the known data.
Converting the data format:
The formula is used: data [ column ] =data [ column ]. Astype (new_type)
This operation converts the data type of the specified column (column) into a new type (new_type), for example, converting a character string into an integer, or converting a date into a time stamp.
Data formatting and normalization:
The formula is used: data [ column ] =data [ column ]. Apply (function)
The data in the specified columns (columns) may be formatted or normalized using custom functions (functions), such as converting text to lower case, removing special characters, etc.
The formula is used: data [ column ] = (data [ column ] -mean)/std
This operation normalizes the column of data (column) by subtracting the mean (mean) and dividing by the standard deviation (std) to make the data appear as a standard normal distribution.
③ And (3) data storage: a data storage system suitable for real-time processing, such as a distributed file system (e.g., HDFS) or a distributed database (e.g., HBase, cassandra), is selected to accommodate large-scale data.
④ Real-time data stream processing: data is transferred from the data source to the data processing engine using a stream processing engine (e.g., APACHE KAFKA, APACHE FLINK, storm, etc.).
⑤ Data analysis and mining: appropriate real-time analysis and mining algorithms, such as machine learning algorithms, image processing algorithms, text analysis algorithms, etc., are applied in the data processing engine to extract useful information from the data. Machine learning algorithms, image processing algorithms, and text analysis algorithms are very broad fields, and specific formulas and algorithms may vary from case to case. The following is a brief description of some examples of algorithms common in these fields and their associated formulas:
Machine learning algorithm:
Linear regression: the formula y=w1×1+w2×2+ + wn×n+b, where y is the predicted variable, x1, x2, xn is the input variable, w1, w2, wn is the weight, and b is the deviation (intercept).
Decision tree: and constructing a decision tree model according to the information gain or the base index of the features, and using the decision tree model for classification and regression tasks.
Support Vector Machine (SVM): the formula is y=sign (w≡t x+b), where y is the prediction result, x is the input sample, w is the weight vector, and b is the deviation (intercept).
K means clustering: the data points are iteratively assigned into K clusters such that the distance of each data point from the centroid in the cluster to which it belongs is minimized.
Deep learning algorithms (such as neural networks): multiple levels of neurons and weights are involved, where the output of each neuron is determined by the weighted sum of its inputs plus the result of the bias passing through the activation function. The formulas may include a combination of linear combinations and various activation functions, such as ReLU, sigmoid, softmax, etc.
Image processing algorithm:
And (3) image filtering: the image is convolved using filters, common filters including gaussian filters, median filters, etc.
Image segmentation: common algorithms include threshold segmentation, edge detection, region growing, etc., to divide an image into different regions or objects.
Feature extraction: features in the image are extracted for further analysis and classification, such as extracting image features using SIFT, HOG, CNN or the like.
Image recognition and classification: the extracted image features are classified using a classifier, and common algorithms include a Support Vector Machine (SVM), a Convolutional Neural Network (CNN), and the like.
Text analysis algorithm:
Bag of Words model (Bag-of-Words): text is represented as a vector or matrix of occurrences of each word, commonly used for text classification and emotion analysis.
TF-IDF (word frequency-inverse document frequency): the importance of each word is calculated, and the importance degree of the word is quantized by combining the indexes of word frequency and inverse document frequency.
Topic model (e.g. LDA): text data is considered to be made up of a plurality of topics, and the topic distribution of the document and the word distribution of the topics are inferred by statistical methods.
Text classification algorithms (e.g., naive bayes, support vector machines): training is performed based on the text features and class labels for classifying new text into predefined classes.
It should be noted that these algorithms and formulas are just some common examples in these fields, more complex algorithms and formulas may be used in practice, and each algorithm may be subject to different variations and modifications. Specific algorithm selections and formulas should be determined based on specific questions and data, in conjunction with corresponding machine learning, image processing, or text analysis libraries.
⑥ Visualization and reporting: the analysis results are visually presented to the user so that the user can understand and utilize the results. Interactive charts, dashboards, etc. may be created using data visualization tools (e.g., tableau, power BI).
⑦ Real-time feedback and decision support: and feeding back the analysis result to related personnel in real time, and supporting real-time decision making. For example, in the e-commerce field, real-time analysis results may be utilized to implement personalized recommendations, anti-fraud measures, and the like.
To achieve real-time decisions, the following steps may be considered:
And (3) data real-time acquisition: real-time performance of data sources is ensured, including data collection from sensors, system logs, user interactions, and the like. Data may be collected and processed in real-time using streaming data processing techniques such as APACHE KAFKA, APACHE FLINK, and the like.
Real-time data processing and analysis: real-time data processing techniques are used to analyze the collected data in real-time, such as streaming data processing tools, complex Event Processing (CEP) engines, and the like. These techniques may apply various machine learning, image processing, text analysis algorithms, and provide real-time data processing and insight.
And (3) constructing a decision model: based on the real-time data and the analysis result, a corresponding decision model is constructed. The model can be a prediction model based on a machine learning algorithm, a real-time monitoring system, an intelligent recommendation system and the like. And the accuracy and the instantaneity of the decision model are ensured.
Decision feedback and real-time push: and feeding back the real-time analysis result and the output of the decision model to related personnel. This may be accomplished in various ways, such as a real time report, dashboard, mobile application, etc. The real-time decision result can also be transmitted to related personnel by means of mail, short message, instant message and the like by using notification and pushing technology.
Automated execution and integration: for decisions that can be automatically performed, they can be integrated into a real-time decision system or workflow to automatically perform the decisions in real-time. This may be achieved by automation tools, process management software, robotic Process Automation (RPA), etc.
⑧ Monitoring and optimizing: the performance and effect of the data processing flow are continuously monitored, and optimization and adjustment are carried out according to the requirement, so that a better real-time analysis processing effect is achieved.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (9)

1. A real-time analysis processing method for big data is characterized in that: the method comprises the following steps:
S1, collecting a large amount of data from various data sources, including structured data and unstructured data;
S2: cleaning and preprocessing the collected data, including removing duplicate data, processing missing values, and converting data formats;
S3: the data storage system is utilized to store the preprocessed data;
S4: transmitting data from the data source to the data processing engine using the stream processing engine;
s5: data analysis and mining: applying real-time analysis and mining algorithms in the data processing engine to extract useful information from the data;
s6: an interactive chart or dashboard is created using a data visualization tool to visually present the analysis results to the user.
2. The real-time big data analyzing and processing method according to claim 1, wherein the method comprises the following steps: the structured data in S1 includes data in a database and sensor data, and the unstructured data includes text data, image data and audio data.
3. The real-time big data analyzing and processing method according to claim 1, wherein the method comprises the following steps: and S3, the data storage system adopts a distributed file system or a distributed database.
4. The real-time big data analyzing and processing method according to claim 1, wherein the method comprises the following steps: the S4 stream processing engine adopts one or more of APACHE KAFKA, APACHE FL INK or Storm.
5. The real-time big data analyzing and processing method according to claim 1, wherein the method comprises the following steps: the S5 real-time analysis and mining algorithm adopts a machine learning algorithm, an image processing algorithm or a text analysis algorithm.
6. The method for real-time analysis and processing of big data according to claim 1, wherein the S6 data visualization tool adopts one or more of Tableau and Power BI.
7. The real-time big data analyzing and processing method according to claim 1, wherein the method comprises the following steps: in the S2
The duplicate data is removed:
the formula is used: data=data.drop_ duplicates ();
The processing missing values:
deleting a record/run of use formula containing a missing value: data=data.dropana ();
the specified value replaces the missing value using the formula: data=data.filena (value);
Filling the missing values using interpolation methods using the formula by estimating the missing values from the trend of the known data: data=data.interface ();
The conversion data format:
Converting the data type of the designated column into a new type new_type uses the formula: data [ column ] = data [ column ]. Astype (new_type).
8. The real-time big data analyzing and processing method according to claim 5, wherein the method comprises the following steps: the machine learning algorithm:
Linear regression: the formula is y=w1×1+w2×2+ + wn×n+b, where y is the predicted variable, x1, x2, xn is the input variable, w1, w2, wn is the weight, and b is the deviation;
decision tree: constructing a decision tree model according to the information gain or the base index of the features, and using the decision tree model for classification and regression tasks;
support Vector Machine (SVM): the formula is y=sign (w≡t x+b), where y is the prediction result, x is the input sample, w is the weight vector, and b is the bias;
k means clustering: distributing data points into K clusters through iteration, so that the distance between each data point and the mass center in the cluster to which each data point belongs is minimized;
deep learning algorithm: including a combination of linear combinations and various activation functions, one or more of ReLU, sigmoid, or Softmax;
The image processing algorithm comprises:
And (3) image filtering: convolving the image with a filter;
Image segmentation: one or more of threshold segmentation, edge detection, or region growing are employed;
feature extraction: extracting image features using one or more of SIFT, HOG, CNN methods;
image recognition and classification: classifying the extracted image features by using a classifier, wherein the algorithm comprises one or more of a support vector machine or a convolutional neural network;
the text analysis algorithm includes one or more of a bag of words model, TF-IDF, topic model, or text classification algorithm.
9. The real-time big data analyzing and processing method according to claim 8, wherein the method comprises the following steps: the filter comprises one or more of a gaussian filter and a median filter.
CN202410202029.1A 2024-02-23 2024-02-23 Big data real-time analysis processing method Pending CN118152383A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410202029.1A CN118152383A (en) 2024-02-23 2024-02-23 Big data real-time analysis processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410202029.1A CN118152383A (en) 2024-02-23 2024-02-23 Big data real-time analysis processing method

Publications (1)

Publication Number Publication Date
CN118152383A true CN118152383A (en) 2024-06-07

Family

ID=91297858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410202029.1A Pending CN118152383A (en) 2024-02-23 2024-02-23 Big data real-time analysis processing method

Country Status (1)

Country Link
CN (1) CN118152383A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704545A (en) * 2017-11-08 2018-02-16 华东交通大学 Railway distribution net magnanimity information method for stream processing based on Storm Yu Kafka message communicatings
CN117151345A (en) * 2023-10-30 2023-12-01 智唐科技(北京)股份有限公司 Enterprise management intelligent decision platform based on AI technology
CN117493412A (en) * 2023-10-26 2024-02-02 北京红山信息科技研究院有限公司 Digital base operation management system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704545A (en) * 2017-11-08 2018-02-16 华东交通大学 Railway distribution net magnanimity information method for stream processing based on Storm Yu Kafka message communicatings
CN117493412A (en) * 2023-10-26 2024-02-02 北京红山信息科技研究院有限公司 Digital base operation management system
CN117151345A (en) * 2023-10-30 2023-12-01 智唐科技(北京)股份有限公司 Enterprise management intelligent decision platform based on AI technology

Similar Documents

Publication Publication Date Title
CN109767255B (en) Method for realizing intelligent operation and accurate marketing through big data modeling
CN106407278B (en) Architecture design system of big data platform
CN111240662A (en) Spark machine learning system and learning method based on task visual dragging
CN103678670B (en) Micro-blog hot word and hot topic mining system and method
CN117009524B (en) Internet big data analysis method and system based on public opinion emotion analysis
CN111026870A (en) ICT system fault analysis method integrating text classification and image recognition
Zhang Application of data mining technology in digital library.
CN116501779A (en) Big data mining analysis system for real-time feedback
CN111221881B (en) User characteristic data synthesis method and device and electronic equipment
Kumar et al. Knowledge discovery from data mining techniques
CN116468460A (en) Consumer finance customer image recognition system and method based on artificial intelligence
CN115809229A (en) Evaluation management method and system based on multi-dimensional data attributes
CN116756688A (en) Public opinion risk discovery method based on multi-mode fusion algorithm
Sulhi Data mining technology used in an Internet of Things-based decision support system for information processing intelligent manufacturing
CN113674846A (en) Hospital intelligent service public opinion monitoring platform based on LSTM network
CN111708919A (en) Big data processing method and system
CN116883065A (en) Merchant risk prediction method and device
CN118152383A (en) Big data real-time analysis processing method
CN112905845B (en) Multi-source unstructured data cleaning method for discrete intelligent manufacturing application
CN115080636A (en) Big data analysis system based on network service
CN113379529A (en) Collaborative decision engine application framework
CN107358494A (en) A kind of client requirement information method for digging based on big data
Mouyassir et al. Business intelligence model to analyze social media through big data analytics
Sinha Big Data Analysis: Concepts, Challenges and Opportunities
Sayeed et al. Smartic: A smart tool for Big Data analytics and IoT

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination