CN117971808A - Intelligent construction method for enterprise data standard hierarchical relationship - Google Patents

Intelligent construction method for enterprise data standard hierarchical relationship Download PDF

Info

Publication number
CN117971808A
CN117971808A CN202410235935.1A CN202410235935A CN117971808A CN 117971808 A CN117971808 A CN 117971808A CN 202410235935 A CN202410235935 A CN 202410235935A CN 117971808 A CN117971808 A CN 117971808A
Authority
CN
China
Prior art keywords
data
model
classification
enterprise
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410235935.1A
Other languages
Chinese (zh)
Other versions
CN117971808B (en
Inventor
段效亮
段莹莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Hanruan Information Technology Co ltd
Original Assignee
Shandong Hanruan Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Hanruan Information Technology Co ltd filed Critical Shandong Hanruan Information Technology Co ltd
Priority to CN202410235935.1A priority Critical patent/CN117971808B/en
Publication of CN117971808A publication Critical patent/CN117971808A/en
Application granted granted Critical
Publication of CN117971808B publication Critical patent/CN117971808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data management, in particular to an intelligent construction method for enterprise data standard hierarchical relationship, which comprises the following steps: s1: collecting raw data from various data sources within the enterprise; s2: cleaning and standardizing the data in the step S1; s3: deep classification is carried out on the data, so that the accuracy and the fine granularity of a classification result are ensured; s4: dynamically constructing and adjusting the hierarchical relationship between the data; s5: evaluating the data hierarchy constructed in the step S4 through a preset performance index; s6: outputting the optimized data standard hierarchical relationship in a standardized format. The invention realizes the intelligent management of enterprise data by introducing a deep learning technology and a graph rolling network, improves the efficiency and accuracy of data processing, supports the dynamic optimization of a data structure and the decision making of enterprises, and provides a set of high-efficiency data management solution with strong adaptability and support of data management and decision making for enterprises.

Description

Intelligent construction method for enterprise data standard hierarchical relationship
Technical Field
The invention relates to the technical field of data management, in particular to an intelligent construction method for enterprise data standard hierarchical relationship.
Background
In today's data-driven business environments, businesses continue to accumulate large amounts of data from a variety of sources, including but not limited to customer transaction records, operation logs, market analysis reports, etc., whose effective management and utilization is critical to the business' policy making, operational optimization, and decision support, however, due to the diversity of data sources and the vast amount of data, businesses face challenges in how to effectively organize, manage, and utilize such data, traditional data management methods often fail to meet the demands for data dynamics, complexity, and hierarchical management, which results in the formation of data islands, reducing the accessibility and utilization efficiency of the data.
Against the background, the present invention needs to solve several key technical problems, firstly, how to automatically extract and identify useful information from multi-source heterogeneous data and effectively classify and organize the information, secondly, how to dynamically construct and adjust hierarchical relationships between data to adapt to changing demands of enterprises along with continuous increase of data volume, and further, how to output and integrate the data hierarchical structures into an enterprise data management system in a standardized manner, thereby supporting enterprise data management, analysis and decision processes, and also facing challenges.
Disclosure of Invention
Based on the above purpose, the invention provides an intelligent construction method for the hierarchical relationship of the enterprise data standard.
The intelligent construction method of the enterprise data standard hierarchical relationship comprises the following steps:
S1: collecting original data from various data sources in an enterprise through an efficient data interface technology, and identifying the content and the attribute of the original data by utilizing a natural language processing technology;
s2: cleaning and standardizing the data in the S1, and increasing the diversity of the data by adopting a data enhancement technology;
S3: deep classification is carried out on data based on the data attribute identified by the S1 and the data characteristic enhanced by the S2 by utilizing an improved deep learning algorithm, so that the accuracy and the fine granularity of a classification result are ensured;
S4: based on the graph rolling network model, dynamically constructing and adjusting the hierarchical relationship between the data according to the S3 classification result and the association between the data;
S5: evaluating the data hierarchy constructed in the step S4 through a preset performance index, and continuously optimizing the hierarchy according to feedback of the performance index evaluation;
S6: outputting the optimized data standard hierarchical relationship in a standardized format, integrating the data standard hierarchical relationship into an enterprise data management system through an API interface, and supporting enterprise data management, analysis and decision making processes.
Further, the S1 specifically includes:
s11: the method comprises the steps of collecting original data from various data sources inside an enterprise by using RESTful API as a data interface technology, wherein the various data sources comprise a database, a log file and an online transaction processing system, and specifically, for the database, SQL query is adopted to acquire data through the RESTful API; for the log file, calling a log management system through an API to collect data; capturing real-time transaction data with an API for an online transaction system;
S12: in the S11 data collection process, the JSON format is used for carrying out data serialization, so as to ensure that data collected from different data sources keep a uniform format in the transmission and processing processes;
S13: the method comprises the steps of applying a natural language processing technology, specifically adopting an entity recognition technology, automatically recognizing and extracting entities of names, places, dates and special terms from text data in a unified format, and providing basis for further processing and classification of the data;
S14: and classifying the subject of the collected data according to the extracted keywords and the entity by using a preset deep learning text classification model.
Further, the step S14 specifically includes:
S141: model selection and training data preparation, specifically selecting BERT developed by google company as a deep learning model of text classification, and preparing a training data set composed of the keywords extracted in step S13 and text data of entity labels:
s142: performing model training, namely performing fine tuning on the BERT model by using a prepared data set, and defining a loss function as a cross entropy loss function in the training process, wherein the loss function is used for calculating the difference between the theme prediction probability output by the model and a real theme label, and the parameter optimization is performed by a back propagation and gradient descent algorithm in the training process, wherein the loss function formula is as follows: Wherein: l represents a loss function value; n is the number of topics classified; y i is the distribution of the sample's real labels over the ith topic; /(I) Is the predictive probability of the model for the ith topic;
S143: and (3) classifying the topics of the collected data by using the BERT model trained in the step (S142), wherein the input data are texts composed of the keywords and the entities extracted in the step (S13), and the model outputs the topic categories of the data.
Further, the step S2 specifically includes:
S21: firstly, carrying out data cleaning on the original data collected in the step S1, including removing repeated records, correcting obvious error values and processing missing values, and adopting an interpolation method or an average value based on similar records to replace the missing values;
S22: carrying out standardization processing on the cleaned data to eliminate the deviation between different data sources and different data quantity levels, wherein the standardization includes Z score standardization and minimum-maximum standardization, and the consistency and the effectiveness of model training are ensured;
s23: the data enhancement technology is adopted, so that the data diversity is increased, the overfitting risk is reduced, and the specific technology comprises the following steps:
For text data, adopting the technologies of synonym replacement, sentence recombination, random insertion and deletion to generate new data samples;
The variability of the data is increased by adopting a method of random noise addition and eigenvalue disturbance to the numerical data;
S24: and (3) carrying out quality verification on the enhanced data set, ensuring that misleading deviation is not introduced in the data enhancement technology, and specifically using an automatic script to check the consistency and the integrity of the data.
Further, the improved deep learning algorithm in S3 is a combination of a convolutional neural network and a long-term and short-term memory network, and the specific application includes:
S31: firstly, extracting the enhanced data features in the step S2 by using a convolutional neural network, wherein the convolutional neural network can automatically learn the local features of the data through a convolutional layer and is used for extracting effective information from the data of images and texts, and a model of the convolutional neural network is specifically provided as follows: f CNN(X;ΘCNN), wherein X represents the input dataset, Θ CNN represents parameters of the convolutional neural network model;
S32: inputting the features extracted in the step S31 into a long-short-term memory network for serialization processing, wherein the long-term memory network can process and memorize information in a long sequence and is used for capturing long-distance dependence in time sequence data or text data, and a model of the long-term memory network is specifically set as follows: f LSTM(Y;ΘLSTM), wherein y=f CNN(X;ΘCNN) represents the extracted features of the long-short term memory network, Θ LSTM represents the parameters of the long-short term memory network model;
S33: the data characteristics processed by the convolutional neural network and the long-term and short-term memory network are combined to carry out deep data classification, the deep data classification is realized by adding a full connection layer FC, the characteristics after the serialization processing are mapped to specific categories, and a classification function is set as follows: f FC(Z;ΘFC), wherein z=f LSTM(Y;ΘLSTM),ΘFC represents parameters of the fully connected layer;
s34: model parameters theta CNNLSTM and theta FC are adjusted through a training process, specifically, a cross entropy loss function is used for measuring classification accuracy, and model parameters are optimized through a back propagation algorithm, so that classification result accuracy and fine granularity are improved.
Further, the step S34 specifically includes:
S341: in the model training process, a cross entropy loss function is adopted to measure the difference between the classification result of model prediction and an actual label, and for the multi-classification problem, the cross entropy loss function is defined as follows: Wherein N is the number of data samples, M is the number of classes, y ic is 1 when sample i belongs to class c, otherwise is an indicator variable of O, and p ic is the probability that model predictive sample i belongs to class c;
S342: updating model parameters according to the gradient of the cross entropy loss function by using a back propagation algorithm to reduce the loss function value, wherein an updating formula of the model parameters is as follows: Where θ represents a model parameter, α is a learning rate, Is the gradient of the loss function L with respect to the parameter θ;
S343: in the training process, dynamically adjusting the learning rate alpha according to the performance of the model on the verification set, and specifically adopting a learning rate attenuation strategy to avoid overlarge step length in the early training stage and local minimum value in the later training stage;
s344: to prevent model overfitting, an early-stop strategy is used, specifically when the performance of the model on the validation set is not significantly improved over several consecutive training periods, the training process is stopped.
Further, the step S4 specifically includes:
S41: constructing a data association graph based on the S3 classification result and the known association between the data, wherein nodes represent data items, edges represent the association between the data items, the association of the data items is defined based on shared attributes and similarity measures, and the specific process for constructing the data association graph comprises the following steps:
first, each data item is considered as a node in the graph based on the data attributes identified by S1 and the S2 enhanced data features;
then, determining an association between the data items by analyzing the data attributes and features, the association defining edges between the nodes, the weights of the edges being defined according to the strength or similarity measure of the association;
s42: processing the data association graph using a graph rolling network model that can learn representations of nodes on the graph structure data, the function of the graph rolling network model based on information of each node and its neighbors being represented as: Wherein/> Wherein A is an adjacency matrix of the data association graph constructed according to S41, I N is an identity matrix used for adding self-connection and enhancing the expression of the self-characteristics of the node,/>Is/>Degree matrix of (1) >, element thereofThe degree of the node i is represented, H (l) is the node feature matrix of the first layer, H (0) is the initial feature matrix of the node for l=0, W (l) and b (l) are the weight matrix and the bias vector of the first layer respectively, are parameters obtained through training and learning, and sigma is a nonlinear activation function;
S43: and dynamically constructing and adjusting the hierarchical relationship between the data items by using node representation processed by the graph convolution network model and combining the classification result and the original characteristics of the nodes through a hierarchical clustering algorithm.
Further, the step S5 specifically includes:
S51: firstly, defining a set of preset performance indexes for evaluating the effectiveness and efficiency of a data hierarchical structure, wherein the set of preset performance indexes comprise data access efficiency, data management convenience and hierarchical structure accuracy, and the data access efficiency is evaluated by measuring the time required for retrieving information in a constructed hierarchical structure; evaluating operational complexity in maintaining and updating data for data management convenience; evaluating the accuracy of the hierarchical structure by comparing the consistency of the predetermined classification and the actual classification of the data items in the hierarchical structure;
s52: evaluating the performance index in the data hierarchy constructed in the step S4 by using the step S51, and particularly adopting a quantitative analysis method to calculate the data access time, the number of operation steps and verify the accuracy of the hierarchy by using expert evaluation;
S53: according to the result of performance index evaluation, collecting feedback information, identifying improvement points in the hierarchical structure, and specifically expressing the optimization process as an adjustment function: h new=Optimize(Hold, feedback, α), where H new is the optimized hierarchy, H old is the current hierarchy, feedback is Feedback based on performance index evaluation, α is learning rate, indicating the degree of response to Feedback in the optimization process;
s54: and taking the optimization result of the step S53 as a new input, and repeating the steps S52 and S53 until the performance index display hierarchy reaches a preset target.
Further, the step S6 specifically includes:
S61: firstly defining a standardized data format for representing an optimized data hierarchical structure, and specifically using a JSON or XML format, wherein the format comprises information of each node in the hierarchical structure, and the information of the nodes comprises node names, node types, father nodes, child node lists and attributes of the nodes;
S62: converting the optimized data hierarchy into a standardized format defined in S61;
S63: developing a RESTful API interface for outputting and integrating the serialized data hierarchy into an enterprise data management system, wherein the API interface comprises an uploading interface, a query interface and an updating interface, and the RESTful API interface comprises: the uploading interface allows a user to upload the serialized data hierarchy file; the query interface allows a user to query node information according to node names or attributes; the update interface allows a user to update node information in the data hierarchy;
S64: the API interface in S63 is integrated in the enterprise data management system to support uploading, querying and updating the data hierarchy via the API interface, after which the system user will be able to operate and manage the data hierarchy directly via the user interface of the data management system.
The invention has the beneficial effects that:
according to the invention, by introducing an improved deep learning algorithm, such as a model combining a Convolutional Neural Network (CNN) and a long and short term memory network (LSTM), and a deep learning model based on a graph (such as a graph convolution network GCN), key information can be automatically extracted and identified from large-scale and multi-source enterprise data, and the information can be effectively classified and organized.
The invention dynamically builds and adjusts the hierarchical relationship between the data by using advanced technologies such as graph rolling network (GCN) and the like, so that the data standard and the hierarchical structure can flexibly adapt to the change requirement of enterprises, and the dynamic optimization capability means that the enterprises can quickly adjust the data organization mode according to the actual application scene and the management requirement, thereby ensuring the long-term effectiveness and flexibility of a data management system.
According to the invention, the optimized data standard hierarchical relationship is output in a standardized format and integrated into the enterprise data management system through the API, so that the accessibility and the usability of data are greatly promoted, the data management work of an enterprise is supported, the definition of data access authority and the guarantee of data safety are supported, a solid foundation is provided for data analysis and decision making, the enterprise is helped to observe business trends from huge data resources, and a more intelligent decision is made.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only of the invention and that other drawings can be obtained from them without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an intelligent construction method for an enterprise data standard hierarchical relationship according to an embodiment of the present invention.
Detailed Description
The present invention will be further described in detail with reference to specific embodiments in order to make the objects, technical solutions and advantages of the present invention more apparent.
It is to be noted that unless otherwise defined, technical or scientific terms used herein should be taken in a general sense as understood by one of ordinary skill in the art to which the present invention belongs. The terms "first," "second," and the like, as used herein, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items.
As shown in fig. 1, the method for intelligently constructing the enterprise data standard hierarchical relationship comprises the following steps:
S1: collecting original data from various data sources (including databases, log files, online transaction processing systems and the like) in an enterprise through an efficient data interface technology, identifying the content and the attribute of the original data by utilizing a natural language processing technology, and providing a preliminary data classification basis for subsequent steps;
s2: the data in the step S1 are subjected to cleaning and standardization treatment, the data diversity is increased by adopting a data enhancement technology, the overfitting risk during model training is reduced, and the generalization capability of the system is improved;
S3: deep classification is carried out on data based on the data attribute identified by the S1 and the data characteristic enhanced by the S2 by utilizing an improved deep learning algorithm, so that the accuracy and the fine granularity of a classification result are ensured;
s4: based on a graph convolution network model, dynamically constructing and adjusting the hierarchical relationship between data according to the S3 classification result and the association between the data, and realizing the intelligent generation of data standards and hierarchical structures;
s5: evaluating the data hierarchy constructed in the step S4 through preset performance indexes (such as data access efficiency, management convenience and the like), and continuously optimizing the hierarchy relation according to feedback of performance index evaluation;
S6: outputting the optimized data standard hierarchical relationship in a standardized format, integrating the data standard hierarchical relationship into an enterprise data management system through an API interface, and supporting enterprise data management, analysis and decision making processes.
S1 specifically comprises:
S11: the RESTful API is adopted as a data interface technology, original data is collected from various data sources in an enterprise, the various data sources comprise a database, a log file and an online transaction processing system, and specifically, for the database, SQL query is adopted to acquire data through the RESTful API; for the log file, calling a log management system through an API to collect data; capturing real-time transaction data with an API for an online transaction system;
s12: in the S11 data collection process, a JSON format is used for carrying out data serialization, so that the data collected from different data sources can be ensured to maintain a uniform format in the transmission and processing processes, and the speed and accuracy of subsequent data processing can be increased;
s13: applying natural language processing technology, specifically adopting entity Recognition (NAMED ENTITY NER) technology, automatically recognizing and extracting the entities of name, place, date and professional term from text data in unified format, and providing basis for further processing and classification of data;
S14: using a preset deep learning text classification model, and performing topic classification on the collected data according to the extracted keywords and the entity, so as to realize preliminary classification of the data content;
Through the steps S11-S14, efficient and accurate collection of original data from various data sources in enterprises is ensured, and the content and the attribute of the data are identified and classified through advanced natural language processing technology, so that a solid foundation is provided for subsequent data standard hierarchical relationship construction.
S14 specifically comprises the following steps:
S141: model selection and training data preparation, namely selecting BERT (BidirectionalEncoderRepresentationsfromTransformers) developed by google company as a deep learning model of text classification, and preparing a training dataset, wherein the dataset consists of text data marked by keywords and entities extracted in step S13, and each sample in the dataset comprises a series of keywords, entities and corresponding topic labels:
S142: performing model training, namely performing fine tuning on the BERT model by using the prepared data set, and defining a loss function as a cross entropy loss function (CrossEntropyLoss) in the training process, wherein the loss function is used for calculating the difference between the theme prediction probability output by the model and the real theme label, and performing parameter optimization by using a back propagation and gradient descent algorithm in the training process, wherein a loss function formula is as follows: Wherein: l represents a loss function value; n is the number of topics classified; y i is the distribution of the real labels of the sample over the ith topic (typically independently encoded, i.e. the position corresponding to the actual topic is 1, the others are 0); /(I) Is the predictive probability of the model for the ith topic;
S143: performing topic classification on the collected data by using the BERT model trained in the S142, wherein the input data is a text composed of the keywords and the entities extracted in the step S13, and the model outputs topic categories of the data;
Through the steps S141-S143, the advanced BERT model is utilized, and the keywords and the entities extracted from the data are combined, so that the accurate subject classification of the data collected in the enterprise is realized, the accuracy of the data classification is improved, and effective support is provided for the subsequent management and application of the data.
S2 specifically comprises:
S21: firstly, carrying out data cleaning on the original data collected in the step S1, including removing repeated records, correcting obvious error values and processing missing values, and adopting interpolation or average value substitution based on similar records for processing the missing values to ensure the integrity and accuracy of the data;
S22: the cleaned data is subjected to standardization processing to eliminate the deviation between different data sources and different amounts of data, specifically including Z-score standardization (data is converted into a distribution with a mean value of 0 and a standard deviation of 1) and minimum-maximum standardization (data is scaled to a specific range, such as 0 to 1), so that the consistency and the effectiveness of model training are ensured;
s23: the data enhancement technology is adopted, so that the data diversity is increased, the overfitting risk is reduced, and the specific technology comprises the following steps:
For text data, adopting the technologies of synonym replacement, sentence recombination, random insertion and deletion to generate new data samples;
The variability of the data is increased by adopting a method of random noise addition and eigenvalue disturbance to the numerical data;
S24: and (3) carrying out quality verification on the enhanced data set, ensuring that misleading deviation is not introduced in the data enhancement technology, and specifically using an automatic script to check the consistency and the integrity of the data.
The improved deep learning algorithm in S3 is the combination of a convolutional neural network and a long-term and short-term memory network, and the specific application comprises the following steps:
S31: firstly, extracting the enhanced data features in the step S2 by using a convolutional neural network, wherein the convolutional neural network can automatically learn the local features of the data through a convolutional layer and is used for extracting effective information from the data of images and texts, and a model of the convolutional neural network is specifically provided as follows: f CNN(X;ΘCNN), wherein X represents the input dataset, Θ CNN represents parameters of the convolutional neural network model;
S32: inputting the features extracted in the step S31 into a long-short-term memory network for serialization processing, wherein the long-term memory network can process and memorize information in a long sequence and is used for capturing long-distance dependence in time sequence data or text data, and a model of the long-term memory network is specifically set as follows: f LSTM(Y;ΘLSTM), wherein y=f CNN(X;ΘCNN) represents the extracted features of the long-short term memory network, Θ LSTM represents the parameters of the long-short term memory network model;
S33: the data characteristics processed by the convolutional neural network and the long-term and short-term memory network are combined to carry out deep data classification, the deep data classification is realized by adding a full connection layer FC, the characteristics after the serialization processing are mapped to specific categories, and a classification function is set as follows: f FC(Z;ΘFC), wherein z=f LSTM(Y;ΘLSTM),ΘFC represents parameters of the fully connected layer;
S34: model parameters theta CNNLSTM and theta FC are adjusted through a training process, specifically, a cross entropy loss function is used for measuring the classification accuracy, and model parameters are optimized through a back propagation algorithm, so that the classification result accuracy and fine granularity are improved;
through the steps S31-S34, the spatial characteristics of the extracted data by utilizing the capability of CNN are combined with the advantages of LSTM in the aspect of processing sequence data, so that deep classification of the data attribute identified by S1 and the data characteristic enhanced by S2 is realized, and the accuracy and the fine granularity of the classification result are ensured.
S34 specifically includes:
S341: in the model training process, a cross entropy loss function is adopted to measure the difference between the classification result of model prediction and an actual label, and for the multi-classification problem, the cross entropy loss function is defined as follows:
Wherein N is the number of data samples, M is the number of classes, y ic is 1 when sample i belongs to class c, otherwise is an indicator variable of O, and p ic is the probability that model predictive sample i belongs to class c;
S342: updating model parameters according to the gradient of the cross entropy loss function by using a back propagation algorithm to reduce the loss function value and improve the classification accuracy, wherein an updating formula of the model parameters is as follows: wherein θ represents model parameters, α is learning rate,/> Is the gradient of the loss function L with respect to the parameter θ;
S343: in the training process, dynamically adjusting the learning rate alpha according to the performance of the model on the verification set, and specifically adopting a learning rate attenuation strategy to avoid overlarge step length in the early training stage and local minimum value in the later training stage;
s344: to prevent model overfitting, an early-stop strategy is used, specifically when the performance of the model on the validation set is not significantly improved over several consecutive training periods, the training process is stopped.
S4 specifically comprises the following steps:
S41: constructing a data association graph based on the S3 classification result and the known association between the data, wherein nodes represent data items, edges represent the association between the data items, the association of the data items is defined based on shared attributes and similarity measures, and the specific process for constructing the data association graph comprises the following steps:
first, each data item is considered as a node in the graph based on the data attributes identified by S1 and the S2 enhanced data features;
Then, determining associations, such as similarities or logical relationships, between the data items by analyzing the data attributes and features, the associations defining edges between the nodes, the weights of the edges being defined according to the strength or similarity measure of the associations; the principle of associative graphs is therefore to transform data items and their interrelationships into a graphical representation, facilitating the analysis and processing of the inherent structure of data using graph algorithms;
S42: processing the data association graph using a graph rolling network model that can learn representations of nodes (feature vectors) on the graph structure data, the function of the graph rolling network model based on information of each node and its neighbors being represented as: Wherein/> Wherein A is an adjacency matrix of the data association graph constructed according to S41, I N is an identity matrix used for adding self-connection and enhancing the expression of the self-characteristics of the node,/>Is/>Degree matrix of (1) >, element thereofRepresenting the degree of node i (including self-connection), H (l) is the node feature matrix of the first layer, H (0) is the initial feature matrix of the node for l=0, W (l) and b (l) are the weight matrix and bias vector of the first layer, respectively, are parameters learned by training, σ is a nonlinear activation function, such as ReLU;
S43: the hierarchical relation among the data items is dynamically constructed and adjusted through a hierarchical clustering algorithm by utilizing node representation processed by the graph-convolution network model and combining the classification result and the original characteristics of the nodes, and the process of dynamically constructing and adjusting the hierarchical relation among the data items by using the hierarchical clustering algorithm is as follows:
Step 1: based on the node representation H L (L is the last layer) processed by the graph rolling network model in S42, calculating a distance matrix between the nodes, wherein the distance between the nodes can be measured by Euclidean distance or cosine similarity;
Step 2: clustering the distance matrix by using a hierarchical clustering algorithm, wherein the hierarchical clustering gradually merges the nearest clusters to construct a hierarchical structure of the data items until a preset termination condition is met, wherein the hierarchical clustering comprises the number of clusters or a distance threshold;
Step3: each cluster represents a hierarchy of data items, and the hierarchical relationship between the data items is dynamically adjusted by analyzing the cluster structure to form a final data standard and hierarchy.
S5 specifically comprises the following steps:
S51: firstly, defining a set of preset performance indexes for evaluating the effectiveness and efficiency of a data hierarchical structure, wherein the set of preset performance indexes comprise data access efficiency, data management convenience and hierarchical structure accuracy, and the data access efficiency is evaluated by measuring the time required for retrieving information in a constructed hierarchical structure; evaluating operational complexity in maintaining and updating data for data management convenience; evaluating the accuracy of the hierarchical structure by comparing the consistency of the predetermined classification and the actual classification of the data items in the hierarchical structure;
s52: evaluating the performance index in the data hierarchy constructed in the step S4 by using the step S51, and particularly adopting a quantitative analysis method to calculate the data access time, the number of operation steps and verify the accuracy of the hierarchy by using expert evaluation;
S53: according to the result of performance index evaluation, collecting feedback information, identifying improvement points in the hierarchical structure, and specifically expressing the optimization process as an adjustment function: h new=Optimize(Hold, feedback, α), where H new is the optimized hierarchy, H old is the current hierarchy, feedback is Feedback based on performance index evaluation, α is learning rate, indicating the degree of response to Feedback in the optimization process;
S54: taking the optimization result of the step S53 as a new input, repeating the steps S52 and S53 until the performance index display hierarchical structure reaches a preset target, so as to realize continuous optimization of the hierarchical relationship;
According to the steps, the data hierarchical structure constructed by the method can be systematically evaluated and optimized based on the preset performance indexes, and can be dynamically adjusted according to feedback of the evaluation result, so that the high efficiency, convenience and accuracy of the data hierarchical structure in practical application are ensured.
S6 specifically comprises the following steps:
S61: firstly defining a standardized data format for representing an optimized data hierarchical structure, and specifically using a JSON or XML format, wherein the format comprises information of each node in the hierarchical structure, and the information of the nodes comprises node names, node types, father nodes, child node lists and attributes of the nodes;
s62: converting the optimized data hierarchy into the standardized format defined in S61, which involves traversing all nodes in the data hierarchy and serializing the node information according to the standardized format;
S63: developing a RESTful API interface for outputting and integrating the serialized data hierarchy into an enterprise data management system, wherein the API interface comprises an uploading interface, a query interface and an updating interface, and the RESTful API interface comprises: the uploading interface allows a user to upload the serialized data hierarchy file; the query interface allows a user to query node information according to node names or attributes; the update interface allows a user to update node information in the data hierarchy;
S64: the API interface in S63 is integrated in the enterprise data management system to support uploading, querying and updating the data hierarchy via the API interface, after which the system user will be able to operate and manage the data hierarchy directly via the user interface of the data management system.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the invention is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
The present invention is intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the present invention should be included in the scope of the present invention.

Claims (9)

1. The intelligent construction method for the enterprise data standard hierarchical relationship is characterized by comprising the following steps:
S1: collecting original data from various data sources in an enterprise through an efficient data interface technology, and identifying the content and the attribute of the original data by utilizing a natural language processing technology;
s2: cleaning and standardizing the data in the S1, and increasing the diversity of the data by adopting a data enhancement technology;
S3: deep classification is carried out on data based on the data attribute identified by the S1 and the data characteristic enhanced by the S2 by utilizing an improved deep learning algorithm, so that the accuracy and the fine granularity of a classification result are ensured;
S4: based on the graph rolling network model, dynamically constructing and adjusting the hierarchical relationship between the data according to the S3 classification result and the association between the data;
S5: evaluating the data hierarchy constructed in the step S4 through a preset performance index, and continuously optimizing the hierarchy according to feedback of the performance index evaluation;
S6: outputting the optimized data standard hierarchical relationship in a standardized format, integrating the data standard hierarchical relationship into an enterprise data management system through an API interface, and supporting enterprise data management, analysis and decision making processes.
2. The method for intelligently constructing the hierarchical relationship of the enterprise data standard according to claim 1, wherein the step S1 specifically comprises:
s11: the method comprises the steps of collecting original data from various data sources inside an enterprise by using RESTful API as a data interface technology, wherein the various data sources comprise a database, a log file and an online transaction processing system, and specifically, for the database, SQL query is adopted to acquire data through the RESTful API; for the log file, calling a log management system through an API to collect data; capturing real-time transaction data with an API for an online transaction system;
S12: in the S11 data collection process, the JSON format is used for carrying out data serialization, so as to ensure that data collected from different data sources keep a uniform format in the transmission and processing processes;
S13: the method comprises the steps of applying a natural language processing technology, specifically adopting an entity recognition technology, automatically recognizing and extracting entities of names, places, dates and special terms from text data in a unified format, and providing basis for further processing and classification of the data;
S14: and classifying the subject of the collected data according to the extracted keywords and the entity by using a preset deep learning text classification model.
3. The method for intelligently constructing the hierarchical relationship of the enterprise data standard according to claim 2, wherein the step S14 specifically comprises:
S141: model selection and training data preparation, specifically selecting BERT developed by google company as a deep learning model of text classification, and preparing a training data set composed of the keywords extracted in step S13 and text data of entity labels:
s142: performing model training, namely performing fine tuning on the BERT model by using a prepared data set, and defining a loss function as a cross entropy loss function in the training process, wherein the loss function is used for calculating the difference between the theme prediction probability output by the model and a real theme label, and the parameter optimization is performed by a back propagation and gradient descent algorithm in the training process, wherein the loss function formula is as follows: Wherein: l represents a loss function value; n is the number of topics classified; y i is the distribution of the sample's real labels over the ith topic; /(I) Is the predictive probability of the model for the ith topic;
S143: and (3) classifying the topics of the collected data by using the BERT model trained in the step (S142), wherein the input data are texts composed of the keywords and the entities extracted in the step (S13), and the model outputs the topic categories of the data.
4. The method for intelligently constructing the hierarchical relationship of the enterprise data standard according to claim 3, wherein the step S2 specifically comprises:
S21: firstly, carrying out data cleaning on the original data collected in the step S1, including removing repeated records, correcting obvious error values and processing missing values, and adopting an interpolation method or an average value based on similar records to replace the missing values;
S22: carrying out standardization processing on the cleaned data to eliminate the deviation between different data sources and different data quantity levels, wherein the standardization includes Z score standardization and minimum-maximum standardization, and the consistency and the effectiveness of model training are ensured;
s23: the data enhancement technology is adopted, so that the data diversity is increased, the overfitting risk is reduced, and the specific technology comprises the following steps:
For text data, adopting the technologies of synonym replacement, sentence recombination, random insertion and deletion to generate new data samples;
The variability of the data is increased by adopting a method of random noise addition and eigenvalue disturbance to the numerical data;
S24: and (3) carrying out quality verification on the enhanced data set, ensuring that misleading deviation is not introduced in the data enhancement technology, and specifically using an automatic script to check the consistency and the integrity of the data.
5. The method for intelligently constructing the hierarchical relationship of the enterprise data standard according to claim 4, wherein the improved deep learning algorithm in S3 is a combination of a convolutional neural network and a long-term and short-term memory network, and the specific application includes:
S31: firstly, extracting the enhanced data features in the step S2 by using a convolutional neural network, wherein the convolutional neural network can automatically learn the local features of the data through a convolutional layer and is used for extracting effective information from the data of images and texts, and a model of the convolutional neural network is specifically provided as follows: f CNN(X;ΘCNN), wherein X represents the input dataset, Θ CNN represents parameters of the convolutional neural network model;
S32: inputting the features extracted in the step S31 into a long-short-term memory network for serialization processing, wherein the long-term memory network can process and memorize information in a long sequence and is used for capturing long-distance dependence in time sequence data or text data, and a model of the long-term memory network is specifically set as follows: f LSTM(Y;ΘLSTM), wherein y=f CNN(X;ΘCNN) represents the extracted features of the long-short term memory network, Θ LSTM represents the parameters of the long-short term memory network model;
S33: the data characteristics processed by the convolutional neural network and the long-term and short-term memory network are combined to carry out deep data classification, the deep data classification is realized by adding a full connection layer FC, the characteristics after the serialization processing are mapped to specific categories, and a classification function is set as follows: f FC(Z;ΘFC), wherein z=f LSTM(Y;ΘLSTM),ΘFC represents parameters of the fully connected layer;
s34: model parameters theta CNNLSTM and theta FC are adjusted through a training process, specifically, a cross entropy loss function is used for measuring classification accuracy, and model parameters are optimized through a back propagation algorithm, so that classification result accuracy and fine granularity are improved.
6. The method for intelligently constructing the hierarchical relationship of the enterprise data standard according to claim 5, wherein the step S34 specifically comprises:
S341: in the model training process, a cross entropy loss function is adopted to measure the difference between the classification result of model prediction and an actual label, and for the multi-classification problem, the cross entropy loss function is defined as follows: Wherein N is the number of data samples, M is the number of classes, y ic is 1 when sample i belongs to class c, otherwise is an indicator variable of O, and p ic is the probability that model predictive sample i belongs to class c;
S342: updating model parameters according to the gradient of the cross entropy loss function by using a back propagation algorithm to reduce the loss function value, wherein an updating formula of the model parameters is as follows: wherein θ represents model parameters, α is learning rate,/> Is the gradient of the loss function L with respect to the parameter θ;
S343: in the training process, dynamically adjusting the learning rate alpha according to the performance of the model on the verification set, and specifically adopting a learning rate attenuation strategy to avoid overlarge step length in the early training stage and local minimum value in the later training stage;
s344: to prevent model overfitting, an early-stop strategy is used, specifically when the performance of the model on the validation set is not significantly improved over several consecutive training periods, the training process is stopped.
7. The method for intelligently constructing the hierarchical relationship of the enterprise data standard according to claim 6, wherein the step S4 specifically comprises:
S41: constructing a data association graph based on the S3 classification result and the known association between the data, wherein nodes represent data items, edges represent the association between the data items, the association of the data items is defined based on shared attributes and similarity measures, and the specific process for constructing the data association graph comprises the following steps:
first, each data item is considered as a node in the graph based on the data attributes identified by S1 and the S2 enhanced data features;
then, determining an association between the data items by analyzing the data attributes and features, the association defining edges between the nodes, the weights of the edges being defined according to the strength or similarity measure of the association;
s42: processing the data association graph using a graph rolling network model that can learn representations of nodes on the graph structure data, the function of the graph rolling network model based on information of each node and its neighbors being represented as: Wherein/> Wherein A is an adjacency matrix of the data association graph constructed according to S41, I N is an identity matrix used for adding self-connection and enhancing the expression of the self-characteristics of the node,/>Is/>Degree matrix of (1) >, element thereofThe degree of the node i is represented, H (l) is the node feature matrix of the first layer, H (0) is the initial feature matrix of the node for l=0, W (l) and b (l) are the weight matrix and the bias vector of the first layer respectively, are parameters obtained through training and learning, and sigma is a nonlinear activation function;
S43: and dynamically constructing and adjusting the hierarchical relationship between the data items by using node representation processed by the graph convolution network model and combining the classification result and the original characteristics of the nodes through a hierarchical clustering algorithm.
8. The method for intelligently constructing the hierarchical relationship of the enterprise data standard according to claim 7, wherein the step S5 specifically comprises:
S51: firstly, defining a set of preset performance indexes for evaluating the effectiveness and efficiency of a data hierarchical structure, wherein the set of preset performance indexes comprise data access efficiency, data management convenience and hierarchical structure accuracy, and the data access efficiency is evaluated by measuring the time required for retrieving information in a constructed hierarchical structure; evaluating operational complexity in maintaining and updating data for data management convenience; evaluating the accuracy of the hierarchical structure by comparing the consistency of the predetermined classification and the actual classification of the data items in the hierarchical structure;
s52: evaluating the performance index in the data hierarchy constructed in the step S4 by using the step S51, and particularly adopting a quantitative analysis method to calculate the data access time, the number of operation steps and verify the accuracy of the hierarchy by using expert evaluation;
S53: according to the result of performance index evaluation, collecting feedback information, identifying improvement points in the hierarchical structure, and specifically expressing the optimization process as an adjustment function: h new=Optimize(Hold, feedback, α), where H new is the optimized hierarchy, H old is the current hierarchy, feedback is Feedback based on performance index evaluation, α is learning rate, indicating the degree of response to Feedback in the optimization process;
s54: and taking the optimization result of the step S53 as a new input, and repeating the steps S52 and S53 until the performance index display hierarchy reaches a preset target.
9. The method for intelligently constructing the hierarchical relationship of the enterprise data standard according to claim 8, wherein the step S6 specifically comprises:
S61: firstly defining a standardized data format for representing an optimized data hierarchical structure, and specifically using a JSON or XML format, wherein the format comprises information of each node in the hierarchical structure, and the information of the nodes comprises node names, node types, father nodes, child node lists and attributes of the nodes;
S62: converting the optimized data hierarchy into a standardized format defined in S61;
S63: developing a RESTful API interface for outputting and integrating the serialized data hierarchy into an enterprise data management system, wherein the API interface comprises an uploading interface, a query interface and an updating interface, and the RESTful API interface comprises: the uploading interface allows a user to upload the serialized data hierarchy file; the query interface allows a user to query node information according to node names or attributes; the update interface allows a user to update node information in the data hierarchy;
S64: the API interface in S63 is integrated in the enterprise data management system to support uploading, querying and updating the data hierarchy via the API interface, after which the system user will be able to operate and manage the data hierarchy directly via the user interface of the data management system.
CN202410235935.1A 2024-03-01 2024-03-01 Intelligent construction method for enterprise data standard hierarchical relationship Active CN117971808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410235935.1A CN117971808B (en) 2024-03-01 2024-03-01 Intelligent construction method for enterprise data standard hierarchical relationship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410235935.1A CN117971808B (en) 2024-03-01 2024-03-01 Intelligent construction method for enterprise data standard hierarchical relationship

Publications (2)

Publication Number Publication Date
CN117971808A true CN117971808A (en) 2024-05-03
CN117971808B CN117971808B (en) 2024-08-30

Family

ID=90856277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410235935.1A Active CN117971808B (en) 2024-03-01 2024-03-01 Intelligent construction method for enterprise data standard hierarchical relationship

Country Status (1)

Country Link
CN (1) CN117971808B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118229032A (en) * 2024-05-22 2024-06-21 山东中翰软件有限公司 Self-adaptive enterprise data management method and system based on business dynamic change

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9928448B1 (en) * 2016-09-23 2018-03-27 International Business Machines Corporation Image classification utilizing semantic relationships in a classification hierarchy
CN108875826A (en) * 2018-06-15 2018-11-23 武汉大学 A kind of multiple-limb method for checking object based on the compound convolution of thickness granularity

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9928448B1 (en) * 2016-09-23 2018-03-27 International Business Machines Corporation Image classification utilizing semantic relationships in a classification hierarchy
CN108875826A (en) * 2018-06-15 2018-11-23 武汉大学 A kind of multiple-limb method for checking object based on the compound convolution of thickness granularity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈珺莹: "基于区域信息增强的细粒度图像分类研究及应用", 《CNKI优秀硕士学位论文全文库信息科技辑》, 15 January 2022 (2022-01-15), pages 1 - 64 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118229032A (en) * 2024-05-22 2024-06-21 山东中翰软件有限公司 Self-adaptive enterprise data management method and system based on business dynamic change

Also Published As

Publication number Publication date
CN117971808B (en) 2024-08-30

Similar Documents

Publication Publication Date Title
CN111324642A (en) Model algorithm type selection and evaluation method for power grid big data analysis
CN117971808B (en) Intelligent construction method for enterprise data standard hierarchical relationship
CN116894152B (en) Multisource data investigation and real-time analysis method
CN117151870B (en) Portrait behavior analysis method and system based on guest group
CN117372165A (en) Data supervision method and system based on financial wind control business
CN115358481A (en) Early warning and identification method, system and device for enterprise ex-situ migration
CN111625578A (en) Feature extraction method suitable for time sequence data in cultural science and technology fusion field
CN117453805B (en) Visual analysis method for uncertainty data
CN113920366A (en) Comprehensive weighted main data identification method based on machine learning
CN118115098A (en) Big data analysis and processing system based on deep learning
Li et al. An improved genetic-XGBoost classifier for customer consumption behavior prediction
Liu Discussion on the Enterprise Financial Risk Management Framework Based on AI Fintech
CN114820074A (en) Target user group prediction model construction method based on machine learning
CN114528378A (en) Text classification method and device, electronic equipment and storage medium
Ortega-Bastida et al. Regional gross domestic product prediction using twitter deep learning representations
Wang Big Data Mining Method of Marketing Management Based on Deep Trust Network Model
Li et al. [Retracted] AK‐Means Clustering Algorithm for Early Warning of Financial Risks in Agricultural Industry
CN113742472B (en) Data mining method and device based on customer service marketing scene
CN112819205B (en) Method, device and system for predicting working hours
CN117593044B (en) Dual-angle marketing campaign effect prediction method, medium and system
Su et al. Research and Comparison of Random Forests and Neural Networks in Shanghai and Shenzhen Financial 20 Index Prediction
CN118035794A (en) Training method and system of small classification model based on cross-modal migration knowledge data
CN118037451A (en) Prediction method of financial transaction price
CN118797273A (en) Data analysis system, method, equipment and medium based on government industry large model
Uma et al. EVALUATION OF ENSEMBLE LEARNING APPROACH FOR OPTIMISED STOCK PREDICTION

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant