CN118052577A

CN118052577A - Cloud platform-based data acquisition, processing and analysis system and method

Info

Publication number: CN118052577A
Application number: CN202410453218.6A
Authority: CN
Inventors: 刘晓东; 童沐雨; 王立桥; 冯思雨
Original assignee: Information Technology Nanjing Co ltd
Current assignee: Information Technology Nanjing Co ltd
Priority date: 2024-04-16
Filing date: 2024-04-16
Publication date: 2024-05-17

Abstract

The application relates to the field of cloud platform technology, and particularly discloses a cloud platform-based data acquisition processing analysis system and method. And then classifying the associated feature vectors through a classifier to identify abnormal transaction behaviors. The scheme can effectively monitor a large amount of transaction data of the e-commerce platform, timely discover illegal transactions such as fraud, deceiving, illegal and the like, and ensure the transaction safety of the e-commerce platform.

Description

Cloud platform-based data acquisition, processing and analysis system and method

Technical Field

The application relates to the field of cloud platform technology, in particular to a data acquisition, processing and analysis system and method based on a cloud platform.

Background

E-commerce, for short, e-commerce refers to carrying out transaction activities and related service activities in an electronic transaction mode on the Internet, an intranet and a value-added network, and is electronic and networked in each link of the traditional business activities; electronic commerce includes electronic money exchanges, supply chain management, electronic transaction markets, network marketing, online transactions, electronic Data Interchange (EDI), inventory management, and automated data collection systems.

When the electronic commerce carries out transaction, the platform needs to monitor the transaction activities of the electronic commerce so as to prevent illegal transactions such as fraud, deceiving and illegal transaction, and along with the high-speed development of the Internet, the platform develops rapidly, the platform needs to detect the electronic commerce transaction activities of big data, the traditional detection mode is not applicable any more, and a monitoring system which is applicable to big data requirements and monitors comprehensively is needed to monitor the electronic commerce.

Therefore, a cloud platform-based data acquisition, processing and analysis system and method are desired.

Disclosure of Invention

The present application has been made to solve the above-mentioned technical problems. The embodiment of the application provides a cloud platform-based data acquisition, processing and analysis system and a cloud platform-based data acquisition, processing and analysis method.

Accordingly, according to one aspect of the present application, there is provided a cloud platform-based data acquisition processing analysis system, including:

The data acquisition module is used for acquiring transaction data on the electronic commerce platform and extracting order details, transaction amount, commodity information and store evaluation from the transaction data on the electronic commerce platform;

The data processing module is used for respectively encoding the order details, the transaction amount, the commodity information and the store evaluation to obtain order detail feature vectors, transaction amount feature vectors, commodity information feature vectors and store evaluation feature vectors;

The data analysis module is used for constructing transaction data text association feature vectors among the order detail feature vector, the transaction amount feature vector, the commodity information feature vector and the store evaluation feature vector, and carrying out coherent interference correction based on a category probability value on the transaction data text association feature vectors to obtain corrected transaction data text association feature vectors;

And the data result classification module is used for enabling the corrected transaction data text association feature vector to pass through a classifier to obtain a classification result, and the classification result is used for indicating whether the transaction data is abnormal or not.

In the data acquisition, processing and analysis system based on the cloud platform, the data processing module includes: an order detail feature extraction unit, configured to pass the order details through a semantic encoder that includes an embedded layer to obtain the order detail feature vector; a transaction amount feature extraction unit, configured to obtain the transaction amount feature vector by passing the transaction amount through the semantic encoder including the embedded layer; the commodity information feature extraction unit is used for enabling the commodity information to pass through the semantic encoder comprising the embedded layer so as to obtain the commodity information feature vector; and a store evaluation feature extraction unit for passing the store evaluation through the semantic encoder containing the embedded layer to obtain the store evaluation feature vector.

In the above data acquisition, processing and analysis system based on a cloud platform, the order detail feature extraction unit includes: the word segmentation processing subunit is used for segmenting the order details to obtain word sequences; a word cleaning subunit, configured to clean the word sequence to obtain a cleaned word sequence, where cleaning the word sequence includes at least one of: deleting repeated words, eliminating punctuation marks and deleting special symbols; a word embedding subunit, configured to map each word in the cleaned word sequence into a word vector by using an embedding layer of the semantic encoder to obtain a word vector sequence; and the semantic coding subunit is used for semantically coding the word vector sequence by using a Bert model of the semantic coder so as to generate the order detail feature vector.

In the data acquisition, processing and analysis system based on the cloud platform, the data analysis module includes: the data fusion unit is used for fusing the order detail feature vector, the transaction amount feature vector, the commodity information feature vector and the store evaluation feature vector to obtain a transaction data key feature matrix; the data convolution encoding unit is used for carrying out convolution encoding on the transaction data key feature matrix to obtain the transaction data text association feature vector; and the data optimization unit is used for carrying out coherent interference correction based on the category probability value on the transaction data text association characteristic vector so as to obtain the corrected transaction data text association characteristic vector.

In the data acquisition, processing and analysis system based on the cloud platform, the data fusion unit is configured to: and arranging the order detail feature vector, the transaction amount feature vector, the commodity information feature vector and the store evaluation feature vector according to parameter dimensions to obtain the transaction data key feature matrix.

In the above data acquisition processing analysis system based on a cloud platform, the data convolution encoding unit includes: the text convolution coding subunit is used for enabling the transaction data key feature matrix to pass through a text convolution neural network model serving as a feature extractor to obtain a transaction data text association feature map; and the associated feature map dimension reduction subunit is used for carrying out global average pooling on each feature matrix of the transaction data text associated feature map along the channel dimension so as to obtain the transaction data text associated feature vector.

In the above data acquisition, processing and analysis system based on a cloud platform, the data optimization unit includes: the pre-classifier subunit is used for enabling the transaction data text associated feature vector to pass through a pre-classifier to obtain a category probability feature vector; a covariance matrix calculating subunit, configured to calculate a covariance matrix between the transaction data text association feature vector and the category probability feature vector; the sub-unit of calculating the autocorrelation covariance matrix is used for calculating the autocorrelation covariance matrix of the text association feature vector of the transaction data; a computation interference correction matrix subunit configured to compute an interference correction matrix based on the covariance matrix and the autocorrelation covariance matrix; and the correction subunit is used for correcting the transaction data text association characteristic vector based on the interference correction matrix to obtain the corrected transaction data text association characteristic vector.

In the data acquisition, processing and analysis system based on the cloud platform, the data result classification module is configured to: processing the corrected transaction data text associated feature vector by using the classifier in the following formula to obtain the classification result;

Wherein, the formula is: Wherein, the method comprises the steps of, wherein, To the point ofAs a matrix of weights, the weight matrix,To the point ofAs a result of the offset vector,To correct the post-transaction data text-related feature vector,The softmax function is represented by a graph,Representing the classification result.

According to another aspect of the present application, there is also provided a data acquisition processing analysis method based on a cloud platform, including:

acquiring transaction data on an electronic commerce platform, and extracting order details, transaction amount, commodity information and store evaluation from the transaction data on the electronic commerce platform;

Encoding the order details, the transaction amount, the commodity information and the store evaluation to obtain order detail feature vectors, transaction amount feature vectors, commodity information feature vectors and store evaluation feature vectors;

constructing a transaction data text association feature vector among the order detail feature vector, the transaction amount feature vector, the commodity information feature vector and the store evaluation feature vector, and performing coherent interference correction based on a class probability value on the transaction data text association feature vector to obtain a corrected transaction data text association feature vector;

and passing the corrected transaction data text associated feature vector through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the transaction data is abnormal or not.

In the above data acquisition, processing and analysis method based on a cloud platform, the step of obtaining an order detail feature vector, a transaction amount feature vector, a commodity information feature vector and a store evaluation feature vector by encoding the order detail, the transaction amount, the commodity information and the store evaluation respectively includes: passing the order details through a semantic encoder comprising an embedded layer to obtain the order detail feature vector; passing the transaction amount through the semantic encoder comprising an embedded layer to obtain the transaction amount feature vector; passing the commodity information through the semantic encoder comprising an embedded layer to obtain the commodity information feature vector; passing the store assessment through the semantic encoder comprising an embedded layer to obtain the store assessment feature vector.

Compared with the prior art, the cloud platform-based data acquisition processing analysis system and method provided by the application have the advantages that the transaction data are captured from the e-commerce platform in real time or at fixed time, the characteristics of order details, transaction amount, commodity information, store evaluation and the like are extracted, and the associated feature vector is constructed. And then classifying the associated feature vectors through a classifier to identify abnormal transaction behaviors. The scheme can effectively monitor a large amount of transaction data of the e-commerce platform, timely discover illegal transactions such as fraud, deceiving, illegal and the like, and ensure the transaction safety of the e-commerce platform.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing embodiments of the present application in more detail with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.

Fig. 1 is a schematic block diagram of a data acquisition, processing and analysis system based on a cloud platform according to an embodiment of the present application.

Fig. 2 is a schematic block diagram of a data processing module in a cloud platform-based data acquisition, processing and analysis system according to an embodiment of the present application.

Fig. 3 is a schematic block diagram of an order detail feature extraction unit in a cloud platform-based data acquisition, processing and analysis system according to an embodiment of the present application.

Fig. 4 is a flowchart of a data acquisition, processing and analysis method based on a cloud platform according to an embodiment of the present application.

Detailed Description

Various exemplary embodiments, features and aspects of the application will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better illustration of the application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In some instances, well known methods, procedures, components, and circuits have not been described in detail so as not to obscure the present application.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

Fig. 1 illustrates a block diagram schematic of a cloud platform based data acquisition processing analysis system in accordance with an embodiment of the present application. As shown in fig. 1, a cloud platform-based data acquisition processing analysis system 100 according to an embodiment of the present application includes: the data acquisition module 110 is used for acquiring transaction data on an e-commerce platform and extracting order details, transaction amount, commodity information and store evaluation from the transaction data on the e-commerce platform; a data processing module 120, configured to encode the order details, the transaction amount, the commodity information, and the store evaluation to obtain an order details feature vector, a transaction amount feature vector, a commodity information feature vector, and a store evaluation feature vector, respectively; the data analysis module 130 is configured to construct a transaction data text association feature vector among the order detail feature vector, the transaction amount feature vector, the commodity information feature vector and the store evaluation feature vector, and perform coherent interference correction based on a category probability value on the transaction data text association feature vector to obtain a corrected transaction data text association feature vector; the data result classification module 140 is configured to pass the corrected transaction data text-related feature vector through a classifier to obtain a classification result, where the classification result is used to indicate whether the transaction data is abnormal.

In the embodiment of the present application, the data acquisition module 110 is configured to acquire transaction data on an e-commerce platform, and extract order details, transaction amount, commodity information, and store evaluation from the transaction data on the e-commerce platform. It should be appreciated that, in particular, order details, transaction amounts, merchandise information, and store ratings are important features describing transactions. By extracting these features and encoding, order details feature vectors, transaction amount feature vectors, commodity information feature vectors, and store evaluation feature vectors can be obtained. These feature vectors are then constructed as transaction data text-related feature vectors. The feature vector contains relevant information of all aspects of the transaction and can reflect the overall situation of the transaction. And finally, classifying the transaction data text association feature vectors through a classifier, so that abnormal transaction behaviors can be identified. Such as fraudulent transactions, deceived transactions, and illicit transactions, etc. Therefore, the transaction data on the electronic commerce platform is obtained, and the order details, the transaction amount, the commodity information and the store evaluation are extracted from the transaction data on the electronic commerce platform, so that the method is a basis for constructing the text-related feature vector of the transaction data and identifying abnormal transaction behaviors.

In the embodiment of the present application, the data processing module 120 is configured to encode the order details, the transaction amount, the commodity information, and the store evaluation to obtain an order details feature vector, a transaction amount feature vector, a commodity information feature vector, and a store evaluation feature vector, respectively. It will be appreciated that unifying the different types of data in the form of feature vectors helps simplify the data structure and make the data easier to process and analyze. Machine learning models typically require numeric data as input, which can be processed by encoding text data into feature vectors. By converting the text information into feature vectors, key features in the text, such as keywords in order details, the numerical value of transaction amount, the category of commodity information and the like, can be extracted, which helps the model to better understand the data. After the text data is encoded into the feature vector, the high-dimension text data can be subjected to dimension reduction processing, so that the computational complexity is reduced, and the model efficiency is improved. Machine learning models require numerical features as inputs for training, and by encoding text data into feature vectors, the model can learn the relationships and patterns between the data.

Specifically, in one embodiment of the present application, fig. 2 illustrates a schematic block diagram of a data processing module in a cloud platform-based data acquisition processing analysis system according to an embodiment of the present application. As shown in fig. 2, in the cloud platform-based data collection, processing and analysis system 100, the data processing module 120 includes: an order detail feature extraction unit 121, configured to pass the order details through a semantic encoder that includes an embedded layer to obtain the order detail feature vector; a transaction amount feature extraction unit 122, configured to pass the transaction amount through the semantic encoder including the embedded layer to obtain the transaction amount feature vector; a commodity information feature extraction unit 123 for passing the commodity information through the semantic encoder including an embedded layer to obtain the commodity information feature vector; a store evaluation feature extraction unit 124 for passing the store evaluation through the semantic encoder including the embedded layer to obtain the store evaluation feature vector.

Accordingly, in a specific example of the present application, the order details feature extracting unit 121 is configured to pass the order details through a semantic encoder including an embedded layer to obtain the order details feature vector. It will be appreciated that by using an embedded semantic encoder, textual information in order details may be converted into a continuous, dense vector representation that captures semantic relationships between words, helping to extract hidden semantic information in order details. The order details are encoded into low-dimensional vector representation through the embedded layer, so that the dimension of data is reduced, key information is reserved, the complexity of subsequent processing is reduced, and the calculation efficiency is improved. The semantic similarity between different order details can be captured through the order detail feature vector generated by the semantic encoder, which is very useful for subsequent similarity calculation, recommendation systems or text classification tasks. The order detail feature vectors generated by the semantic encoder can improve the generalization capability of the model, so that the model is better adapted to new order detail data, and the risk of overfitting is reduced.

Further, fig. 3 illustrates a schematic block diagram of an order detail feature extraction unit in the cloud platform-based data acquisition processing analysis system according to an embodiment of the present application. As shown in fig. 3, in the data processing module 120 of the cloud platform-based data acquisition, processing and analysis system 100, the order detail feature extraction unit 121 includes: a word segmentation processing subunit 1211, configured to segment the order details to obtain a word sequence; a word cleansing subunit 1212 configured to cleansing the word sequence to obtain a cleansing word sequence, where cleansing the word sequence includes at least one of: deleting repeated words, eliminating punctuation marks and deleting special symbols; a word embedding subunit 1213, configured to map each word in the cleaned word sequence into a word vector using an embedding layer of the semantic encoder to obtain a word vector sequence; a semantic coding subunit 1214, configured to use a Bert model of the semantic encoder to semantically code the word vector sequence to generate the order detail feature vector.

Specifically, the word segmentation processing subunit 1211 is configured to segment the order details to obtain a word sequence. It should be understood that word segmentation is one of the basic steps of text preprocessing. The segmentation of the order details may divide the continuous text sequence into words or phrases, helping to eliminate noise and redundant information in some text. The word sequence after word segmentation can better represent the semantics and structure of the text, and is beneficial to the subsequent feature extraction work. Each word serves as a feature that can help the system better understand the meaning of the order details. Through word segmentation, the relation between words in order details can be captured better, and subsequent semantic analysis and understanding are facilitated. This is critical to identifying critical information and important features in an order. In natural language processing tasks, most models require as input a word or phrase. Therefore, the order details are segmented to provide satisfactory input data for subsequent model training and data analysis. The segmentation can convert text data into a simpler form, reduces the complexity of text processing, and is beneficial to the subsequent text cleaning and feature extraction processes.

Specifically, the word cleaning subunit 1212 is configured to clean the word sequence to obtain a cleaned word sequence, where cleaning the word sequence includes at least one of the following operations: deleting duplicates, eliminating punctuation marks, and deleting special symbols. It should be appreciated that some noise may be included in the text data, such as punctuation marks, special symbols, and repeated words, which are not significant for text processing and feature extraction. The washing operation helps to reduce the effects of these noise, making the text cleaner. Deleting the repeated words can reduce redundant information in the data, so that the text is more concise. This helps to reduce the amount of data, improve processing efficiency, and reduce the complexity of subsequent processing. Eliminating punctuation and special symbols can make the text more normalized and easier to process. This helps to improve the quality of the data and reduces problems that may occur in subsequent processing. The cleaning operation can keep the consistency of the text data and avoid confusion of words or symbols in different forms for subsequent processing. This helps to improve the accuracy of text processing and feature extraction. Clean text data can improve the performance and generalization ability of the model. The cleaned word sequence is more beneficial to the model to learn the effective characteristics of text data, so that the performance of the model on tasks is improved.

Further, the word embedding subunit 1213 is configured to map each word in the cleaned word sequence into a word vector using an embedding layer of the semantic encoder to obtain a word vector sequence. It should be appreciated that converting words into word vectors may better capture semantic relationships between words. Word vectors are a dense representation of values that effectively express the semantic information of words, thereby helping the system to better understand and process text data. Converting words into word vectors may map high-dimensional text data into a low-dimensional vector space, thereby reducing the dimensionality and complexity of the data. This helps to improve the efficiency of text processing and model training. Most natural language processing models and machine learning models require as input a numerical vector. Converting words into word vectors may provide satisfactory input data for subsequent model training and data analysis. The word vector contains semantic information of words and can be input into the model as text features. By learning word vectors, the system can better understand text data, supporting subsequent feature extraction and model training. Based on the representation of the word vectors, the similarity between words can be calculated, thereby supporting semantic analysis and understanding of the text. This is very helpful for identifying key information in text and making text similarity comparisons.

Still further, the semantic coding subunit 1214 is configured to use a Bert model of the semantic encoder to semantically code the word vector sequence to generate the order detail feature vector. It should be appreciated that Bert is a pre-trained deep bi-directional transducer model that is capable of capturing rich contextual information in text. By encoding the word vector sequence by Bert, the association and context between words in the text data can be better understood, thereby generating a more semantic feature vector. Bert pre-trains on a large scale of corpus and learns rich semantic information. Encoding a sequence of word vectors using Bert may transform words into a vector representation with high semantic information, helping to extract important features in the text data. Bert is excellent in various natural language processing tasks, and has strong modeling and generalization capabilities. By encoding the word vector sequence using Bert, the quality of the feature representation can be improved, thereby improving the performance and accuracy of the subsequent model. Bert is a generic pre-trained language model that can be tailored to specific tasks by fine tuning. The word vector sequence of the order details is input into the Bert model for fine adjustment, so that the model can better understand the semantics and the characteristics of the order data, and further, a more meaningful characteristic vector is generated. The Bert model may extract rich feature representations in the text data, including semantic relationships between words, contextual information, and the like. By encoding the word vector sequence by using Bert, important features in order details can be extracted, and powerful support is provided for subsequent feature engineering and modeling.

Accordingly, in a specific example of the present application, the transaction amount feature extraction unit 122 is configured to pass the transaction amount through the semantic encoder including the embedded layer to obtain the transaction amount feature vector. It should be appreciated that the transaction amount is characterized as a numerical feature, combined with textual information (encoded as a word vector via an embedded layer). Thus, the influence of text information and transaction amount on the task can be comprehensively considered, and the performance of the model is improved. The transaction amount is used as a numerical characteristic to be combined with text information (the text information is encoded into word vectors through an embedding layer), and different types of information can be fused together, so that the model can comprehensively consider the influence of the text information and the transaction amount on tasks. This helps provide a more comprehensive and comprehensive representation of the features, thereby improving the performance of the model. Although the transaction amount is a numerical feature, it also contains certain semantic information, such as the amount of the transaction may reflect the importance of the transaction or the consumer's consumption habits. By converting the transaction amount into the feature vector through the semantic encoder, potential semantic information in the feature vector can be captured better, and the understanding capability of the model on the transaction amount can be improved. After the transaction amount is expressed as the feature vector, the feature combination can be carried out with the text information to form a richer and diversified feature expression. This helps the model learn more representative features, improving the characterizability and generalization ability of the model. By converting the transaction amount into the feature vector, the model can learn the association relation between the text information and the transaction amount better, so that the performance of the model in the task is improved. The method is helpful for improving understanding and modeling capacity of the model to data and improving prediction accuracy of the model.

Accordingly, in a specific example of the present application, the commodity information feature extraction unit 123 is configured to pass the commodity information through the semantic encoder including an embedded layer to obtain the commodity information feature vector. It should be appreciated that merchandise information generally contains rich semantic and contextual information such as merchandise names, descriptions, categories, and the like. By inputting the commodity information into the semantic encoder of the embedded layer, the text information can be converted into a vector representation with more semantic information, which is helpful for capturing important features in the commodity information. The semantic encoder such as BERT models are pre-trained on large-scale text data, and can learn rich semantic information and feature representations. By encoding commodity information by using a semantic encoder, important features in the commodity information can be extracted, and powerful support is provided for subsequent feature engineering and modeling. The merchandise information may relate to a plurality of text fields such as a merchandise name, description, label, etc. The semantic encoder can capture semantic association among different text fields, is helpful for comprehensively considering information of different parts, and generates more comprehensive and accurate commodity feature vectors. By using the semantic encoder to generate the commodity information feature vector, the quality of the feature representation can be improved, and the understanding capability of the model to commodity information can be enhanced, so that the performance and effect of subsequent tasks (such as commodity recommendation, classification and the like) are improved. The semantic encoder such as BERT has strong migration learning capability, and can achieve good effects on various natural language processing tasks. Commodity information is input into a semantic encoder for feature extraction, and knowledge and representation capacity of a pre-trained model can be utilized to improve generalization capacity of the model.

Accordingly, in a specific example of the present application, the store evaluation feature extraction unit 124 is configured to pass the store evaluation through the semantic encoder including the embedded layer to obtain the store evaluation feature vector. It should be appreciated that store ratings typically contain rich semantic information such as customer ratings for goods, services, emotional colors, etc. By inputting the store evaluation text into the semantic encoder, semantic information in the text can be extracted, converted into a vector representation, and key features in the evaluation captured. The semantic encoder can learn rich semantic information and feature representation through the pre-training of large-scale text data. By inputting store evaluations into the semantic encoder, feature vectors that are more semantic information can be obtained, helping to distinguish differences and meanings between different evaluations. Store evaluations often include emotional tendencies of customers, such as satisfaction, frustration, and the like. The semantic encoder can help capture the emotion information, so that emotion analysis and emotion classification tasks are realized, and stores are helped to know the evaluation and attitude of customers to services of the stores. After the shop evaluation is converted into the feature vector, the feature vector can be used for various natural language processing tasks such as text classification, emotion analysis, similarity calculation and the like. These tasks are of great significance to both store operators and platforms, helping to understand user needs, improve quality of service, etc. The semantic encoder has strong generalization capability in terms of text understanding, and can process text data in different fields. The store evaluation is input into the semantic encoder, and the generalization capability of the model on store evaluation data can be improved by using the learned general semantic information.

In the embodiment of the present application, the data analysis module 130 is configured to construct the order details feature vector, the transaction amount feature vector, the commodity information feature vector, and the transaction data text association feature vector between the store evaluation feature vectors, and perform coherent interference correction based on the category probability value on the transaction data text association feature vector to obtain a corrected transaction data text association feature vector. It will be appreciated that combining order details, transaction amounts, merchandise information, and store valuations, etc. of information in different dimensions may provide a more comprehensive, richer characterization of data. Through constructing the text association feature vector of the transaction data, the association and potential modes among different data can be captured, and the expression capability of the data is further improved. The transaction data is generally associated, and there is a certain association between order details, transaction amounts, merchandise information, and store ratings. By constructing the associated feature vector, the system can better understand the context information between the data, helping the model to understand and predict transaction behavior more accurately. The data of different dimensions contains different types of information, such as text information, numerical information, category information, etc. By combining the information to construct the associated feature vector, more abundant and diversified data characterization can be provided, which is helpful for improving the generalization capability and performance of the model. By integrating information of different data sources into the associated feature vectors, the model can be helped to learn more representative features, thereby improving the characterization and prediction capabilities of the model. This helps to improve the effectiveness of the model in handling complex tasks. In some tasks, information from different data sources needs to be comprehensively considered to effectively complete the task. Constructing text-related feature vectors of transaction data may provide more comprehensive information support for these tasks, helping the system to better understand and process the data.

Accordingly, in one embodiment of the present application, the data analysis module 130 includes: the data fusion unit is used for fusing the order detail feature vector, the transaction amount feature vector, the commodity information feature vector and the store evaluation feature vector to obtain a transaction data key feature matrix; the data convolution encoding unit is used for carrying out convolution encoding on the transaction data key feature matrix to obtain the transaction data text association feature vector; and the data optimization unit is used for carrying out coherent interference correction based on the category probability value on the transaction data text association characteristic vector so as to obtain the corrected transaction data text association characteristic vector.

Accordingly, in a specific example of the present application, the data fusion unit is configured to fuse the order details feature vector, the transaction amount feature vector, the commodity information feature vector, and the store evaluation feature vector to obtain a transaction data key feature matrix. It should be appreciated that feature vectors such as order details, transaction amounts, merchandise information, store evaluations, etc. contain information in different ways that can be fused to provide a more comprehensive and richer representation of data. By fusing the feature vectors, the relevance and potential modes between different information can be captured, and richer input is provided for the model. The feature vectors are fused into a transaction data key feature matrix, so that the dimension of data can be effectively reduced, and the data representation is simplified. This helps reduce model complexity, reduces computational costs, and improves model efficiency and performance. Certain association relation may exist between different feature vectors, and information interaction and information sharing between different features can be promoted by fusing the feature vectors. The method is helpful for the model to better understand the relevance between the data, and improves the generalization capability and the prediction accuracy of the model. Fusing different feature vectors can help the model learn more representative features, thereby improving the characterization capability and the prediction capability of the model. This helps the model better understand and utilize the transaction data, improving the performance and effectiveness of the model. In some tasks, information from different data sources needs to be comprehensively considered to effectively complete the task. The order detail feature vector, the transaction amount feature vector, the commodity information feature vector and the store evaluation feature vector are fused, so that more comprehensive information support can be provided for the tasks, and the system is helped to better understand and process the data.

Specifically, the data fusion unit is configured to: and arranging the order detail feature vector, the transaction amount feature vector, the commodity information feature vector and the store evaluation feature vector according to parameter dimensions to obtain the transaction data key feature matrix.

Accordingly, in a specific example of the present application, the data convolution encoding unit is configured to convolutionally encode the transaction data key feature matrix to obtain the transaction data text associated feature vector. It should be appreciated that Convolutional Neural Networks (CNNs) have good feature extraction capabilities when processing text data, enabling key features to be extracted from the data. By applying the transaction data key feature matrix to convolutional coding, the model can be helped to learn important features in the data, and the characterization capability of the data is improved. The convolution operation is able to capture local correlations in the input data, which is very helpful for understanding the local patterns and structures in the text data. The model can better capture local associated information in the transaction data key feature matrix through convolutional coding, so that the understanding capability of the model on text data is improved. The convolutional neural network reduces the number of parameters to be learned through a parameter sharing mode, and improves the efficiency and generalization capability of the model. This feature enables convolutional encoding to efficiently extract features when processing text data and helps avoid over-fitting problems. The convolution operation has the property of translational invariance, i.e. it can recognize the same pattern regardless of the position of the feature in the input. This is very helpful in handling the order invariance of words in the text data and helps to improve the modeling ability of the model on the text data.

Specifically, the data convolution encoding unit includes: the text convolution coding subunit is used for enabling the transaction data key feature matrix to pass through a text convolution neural network model serving as a feature extractor to obtain a transaction data text association feature map; and the associated feature map dimension reduction subunit is used for carrying out global average pooling on each feature matrix of the transaction data text associated feature map along the channel dimension so as to obtain the transaction data text associated feature vector.

Further, the convolution encoding subunit is configured to pass the transaction data key feature matrix through a text convolution neural network model serving as a feature extractor to obtain a transaction data text association feature map. It should be appreciated that convolutional neural networks are capable of efficiently extracting local features and structural information in text when processing text data. By inputting the key feature matrix of the transaction data into the text convolutional neural network model, the feature extraction capability of the convolutional neural network model can be utilized to extract important features in the transaction data text, including features in terms of vocabulary, semantics, grammar and the like. The convolutional neural network model may progressively extract abstract feature representations of the data through multiple convolutional layers and pooling layers. This manner of hierarchical representation facilitates the model to learn progressively more complex feature representations of the data, thereby better characterizing the relevance and features of the transaction data text. The convolutional neural network model can capture local relevance among words in text processing, and is helpful for understanding local patterns and structures in text data. The characteristics extracted by the text convolutional neural network model can better represent the association relation between different parts in the transaction data text, and the understanding capability of the model on the text data is improved. The parameter sharing mechanism in the convolutional neural network model can reduce the number of parameters to be learned, and improve the efficiency and generalization capability of the model. By using the convolutional neural network model as a feature extractor, key features of the transaction data text can be effectively extracted, and the problem of overfitting is avoided. The text convolutional neural network model obtains good performance in a natural language processing task, and by applying the text convolutional neural network model to the extraction of the text characteristics of transaction data, the training process of the model can be quickened and the performance of the model can be improved by means of the knowledge and the representation capacity of the trained text convolutional neural network model.

Further, the dimension reduction subunit of the association feature map is configured to pool the global average value of each feature matrix of the transaction data text association feature map along the channel dimension to obtain the transaction data text association feature vector. It should be appreciated that in practical applications, the number of channels may be very large, and global averaging may convert the feature matrix of each channel to a single value, thereby reducing the dimensionality of the data. This helps to reduce the complexity of the data, improves computational efficiency, and can reduce the risk of model overfitting. Global averaging can preserve important feature information in each channel. By carrying out mean pooling on the feature matrix of each channel, the average feature in each channel can be extracted, so that key information is reserved, and the effectiveness of data is maintained. A global averaging pooling operation is an operation that is not a translation invariance, i.e. the pooling operation can recognize the same pattern regardless of the position of the feature in the input. This helps the model better capture the overall features in the text data, independent of the specific location. Global averaging can reduce the number of parameters of the model and avoid the over-fitting problem. By dimension reduction of the feature map, the complexity of the model can be effectively reduced, the generalization capability is improved, and meanwhile, important information is kept, so that the representation of the model on unseen data is improved. Global averaging may help the model capture global information of the entire feature map, not just local features. This helps the model better understand the relevance and characteristics of the entire transaction data text, thereby improving the performance of the model in text data tasks.

Accordingly, in a specific example of the present application, the data optimization unit is configured to perform coherent interference correction based on a class probability value on the transaction data text associated feature vector to obtain the corrected transaction data text associated feature vector. It should be appreciated that, in particular, in the above technical solution, the transaction data text is encoded into the transaction data text associated feature vector, and then the classification judgment is performed by the classifier. However, the transaction data text may contain information unrelated to the classification task, such as noise or interference. Such extraneous information may negatively impact the performance of the classifier, reducing the accuracy of the classification. In particular, extraneous information (such as noise or interference) may interfere with the learning process of the classifier, resulting in erroneous decisions by the classifier during training and prediction. This may reduce the accuracy of the classification, making the classification result unreliable. If the transaction data text-related feature vectors contain a large amount of information unrelated to the classification task, the model may overfit the noise data rather than actually classifying the task-related features. This results in the model performing well on the training set, but not on the unseen data. Text-related feature vectors for transaction data containing extraneous information may prevent the model from generalizing to new, unseen data. Models may not be accurately classified in the face of new data because the model is overly dependent on noise rather than real features. The text-related feature vectors of transaction data containing a large amount of irrelevant information increase the computational complexity of the model, increasing the time cost of model training and reasoning, and increasing the storage requirements of the model. In order to solve the problem, in the technical scheme of the application, the transaction data text associated feature vector is subjected to coherent interference correction based on the class probability value to obtain a corrected transaction data text associated feature vector.

Specifically, the data optimization unit includes: the pre-classifier subunit is used for enabling the transaction data text associated feature vector to pass through a pre-classifier to obtain a category probability feature vector; a covariance matrix calculating subunit, configured to calculate a covariance matrix between the transaction data text association feature vector and the category probability feature vector; the sub-unit of calculating the autocorrelation covariance matrix is used for calculating the autocorrelation covariance matrix of the text association feature vector of the transaction data; a computation interference correction matrix subunit configured to compute an interference correction matrix based on the covariance matrix and the autocorrelation covariance matrix; and the correction subunit is used for correcting the transaction data text association characteristic vector based on the interference correction matrix to obtain the corrected transaction data text association characteristic vector.

Further, based on the covariance matrix and the autocorrelation covariance matrix, an interference correction matrix is calculated in the following formula; wherein, the formula is:

wherein, Representing the transaction data text associated feature vector,Representing the probability feature vector of the category,Representing a covariance matrix between the transaction data text-associated feature vector and the category probability feature vector,An autocorrelation covariance matrix representing the transaction data text-associated feature vector,Representing the identity matrix of the cell,Representing a predetermined hyper-parameter, which is used to ensure the reversibility of the covariance matrix,Representing the interference correction matrix.

In order to solve the technical problems, in the technical scheme of the application, the related feature vector of the transaction data text is subjected to coherent interference correction based on the class probability value to obtain the related feature vector of the corrected transaction data text, the information irrelevant to classification tasks in the class probability value is removed through the coherent interference correction, and the class probability feature vector is obtained through a pre-classifier, namely, the probability of each class is predicted. And then, calculating a covariance matrix between the transaction data text association feature vector and the category probability feature vector and an autocorrelation covariance matrix of the transaction data text association feature vector so as to know the relation and the change degree between the transaction data text association feature vector and the category probability feature vector. Based on the covariance matrix and the autocorrelation covariance matrix, an interference correction matrix is further calculated. The correction matrix is used for removing information irrelevant to classification tasks in the class probability values, and retaining and highlighting features relevant to the classification tasks. Further, an interference correction matrix is applied to the transaction data text associated feature vector to obtain a corrected transaction data text associated feature vector, and the corrected transaction data text associated feature vector only contains information related to classification tasks, so that the transaction data text associated feature vector can concentrate on truly important features, and the performance of the classification model is improved.

In this embodiment of the present application, the data result classification module 140 is configured to pass the corrected transaction data text associated feature vector through a classifier to obtain a classification result, where the classification result is used to indicate whether the transaction data is abnormal. It should be appreciated that the classifier may be used for anomaly detection tasks and that by inputting the transaction data text-related feature vectors into the classifier for classification, a determination may be made as to whether the transaction data is normal. Anomaly detection is a very important task in the financial field that can help detect potential fraudulent behavior or anomalous transactions. The classifier is able to learn and identify patterns and rules in the data. By training the classifier, the classifier can learn the characteristics of normal transaction data, so that abnormal transactions can be distinguished. The classifier can help to identify transaction data which is inconsistent with the normal mode, so that the accuracy of anomaly detection is improved. By using the classifier to detect the abnormality, the automatic processing can be realized, the efficiency is improved, and the manual intervention is reduced. The classifier can quickly classify a large amount of transaction data, so that potential abnormal conditions can be found in time. The trained classifier has certain generalization capability and can adapt to different transaction data conditions. This means that the classifier can make a reasonable classification judgment also in the face of new transaction data, thereby improving the reliability of anomaly detection.

Accordingly, in one embodiment of the present application, the data result classification module is configured to: processing the corrected transaction data text associated feature vector by using the classifier in the following formula to obtain the classification result;

In summary, according to the cloud platform-based data acquisition processing analysis system and method provided by the embodiment of the application, the characteristics of order details, transaction amount, commodity information, store evaluation and the like are extracted by capturing transaction data from the e-commerce platform in real time or at fixed time, so as to construct the associated feature vector. And then classifying the associated feature vectors through a classifier to identify abnormal transaction behaviors. The scheme can effectively monitor a large amount of transaction data of the e-commerce platform, timely discover illegal transactions such as fraud, deceiving, illegal and the like, and ensure the transaction safety of the e-commerce platform.

As described above, the data acquisition, processing and analysis system 100 based on a cloud platform according to the embodiment of the present application may be implemented in various terminal devices, for example, a server of the data acquisition, processing and analysis system based on a cloud platform. In one example, the analysis system 100 may be integrated into the terminal device as a software module and/or hardware module based on the cloud platform based data acquisition process. For example, the cloud platform-based data acquisition processing analysis system 100 may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the cloud platform-based data acquisition, processing and analysis system 100 may also be one of a plurality of hardware modules of the terminal device.

Alternatively, in another example, the cloud platform-based data acquisition, processing and analysis system 100 and the terminal device may be separate devices, and the cloud platform-based data acquisition, processing and analysis system 100 may be connected to the terminal device through a wired and/or wireless network and transmit the interaction information in a agreed data format.

Fig. 4 is a flowchart of a data acquisition, processing and analysis method based on a cloud platform according to an embodiment of the present application. As shown in fig. 4, the data acquisition, processing and analysis method based on the cloud platform according to the embodiment of the application includes the following steps: s110, acquiring transaction data on an electronic commerce platform, and extracting order details, transaction amount, commodity information and store evaluation from the transaction data on the electronic commerce platform; s120, the order details, the transaction amount, the commodity information and the store evaluation are respectively encoded to obtain order detail feature vectors, transaction amount feature vectors, commodity information feature vectors and store evaluation feature vectors; s130, constructing a transaction data text association feature vector among the order detail feature vector, the transaction amount feature vector, the commodity information feature vector and the store evaluation feature vector, and performing coherent interference correction based on a category probability value on the transaction data text association feature vector to obtain a corrected transaction data text association feature vector; and S140, the corrected transaction data text association feature vector is passed through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the transaction data is abnormal or not.

Here, it will be understood by those skilled in the art that the specific operations of the respective steps in the above-described cloud platform-based data acquisition processing analysis method have been described in detail in the above description of the cloud platform-based data acquisition processing analysis system with reference to fig. 1 to 3, and thus, repetitive descriptions thereof will be omitted.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The data acquisition, processing and analysis system based on the cloud platform is characterized by comprising:

2. The cloud platform based data acquisition, processing and analysis system of claim 1, wherein the data processing module comprises:

An order detail feature extraction unit, configured to pass the order details through a semantic encoder that includes an embedded layer to obtain the order detail feature vector;

A transaction amount feature extraction unit, configured to obtain the transaction amount feature vector by passing the transaction amount through the semantic encoder including the embedded layer;

the commodity information feature extraction unit is used for enabling the commodity information to pass through the semantic encoder comprising the embedded layer so as to obtain the commodity information feature vector;

and a store evaluation feature extraction unit for passing the store evaluation through the semantic encoder containing the embedded layer to obtain the store evaluation feature vector.

3. The cloud platform based data acquisition, processing and analysis system of claim 2, wherein the order detail feature extraction unit comprises:

the word segmentation processing subunit is used for segmenting the order details to obtain word sequences;

A word cleaning subunit, configured to clean the word sequence to obtain a cleaned word sequence, where cleaning the word sequence includes at least one of: deleting repeated words, eliminating punctuation marks and deleting special symbols;

A word embedding subunit, configured to map each word in the cleaned word sequence into a word vector by using an embedding layer of the semantic encoder to obtain a word vector sequence;

And the semantic coding subunit is used for semantically coding the word vector sequence by using a Bert model of the semantic coder so as to generate the order detail feature vector.

4. The cloud platform based data acquisition processing analysis system of claim 3, wherein the data analysis module comprises:

The data fusion unit is used for fusing the order detail feature vector, the transaction amount feature vector, the commodity information feature vector and the store evaluation feature vector to obtain a transaction data key feature matrix;

The data convolution encoding unit is used for carrying out convolution encoding on the transaction data key feature matrix to obtain the transaction data text association feature vector;

And the data optimization unit is used for carrying out coherent interference correction based on the category probability value on the transaction data text association characteristic vector so as to obtain the corrected transaction data text association characteristic vector.

5. The cloud platform based data acquisition, processing and analysis system of claim 4, wherein the data fusion unit is configured to: and arranging the order detail feature vector, the transaction amount feature vector, the commodity information feature vector and the store evaluation feature vector according to parameter dimensions to obtain the transaction data key feature matrix.

6. The cloud platform based data acquisition processing analysis system of claim 5, wherein the data convolution encoding unit comprises:

The text convolution coding subunit is used for enabling the transaction data key feature matrix to pass through a text convolution neural network model serving as a feature extractor to obtain a transaction data text association feature map;

and the associated feature map dimension reduction subunit is used for carrying out global average pooling on each feature matrix of the transaction data text associated feature map along the channel dimension so as to obtain the transaction data text associated feature vector.

7. The cloud platform based data acquisition processing analysis system of claim 6, wherein the data optimization unit comprises:

The pre-classifier subunit is used for enabling the transaction data text associated feature vector to pass through a pre-classifier to obtain a category probability feature vector;

A covariance matrix calculating subunit, configured to calculate a covariance matrix between the transaction data text association feature vector and the category probability feature vector;

the sub-unit of calculating the autocorrelation covariance matrix is used for calculating the autocorrelation covariance matrix of the text association feature vector of the transaction data;

A computation interference correction matrix subunit configured to compute an interference correction matrix based on the covariance matrix and the autocorrelation covariance matrix;

And the correction subunit is used for correcting the transaction data text association characteristic vector based on the interference correction matrix to obtain the corrected transaction data text association characteristic vector.

8. The cloud platform based data acquisition, processing and analysis system of claim 7, wherein the data result classification module is configured to: processing the corrected transaction data text associated feature vector by using the classifier in the following formula to obtain the classification result;

Wherein, the formula is: Wherein/> To/>Is a weight matrix,/>To the point ofIs bias vector,/>Text-related feature vectors for corrected post-transaction data,/>Representing a softmax function,/>Representing the classification result.

9. The data acquisition, processing and analysis method based on the cloud platform is characterized by comprising the following steps of:

10. The cloud platform based data collection, processing and analysis method of claim 9, wherein encoding the order details, the transaction amount, the commodity information and the store valuation to obtain an order details feature vector, a transaction amount feature vector, a commodity information feature vector and a store valuation feature vector, respectively, comprises:

Passing the order details through a semantic encoder comprising an embedded layer to obtain the order detail feature vector;

passing the transaction amount through the semantic encoder comprising an embedded layer to obtain the transaction amount feature vector;

passing the commodity information through the semantic encoder comprising an embedded layer to obtain the commodity information feature vector;

Passing the store assessment through the semantic encoder comprising an embedded layer to obtain the store assessment feature vector.