CN117014382A

CN117014382A - Data stream processing system and method based on convergence and distribution equipment

Info

Publication number: CN117014382A
Application number: CN202311276910.8A
Authority: CN
Inventors: 关朋
Original assignee: Beijing Zhongke Network Core Technology Co ltd
Current assignee: Beijing Zhongke Network Core Technology Co ltd
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2023-11-07
Anticipated expiration: 2043-10-07
Also published as: CN117014382B

Abstract

The application discloses a data stream processing system and a method thereof based on convergence and distribution equipment, which are used for carrying out matching processing on data streams based on characteristic data and address data of the data streams to be processed and a characteristic rule identification code through a characteristic coding-decoding structure.

Description

Data stream processing system and method based on convergence and distribution equipment

Technical Field

The present application relates to the field of data stream processing, and more particularly, to a data stream processing system based on a convergence and offloading device and a method thereof.

Background

With the rapid increase of network traffic, it has been difficult for a single network device to process the traffic of the entire network, and a service processing system is usually formed by using multiple network devices to provide services to the outside. In order to realize effective processing and distribution of network traffic, network convergence and distribution equipment is introduced.

The network convergence and distribution device is responsible for distributing the converged network messages to different devices of the service system. The method needs to analyze the link layer address and the load protocol information in the message and distribute the message according to a certain rule so as to ensure that the message belonging to the same data flow can be finally distributed to the same processing end point, thereby realizing the homologous and homologous of the message.

At present, a message distribution mode which is simple and easy to realize is adopted by the convergence and distribution equipment, but the inherent defects exist. In the process of data aggregation and distribution, when the data flow is matched, the data flow needs to be processed according to a preset compound rule. For example, adopting IP matching and feature code matching as source data matched by a compound rule, wherein the IP matching is executed by an independent IP matching module to generate an IP matching result; and the feature code matching is executed by a feature code matching module to generate a feature code matching result. And then, transmitting the IP matching processing result and the feature code matching processing result to a compound rule module for processing to obtain a final matching result. However, since the feature code matching and the IP matching are performed based on different compounding rules, the data stream may be discarded, and a phenomenon of data stream miss-matching occurs.

Accordingly, an optimized data stream processing system based on a converged splitter device is desired.

Disclosure of Invention

The present application has been made to solve the above-mentioned technical problems. The embodiment of the application provides a data stream processing system and a data stream processing method based on convergence and distribution equipment, which are used for carrying out matching processing on data streams based on characteristic data and address data of the data streams to be processed and a characteristic rule identification code through a characteristic coding-decoding structure.

According to one aspect of the present application, there is provided a data stream processing system based on a convergence splitting device, comprising:

the data stream acquisition module is used for acquiring a data stream to be processed;

the data analysis module is used for carrying out data analysis on the data stream to be processed to obtain semantic understanding characteristics of the data stream;

the characteristic rule identification code generation module is used for generating characteristic rule identification codes based on the semantic understanding characteristics of the data stream; and

And the matching module is used for carrying out matching processing on the data stream to be processed based on the characteristic rule identification code.

According to another aspect of the present application, there is provided a data stream processing method based on a convergence and offloading device, including:

acquiring a data stream to be processed;

carrying out data analysis on the data stream to be processed to obtain semantic understanding characteristics of the data stream;

generating a feature rule identification code based on the data stream semantic understanding feature; and

and carrying out matching processing on the data stream to be processed based on the characteristic rule identification code.

Compared with the prior art, the data stream processing system and the method based on the convergence and distribution equipment provided by the application are used for carrying out the matching processing of the data stream based on the characteristic data and the address data of the data stream to be processed and the characteristic rule identification code generated by the characteristic encoding-decoding structure, so that the efficiency and the accuracy of the data stream processing can be improved, the occurrence of the phenomenon of data stream mismatching is avoided, and the performance and the stability of the whole network service processing system are improved.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing embodiments of the present application in more detail with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.

FIG. 1 is a block diagram of a data stream processing system based on a convergence splitting device in accordance with an embodiment of the present application;

FIG. 2 is a system architecture diagram of a data stream processing system based on a convergence splitting device in accordance with an embodiment of the present application;

FIG. 3 is a block diagram of a data analysis module in a data stream processing system based on a convergence splitting device in accordance with an embodiment of the present application;

FIG. 4 is a block diagram of a data semantic association coding unit in a data stream processing system based on a convergence splitting device according to an embodiment of the present application;

FIG. 5 is a block diagram of a feature rule identification code generation module in a data stream processing system based on a convergence splitting device in accordance with an embodiment of the present application;

FIG. 6 is a block diagram of a feature distribution optimization unit in a data stream processing system based on a convergence splitting device in accordance with an embodiment of the present application;

fig. 7 is a flowchart of a data flow processing method based on a convergence and splitting device according to an embodiment of the present application.

Detailed Description

Hereinafter, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.

As used in the specification and in the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

Although the present application makes various references to certain modules in a system according to embodiments of the present application, any number of different modules may be used and run on a user terminal and/or server. The modules are merely illustrative, and different aspects of the systems and methods may use different modules.

A flowchart is used in the present application to describe the operations performed by a system according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Also, other operations may be added to or removed from these processes.

At present, a message distribution mode which is simple and easy to realize is adopted by the convergence and distribution equipment, but the inherent defects exist. In the process of data aggregation and distribution, when the data flow is matched, the data flow needs to be processed according to a preset compound rule. For example, adopting IP matching and feature code matching as source data matched by a compound rule, wherein the IP matching is executed by an independent IP matching module to generate an IP matching result; and the feature code matching is executed by a feature code matching module to generate a feature code matching result. And then, transmitting the IP matching processing result and the feature code matching processing result to a compound rule module for processing to obtain a final matching result. However, since the feature code matching and the IP matching are performed based on different compounding rules, the data stream may be discarded, and a phenomenon of data stream miss-matching occurs. Accordingly, an optimized data stream processing system based on a converged splitter device is desired.

In the technical scheme of the application, a data stream processing system based on convergence and distribution equipment is provided. Fig. 1 is a block diagram of a data stream processing system based on a convergence splitting device in accordance with an embodiment of the present application. Fig. 2 is a system architecture diagram of a data stream processing system based on a convergence splitting device according to an embodiment of the present application. As shown in fig. 1 and 2, a data stream processing system 300 based on a convergence splitting device according to an embodiment of the present application includes: a data stream acquisition module 310, configured to acquire a data stream to be processed; the data analysis module 320 is configured to perform data analysis on the data stream to be processed to obtain a semantic understanding feature of the data stream; a feature rule identification code generation module 330, configured to generate a feature rule identification code based on the semantic understanding feature of the data stream; and a matching module 340, configured to perform matching processing on the data stream to be processed based on the feature rule identification code.

In particular, the data stream acquisition module 310 is configured to acquire a data stream to be processed. Where data flow refers to the process of data flowing from one place to another in a computer system. It describes the way data is transferred and processed in the system, as well as the flow paths of the data between the different components. The data stream may be real-time or batch. The concept of data flow is widely used in the fields of computer science and information technology. It helps describe and design data transmission and processing processes in computer systems, improving the efficiency and performance of the system.

In particular, the data analysis module 320 is configured to perform data analysis on the data stream to be processed to obtain a semantic understanding feature of the data stream. In particular, in one specific example, as shown in fig. 3, the data analysis module 320 includes: a multi-class data extraction unit 321 for extracting feature data and address data from a data stream to be processed; and a data semantic association encoding unit 322, configured to perform semantic association encoding on the feature data and the address data to obtain the data stream semantic understanding feature.

Specifically, the multi-class data extracting unit 321 is configured to extract feature data and address data from a data stream to be processed. Wherein the characteristic data and the address data are two types of data common in data stream analysis. The feature data refers to data for describing and representing attributes or features of a certain object or event. These features may be numeric, typed, or other types of data. In data flow analysis, feature data is typically used to build models and make predictions. For example, in a room price prediction task, the feature data may include the area of the house, the number of rooms, the geographic location, etc.; and, the address data refers to data related to a geographic location. The address data may include country, city, street, zip code, etc. information for identifying and locating a location. In data stream analysis, address data is commonly used for Geographic Information Systems (GIS) and location analysis. For example, in business analysis, address data may be used to determine the number of potential customers or market coverage for a region.

Accordingly, in one possible implementation, the feature data and the address data may be extracted from the data stream to be processed by, for example: raw data is obtained from the data stream. The raw data may come from a variety of sources, such as databases, sensors, log files, etc.; and cleaning and preprocessing the original data, removing invalid or repeated data, and processing missing values and abnormal values. For address data, normalization and error correction of the address format may be required; and extracting meaningful features from the cleaned data according to task requirements and data characteristics. Feature extraction may be based on statistical methods, machine learning methods, or domain knowledge. For example, for text data, features such as word frequency, TF-IDF value, etc. may be extracted; for image data, color histograms, texture features, etc. may be extracted; the extracted features are transformed or reduced in dimension to better represent the data. Common feature transformation methods include Principal Component Analysis (PCA), linear Discriminant Analysis (LDA), and the like; for data containing address information, address data may be extracted using an address resolution tool or a geocoding service. These tools can parse address information into country, city, street, etc. components; modeling and analysis is performed using the extracted feature data and address data. Modeling can be performed by using methods such as a machine learning algorithm, a statistical model and the like so as to realize tasks such as prediction, decoding, clustering and the like; and evaluating and verifying the established model, and checking the performance and accuracy of the model. The evaluation can be performed by using methods such as cross-validation, confusion matrix and the like; the extracted feature data can be interpreted and analyzed to understand the relationship between the feature and the target variable. This helps understand the meaning of the data and the interpretation of the model.

Specifically, the data semantic association encoding unit 322 is configured to perform semantic association encoding on the feature data and the address data to obtain the data stream semantic understanding feature. In particular, in one specific example, as shown in fig. 4, the data semantic association encoding unit 322 includes: an address data one-hot encoding subunit 3221, configured to perform one-hot encoding on the address data to obtain an address one-hot encoding vector; a feature data embedding subunit 3222, configured to pass each data item in the feature data through an embedding layer to obtain a plurality of feature data item embedding vectors; and a multi-data up-down Wen Yuyi association subunit 3223 configured to embed the plurality of feature data items into vectors and the address single hot encoding vector through a context encoder based on a converter module to obtain a data stream semantic understanding feature vector as the data stream semantic understanding feature.

More specifically, the address data uni-thermal encoding subunit 3221 is configured to uni-thermally encode the address data to obtain an address uni-thermal encoding vector. It should be appreciated that in a data stream processing system, address data may have a plurality of different values, such as different IP addresses, port numbers, etc. Thus, in order to be able to make an efficient semantic understanding of the address data, it needs to be translated into a computer-recognizable vector representation. Specifically, the address data is subjected to one-hot encoding to obtain an address one-hot encoding vector. It should be appreciated that one-hot encoding is capable of converting discrete decoding variables into binary vectors. For example, for a decoded variable having N different values, one-hot encoding may generate a binary vector of length N, where only the corresponding value is 1 and the other is 0. The coding mode can keep the category information of the address data and enable the data to be easier to process in the numerical value calculation process.

Accordingly, in one possible implementation, the address data may be thermally encoded to obtain an address thermal encoding vector, for example, by: collecting data containing address information from source data; cleaning the address data to remove unnecessary character or wrong address data; address information is extracted from the cleaned data, and methods such as regular expressions, natural language processing and the like can be used; performing one-time thermal coding on the extracted address information, and converting each address into a vector; creating an address coding dictionary, and mapping each address with a corresponding one-hot coding vector; performing one-time thermal encoding on each address, and encoding the address into a vector only comprising 0 and 1; the one-hot encoded vectors are used for further data analysis and mining, such as clustering, decoding, etc.

More specifically, the feature data embedding subunit 3222 is configured to pass each data item in the feature data through an embedding layer to obtain a plurality of feature data item embedding vectors. Specifically, for the feature data, each data item in the feature data is encoded by an embedding layer to convert the original feature data into a low-dimensional dense feature vector representation, thereby obtaining a plurality of feature data item embedding vectors. This encoding helps to extract and represent hidden information in the feature data and reduces the dimensionality of the data. In particular, in a data stream processing system based on a converging and diverging device, the feature data is encoded by the embedding layer, and discrete feature data can be mapped to a continuous low-dimensional vector space to convert the original feature data into a plurality of low-dimensional dense feature data item embedding vectors. In this way, semantic information that facilitates the feature data is better captured and represented, and it can also be combined with other feature data items and address data for better data stream matching. Wherein the embedded Layer (Embedding Layer) is a common Layer type in a deep learning model, and is used for mapping discrete data (such as text, decoding labels, etc.) into a continuous low-dimensional vector representation. It can transform high-dimensional discrete features into low-dimensional continuous features to better represent the relationships between features.

Accordingly, in one possible implementation, each data item in the feature data may be passed through an embedding layer to obtain a plurality of feature data item embedding vectors, for example, by: the extracted feature data is converted so as to be suitable for input into the embedded layer. For example, the category type data may be unithermally encoded, converting it into a binary vector; the converted feature data is input into the embedded layer. The embedded layer is a common layer type in the deep learning model and is used for mapping discrete data into a continuous low-dimensional vector representation; each feature data item is mapped to an embedded vector by the embedding layer. The embedded vector is a low-dimensional continuous vector representing a characteristic representation of the characteristic data item; for each item of characteristic data a corresponding embedded vector is obtained. Through the processing of the embedding layer, the original characteristic data is converted into an embedding vector, so that the embedding vector can be used for subsequent data analysis and mining tasks.

More specifically, the multiple data up-down Wen Yuyi association subunit 3223 is configured to embed the multiple feature data items into vectors and the address single hot encoding vectors through a context encoder based on a converter module to obtain a data stream semantic understanding feature vector as the data stream semantic understanding feature. That is, in one aspect of the present application, the plurality of feature data item embedding vectors and the address single hot encoding vectors are processed in a context encoder based on a converter module to combine the feature data and the address data for semantic encoding, so as to extract global context semantic association feature information related to the combination of the feature data and the address data, thereby obtaining a data stream semantic understanding feature vector, so as to better represent semantic information of the data stream to be processed. In particular, the context encoder based on the converter module is used for semantic encoding, so that the plurality of characteristic data items can be embedded into the context relation in the vector and the address single-hot encoding vector, and different characteristic data items and address data can be interacted and integrated, thereby generating a vector representing semantic understanding characteristics of the data stream, wherein the characteristic vector comprises the association and importance between each characteristic data item and address data in the data stream, and semantic information of the data stream can be more comprehensively expressed. In particular, in one specific example, the multiple data up and down Wen Yuyi association subunit 3223 includes: a context encoding secondary subunit configured to embed the plurality of feature data items into vectors and the address-one hot encoding vectors through the context encoder of the converter-based module to obtain a plurality of context Wen Yuyi associated encoded feature vectors; and the cascade secondary subunit is used for cascading the plurality of upper and lower Wen Yuyi associated coding feature vectors to obtain the data stream semantic understanding feature vector.

The context encoding secondary subunit is configured to embed the plurality of feature data items into vectors and the address single hot encoding vectors through the context encoder based on the converter module to obtain a plurality of context Wen Yuyi associated encoded feature vectors. Specifically, embedding the plurality of characteristic data items into a vector and performing one-dimensional arrangement on the address unique-hot encoding vector to obtain a global characteristic vector; calculating the product between the global feature vector and the transpose vector of each of the plurality of feature data item embedded vectors and the address unique-hot encoding vector to obtain a plurality of self-attention correlation matrices; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; weighting each vector of the plurality of feature data item embedded vectors and the address single heat encoding vector by taking each probability value of the plurality of probability values as a weight to obtain the plurality of context semantic feature vectors; and concatenating the plurality of context semantic feature vectors to obtain the plurality of context Wen Yuyi associated encoding feature vectors. In particular, the context encoder based on the converter module is used for semantic encoding, so that the plurality of characteristic data items can be embedded into the context relation in the vector and the address single-hot encoding vector, and different characteristic data items and address data can be interacted and integrated, thereby generating a vector representing semantic understanding characteristics of the data stream, wherein the characteristic vector comprises the association and importance between each characteristic data item and address data in the data stream, and semantic information of the data stream can be more comprehensively expressed.

The cascade secondary subunit is configured to cascade the plurality of upper and lower Wen Yuyi associated coding feature vectors to obtain the data stream semantic understanding feature vector. That is, in the technical solution of the present application, after the plurality of upper and lower Wen Yuyi associated coding feature vectors are obtained, the plurality of upper and lower Wen Yuyi associated coding feature vectors are further cascaded, specifically, the following cascade formulas are used to fuse the plurality of upper and lower Wen Yuyi associated coding feature vectors so as to obtain the data stream semantic understanding feature vector; wherein, the formula is:

wherein,representing the plurality of upper and lower Wen Yuyi associated encoding feature vectors,representing a cascade function->Representing the data stream semantic understanding feature vector.

It is worth mentioning that in other specific examples of the present application, the plurality of feature data items may be embedded in vectors and the address-independent heat-coded vectors by passing through a context encoder based on a converter module to obtain a data stream semantic understanding feature vector as the data stream semantic understanding feature, for example: according to the previous steps, a plurality of embedded vectors of characteristic data items are obtained, and each characteristic data item has a corresponding low-dimensional continuous vector representation; for address data, it may be unithermally encoded, converting the address information into a binary vector representation. Each address corresponds to a single thermal encoding vector; a plurality of feature data item embedded vectors and address one-hot encoded vectors are encoded using a context encoder based on a converter module. The converter module is a common deep learning model, and can learn the relation and the context information between input sequences; a data stream semantic understanding feature vector can be obtained through the processing of the context encoder. This feature vector contains semantic understanding of the feature data and address data, capturing the associations and context information between them.

It should be noted that, in other specific examples of the present application, the feature data and the address data may be encoded in a semantic association to obtain the semantic understanding feature of the data stream in other manners, for example: the characteristic data and the address data are preprocessed, including noise removal, missing value processing, normalization and other operations. The quality and consistency of the data are ensured; the feature data and the address data are semantically associated encoded to capture semantic relationships between them. The following method may be used: word Embedding: and embedding and representing the vocabulary in the feature data and the address data. Common Word embedding models include Word2Vec, gloVe, etc. So that the vocabulary can be converted into continuous vector representation, and semantic association among the vocabulary is captured; the address data is geocoded and the address is converted to geographic coordinates or other forms of geographic representation. Geocoding may be performed using a geocoding service or a Geographic Information System (GIS) tool; the feature data is encoded and the features are converted into vector representations. The category type features can be converted into numerical representations by using methods such as single-heat coding, label coding and the like; and selecting the characteristics with higher information quantity and importance for further analysis according to task requirements and data characteristics. Feature selection may be performed using a statistical method, a machine learning method, or the like; modeling data: data modeling is performed by using the feature data subjected to semantic association coding. Modeling can be performed by using a machine learning algorithm, a deep learning model and the like to realize tasks such as prediction, decoding, clustering and the like; model evaluation: and evaluating and verifying the established model, and checking the performance and accuracy of the model. The evaluation can be performed by using methods such as cross-validation, confusion matrix and the like; the characteristic data subjected to semantic association coding can be interpreted and analyzed to know the relation between the characteristic and the target variable. This helps understand the meaning of the data and the interpretation of the model.

It should be noted that, in other specific examples of the present application, the data to be processed may be further analyzed by other manners to obtain semantic understanding features of the data stream, for example: and (3) data collection: samples or instances of the data stream to be processed are collected. This may be done by means of sensors, log files, database queries, etc.; data cleaning: the collected data is cleaned and preprocessed to remove noise, process missing values, process outliers, etc. This may include steps such as data cleansing, data conversion, and data integration; feature extraction: features are extracted from the cleaned data. The features may be statistics of the data, time series features, frequency domain features, spatial features, etc. The object of feature extraction is to extract useful information capable of representing data semantics from the original data; feature selection: the most representative feature is selected according to the indexes such as the correlation, importance, interpretability and the like of the feature. This may be done by statistical methods, machine learning algorithms, domain knowledge, etc.; feature conversion: and converting the selected characteristics to meet the input requirements of the model or improve the expressive power of the characteristics. Common feature conversion methods include standardization, normalization, dimension reduction and the like; modeling data: the transformed features are modeled using an appropriate machine learning algorithm or statistical method. This may include methods of decoding, clustering, regression, timing analysis, etc., selecting an appropriate model based on the particular task; model evaluation: the built model is evaluated for its performance and accuracy. This may be done using cross-validation, index evaluation, confusion matrix, etc.; interpretation of features: and interpreting the information such as feature weights, importance and the like obtained in the model to understand the semantics of the data stream. This may be done by feature importance ordering, visualization, etc.

In particular, the feature rule identification code generation module 330 is configured to generate a feature rule identification code based on the data stream semantic understanding feature. In particular, in one specific example of the present application, as shown in fig. 5, the feature rule identification code generation module 330 includes: the feature distribution optimizing unit 331 is configured to perform feature distribution optimization on the data stream semantic understanding feature vector to obtain an optimized data stream semantic understanding feature vector; and an identification code generating unit 332, configured to pass the optimized data stream semantic understanding feature vector through a decoder-based identification code generator to obtain a feature rule identification code.

Specifically, the feature distribution optimizing unit 331 is configured to perform feature distribution optimization on the data stream semantic understanding feature vector to obtain an optimized data stream semantic understanding feature vector. In particular, in one specific example of the present application, as shown in fig. 6, the feature distribution optimizing unit 331 includes: a context data stream semantic feature optimization subunit 3311, configured to perform feature distribution optimization on the respective upper and lower Wen Yuyi associated coding feature vectors based on the data stream semantic understanding feature vectors, so as to obtain a plurality of optimized upper and lower Wen Yuyi associated coding feature vectors; and a data stream semantic understanding feature optimization subunit 3312, configured to concatenate the plurality of optimization upper and lower Wen Yuyi associated coding feature vectors to obtain the optimized data stream semantic understanding feature vector.

More specifically, the contextual data stream semantic feature optimization subunit 3311 is configured to perform feature distribution optimization on the respective upper and lower Wen Yuyi associated encoding feature vectors based on the data stream semantic understanding feature vectors to obtain a plurality of optimized upper and lower Wen Yuyi associated encoding feature vectors. In particular, in one specific example of the present application, the contextual data stream semantic feature optimization subunit 3311 comprises: a weighting factor calculation secondary subunit, configured to calculate quantized transferable sensing factors of transferable features of the respective upper and lower Wen Yuyi associated coding feature vectors based on the data stream semantic understanding feature vectors, respectively, to obtain a plurality of weighting factors; and a weighted optimization secondary subunit, configured to perform weighted optimization on the upper and lower Wen Yuyi associated coding feature vectors with the plurality of weighting factors as weighting coefficients, so as to obtain the plurality of optimized upper and lower Wen Yuyi associated coding feature vectors.

The weighting factor calculation secondary subunit is configured to calculate quantized transferable sensing factors of transferable features of the respective upper and lower Wen Yuyi associated coding feature vectors based on the data stream semantic understanding feature vectors to obtain a plurality of weighting factors A number. In particular, in the technical solution of the present application, after the plurality of feature data item embedded vectors and the address single hot encoding vector pass through a context encoder based on a converter module, the feature data item embedded vectors and the address single hot encoding vectors may be subjected to context-related encoding, so as to promote intrinsic feature distribution consistency between each of the feature data item embedded vectors and the address single hot encoding vectors. However, considering the intrinsic differences of the encoding forms of the feature data item embedding vectors and the address single hot encoding vectors, the feature data item embedding vectors and the address single hot encoding vectors still have certain explicit differences through respective upper and lower Wen Yuyi associated encoding feature vectors obtained after context encoders based on the converter modules, so that when the data stream semantic understanding feature vectors are obtained through cascading, domain transfer differences to fusion feature domains exist, and the expression effect of the data stream semantic understanding feature vectors is affected. Based on this, the applicant of the present application associated each of the encoded feature vectors for the upper and lower Wen Yuyi, e.g., denoted as Wherein->，/>Is the number of the upper and lower Wen Yuyi associated coding feature vectors and the concatenated data stream semantic understanding feature vectors, e.g. denoted +.>Calculating a quantized transferable sensing factor of its transferable characteristics:

wherein the method comprises the steps ofAnd->The first part of the coding feature vector is the first part of the coding feature vector of each upper part Wen Yuyi and the second part of the coding feature vector is the second part of the coding feature vector of each lower part Wen Yuyi>The upper and lower Wen Yuyi of each are associated with a coding feature vector and a semantic understanding feature vector of the data stream,/->Is the +.sup.th of the coding feature vectors associated with the plurality of upper and lower Wen Yuyi>First part of the feature vector>Characteristic value of individual position->Is the +.f. of the semantic understanding feature vector of the data stream>Characteristic value of individual position->Represents a logarithmic function value based on 2, and +.>Is a weighted superparameter,/->Is the +.f of the plurality of weighting factors>And a number of weighting factors. Here, the quantized transferable sensing factor of the transferable feature estimates the domain uncertainty of the feature space domain to the decoding target domain by the uncertainty measure under domain transfer, and since the domain uncertainty estimate can be used to identify the feature representation that has been transferred between domains, byThe factor is used as a weight to weight each feature vector in the upper and lower Wen Yuyi associated coding feature vectors respectively, so that whether feature mapping is effectively transferred between domains can be identified through cross-domain alignment of a feature space domain to a decoding target domain, and accordingly transferability of transferable features in different feature vectors is quantitatively perceived, inter-domain self-adaptive feature fusion is achieved, and expression effect of the data stream semantic understanding feature vectors is improved. Therefore, the feature rule identification code can be generated based on the global semantic information of the data to be processed, and the matching processing of the data stream is performed based on the feature rule identification code, so that the efficiency and the accuracy of the data stream processing are improved, the occurrence of the phenomenon of data stream miss-matching is avoided, and the performance and the stability of the whole network service processing system are improved.

The weighted optimization secondary subunit is configured to perform weighted optimization on the upper and lower Wen Yuyi associated coding feature vectors by using the plurality of weighting factors as weighting coefficients, so as to obtain the plurality of optimized upper and lower Wen Yuyi associated coding feature vectors. Wherein the weighting coefficient is a parameter for performing weighting processing on the feature data. By assigning a weight to each feature, the features can be weighted according to their relative importance, thereby affecting the degree of contribution of the feature in the final result.

Accordingly, in one possible implementation, the plurality of weighting factors may be used as weighting coefficients to respectively weight and optimize the respective upper and lower Wen Yuyi associated encoded feature vectors to obtain the plurality of optimized upper and lower Wen Yuyi associated encoded feature vectors, for example: determining a weighting coefficient: determining a plurality of weighting coefficients according to the requirements and domain knowledge, each weighting coefficient corresponding to a feature or a combination of features; calculating weighted feature vectors: for each upper and lower Wen Yuyi, a weighting calculation is performed based on the corresponding weighting coefficients. The method comprises the following specific steps: multiplying each feature vector by a corresponding weighting coefficient to obtain a weighted feature vector; for feature combination, the weighted feature vectors of the features can be added by using a linear combination or other combination methods to obtain a final weighted feature vector; optimizing the weighted feature vector: for each weighted feature vector, further optimization may be performed to increase its expressive power and discrimination. The method comprises the following specific steps: feature vector analysis: analyzing the weighted feature vector to know the distribution and characteristics of the weighted feature vector; characteristic pretreatment: preprocessing the weighted feature vector, such as normalization, noise removal and the like; feature dimension reduction: if the feature vector dimension is higher, the dimension can be reduced by a dimension reduction method (such as principal component analysis) so as to improve the calculation efficiency and reduce redundant information; feature selection: selecting a most representative feature subset according to the relevance and importance of the features; feature mapping: mapping the feature vectors to a more task-appropriate representation space, such as using a self-encoder or other mapping method; feature optimization: according to a specific optimization target, the feature vectors are optimized and adjusted, such as the degree of distinction between the enhanced features, the redundancy between the features, and the like; evaluation and verification: evaluating the optimized feature vector through an evaluation index and a verification method, so as to ensure the effectiveness and accuracy of the feature vector in a task; obtaining a plurality of optimized upper and lower Wen Yuyi associated coding feature vectors: according to the steps, the weighting optimization is carried out on each upper Wen Yuyi and lower Wen Yuyi associated coding feature vector, and a plurality of optimized feature vectors are obtained.

More specifically, the data stream semantic understanding feature optimization subunit 3312 is configured to concatenate the plurality of optimization upper and lower Wen Yuyi associated coding feature vectors to obtain the optimized data stream semantic understanding feature vector. That is, the plurality of optimized up-down Wen Yuyi associated encoded feature vectors are further concatenated after being obtained to fuse information between different feature vectors with each other, thereby providing a more comprehensive feature description.

It should be noted that, in other specific examples of the present application, the feature distribution optimization may be performed on the data stream semantic understanding feature vector in other manners to obtain an optimized data stream semantic understanding feature vector, for example: feature vector analysis: analyzing the semantic understanding feature vector of the data stream, and knowing the distribution condition, the correlation among the features and the possible problems or defects; the analysis and exploration of feature vectors can be performed using statistical methods, visualization tools or feature selection algorithms, etc.; characteristic pretreatment: preprocessing the feature vector, including normalization, standardization, outlier removal and other operations, so as to ensure that the data range and distribution of the feature vector meet the optimization requirement; the preprocessing method can be selected according to the specific situation and the requirement of the feature vector, for example, maximum and minimum scaling, Z-score standardization and other methods are used; feature dimension reduction: if the feature vector dimension is high, the feature vector dimension reduction method can be considered to be used for converting the feature vector into a low-dimension representation so as to reduce the complexity and redundancy of data; the common feature dimension reduction method comprises Principal Component Analysis (PCA), linear Discriminant Analysis (LDA) and the like, and a proper method can be selected for dimension reduction operation; feature selection: selecting the feature subset with the most representation and distinction according to the importance and the relevance of the feature vector so as to reduce the dimension and the redundant information of the feature vector and simultaneously reserve key features; the feature selection method may include a filtering method, a wrapped method, or an embedded method, and may be selected based on statistical indicators, machine learning algorithms, or domain knowledge; feature mapping: a nonlinear mapping method can be used to map the original feature vector to a feature space with higher dimension so as to extract richer feature information; the commonly used nonlinear mapping method comprises a kernel function method, a neural network model in deep learning and the like; feature optimization: selecting a proper optimization method to optimize the feature vector according to the distribution condition and the optimization target of the feature vector; the feature vector can be optimized and adjusted according to the requirements of specific tasks by using methods such as a clustering algorithm, a decoder, a regression model and the like; evaluation and verification: evaluating and verifying the optimized semantic understanding feature vector of the data stream, and comparing the performance and effect of the semantic understanding feature vector on different tasks; the accuracy, robustness and interpretability of the optimized feature vector can be evaluated by using methods such as cross-validation, index evaluation and the like.

Specifically, the identifier code generating unit 332 is configured to pass the optimized data stream semantic understanding feature vector through a decoder-based identifier code generator to obtain a feature rule identifier code. The feature rule identification code is generated by utilizing the global semantic information of the data stream to be processed, so that the data stream to be processed can be matched based on the feature rule identification code, thereby avoiding data stream miss-matching and improving the performance and stability of the whole network service processing system.

Accordingly, in one possible implementation, the optimized data stream semantic understanding feature vector may be passed through a decoder-based identification code generator to derive a feature rule identification code, for example, by: a data set for training the decoder is collected and prepared. This data set should contain the input feature vector and the corresponding feature rule identification code; a decoder model is trained using the data set, which model is capable of converting feature vectors into feature rule identification codes. The decoder may be a neural network model, such as a self-encoder or a variational self-encoder; for a data stream to be processed, firstly, a corresponding feature extraction method is used for obtaining semantic understanding feature vectors of the data stream; inputting the obtained feature vector into a trained decoder model, and decoding the feature vector by a decoder and generating a corresponding feature rule identification code; the generated characteristic rule identification code is used for further data flow analysis tasks. These identification codes can be used for tasks such as data stream decoding, clustering, anomaly detection, etc., to extract important features and rules in the data stream.

It should be noted that, in other specific examples of the present application, the feature rule identification code may also be generated based on the data stream semantic understanding feature in other manners, for example: data preprocessing: collecting an original data stream, and performing necessary data cleaning and processing to ensure the integrity and accuracy of the data; extracting feature data and address data, and determining features required for generating a rule identification code according to the structure and the content of a data stream; the address data is thermally encoded and converted to a format suitable for processing. Characteristic data embedding: the feature data is mapped into a continuous low-dimensional vector representation using an embedding layer. The embedded layer can learn the representation of the feature data by training a neural network model, and can also use a pre-trained embedded vector for mapping; different embedded layer structures and parameters can be selected according to the requirements so as to obtain better characteristic representation effects; context encoder: semantic understanding of the feature data and address data is performed using a context encoder based on a converter module. The converter module is a neural network model based on a self-attention mechanism and can capture the association and the context information between data; the characteristic data and the address data are used as input, and the characteristic representation after semantic understanding is obtained through the coding process of the multi-layer converter module; and (3) generating a characteristic rule identification code: based on the feature representation processed by the context encoder, different algorithms and methods may be used to generate feature rule identification codes; the method of generating the rule identification code may be selected according to specific needs and application scenarios, for example, based on a hash function, a coding algorithm, or other specific rules; identification code application and verification: the generated characteristic rule identification code is applied to data flow analysis and mining tasks, and can be used for applications such as characteristic matching, data decoding, anomaly detection and the like; the validity and the accuracy of the generated identification code are verified, and the performance and the effect of the identification code on different tasks can be compared by testing and evaluating the identification code.

In particular, the matching module 340 is configured to perform matching processing on the data stream to be processed based on the feature rule identification code. That is, the data stream to be processed is subjected to matching processing based on the feature rule identification code generated by the decoding.

Accordingly, in one possible implementation manner, the matching process may be performed on the data stream to be processed based on the feature rule identification code, for example, by: defining a characteristic rule identification code: a set of feature rule identification codes is defined based on the specific requirements and the characteristics of the data stream. The identification codes can be predefined rules or can be automatically generated through a machine learning algorithm; extracting a data stream to be processed: the data segments to be matched are extracted from the incoming data stream. These data segments may be different types of data such as text, images, audio, etc.; encoding a data stream to be processed: the extracted data segments are encoded and converted into the form of feature vectors. This may use various feature extraction algorithms, such as bag of words models, convolutional neural networks, etc.; matching the characteristic rule identification code: and matching the coded data stream to be processed with the defined characteristic rule identification code. Similarity measurement methods, such as cosine similarity, euclidean distance and the like, can be used for measuring the similarity between the data stream to be processed and the characteristic rule identification codes; performing a matching process operation: and executing corresponding processing operation according to the matching result. This may be different operations such as data filtering, decoding, clustering, prediction, etc., depending on the application scenario and requirements; outputting a processing result: the processed data stream is output in the form of saved to a file, sent to other systems or presented to a user, etc.

In particular, in the technical solution of the present application, by calculating the quantized transferable sensing factor of the transferable feature of the respective upper and lower Wen Yuyi associated coding feature vectors, the quantized transferable sensing factor of the transferable feature estimates the domain uncertainty from the feature space domain to the decoding target domain through the uncertainty metric under the domain transfer, and since the domain uncertainty estimation can be used to identify the feature representation that has been transferred between domains, by weighting each feature vector in the upper and lower Wen Yuyi associated coding feature vectors with the factor as a weight, whether the feature map is effectively transferred between domains can be identified by the cross-domain alignment of the feature space domain to the decoding target domain, so as to quantitatively perceive the transferable nature of the transferable feature in different feature vectors, so as to realize the inter-domain adaptive feature fusion, and to improve the expression effect of the semantic understanding feature vector of the data stream. However, this improves the feature representation of the semantic understanding feature vector of the data stream in the full semantic space, and at the same time, causes the label distribution enrichment corresponding to the feature distribution diversification in the probability distribution domain of the decoding result when the semantic understanding feature vector of the optimized data stream is decoded by the decoder, thereby affecting the convergence effect of the weight matrix of the decoder in the decoding process. Based on the above, when the optimized data stream semantic understanding feature vector is decoded by a decoder, the optimized data stream semantic understanding feature vector is subjected to weight space exploration constraint based on regularization of a class matrix during each iteration of the weight matrix.

Accordingly, in one possible implementation manner, the data flow processing system based on the convergence and splitting device further includes: and the training module is used for carrying out weight space exploration constraint on the optimized data stream semantic understanding feature vector based on class matrix regularization to obtain a constrained optimized data stream semantic understanding feature vector when the optimized data stream semantic understanding feature vector passes through the decoder in the training process of the decoder and each time of iteration of the weight matrix.

Wherein, training module is used for: performing weight space exploration constraint on the optimized data stream semantic understanding feature vector based on class matrix regularization by using the following constraint formula to obtain a constrained optimized data stream semantic understanding feature vector;

wherein, the constraint formula is:

wherein,is the optimized data stream semantic understanding feature vector, specifically denoted as column vector,/for>Is the constrained optimized data stream semantic understanding feature vector,>is a row vector, +.>For a learnable domain transfer matrix, for example, the weight matrix can be initially set to the last iteration +.>Diagonal matrix of diagonal elements, +.>Representing the real number field, ++>The length of the feature vector is understood semantically for the optimized data stream, and +. >For an iterative weight matrix +.>For the weight matrix after the domain mapping, +.>Representing the transpose of the vector>Representing a matrix multiplication.

Here, the feature vector is understood semantically taking into account the weight spatial domain of the weight matrix and the optimized data streamDomain differences (domain gap) between probability distribution domains of decoding results of (a) by weight matrix +.>Semantic understanding of feature vectors with respect to the optimized data stream>As inter-domain migration agent (inter-domain transferring agent) to transfer probability distributions of valuable label constraints into the weight space, thereby avoiding the need for rich labeling (rich l during weight space based decodingThe enabled) probability distribution domain excessively explores the weight distribution in the weight space, so that the convergence effect of the weight matrix is improved, and the training effect of the optimized data stream semantic understanding feature vector in decoding through a decoder is improved.

As described above, the data stream processing system 300 based on the convergence and distribution device according to the embodiment of the present application may be implemented in various wireless terminals, for example, a server or the like having a data stream processing algorithm based on the convergence and distribution device. In one possible implementation, the data stream processing system 300 based on the convergence splitting device according to an embodiment of the present application may be integrated into a wireless terminal as a software module and/or a hardware module. For example, the convergence fabric-based data stream processing system 300 may be a software module in the operating system of the wireless terminal or may be an application developed for the wireless terminal; of course, the aggregate-shunt device based data flow handling system 300 may also be one of many hardware modules of the wireless terminal.

Alternatively, in another example, the aggregate-shunt device based data stream processing system 300 and the wireless terminal may be separate devices, and the aggregate-shunt device based data stream processing system 300 may be connected to the wireless terminal through a wired and/or wireless network and transmit the interaction information in a agreed data format.

Further, a data stream processing method based on the convergence and distribution equipment is also provided.

Fig. 7 is a flowchart of a data flow processing method based on a convergence and splitting device according to an embodiment of the present application. As shown in fig. 7, a data stream processing method based on a convergence and splitting device according to an embodiment of the present application includes: s1, acquiring a data stream to be processed; s2, carrying out data analysis on the data stream to be processed to obtain semantic understanding characteristics of the data stream; s3, generating a characteristic rule identification code based on the semantic understanding characteristics of the data stream; and S4, carrying out matching processing on the data stream to be processed based on the characteristic rule identification code.

In summary, the data stream processing method based on the convergence and splitting device according to the embodiment of the application is explained, which generates the feature rule identification code based on the feature data and the address data of the data stream to be processed and the feature encoding-decoding structure, and performs the matching processing of the data stream based on the feature rule identification code.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A data stream processing system based on a convergence and offloading device, comprising:

2. The data stream processing system based on a convergence splitting device as set forth in claim 1, wherein said data analysis module comprises:

A multi-class data extraction unit for extracting feature data and address data from the data stream to be processed; and

and the data semantic association coding unit is used for carrying out semantic association coding on the characteristic data and the address data to obtain the semantic understanding characteristics of the data stream.

3. The data stream processing system based on the convergence and diversion device as set forth in claim 2, wherein the data semantic association encoding unit comprises:

the address data single-heat coding subunit is used for carrying out single-heat coding on the address data to obtain an address single-heat coding vector;

the characteristic data embedding subunit is used for enabling each data item in the characteristic data to pass through an embedding layer to obtain a plurality of characteristic data item embedding vectors; and

the multi-data context Wen Yuyi association subunit is configured to embed the plurality of feature data items into vectors and the address single hot encoding vector to obtain a data stream semantic understanding feature vector as the data stream semantic understanding feature by a context encoder based on a converter module.

4. The data stream processing system based on the convergence and diversion device as set forth in claim 3, wherein the multi-data up-down Wen Yuyi association subunit comprises:

A context encoding secondary subunit configured to embed the plurality of feature data items into vectors and the address-one hot encoding vectors through the context encoder of the converter-based module to obtain a plurality of context Wen Yuyi associated encoded feature vectors;

and the cascade secondary subunit is used for cascading the plurality of upper and lower Wen Yuyi associated coding feature vectors to obtain the data stream semantic understanding feature vector.

5. The data stream processing system based on the convergence and diversion device as set forth in claim 4, wherein the feature rule identification code generation module comprises:

the feature distribution optimizing unit is used for carrying out feature distribution optimization on the data flow semantic understanding feature vector so as to obtain an optimized data flow semantic understanding feature vector; and

and the identification code generating unit is used for enabling the optimized data stream semantic understanding feature vector to pass through the identification code generator based on the decoder to obtain a feature rule identification code.

6. The data stream processing system based on the convergence and diversion apparatus as set forth in claim 5, wherein the feature distribution optimizing unit comprises:

a context data stream semantic feature optimization subunit, configured to perform feature distribution optimization on the respective upper and lower Wen Yuyi associated coding feature vectors based on the data stream semantic understanding feature vectors, so as to obtain a plurality of optimized upper and lower Wen Yuyi associated coding feature vectors; and

And the data flow semantic understanding characteristic optimizing subunit is used for cascading the plurality of optimizing upper and lower Wen Yuyi associated coding characteristic vectors to obtain the optimized data flow semantic understanding characteristic vector.

7. The sink bypass device based data stream processing system of claim 6, wherein the contextual data stream semantic feature optimization subunit comprises:

a weighting factor calculation secondary subunit, configured to calculate quantized transferable sensing factors of transferable features of the respective upper and lower Wen Yuyi associated coding feature vectors based on the data stream semantic understanding feature vectors, respectively, to obtain a plurality of weighting factors; and

and the weighted optimization secondary subunit is used for respectively weighted optimizing the upper and lower Wen Yuyi associated coding feature vectors by taking the plurality of weighted factors as weighted coefficients to obtain the plurality of optimized upper and lower Wen Yuyi associated coding feature vectors.

8. The data stream processing system based on a convergence splitting device as set forth in claim 7, wherein said weighting factor calculation secondary subunit is configured to: calculating quantized transferable sensing factors of transferable features of the respective upper and lower Wen Yuyi associated coding feature vectors based on the data stream semantic understanding feature vectors to obtain the plurality of weighting factors, respectively;

Wherein, the optimization formula is:

wherein the method comprises the steps ofAnd->The first part of the coding feature vector is the first part of the coding feature vector of each upper part Wen Yuyi and the second part of the coding feature vector is the second part of the coding feature vector of each lower part Wen Yuyi>The upper and lower Wen Yuyi of each are associated with a coding feature vector and a semantic understanding feature vector of the data stream,/->Is the +.sup.th of the coding feature vectors associated with the plurality of upper and lower Wen Yuyi>First part of the feature vector>Characteristic value of individual position->Is the +.f. of the semantic understanding feature vector of the data stream>Characteristic value of individual position->Represents a logarithmic function value based on 2, and +.>Is a weighted superparameter,/->Is the +.f of the plurality of weighting factors>And a number of weighting factors.

9. The data stream processing system based on the convergence splitting device as set forth in claim 8, further comprising: the training module is used for carrying out weight space exploration constraint on the optimized data stream semantic understanding feature vector based on class matrix regularization to obtain a constrained optimized data stream semantic understanding feature vector when the optimized data stream semantic understanding feature vector passes through the decoder in the training process of the decoder and each time of iteration of the weight matrix;

Wherein, the constraint formula is:

wherein,is the optimized data stream semantic understanding feature vector,/->Is the constrained optimized data stream semantic understanding feature vector,>is a domain transfer matrix which can be learned, +.>Representing the real number field, ++>The length of the feature vector is understood semantically for the optimized data stream, and +.>For an iterative weight matrix +.>For the weight matrix after the domain mapping, +.>Representing the transpose of the vector>Representing a matrix multiplication.

10. The data stream processing method based on the convergence and distribution equipment is characterized by comprising the following steps of:

acquiring a data stream to be processed;