CN115712614B - Information processing method and system based on data transmission flow control - Google Patents

Information processing method and system based on data transmission flow control Download PDF

Info

Publication number
CN115712614B
CN115712614B CN202211384421.XA CN202211384421A CN115712614B CN 115712614 B CN115712614 B CN 115712614B CN 202211384421 A CN202211384421 A CN 202211384421A CN 115712614 B CN115712614 B CN 115712614B
Authority
CN
China
Prior art keywords
user
streaming data
grouping
data
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211384421.XA
Other languages
Chinese (zh)
Other versions
CN115712614A (en
Inventor
莫峰华
李嘉斌
杨超文
赖仕年
徐道广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangke Guangruan Guangzhou Digital Technology Co ltd
Original Assignee
Hangke Guangruan Guangzhou Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangke Guangruan Guangzhou Digital Technology Co ltd filed Critical Hangke Guangruan Guangzhou Digital Technology Co ltd
Priority to CN202211384421.XA priority Critical patent/CN115712614B/en
Publication of CN115712614A publication Critical patent/CN115712614A/en
Application granted granted Critical
Publication of CN115712614B publication Critical patent/CN115712614B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The information processing method and the system based on the data transmission flow control adopt a data packet network to carry out multiple grouping on the user big data flow log until the user big data flow log meets the set requirement; grouping user data in the user big data stream log according to each group representative knowledge field; and clearing the data isolated from the streaming data group in the user big data streaming log. According to the method and the system, knowledge training and grouping are integrated in the grouping process, the organic fusion of the knowledge training and unsupervised grouping together is completed, the knowledge training and unsupervised grouping are mutually influenced, so that the scattering characteristics of the grouping can be considered simultaneously during knowledge training, the accuracy of knowledge training is benign and enhanced, preprocessing can be rapidly and accurately performed on a large data stream log of a user, no additional analysis process is needed, and the accuracy and timeliness of subsequent stream data processing are guaranteed.

Description

Information processing method and system based on data transmission flow control
Technical Field
The application relates to the field of big data and artificial intelligence, in particular to an information processing method and system based on data transmission flow control.
Background
In application scenes such as the internet and the internet of things, complicated business requirements such as personalized services, user experience improvement, intelligent analysis, in-process decision making and the like are higher in requirements on big data processing technology. To meet these demands, large data processing systems must return processing results in milliseconds or even microseconds. In this way, the concept of streaming data (or real-time big data) appears, and when the streaming data is subjected to data mining, the preprocessing process of the big data, such as the removal of disturbance information of the streaming data, is not separated, so that the accuracy of subsequent data analysis is ensured. It can be appreciated that, for streaming data, because of its real-time nature, the time consumption of the pre-processing process should not be too long, and how to improve the efficiency of streaming data cleaning is a technical issue to be considered.
Disclosure of Invention
The invention aims to provide an information processing method and system based on data transmission flow control so as to improve the efficiency and accuracy of flow data cleaning.
In order to achieve the above object, embodiments of the present application are implemented as follows:
an embodiment of the present application provides an information processing method based on data transmission flow control, applied to an information processing system, where the method includes:
Obtaining a user big data stream log, and grouping for multiple times through a data grouping network according to the user big data stream log until the user big data stream log meets the set requirement, wherein each grouping comprises the following processing flow:
each of a plurality of groups of user streaming data in the user big data streaming log is subjected to data quantization, a plurality of corresponding user behavior knowledge fields are obtained according to quantization results, and knowledge expression error results are obtained;
grouping the plurality of user behavior knowledge fields, and determining a grouping error result according to a grouping result;
optimizing network coefficients of the data packet network when the data packet network does not meet preset requirements according to the knowledge expression error result and the packet error result;
outputting a plurality of group representative knowledge fields covered by a group result obtained by the last grouping, wherein each group representative knowledge field points to a corresponding user behavior classification;
grouping the user data in the user big data stream log according to each grouping representative knowledge field to obtain P stream data groups; wherein P is the number of the packet representative knowledge fields and P is a positive integer greater than or equal to 1;
And clearing the data isolated from the P streaming data groups in the user big data streaming log.
As a possible implementation manner, each of the plurality of groups of user streaming data in the user big data streaming log adopts data quantization, and a plurality of corresponding user behavior knowledge fields are obtained according to quantization results, including: for the multiple groups of user streaming data, mining knowledge fields for one time or multiple times respectively to acquire corresponding initial knowledge fields; respectively carrying out knowledge refinement on each obtained initial knowledge field to obtain user behavior knowledge fields corresponding to each group of user streaming data;
before grouping multiple times through the data packet network according to the user big data stream log, the method further comprises: determining a plurality of user streaming data subsets according to the user big data streaming log, wherein each user streaming data subset comprises at least two groups of user streaming data and a commonality scoring tag between the at least two groups of user streaming data;
the determining knowledge representation error results comprises: for the plurality of user stream data subsets, the following processing flows are carried out: determining a commonality score reasoning result between at least two groups of user streaming data according to user behavior knowledge fields of at least two groups of user streaming data included in one user streaming data subset of the plurality of user streaming data subsets, and performing error calculation on the obtained commonality score reasoning result and a corresponding commonality score label to obtain a corresponding error calculation result; and determining the knowledge expression error result through each error calculation result obtained for the plurality of groups of user stream data subsets.
As a possible implementation manner, before the multiple packets are performed through the data packet network according to the user big data stream log, the method further includes: determining a plurality of user streaming data subsets according to the user big data streaming log, wherein each user streaming data subset comprises first user streaming data, second user streaming data and third user streaming data, the commonness score between the first user streaming data and the second user streaming data in each user streaming data subset is larger than or equal to a preset commonness score reference value, and the commonness score between the first user streaming data and the third user streaming data is smaller than the commonness score reference value;
the determining knowledge representation error results comprises: for the plurality of user stream data subsets, the following processing flows are carried out: in one of the one or more user streaming data subsets, acquiring a first commonality score between the first user streaming data and the second user streaming data, acquiring a second commonality score between the first user streaming data and the third user streaming data, and acquiring a preset error result corresponding to the user streaming data subset according to the first commonality score and the second commonality score; and obtaining the knowledge expression error result according to the preset error results corresponding to the obtained multiple groups of user streaming data subsets.
As a possible implementation manner, the grouping the plurality of user behavior knowledge fields includes: screening one or more user behavior knowledge fields corresponding to first user streaming data and third user streaming data in the plurality of user streaming data subsets from the plurality of user behavior knowledge fields; grouping one or more of the user behavior knowledge fields;
the grouping the plurality of user behavior knowledge fields and determining a grouping error result from the grouping result includes: for the plurality of user behavior knowledge fields, the following processing flow is carried out: each user behavior knowledge field in the plurality of user behavior knowledge fields is respectively associated with each user behavior classification, and a corresponding user behavior classification field table is obtained; wherein, each component in the user behavior classification field table is matched with one user behavior classification, and the corresponding value of each component indicates whether the user streaming data corresponding to the user behavior knowledge field can be summarized to the corresponding user behavior classification; performing error calculation on the user behavior classification field table and the user behavior classification field table determined by the user flow data corresponding to the user behavior knowledge field in the previous grouping to obtain an error calculation result; and determining the grouping error result according to a plurality of error calculation results obtained for the plurality of user behavior knowledge fields.
As a possible implementation manner, before determining the packet error result according to a plurality of error calculation results obtained for the plurality of user behavior knowledge fields, the method further includes:
screening one or more user behavior knowledge fields corresponding to first user streaming data and third user streaming data in the plurality of user streaming data subsets from the plurality of user behavior knowledge fields;
determining the group error result according to a plurality of error calculation results obtained for the plurality of user behavior knowledge fields, including:
and determining the grouping error result according to the error calculation result corresponding to one or more user behavior knowledge fields.
As a possible implementation manner, associating one of the plurality of user behavior knowledge fields with each user behavior classification, to obtain a corresponding user behavior classification field table, including: determining the commonality scores between the user behavior knowledge fields and the group representative knowledge fields determined by the current grouping respectively; according to the obtained commonality scores, determining whether the user streaming data corresponding to the user behavior knowledge field can be summarized to the user behavior classification indicated by the corresponding grouping representative knowledge field, and obtaining a summary judgment result; obtaining the user behavior classification field table according to the induction judgment result;
Before optimizing the network coefficients of the data packet network when the data packet network does not meet the preset requirement according to the knowledge expression error result and the packet error result, the method further comprises: determining an importance index of the grouping error result according to a difference result between the knowledge expression error result and the knowledge expression error result of the previous grouping; wherein the importance index and the difference result have a reverse association relationship; determining a network total error calculation result of the data packet network according to the knowledge expression error result, the packet error result and the importance index; and determining whether the data packet network reaches a preset condition according to the total network error calculation result.
As a possible implementation manner, the data packet network includes a data quantization network, and before grouping the plurality of user behavior knowledge fields and determining a grouping error result according to the grouping result, the method further includes:
when the data quantization network is determined to not meet the preset requirement according to the knowledge expression error result, performing coefficient optimization on the data quantization network;
Performing one or more times of adjustment and optimization on the optimized data quantization network until the current grouping meets the preset requirement; for each adjustment optimization, respectively carrying out data quantization on a plurality of groups of user streaming data according to the data quantization network, determining a knowledge expression error result according to a plurality of user behavior knowledge fields obtained by quantization, carrying out coefficient optimization on the data quantization network according to the knowledge expression error result, wherein the preset requirement comprises that the number of data quantization rounds of the current grouping accords with the preset number of rounds, or the data quantization network converges;
grouping the plurality of user behavior knowledge fields, comprising: grouping according to a plurality of user behavior knowledge fields obtained by quantification of a data quantification network after last adjustment and optimization in the current grouping.
As a possible implementation manner, the method further comprises:
determining a first grouping factor according to the P streaming data groups, wherein the first grouping factor indicates the grouping compactness of streaming data in the P streaming data groups;
splitting the selected streaming data set to obtain two streaming data sets aiming at any one selected streaming data set in the P streaming data sets to obtain Q comparison streaming data sets, wherein comparison grouping factors determined according to the Q comparison streaming data sets are used as second grouping factors of the selected streaming data sets, and the second grouping factors indicate grouping compactness of streaming data in the Q comparison streaming data sets; wherein, q=p+1;
Splitting a selected streaming data set corresponding to the largest second grouping factor when the largest second grouping factor in the second grouping factors of the P streaming data sets is larger than or equal to the first grouping factor to obtain two streaming data sets, and obtaining Q streaming data sets;
splitting the selected streaming data set again for any one of the Q streaming data sets to obtain two streaming data sets so as to obtain R comparison streaming data sets, wherein a comparison grouping factor determined according to the R comparison streaming data sets is used as a third grouping factor of the selected streaming data set, and the third grouping factor indicates grouping compactness of the R comparison streaming data sets; wherein r=p+2;
when the largest third grouping factor in the third grouping factors of the Q second streaming data sets is larger than or equal to the largest second grouping factor, splitting the selected streaming data set corresponding to the largest third grouping factor to obtain two streaming data sets, and obtaining R streaming data sets until the largest grouping factor in a plurality of grouping factors obtained by the current splitting is smaller than the grouping factor before splitting.
As a possible implementation manner, the determining the first grouping factor according to the P streaming data sets includes:
for each of the P streaming data sets, determining a pooling factor and a tearing factor corresponding to the streaming data according to the first streaming data knowledge of the streaming data, the first streaming data knowledge of the remaining streaming data in the streaming data set of the streaming data collection, and the first streaming data knowledge of the streaming data in the remaining streaming data set, the pooling factor indicating a dissimilarity score between the streaming data and the remaining streaming data in the streaming data set of the streaming data collection, and the tearing factor indicating a dissimilarity score between the streaming data and the streaming data in the remaining streaming data set;
determining a sub-grouping factor corresponding to the streaming data according to the collecting factor and the tearing factor, wherein the sub-grouping factor and the collecting factor are in a reverse association relationship, and the sub-grouping factor and the tearing factor are in a forward association relationship;
and determining the first grouping factor according to the sub-grouping factor corresponding to each streaming data.
A second aspect of the embodiments of the present application provides an information processing system, including a processor and a memory, where the memory stores a computer program, and when the processor runs the program, the method described above is implemented.
According to the information processing method and the system based on the data transmission flow control, through obtaining the user large data flow log, the user large data flow log is grouped for multiple times by adopting a data grouping network until the user large data flow log meets the set requirement, wherein each grouping comprises the steps of respectively carrying out data quantization on multiple groups of user flow data in the user large data flow log, obtaining a plurality of corresponding user behavior knowledge fields according to the quantization result, and obtaining a knowledge expression error result; grouping a plurality of user behavior knowledge fields, and determining a grouping error result according to the grouping result; optimizing network coefficients of the data packet network when the data packet network does not meet preset requirements according to the knowledge expression error result and the packet error result; outputting a plurality of group representative knowledge fields covered by a group result obtained by the last grouping, wherein each group representative knowledge field points to a corresponding user behavior classification; grouping user data in the user big data stream log according to each group representative knowledge field to obtain P stream data groups; wherein P is the number of the grouping representative knowledge fields and P is a positive integer greater than or equal to 1; and removing the data isolated from the P streaming data groups in the user big data streaming log. Therefore, the streaming data in the user big data streaming log is grouped, isolated data outside the grouping is determined as disturbance information and cleared, so that the user big data streaming log can be rapidly and accurately preprocessed, no additional analysis process is needed, and the accuracy and timeliness of subsequent streaming data processing are ensured. In addition, the user big data stream type logs are grouped through the set data grouping network, knowledge training and grouping are integrated in the grouping process, organic fusion of knowledge training and unsupervised grouping is completed, the knowledge training and unsupervised grouping are mutually influenced, the scattering characteristics of the grouping can be considered at the same time when knowledge training is known, the acquired user behavior knowledge fields also have the scattering characteristics, the accuracy of the knowledge training is benign and enhanced, further, the grouping of each time can maintain stronger consistency of the current grouping and the previous grouping, so that the grouping is highly similar to the knowledge of the same behavior classification, the finally obtained grouping represents knowledge fields more accurately, and the grouping result of the user data in the user big data stream type logs according to the grouping represents knowledge fields is more accurate, so that the possibility that the isolated stream data are disturbance information is more accurate.
In the following description, other features will be partially set forth. Upon review of the ensuing disclosure and the accompanying figures, those skilled in the art will in part discover these features or will be able to ascertain them through production or use thereof. The features of the present application may be implemented and obtained by practicing or using the various aspects of the methods, tools, and combinations that are set forth in the detailed examples described below.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a flowchart of an information processing method based on data transmission flow control according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a functional module architecture of an information processing apparatus according to an embodiment of the present application.
Fig. 3 is a schematic diagram of the composition of an information processing system according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings in the embodiments of the present application. The terminology used in the description of the embodiments of the application is for the purpose of describing particular embodiments of the application only and is not intended to be limiting of the application.
The main execution body of the information processing method based on the data transmission flow control in the embodiment of the application is an information processing system, and the specific expression forms of the main execution body can include, but are not limited to, a server, a personal computer, a notebook computer and the like. When the server is used, the server can be a single network server, a server group formed by a plurality of network servers or a cloud formed by a plurality of computers or network servers in cloud computing, wherein the cloud computing is one of distributed computing and is a super virtual computer formed by a group of loosely coupled computer sets. The network in which it is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like. In this embodiment of the present application, the information processing system is in communication connection with at least one terminal device, where the terminal device may be a user client, such as a smart phone, a tablet pc, or an internet of things sensing network device, such as a car networking device, and may collect behavior data of a user, and after the terminal device collects flow type data, the data is uploaded to the information processing system for processing, where the information processing system executes the information processing method based on data transmission flow control provided in this embodiment of the present application.
The embodiment of the application provides an information processing method based on data transmission flow control, which is applied to an information processing system, as shown in fig. 1, and comprises the following steps:
step S1: and acquiring a user big data stream log.
As an implementation manner of the embodiment of the present application, the user big data stream log refers to a log file formed by acquiring user type data in real time or near real time and performing big data stream formation with timeliness processing on the user type data. In general, streaming data processing is configured by using frames such as Spark, storm and Samza, and can complete a job in a short time, for example, from milliseconds to seconds, so that the streaming data processing is suitable for application scenes with high timeliness requirements. Such as online personalized recommendation, website user real-time behavior acquisition and analysis, internet of things machine log real-time analysis, financial real-time consumption anti-fraud, real-time abnormal personnel identification, etc. In one implementation of this embodiment of the present application, the user big data stream log includes multiple sets of user stream data, for example, such data is classified by click behavior data, session behavior data, feedback behavior data, and other user behaviors when the user browses an internet platform (such as an e-commerce, reading, video, comment, etc., platform). Analysis of the real-time streaming data can help to obtain potential behavior targeting information, such as user portraits, of the user behavior data and make corresponding operation decisions, such as service recommendations, based on the behavior targeting information. Before analyzing the big data stream log, the data purity is important, specifically, at present, the big data stream log of the user is analyzed by using an AI model, which is basically debugged first to obtain a converged application model, the data volume dependence and the calculation cost are very large, if the data contains a large amount of disturbance information (noise), the data convergence efficiency is greatly affected, and even the reasoning precision of the obtained model is greatly adversely affected, so that after the big data stream log of the user is obtained, the big data stream log of the user is necessary to be preprocessed, namely denoising is performed before the analysis is performed.
Step S2: and carrying out grouping for multiple times through a data grouping network according to the large data stream log of the user until the large data stream log meets the set requirement.
The information processing method based on the data transmission flow control provided by the embodiment of the application is implemented by adopting an artificial intelligent model, for example, the information processing method is implemented by a data packet network, the data packet network can be any feasible machine learning model or deep learning network, and the data packet network can comprise a data quantization network and a packet network. The function of the data quantization network is to realize the quantization coding of the data, and the function of the packet network is to realize the grouping of the data.
As an implementation manner of an embodiment of the present application, each of the multiple packets involved in the steps above includes the following processing flows: each of a plurality of groups of user streaming data extracted from the user big data streaming log is subjected to data quantization, the characteristic coding of the data is completed in the data quantization process, a plurality of corresponding user behavior knowledge fields (knowledge vectors obtained through the expert system branch mining of the AI technology) are obtained according to quantization results, and knowledge expression error results are obtained; meanwhile, grouping a plurality of user behavior knowledge fields, and determining a grouping error result according to a grouping result; and determining whether the data packet network reaches a preset condition according to the knowledge expression error result and the packet error result, optimizing the network coefficient of the data packet network when the data packet network is determined to not meet the preset requirement, and starting the next packet process.
Step S3: and outputting a plurality of groups of group representative knowledge fields covered by the grouping result obtained by the last grouping, wherein each group of group representative knowledge field points to a corresponding user behavior classification.
As an implementation manner of the embodiment of the present application, when the data packet network converges (for example, a preset reasoning precision is met, a preset debugging frequency is reached, and a weight change of two times of tuning meets a preset value), the grouping ends, in other words, multiple groups of user streaming data in the user big data streaming log are summarized into corresponding user behavior classifications (such as click data, evaluation data and collection data). The user behavior knowledge field of each classification can be indicated in the grouping result, and the user behavior knowledge field of each classification can be determined as the grouping representative knowledge field of each classification, so that a plurality of grouping representative knowledge fields are obtained, each grouping representative knowledge field points to the corresponding user behavior classification, each grouping representative knowledge field serves as the center of the corresponding grouping, and the reference points are collected. It is easy to understand that the steps S1 to S3 are processes of debugging the data packet network, and when the data packet network has completed debugging or receives the user large data stream log without further improvement and optimization, the steps S1 to S3 can be skipped directly, and the subsequent steps S4 to S5 can be executed directly.
As an implementation manner of the embodiment of the present application, a process of grouping a data packet network is a process of performing debugging on the data packet network, when the data packet network is at the end of grouping, the data packet network reaches convergence, and when the data packet network is subjected to debugging optimization, the data packet network is iteratively debugged to enable the data packet network to slowly meet a set convergence condition, and each round of debugging, or each round of grouping is consistent, each round of grouping includes the following steps:
step S10: and determining a plurality of user streaming data subsets according to the user big data streaming log, wherein each user streaming data subset comprises at least two groups of user streaming data and a commonality grading label between the at least two groups of user streaming data.
As an implementation of the embodiment of the present application, the determined user streaming data subsets are used for knowledge training or knowledge learning, and different user streaming data subsets may be determined according to the variability of the data quantization network adopted, for example, when the data quantization network is according to the supervised learning network, each determined user streaming data subset includes two or more groups of user streaming data, and each group of user streaming data subset includes at least two groups of user streaming data and a commonality scoring tag (record information indicating a degree of similarity) between at least two groups of user streaming data.
The data quantization network may also be a network that infers from a data commonality score, then each user streaming data subset may include two sets of user streaming data, each user streaming data subset being recorded to indicate whether the two sets of user streaming data included in each user streaming data subset are close or not close, and further to undertake supervised learning (supervised) knowledge training from the recorded information. As another case, the data quantization network may be a network that is based on triplet loss, where each user streaming data subset determined at this time contains a reference sample, a positive sample, and a negative sample.
Step S20: and respectively carrying out data quantization on the multiple groups of user streaming data, and acquiring a plurality of corresponding user behavior knowledge fields according to quantization results.
As an implementation manner of the embodiment of the present application, in each grouping process, a plurality of user streaming data subsets for performing current grouping are loaded into the data packet network, and the user streaming data included in the plurality of user streaming data subsets for performing current grouping may include all user streaming data in the user big data streaming log or be screened from the user big data streaming log. If the user big data stream log is screened, the user big data stream log can be screened according to the screening quantity determined in advance, or the user big data stream log can be screened according to the screening strategy determined in advance. The network infrastructure of the data quantization network employed for data quantization may be unlimited, such as the common CNN, resnet.
As an implementation manner, the architecture of the data quantization network may include a knowledge field mining module and a knowledge extraction module, where the knowledge field mining module performs knowledge field mining on multiple groups of user streaming data once or multiple times to obtain corresponding initial knowledge fields, and the knowledge extraction module performs knowledge extraction on each obtained initial knowledge field to obtain user behavior knowledge fields corresponding to the multiple groups of user streaming data, i.e. the knowledge extraction module completes a compression process on the initial knowledge fields, so as to save operation overhead. As an implementation of an embodiment of the present application, the knowledge field mining module may be a residual module. As an implementation manner, the pre-debugging knowledge field mining module is used for improving the overall debugging speed of the network, the structure of the knowledge field mining module can comprise a plurality of cascaded convolution units, convolution kernels, channels and stride of each convolution unit are different, the obtained knowledge relation network is different in size, specific parameters are determined according to actual conditions, and the method is not limited herein.
As one embodiment, the structure of the knowledge refinement module includes a pulling module and a plurality of FC (fully connected) modules. The pooling module carries out pooling treatment on the initial knowledge field generated by the knowledge field mining module to obtain a basic knowledge field, then the basic knowledge field is refined for many times through a plurality of FC modules arranged at one time, the dimension of the finally obtained user behavior knowledge field is reduced, the parameters of each module are easy to understand, and the parameters of each module are not limited and are determined according to practice.
Step S30: and determining a knowledge expression error result.
As an implementation manner of the embodiment of the present application, after obtaining the user behavior knowledge fields of the multiple sets of user streaming data, the knowledge expression error result of the data quantization network may be determined according to the user behavior knowledge fields. For example, because there is a prior record of the determined sets of user streaming data subsets, knowledge representation error results may be determined from the records of the sets of user streaming data subsets. For example, the common score reasoning result between at least two groups of user streaming data can be determined for the user behavior knowledge field of at least two groups of user streaming data included in each of the plurality of user streaming data subsets, error calculation is performed on the obtained common score reasoning result and the corresponding common score label to obtain the corresponding error calculation result of each user streaming data subset, and thus the knowledge expression error result can be determined through the error calculation results obtained for the plurality of groups of user streaming data subsets.
As another implementation of the embodiments of the present application, each subset of user-streaming data takes the form of a triplet, with any subset of user-streaming data including first user-streaming data (reference user-streaming data), second user-streaming data (positive user-streaming data), and third user-streaming data (negative user-streaming data). The commonness score between the first user streaming data and the second user streaming data in each subset of user streaming data is greater than or equal to a preset commonness score reference value, and furthermore the commonness score between the first user streaming data and the third user streaming data is less than the commonness score reference value, in other words, the first user streaming data and the second user streaming data in the three user streaming data are close, and the first user streaming data and the third user streaming data are not close. If each user flow data subset contains triples, respectively obtaining first common scores between first user flow data and second user flow data in the plurality of groups of user flow data subsets for the plurality of groups of user flow data subsets in the process of determining the knowledge expression error result, obtaining second common scores between the first user flow data and third user flow data, obtaining a preset error result corresponding to one user flow data subset according to the first common scores and the second common scores, and further obtaining the knowledge expression error result according to the preset error results corresponding to each of the plurality of groups of user flow data subsets.
Step S40: grouping the plurality of user behavior knowledge fields and determining a grouping error result according to the grouping result.
After the knowledge fields of the user behaviors corresponding to the user streaming data are obtained, the corresponding knowledge expression error results are obtained, and grouping is carried out continuously according to the knowledge fields of the user behaviors corresponding to the user streaming data, so that grouping results are obtained. The grouping network performs the above process through a grouping module, the grouping module is used for performing knowledge mapping (or projection) on user behavior knowledge fields of multiple groups of user flow data, the grouping module stores grouping representative knowledge fields of each user behavior classification, the user behavior knowledge fields are associated to centroids of P user behavior classifications, and P is the number of the user behavior classifications. The grouping module may include an FC module, a linear activation module, a convolution module, and the like. The grouping, namely learning the grouping representative knowledge field, adopts the method of transforming the grouping representative knowledge field into a network eccentric coefficient for debugging, continuously optimizes and perfects the grouping representative knowledge field for a plurality of times, and obtains the grouping representative knowledge field when the data grouping network reaches a preset debugging cut-off condition.
If the triad error is used for knowledge training, only the user streaming data with different triads are screened for grouping in each grouping process, such as screening the first user streaming data and the third user streaming data or screening the second user streaming data and the third user streaming data. And if the first user streaming data and the third user streaming data are screened, screening one or more user behavior knowledge fields corresponding to the first user streaming data and the third user streaming data in a plurality of groups of user streaming data subsets from the obtained plurality of user behavior knowledge fields in the grouping process, and grouping the one or more user behavior knowledge fields. The classification of the second user streaming data is classified into the classification of the first user streaming data collection, or the commonality score of the second user streaming data and the knowledge field represented by each group is obtained after grouping, and the classification of the maximum commonality score is taken as the classification of the second user streaming data. In each grouping process, user streaming data of different classifications is screened after grouping, and a grouping error result is obtained, wherein each user behavior knowledge field in a plurality of user behavior knowledge fields is respectively associated with each user behavior classification, and a corresponding user behavior classification field table is obtained. The user behavior classification field table includes a plurality of components, one component matches one user behavior classification, and a corresponding value of each component indicates whether the user flow data corresponding to one user behavior knowledge field can be summarized into the corresponding user behavior classification, for example, if a corresponding value of a second component in the user behavior classification field table is Y, it indicates that the user behavior classification corresponds to the second component, and if a corresponding value of a fourth component is N, it indicates that the user flow data corresponding to the fourth component is not the user behavior classification corresponding to the fourth component.
For example, setting a user behavior knowledge field 1, in the process of obtaining the user behavior classification field table of the user behavior knowledge field 1, determining the commonality scores between the user behavior knowledge field 1 and each group representative knowledge field obtained by the current grouping, and determining whether the user streaming data corresponding to the user behavior knowledge field 1 can be summarized to the user behavior classification indicated by the corresponding group representative knowledge field according to each obtained commonality score, so as to obtain a summary judgment result, and obtaining the user behavior classification field table according to the summary judgment result. For example, the commonality scores of the group representative knowledge fields of the user behavior knowledge field 1 and the user behavior classification 2 are obtained, if the commonality scores between the group representative knowledge fields of the user behavior knowledge field 1 and the user behavior classification 2 are larger than the set commonality scores, the user streaming data corresponding to the user behavior knowledge field 1 is determined to be the user behavior classification 2, the corresponding numerical value of the corresponding component of the user behavior classification 2 in the user behavior classification field table is the corresponding numerical value indicating attribution of the classification, and the user behavior classification field table of the user behavior knowledge field 1 can be obtained based on the obtaining of the commonality scores of the classifications. Or after the commonality score is determined between the user behavior knowledge field 1 and each classified group representative knowledge field, the maximum commonality score is screened to determine the user behavior classification of the user streaming data corresponding to the user behavior knowledge field 1, and the corresponding numerical value of the corresponding component in the user behavior classification field table is determined as the corresponding numerical value of the classification, so that the user behavior classification field table of the user behavior knowledge field 1 is obtained. The user behavior classification field table can be used for determining the user behavior classification of a plurality of groups of user streaming data after the current grouping, the user behavior classification field table can be regarded as a mark of a plurality of groups of user streaming data, the mark is generated through grouping, the mark is not the actual mark of the user streaming data, the mark can be regarded as a false grouping mark, when the grouping network approaches convergence, the false grouping mark of the plurality of groups of user streaming data is not changed any more, and whether the grouping network reaches a preset condition is judged through the change of the false grouping mark. And performing error calculation to determine an error calculation result, in other words, performing error calculation to the user behavior classification field table obtained at the present time and the user behavior classification field table obtained at the previous time on the user behavior classification field table which is obtained at the present time and corresponds to the user behavior knowledge field in the previous time in the grouping, so as to obtain error calculation results which correspond to the user behavior knowledge fields of multiple groups of user behavior data, and determining a grouping error result according to multiple error calculation results which are obtained for multiple user behavior knowledge fields.
As an implementation manner of the embodiment of the present application, if knowledge training is performed through triple error, each grouping may only filter user streaming data of different classifications to obtain a grouping error, after all user streaming data are grouped, filter user streaming data of different classifications to obtain a grouping error, for example, filter first user streaming data and third user streaming data, and after grouping, filter error calculation results corresponding to one or more user behavior knowledge fields corresponding to the first user streaming data and the third user streaming data in a plurality of groups of user streaming data subsets to determine a grouping error result.
Step S50: and determining a network total error calculation result of the data packet network according to the knowledge expression error result and the packet error result.
As an implementation mode of the embodiment of the application, all coefficients of the network are regarded as optimization objects, when in debugging, the loaded user streaming data is operated to obtain an inference result, a knowledge expression error result and a grouping error result are obtained, and a network total error calculation result of the current grouping of the data grouping network is obtained according to the knowledge expression error result and the grouping error result. The grouping process optimizes the user behavior classification field table of each user streaming data, and in the next debugging process, the target referred by the grouping module becomes an optimized user behavior classification field table, however, because the increase of the grouping error result after each grouping optimization of the user behavior classification field table can interfere with the quantification of the data, when in debugging, the corresponding importance index (adjusting factor or weight value) is configured for the grouping interference, and the specific numerical value of the importance index can be adjusted through the condition adaptability of knowledge training. Then, the importance index of the packet error result can be determined according to the difference result between the knowledge expression error result and the knowledge expression error result of the previous packet, and then the total error of the data packet network is determined according to the knowledge expression error result, the packet error result and the importance index. The importance index and the difference result have inverse association relation, namely, the larger the importance index is, the smaller the difference result is, the smaller the importance index is, and the larger the difference result is.
Step S60: and determining whether the data packet network reaches a preset condition according to the total network error calculation result.
If the preset condition is reached, the repeated grouping process is not carried out, and if the preset condition is not reached, the coefficients of the data grouping network are continuously optimized.
As an implementation manner of the embodiment of the present application, whether the data packet network reaches a preset condition may be determined by whether the total network error calculation result is smaller than the error result threshold, when the total network error calculation result is smaller than the error result threshold, the data packet network reaches the preset condition, the repeated process of the packet is ended, if the total network error calculation result is greater than or equal to the error result threshold, the data packet network does not reach the preset condition, an optimization gradient of a network coefficient is obtained through the total network error calculation result, the network coefficient of the data packet network is optimized, and the packet is started for the last time based on the optimized data packet network.
As a specific example, the user big data stream log includes multiple groups of user stream data, where the user stream data may include user click operation data, user collection operation data, user comment data, and the like, and the multiple user stream data are grouped by using the information processing method based on data transmission stream control in the embodiment of the present application, where the multiple user stream data are divided into multiple groups, each group includes stream data with the same user behavior classification, for example, the user stream data of all user collection operation data are divided into one group, and the user behavior knowledge field of the user stream data that filters the user stream data with the characteristics of the group from each group is determined as the group representative knowledge field of the group.
As an implementation manner of the embodiment of the present application, according to the above process, after each time of grouping and optimizing the user behavior classification field table, the grouping error result increases, and for the consideration of the debug speed, the number of times of grouping and optimizing should not be too large, and then one grouping and optimizing is adopted after the optimization of the data quantization network is repeated for many times, for example, one grouping is completed after every 3 times of repetition, and then, as another grouping manner, the following may be referred to:
step S100: a plurality of user streaming data subsets is determined from the user big data streaming log.
Step S200: and respectively carrying out data quantization on the multiple groups of user streaming data, and acquiring a plurality of corresponding user behavior knowledge fields according to quantization results.
Step S300: and determining knowledge expression error results according to the obtained multiple user behavior knowledge fields.
Step S400: judging whether the current sub-packet meets the preset requirement, and if not, performing coefficient optimization on the data quantization network, and returning to the step S200; when the two pieces meet each other, step S500 is executed.
The preset requirements may include that the number of rounds of data quantization of the current time packet meets the preset number of rounds, for example, setting 3 rounds of repeated optimization once of the packet task, meeting the preset requirements when the number of rounds of data quantization meets 3 rounds, or convergence of the data quantization network.
Step S500: grouping a plurality of user behavior knowledge fields obtained through the last data quantization in the current grouping, and determining a grouping error result according to the grouping result.
Step S600: and determining a network total error calculation result of the data packet network according to the knowledge expression error result and the packet error result.
Step S700: determining whether the data packet network reaches a preset condition according to the total error calculation result of the network, and ending the packet repetition process if the data packet network reaches the preset condition; if not, the coefficients of the data packet network are optimized.
As an implementation manner of the embodiment of the application, when the data packet network converges, a packet representing knowledge field can be obtained, and a plurality of packets representing knowledge fields covered by a packet result obtained by the last packet can be obtained. The user behavior knowledge fields of each class can be indicated in the grouping result to be determined as the grouping representative knowledge fields of each class, so that a plurality of grouping representative knowledge fields are obtained, and in addition, a plurality of groups of user behavior knowledge fields of the user streaming data can be obtained.
Continuing the step S3, the information processing method based on the data transmission flow control provided in the embodiment of the present application further includes:
Step S4: and grouping the user data in the user big data stream log according to each group representative knowledge field to obtain P stream data groups.
Where P is the number of packet representative knowledge fields and P is a positive integer greater than or equal to 1. Each packet representative knowledge field is the centroid of the packet, after the new streaming data is acquired, whether the new streaming data can be collected into the packet where the packet representative knowledge field is located can be judged according to the packet representative knowledge field, for example, whether the distance between the knowledge field of the new lost data and the packet representative knowledge field is smaller than a packet distance threshold value is calculated, if so, the proximity degree of the new lost data and the packet representative knowledge field is consistent with the packet requirement, the new streaming data is collected into the corresponding packet representative field, and if the vector distance between all the packet representative fields and the new streaming data does not meet the packet distance threshold value, the new streaming data becomes isolated streaming data.
Step S5: and removing the data isolated from the P streaming data groups in the user big data streaming log.
Data isolated from the P streaming data sets does not belong to any streaming data set, in other words, does not belong to any user behavior classification data, and can be regarded as disturbance information (noise data) for cleaning.
In summary, according to the information processing method and system based on data transmission flow control provided by the embodiments of the present application, a user big data flow log is obtained, and is grouped for multiple times by adopting a data grouping network until meeting a set requirement, where each grouping includes respectively performing data quantization on multiple groups of user flow data in the user big data flow log, obtaining multiple corresponding user behavior knowledge fields according to the quantization result, and obtaining a knowledge expression error result; grouping a plurality of user behavior knowledge fields, and determining a grouping error result according to the grouping result; optimizing network coefficients of the data packet network when the data packet network does not meet preset requirements according to the knowledge expression error result and the packet error result; outputting a plurality of group representative knowledge fields covered by a group result obtained by the last grouping, wherein each group representative knowledge field points to a corresponding user behavior classification; grouping user data in the user big data stream log according to each group representative knowledge field to obtain P stream data groups; wherein P is the number of the grouping representative knowledge fields and P is a positive integer greater than or equal to 1; and removing the data isolated from the P streaming data groups in the user big data streaming log. Therefore, the streaming data in the user big data streaming log is grouped, isolated data outside the grouping is determined to be disturbance information and is cleared, so that the user big data streaming log can be preprocessed rapidly and accurately, and the accuracy and timeliness of subsequent streaming data processing are ensured. In addition, the user big data stream type logs are grouped through the set data grouping network, knowledge training and grouping are integrated in the grouping process, organic fusion of knowledge training and unsupervised grouping is completed, the knowledge training and unsupervised grouping are mutually influenced, the scattering characteristics of the grouping can be considered at the same time when knowledge training is known, the acquired user behavior knowledge fields also have the scattering characteristics, the accuracy of the knowledge training is benign and enhanced, further, the grouping of each time can maintain stronger consistency of the current grouping and the previous grouping, so that the grouping is highly similar to the knowledge of the same behavior classification, the finally obtained grouping represents knowledge fields more accurately, and the grouping result of the user data in the user big data stream type logs according to the grouping represents knowledge fields is more accurate, so that the possibility that the isolated stream data are disturbance information is more accurate.
As a further embodiment, after obtaining P streaming data sets, the packet rationality may be further verified, which may specifically include the following steps:
step S6: a first grouping factor is determined from the P streaming data sets.
The first grouping factor indicates the compactness of the grouping of the streaming data in the P streaming data sets, the greater the first grouping factor, the higher the compactness, the higher the degree of similarity between the individual streaming data in the streaming data sets, and the greater the degree of differentiation between the streaming data sets. As one embodiment, determining the first grouping factor from the P streaming data sets includes: for each of the P streaming data sets, determining a pooling factor and a tearing factor corresponding to the streaming data according to first streaming data knowledge of the streaming data, first streaming data knowledge of the remaining streaming data in the streaming data set of the streaming data pool, and first streaming data knowledge of the streaming data in the remaining streaming data set, the pooling factor indicating a dissimilarity score between the streaming data and the remaining streaming data in the streaming data set of the streaming data pool, the tearing factor indicating a dissimilarity score between the streaming data and the streaming data in the remaining streaming data set; determining sub-grouping factors corresponding to the streaming data according to the collecting factors and the tearing factors, wherein the sub-grouping factors and the collecting factors form a reverse association relationship, and the sub-grouping factors and the tearing factors form a forward association relationship; and determining a first grouping factor according to the sub-grouping factor corresponding to each streaming data.
And determining a corresponding collection factor of the streaming data according to the first streaming data knowledge of the streaming data and the first streaming data knowledge of the rest streaming data in the streaming data group of the streaming data collection. And determining the tearing factor corresponding to the streaming data according to the first streaming data knowledge of the streaming data and the first streaming data knowledge of the streaming data in the rest streaming data groups except the streaming data group of the streaming data collection. And determining the undetermined tearing factor between the streaming data and the rest streaming data sets according to the first streaming data knowledge of the streaming data and the first streaming data knowledge of the streaming data in the rest streaming data for each rest streaming data set so as to acquire the undetermined tearing factor between the streaming data and each rest streaming data set, and taking the minimum undetermined tearing factor as the tearing factor corresponding to the streaming data. According to the first stream data knowledge of the stream data and the first stream data knowledge of the rest stream data in the stream data group of the stream data collection, the distance between the stream data and each rest stream data (such as vector distance, which can be calculated by common distance calculation modes such as Ming's distance and Euclidean distance) is obtained, and the interval average result between the stream data and a plurality of rest stream data is used as the corresponding collecting factor of the stream data. The smaller the distance between the streaming data and the remaining streaming data, the larger the commonality score between the streaming data and the remaining streaming data in the same streaming data set, the smaller the aggregation factor corresponding to the streaming data, and the lower the dissimilarity score between the streaming data and the remaining streaming data in the streaming data set of the streaming data collection, the stronger the packet compactness of the streaming data. In one embodiment, the first stream data knowledge of the stream data and the first stream data knowledge of each stream data in the rest stream data groups are used for determining the intervals between the stream data and the stream data in the rest stream data groups, and then the average result of the intervals between the stream data and the stream data in the rest stream data groups is used as the tearing factor corresponding to the stream data, the larger the interval between the stream data and the stream data in the rest stream data groups is, the smaller the commonality score between the stream data and the stream data in the rest stream data groups is, the larger the tearing factor corresponding to the stream data is, and the tightness of the grouping of the stream data is the stronger.
Step S7: for any one selected streaming data group of the P streaming data groups, splitting the selected streaming data groups to obtain two streaming data groups, obtaining Q comparison streaming data groups, taking comparison grouping factors determined according to the Q comparison streaming data groups as second grouping factors of the selected streaming data groups, wherein the second grouping factors indicate grouping compactness of streaming data in the Q comparison streaming data groups; wherein q=p+1.
Each of the P streaming data sets may be a selected streaming data set, splitting the selected streaming data set for any one of the P streaming data sets to obtain two streaming data sets, determining P-1 streaming data sets other than the selected streaming data set of the two streaming data sets and the P streaming data sets as comparison streaming data sets, and obtaining Q comparison streaming data sets. And determining a comparison grouping factor according to the Q comparison streaming data groups, and taking the comparison grouping factor as a second grouping factor of the selected streaming data group, wherein the second grouping factor indicates the grouping compactness of streaming data in the Q comparison streaming data groups. Wherein the processing in step S7 is performed for each of the P streaming data sets, the second grouping factor of each of the P streaming data sets, in other words, the P second grouping factors, may be obtained.
Step S8: when the largest second grouping factor in the second grouping factors of the P streaming data sets is larger than or equal to the first grouping factor, splitting the selected streaming data set corresponding to the largest second grouping factor to obtain two streaming data sets, and obtaining Q streaming data sets.
Because the second grouping factor indicates the packet compactness of the streaming data in the Q comparison streaming data sets, the greater the second grouping factor, the stronger the packet compactness of the streaming data in the Q comparison streaming data sets, the second grouping factor maximum value is selected from the second grouping factors of the P obtained streaming data sets, the selected streaming data sets corresponding to the second grouping factor maximum value are split, and the packet compactness of the streaming data of the Q comparison streaming data sets obtained after the two streaming data sets is obtained, in other words, if any one of the Q streaming data sets needs to be split, two streaming data sets are obtained, and the selected streaming data sets corresponding to the second grouping factor maximum value are split, so that the packet compactness of the streaming data can be the best. Comparing the second grouping factor maximum value with the first grouping factor, splitting the selected streaming data group corresponding to the second grouping factor maximum value when the second grouping factor maximum value is larger than or equal to the first grouping factor, obtaining two streaming data groups, then splitting the selected streaming data group corresponding to the second grouping factor maximum value to obtain two streaming data groups, or splitting the selected streaming data group corresponding to the second grouping factor maximum value when the second grouping factor maximum value is smaller than the first grouping factor, obtaining two streaming data groups, after the two streaming data groups are obtained, the grouping compactness of the streaming data is smaller than the grouping compactness of the streaming data of the previous P streaming data groups, and stopping the process of splitting the streaming data groups of the P streaming data groups. Each of the P streaming data sets is determined to be split to obtain two new second grouping factors after the P streaming data sets, if the second grouping factor maximum value is greater than or equal to the first grouping factor before splitting, the splitting of the streaming data set corresponding to the second grouping factor is represented to obtain two new streaming data sets, so that the grouping compactness of streaming data in the streaming data sets can be increased, the splitting of the streaming data sets can be performed to obtain two new streaming data sets, Q streaming data sets are obtained, the next-level grouping of the P streaming data sets can be performed again, so that easily confused streaming data can be distinguished, and the grouping compactness of the streaming data packets is also avoided to a certain extent under the condition that the grouped streaming data is regarded as disturbance information to be removed.
As an embodiment, the method may further include the steps of: and splitting the selected streaming data set again for any one of the Q streaming data sets to obtain two streaming data sets so as to obtain R comparison streaming data sets, wherein the comparison grouping factors determined according to the R comparison streaming data sets are used as third grouping factors of the selected streaming data sets, and the third grouping factors indicate grouping compactness of the R comparison streaming data sets. Wherein r=p+2. When the largest third grouping factor in the third grouping factors of the Q second streaming data sets is larger than or equal to the largest second grouping factor, splitting the selected streaming data set corresponding to the largest third grouping factor to obtain two streaming data sets, and obtaining R streaming data sets until the largest grouping factor in a plurality of grouping factors obtained by the current splitting is smaller than the grouping factor before splitting.
After obtaining R streaming data sets, splitting any one selected streaming data set of the R streaming data sets again and determining a grouping factor again, and judging whether to split the R streaming data sets into p+3 streaming data sets again based on the numerical value of the grouping factor, in other words, iterating the above process repeatedly, if the maximum value of the grouping factor in the multiple grouping factors obtained after the splitting is smaller than the grouping factor before the splitting, then the grouping factor obtained by splitting any one of the current streaming data sets is smaller than the grouping factor before the splitting, that is, the grouping compactness is smaller than the grouping compactness before the splitting, then the repeated process is ended, and the P streaming data sets are split again.
Based on the same principle as the method shown in fig. 1, there is also provided an information processing apparatus 10 in the embodiment of the present application, as shown in fig. 2, the apparatus 10 includes:
and the acquisition module 11 is used for acquiring the user big data stream log.
And the quantization module 12 is configured to perform data quantization on each of the plurality of groups of user streaming data in the user big data streaming log, obtain a plurality of corresponding user behavior knowledge fields according to the quantization result, and obtain a knowledge expression error result.
The error determining module 13 is configured to group the plurality of user behavior knowledge fields, and determine a grouping error result according to the grouping result.
And an optimizing module 14, configured to optimize the network coefficient of the data packet network when the data packet network does not meet the preset requirement according to the knowledge expression error result and the packet error result.
The centroid determining module 15 is configured to output a plurality of packet representative knowledge fields covered by a packet result obtained by the last packet, where each packet representative knowledge field points to a corresponding user behavior classification.
A grouping module 16, configured to group the user data in the user big data stream log according to each group representative knowledge field, so as to obtain P stream data groups; wherein P is the number of the packet representative knowledge fields and P is a positive integer greater than or equal to 1;
And the clearing module 17 is used for clearing the data isolated from the P streaming data groups in the user big data streaming log.
The above embodiment describes the information processing apparatus 10 from the viewpoint of a virtual module, and the following describes an information processing system from the viewpoint of a physical module, specifically as follows:
an embodiment of the present application provides an information processing system, as shown in fig. 3, an information processing system 100 includes: a processor 101 and a memory 103. Wherein the processor 101 is coupled to the memory 103, such as via bus 102. Optionally, the information handling system 100 may also include a transceiver 104. It should be noted that, in practical applications, the transceiver 104 is not limited to one, and the structure of the information processing system 100 is not limited to the embodiments of the present application.
The processor 101 may be a CPU, general purpose processor, GPU, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor 101 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.
Bus 102 may include a path to transfer information between the aforementioned components. Bus 102 may be a PCI bus or an EISA bus, etc. The bus 102 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one thick line is shown in fig. 3, but not only one bus or one type of bus.
Memory 103 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disks, laser disks, optical disks, digital versatile disks, blu-ray disks, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 103 is used for storing application program codes for executing the present application and is controlled to be executed by the processor 101. The processor 101 is configured to execute application code stored in the memory 103 to implement what is shown in any of the method embodiments described above.
An embodiment of the present application provides an information processing system, where the information processing system in the embodiment of the present application includes: one or more processors; a memory; one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the one or more processors, implement the data transmission flow control based information processing method provided above. According to the technical scheme, the user big data stream type log is obtained, and is grouped for multiple times by adopting a data grouping network until the user big data stream type log meets the set requirement, wherein each grouping comprises the steps of respectively carrying out data quantization on multiple groups of user stream type data in the user big data stream type log, obtaining a plurality of corresponding user behavior knowledge fields according to quantization results, and obtaining knowledge expression error results; grouping a plurality of user behavior knowledge fields, and determining a grouping error result according to the grouping result; optimizing network coefficients of the data packet network when the data packet network does not meet preset requirements according to the knowledge expression error result and the packet error result; outputting a plurality of group representative knowledge fields covered by a group result obtained by the last grouping, wherein each group representative knowledge field points to a corresponding user behavior classification; grouping user data in the user big data stream log according to each group representative knowledge field to obtain P stream data groups; wherein P is the number of the grouping representative knowledge fields and P is a positive integer greater than or equal to 1; and removing the data isolated from the P streaming data groups in the user big data streaming log. Therefore, the streaming data in the user big data streaming log is grouped, isolated data outside the grouping is determined as disturbance information and cleared, so that the user big data streaming log can be rapidly and accurately preprocessed, no additional analysis process is needed, and the accuracy and timeliness of subsequent streaming data processing are ensured. In addition, the user big data stream type logs are grouped through the set data grouping network, knowledge training and grouping are integrated in the grouping process, organic fusion of knowledge training and unsupervised grouping is completed, the knowledge training and unsupervised grouping are mutually influenced, the scattering characteristics of the grouping can be considered at the same time when knowledge training is known, the acquired user behavior knowledge fields also have the scattering characteristics, the accuracy of the knowledge training is benign and enhanced, further, the grouping of each time can maintain stronger consistency of the current grouping and the previous grouping, so that the grouping is highly similar to the knowledge of the same behavior classification, the finally obtained grouping represents knowledge fields more accurately, and the grouping result of the user data in the user big data stream type logs according to the grouping represents knowledge fields is more accurate, so that the possibility that the isolated stream data are disturbance information is more accurate.
Embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed on a processor, enables the processor to perform the corresponding content of the foregoing method embodiments.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (10)

1. An information processing method based on data transmission flow control, characterized by being applied to an information processing system, the method comprising:
obtaining a user big data stream log, and grouping for multiple times through a data grouping network according to the user big data stream log until the user big data stream log meets the set requirement, wherein each grouping comprises the following processing flow:
each of a plurality of groups of user streaming data in the user big data streaming log is subjected to data quantization, a plurality of corresponding user behavior knowledge fields are obtained according to quantization results, and knowledge expression error results are obtained;
grouping the plurality of user behavior knowledge fields, and determining a grouping error result according to a grouping result;
optimizing network coefficients of the data packet network when the data packet network does not meet preset requirements according to the knowledge expression error result and the packet error result;
outputting a plurality of group representative knowledge fields covered by a group result obtained by the last grouping, wherein each group representative knowledge field points to a corresponding user behavior classification;
grouping the user data in the user big data stream log according to each grouping representative knowledge field to obtain P stream data groups; wherein P is the number of the packet representative knowledge fields and P is a positive integer greater than or equal to 1;
And clearing the data isolated from the P streaming data groups in the user big data streaming log.
2. The method of claim 1, wherein each of the plurality of sets of user streaming data in the user big data streaming log is quantized, and the obtaining the corresponding plurality of user behavior knowledge fields according to the quantization result comprises: for the multiple groups of user streaming data, mining knowledge fields for one time or multiple times respectively to acquire corresponding initial knowledge fields; respectively carrying out knowledge refinement on each obtained initial knowledge field to obtain user behavior knowledge fields corresponding to each group of user streaming data;
before grouping multiple times through the data packet network according to the user big data stream log, the method further comprises: determining a plurality of user streaming data subsets according to the user big data streaming log, wherein each user streaming data subset comprises at least two groups of user streaming data and a commonality scoring tag between the at least two groups of user streaming data;
the obtaining the knowledge expression error result comprises the following steps: for the plurality of user stream data subsets, the following processing flows are carried out: determining a commonality score reasoning result between at least two groups of user streaming data according to user behavior knowledge fields of at least two groups of user streaming data included in one user streaming data subset of the plurality of user streaming data subsets, and performing error calculation on the obtained commonality score reasoning result and a corresponding commonality score label to obtain a corresponding error calculation result; and determining the knowledge expression error result through each error calculation result obtained for the plurality of groups of user stream data subsets.
3. The method of claim 1, wherein prior to multiple packets over a data packet network in accordance with the user large data stream log, the method further comprises: determining a plurality of user streaming data subsets according to the user big data streaming log, wherein each user streaming data subset comprises first user streaming data, second user streaming data and third user streaming data, the commonness score between the first user streaming data and the second user streaming data in each user streaming data subset is larger than or equal to a preset commonness score reference value, and the commonness score between the first user streaming data and the third user streaming data is smaller than the commonness score reference value;
the obtaining the knowledge expression error result comprises the following steps: for the plurality of user stream data subsets, the following processing flows are carried out: in one of the one or more user streaming data subsets, acquiring a first commonality score between the first user streaming data and the second user streaming data, acquiring a second commonality score between the first user streaming data and the third user streaming data, and acquiring a preset error result corresponding to the user streaming data subset according to the first commonality score and the second commonality score; and obtaining the knowledge expression error result according to the preset error results corresponding to the obtained multiple groups of user streaming data subsets.
4. The method of claim 3, wherein said grouping said plurality of user behavior knowledge fields comprises: screening one or more user behavior knowledge fields corresponding to first user streaming data and third user streaming data in the plurality of user streaming data subsets from the plurality of user behavior knowledge fields; grouping one or more of the user behavior knowledge fields;
the grouping the plurality of user behavior knowledge fields and determining a grouping error result from the grouping result includes: for the plurality of user behavior knowledge fields, the following processing flow is carried out: each user behavior knowledge field in the plurality of user behavior knowledge fields is respectively associated with each user behavior classification, and a corresponding user behavior classification field table is obtained; wherein, each component in the user behavior classification field table is matched with one user behavior classification, and the corresponding value of each component indicates whether the user streaming data corresponding to the user behavior knowledge field can be summarized to the corresponding user behavior classification; performing error calculation on the user behavior classification field table and the user behavior classification field table determined by the user flow data corresponding to the user behavior knowledge field in the previous grouping to obtain an error calculation result; and determining the grouping error result according to a plurality of error calculation results obtained for the plurality of user behavior knowledge fields.
5. The method of claim 4, wherein prior to determining the group error result from a plurality of error calculations derived for the plurality of user behavior knowledge fields, the method further comprises:
screening one or more user behavior knowledge fields corresponding to first user streaming data and third user streaming data in the plurality of user streaming data subsets from the plurality of user behavior knowledge fields;
determining the group error result according to a plurality of error calculation results obtained for the plurality of user behavior knowledge fields, including:
and determining the grouping error result according to the error calculation result corresponding to one or more user behavior knowledge fields.
6. The method of claim 4, wherein associating one of the plurality of user behavior knowledge fields to each user behavior classification, respectively, to obtain a corresponding user behavior classification field table, comprises: determining the commonality scores between the user behavior knowledge fields and the group representative knowledge fields determined by the current grouping respectively; according to the obtained commonality scores, determining whether the user streaming data corresponding to the user behavior knowledge field can be summarized to the user behavior classification indicated by the corresponding grouping representative knowledge field, and obtaining a summary judgment result; obtaining the user behavior classification field table according to the induction judgment result;
Before optimizing the network coefficients of the data packet network when the data packet network does not meet the preset requirement according to the knowledge expression error result and the packet error result, the method further comprises: determining an importance index of the grouping error result according to a difference result between the knowledge expression error result and the knowledge expression error result of the previous grouping; wherein the importance index and the difference result have a reverse association relationship; determining a network total error calculation result of the data packet network according to the knowledge expression error result, the packet error result and the importance index; and determining whether the data packet network reaches a preset condition according to the total network error calculation result.
7. The method of claim 6, wherein the data packet network comprises a data quantization network, and wherein prior to grouping the plurality of user behavior knowledge fields and determining a grouping error result based on the grouping result, the method further comprises:
when the data quantization network is determined to not meet the preset requirement according to the knowledge expression error result, performing coefficient optimization on the data quantization network;
Performing one or more times of adjustment and optimization on the optimized data quantization network until the current grouping meets the preset requirement; for each adjustment optimization, respectively carrying out data quantization on a plurality of groups of user streaming data according to the data quantization network, determining a knowledge expression error result according to a plurality of user behavior knowledge fields obtained by quantization, carrying out coefficient optimization on the data quantization network according to the knowledge expression error result, wherein the preset requirement comprises that the number of data quantization rounds of the current grouping accords with the preset number of rounds, or the data quantization network converges;
grouping the plurality of user behavior knowledge fields, comprising: grouping according to a plurality of user behavior knowledge fields obtained by quantification of a data quantification network after last adjustment and optimization in the current grouping.
8. The method according to claim 1, wherein the method further comprises:
determining a first grouping factor according to the P streaming data groups, wherein the first grouping factor indicates the grouping compactness of streaming data in the P streaming data groups;
splitting the selected streaming data set to obtain two streaming data sets aiming at any one selected streaming data set in the P streaming data sets to obtain Q comparison streaming data sets, wherein comparison grouping factors determined according to the Q comparison streaming data sets are used as second grouping factors of the selected streaming data sets, and the second grouping factors indicate grouping compactness of streaming data in the Q comparison streaming data sets; wherein, q=p+1;
Splitting a selected streaming data set corresponding to the largest second grouping factor when the largest second grouping factor in the second grouping factors of the P streaming data sets is larger than or equal to the first grouping factor to obtain two streaming data sets, and obtaining Q streaming data sets;
splitting the selected streaming data set again for any one of the Q streaming data sets to obtain two streaming data sets so as to obtain R comparison streaming data sets, wherein a comparison grouping factor determined according to the R comparison streaming data sets is used as a third grouping factor of the selected streaming data set, and the third grouping factor indicates grouping compactness of the R comparison streaming data sets; wherein r=p+2;
when the largest third grouping factor in the third grouping factors of the Q streaming data sets is larger than or equal to the largest second grouping factor, splitting the selected streaming data set corresponding to the largest third grouping factor to obtain two streaming data sets, and obtaining R streaming data sets until the largest grouping factor in a plurality of grouping factors obtained by the current splitting is smaller than the grouping factor before splitting.
9. The method of claim 8, wherein determining the first grouping factor from the P streaming data sets comprises:
for each of the P streaming data sets, determining a pooling factor and a tearing factor corresponding to the streaming data according to the first streaming data knowledge of the streaming data, the first streaming data knowledge of the remaining streaming data in the streaming data set of the streaming data collection, and the first streaming data knowledge of the streaming data in the remaining streaming data set, the pooling factor indicating a dissimilarity score between the streaming data and the remaining streaming data in the streaming data set of the streaming data collection, and the tearing factor indicating a dissimilarity score between the streaming data and the streaming data in the remaining streaming data set;
determining a sub-grouping factor corresponding to the streaming data according to the collecting factor and the tearing factor, wherein the sub-grouping factor and the collecting factor are in a reverse association relationship, and the sub-grouping factor and the tearing factor are in a forward association relationship;
and determining the first grouping factor according to the sub-grouping factor corresponding to each streaming data.
10. An information processing system comprising a processor and a memory, the memory storing a computer program, which when run by the processor, implements the method of any one of claims 1-9.
CN202211384421.XA 2022-11-07 2022-11-07 Information processing method and system based on data transmission flow control Active CN115712614B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211384421.XA CN115712614B (en) 2022-11-07 2022-11-07 Information processing method and system based on data transmission flow control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211384421.XA CN115712614B (en) 2022-11-07 2022-11-07 Information processing method and system based on data transmission flow control

Publications (2)

Publication Number Publication Date
CN115712614A CN115712614A (en) 2023-02-24
CN115712614B true CN115712614B (en) 2023-07-07

Family

ID=85232356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211384421.XA Active CN115712614B (en) 2022-11-07 2022-11-07 Information processing method and system based on data transmission flow control

Country Status (1)

Country Link
CN (1) CN115712614B (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10698926B2 (en) * 2017-04-20 2020-06-30 Microsoft Technology Licensing, Llc Clustering and labeling streamed data
WO2020112658A1 (en) * 2018-11-27 2020-06-04 Xaxar Inc. Systems and methods of data flow classification
CN110647900B (en) * 2019-04-12 2022-04-22 中国人民解放军战略支援部队信息工程大学 Intelligent safety situation prediction method, device and system based on deep neural network
CN110896381B (en) * 2019-11-25 2021-10-29 中国科学院深圳先进技术研究院 Deep neural network-based traffic classification method and system and electronic equipment
US20210303984A1 (en) * 2020-03-24 2021-09-30 Fortinet, Inc. Machine-learning based approach for classification of encrypted network traffic
CN111506610B (en) * 2020-04-20 2023-05-26 浙江中烟工业有限责任公司 Real-time stream data preprocessing method oriented to tobacco industry production site
CN113361559B (en) * 2021-03-12 2023-10-17 华南理工大学 Multi-mode data knowledge information extraction method based on deep-width combined neural network
CN115270003B (en) * 2022-09-27 2023-01-06 创域智能(常熟)网联科技有限公司 Information recommendation method and system based on Internet of things platform behavior data mining

Also Published As

Publication number Publication date
CN115712614A (en) 2023-02-24

Similar Documents

Publication Publication Date Title
CN111177792B (en) Method and device for determining target business model based on privacy protection
CN112784092B (en) Cross-modal image text retrieval method of hybrid fusion model
CN109902222B (en) Recommendation method and device
CN108121795B (en) User behavior prediction method and device
CN107423613B (en) Method and device for determining device fingerprint according to similarity and server
CN106936781A (en) A kind of decision method and device of user's operation behavior
CN107291672A (en) The treating method and apparatus of tables of data
WO2023169274A1 (en) Data processing method and device, and storage medium and processor
CN113873330B (en) Video recommendation method and device, computer equipment and storage medium
CN111970400A (en) Crank call identification method and device
CN112839014A (en) Method, system, device and medium for establishing model for identifying abnormal visitor
CN111160783A (en) Method and system for evaluating digital asset value and electronic equipment
CN114221991B (en) Session recommendation feedback processing method based on big data and deep learning service system
CN111639230A (en) Similar video screening method, device, equipment and storage medium
CN114359787A (en) Target attribute identification method and device, computer equipment and storage medium
CN115712614B (en) Information processing method and system based on data transmission flow control
CN115599873B (en) Data acquisition method and system based on artificial intelligence Internet of things and cloud platform
CN110413682A (en) A kind of the classification methods of exhibiting and system of data
CN113761282B (en) Video duplicate checking method and device, electronic equipment and storage medium
CN114898273A (en) Video monitoring abnormity detection method, device and equipment
CN112529637B (en) Service demand dynamic prediction method and system based on context awareness
CN113946717A (en) Sub-map index feature obtaining method, device, equipment and storage medium
CN114417739A (en) Method and device for recommending process parameters under abnormal working conditions
CN113298504A (en) Service big data grouping identification method and system based on artificial intelligence
CN116108291B (en) Mobile internet traffic service recommendation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant