CN117149457A - Information self-adaptive distribution strategy and flow automatic arrangement system of message middleware - Google Patents
Information self-adaptive distribution strategy and flow automatic arrangement system of message middleware Download PDFInfo
- Publication number
- CN117149457A CN117149457A CN202311096337.2A CN202311096337A CN117149457A CN 117149457 A CN117149457 A CN 117149457A CN 202311096337 A CN202311096337 A CN 202311096337A CN 117149457 A CN117149457 A CN 117149457A
- Authority
- CN
- China
- Prior art keywords
- word
- information
- words
- user
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000009826 distribution Methods 0.000 title claims abstract description 55
- 238000012545 processing Methods 0.000 claims abstract description 11
- 230000007246 mechanism Effects 0.000 claims abstract description 9
- 238000013507 mapping Methods 0.000 claims abstract description 8
- 230000008859 change Effects 0.000 claims abstract description 7
- 238000005457 optimization Methods 0.000 claims abstract description 5
- 230000011218 segmentation Effects 0.000 claims description 12
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 238000004891 communication Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000013016 damping Methods 0.000 claims description 3
- 239000000523 sample Substances 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 238000012800 visualization Methods 0.000 claims description 3
- 238000004140 cleaning Methods 0.000 claims 1
- 238000013075 data extraction Methods 0.000 claims 1
- 238000000034 method Methods 0.000 abstract description 11
- 238000005516 engineering process Methods 0.000 abstract description 9
- 230000008569 process Effects 0.000 description 7
- 102000010954 Link domains Human genes 0.000 description 2
- 108050001157 Link domains Proteins 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 239000000427 antigen Substances 0.000 description 2
- 102000036639 antigens Human genes 0.000 description 2
- 108091007433 antigens Proteins 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000016784 immunoglobulin production Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/02—Standardisation; Integration
- H04L41/0246—Exchanging or transporting network management information using the Internet; Embedding network management web servers in network elements; Web-services-based protocols
- H04L41/026—Exchanging or transporting network management information using the Internet; Embedding network management web servers in network elements; Web-services-based protocols using e-messaging for transporting management information, e.g. email, instant messaging or chat
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/547—Messaging middleware
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses an information self-adaptive distribution strategy and a flow automatic arrangement method of a message middleware, and belongs to the field of data processing. The method of the invention takes the user demand as the guide, designs an information distribution self-adaptive strategy, and aims to change the passive subscription message mode of the traditional user so as to realize the active reception of the information of interest by the user. Acquiring a user interest label based on an LDA topic modeling technology, and constructing a user interest knowledge graph based on Neo4 j; based on an attention mechanism, constructing a mapping relation between the global semantic information of the user interest knowledge graph and the global semantic information of the message text to be distributed, and realizing semantic self-adaptive distribution of information content; constructing self-adaptive distribution of information modes by using an immune optimization algorithm so as to improve the channel utilization rate of information distribution; and finally, constructing an information self-adaptive distribution system and automatic arrangement of the flow based on the flow engine technology, and realizing modularized customization of distribution data nodes and automatic generation arrangement of the distribution flow.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to an information self-adaptive distribution strategy and flow automatic arrangement system of message middleware.
Background
The data sharing and the data distribution among the nodes in the system are realized in a plurality of technical modes, and the most important are the following three modes: federal database systems, data warehouse and middleware technology. But the most popular is the middleware technology at present, which can shield the difference between various heterogeneous data, realize interconnection, intercommunication and interoperation between nodes and provide a unified interface for users to access the heterogeneous data. However, with increasing network conditions of information and data, there are two problems associated with sending information in a conventional subscription/distribution manner: passive reception at the subscribing end results in an increasing amount of spam; the topic matching of the point-to-point in the data link domain between the publishing user and the subscribing user has the problem of low utilization efficiency of channel resources. Therefore, the invention provides an information self-adaptive distribution strategy and a flow automatic arrangement system of message middleware, which aim to solve the problems that a user terminal is difficult to acquire information on demand due to the fact that the user cannot self-adapt to the user demand of message content acquired by the user and the information distribution strategy is inflexible due to the fact that the user passively receives the message content in a traditional subscription distribution mechanism.
Disclosure of Invention
The invention aims to solve the problem that the passive receiving of the subscription terminal in the prior art causes the continuous increase of junk information; the topic matching of point-to-point in the data link domain between the publishing user and the subscribing user has the problem of low utilization efficiency of channel resources.
In order to solve the difficult problems in the prior art, the invention provides an information self-adaptive distribution strategy and flow automatic arrangement system of message middleware, which can realize self-adaptive distribution of corresponding messages based on user interests and demands; meanwhile, according to the network condition, different message modes are distributed in a self-adaptive mode; and can realize the automatic flow arrangement of data distribution. The self-adaptive information distribution strategy comprises the following specific steps:
step S10: constructing a topic content extraction model based on a TF-IDF, textRank, LDA algorithm, which is used for acquiring the user interested topic content of a user receiving historical message text, and further carrying out feature screening based on an AdaBoost algorithm so as to acquire the user interested topic words;
step S20: constructing a triplet < user, relation, interest tag phrase > of the user and interest topic group, and constructing a user interest knowledge graph based on a Neo4j tool;
step S30: based on the Bert model and the attention mechanism, constructing a semantic mapping relation between the user interest knowledge graph and the message to be received, and realizing the self-adaptive distribution of the information content to interested users;
step S40: in a scene of network change and limitation, based on the user priority and the network link state, an information modality (text, atlas, video, voice and the like) self-adaptive distribution strategy is realized based on an immune optimization algorithm, and an information distribution modality optimal combination is selected, so that the utilization efficiency of a channel is improved;
step S50: and designing an automatic arrangement system meeting the requirement of the data self-adaptive distribution module based on the ETL and the flow engine, and constructing an information distribution flow.
In the above technical solution, the specific steps of step S10 are as follows:
step S101, the TF-IDF algorithm uses a word segmentation tool to segment the input document, removes stop words and low-frequency words, and only reserves the words with the name part of speech as a candidate word set. The TF-IDF value of the candidate word is calculated, wherein the TF value (Term Frequency) is calculated as follows:
where m is the number of times the word w appears in the text, n is the total number of words in the text. Calculating the IDF value (Inverse Document Frequency) of the candidate word, i.e. the reverse document frequency, by dividing the total number of documents by the number of documents containing the word, and then taking the logarithm of the quotient to calculate:
step S102, the TextRank algorithm uses a word segmentation tool to segment the input document, removes stop words and low-frequency words, and only reserves the words with the name part of speech as a candidate word set. Constructing the reserved words into a semantic relation undirected graph, and calculating the TextRank value of the candidate words:
where d is the damping coefficient, typically set to 0.85, in (V i ) For word set pointing to word i, for word Out (V i ) A set of words pointed to. w (w) ij For word node V i Node V with word j Weighting of edges.
Step S103, the LDA algorithm uses a word segmentation tool to segment the input document, removes stop words and low-frequency words, and only reserves name part-of-speech words as a candidate word set. Calculating a P (word/topic) according to the probability of the topic word appearing in the topic, calculating P (topic/document) according to the probability of a topic appearing in the document, and calculating the probability of the word appearing according to P (word/topic) and P (topic/document):
P(Word/Text)=∑ Topic P(Word/Topic)×P(Topic/Text)
step S104, combining weak classifiers (TF-IDF, textRank, LDA algorithm classifiers) by using an AdaBoost algorithm to construct strong classifiers, wherein the idea is to change the weights of training samples in a data set to learn a plurality of classifiers, and integrating the classifiers according to a certain rule so as to improve classification performance; and finally, taking the obtained subject term as a tag term of interest to the user.
In the above technical solution, the specific steps of step S20 are as follows:
step S201, extracting named entity words (users, subject words of interest of the users) and relation words, adding the named entity words and the relation words into a knowledge graph database, and importing Neo4j to realize knowledge graph visualization construction.
In the above technical solution, the specific steps of step S30 are as follows:
step S301, the information text is originally represented as i= { w 1 ,…,w n And n is the number of words in the text, wherein the interest tag entity words of the knowledge graph are expressed as U= { w 1 ,…,w m Using a pre-trained language model BERT to obtain a sentence context feature vector tableIndication I h ={h CLS ,h 1 ,…,h n },U={h CLS ,h 1 ,…,h m }。
Step S302, based on the attention mechanism, the cosine similarity of the text of the message and the interest tag entity word of the knowledge graph is calculated according to the following calculation formula:
wherein U is e Representing a representation of a user's interest feature, I j Representing a text characteristic representation of the message.
Step S303, based on the semantic mapping model of the user interest word and the message text, a cosine similarity threshold beta is set, and the user-message greater than the beta value is composed into a to-be-distributed list (U i ,I j )。
In the above technical solution, the specific steps of step S40 are as follows:
step S401, acquiring user channel communication status based on the network probe, and according to the to-be-distributed list (U i ,I j ) The modality (text, picture, video, etc.) in which the information to be distributed contains the data and the specification thereof are acquired.
Step S402, defining the above-mentioned user and data to be distributed as [ user, text size, picture size, video size, voice size ]]=[U i ,T i ,I i ,V i ]。
Step S403, selecting an optimal transmit data list [ U ] based on an immune algorithm j ,T j ,I j ,V j ]。
In the above technical solution, the specific steps of step S60 are as follows:
step S501, a manner of processing the data conversion flow using the ETL tool: including extraction of test data, wash conversion, loading.
In step S502, the flow definition file is designed to contain information (node position, size, shape, etc.) required for each node to be visually displayed.
Step S503, defining a processing node and a circulation mode based on the workflow engine.
Step S504, the needed flow definition file is stored and deployed in the operation environment of the workflow engine, and the execution flow decides the subsequent data processing flow according to the identification information in the current analysis converted data.
The beneficial effects of the invention are as follows:
firstly, acquiring interested subject content in a user history message based on an LDA subject modeling technology, and constructing a user interest knowledge graph based on Neo4 j; secondly, constructing a mapping relation between the global semantic information of the user interest knowledge graph and the global semantic information of the message text to be distributed based on an attention mechanism, and realizing semantic self-adaptive distribution of information content; furthermore, under the scene of considering the limited network state, constructing the self-adaptive distribution of the information modes (text, picture, video, voice and the like) by utilizing an immune optimization algorithm so as to improve the channel utilization rate of the information distribution; and finally, constructing an information self-adaptive distribution system and automatic arrangement of the flow based on the flow engine technology, and realizing modularized customization of distribution data nodes and automatic generation arrangement of the flow.
Drawings
FIG. 1 is a flow chart of the overall technology of the invention.
Fig. 2 is a technical roadmap for implementing the information adaptive distribution strategy of the present invention.
FIG. 3 is a technical roadmap for implementing the automatic flow layout of information distribution according to the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the present invention provides an overall technical solution:
inputting the information source text into an information self-adaptive distribution strategy, performing content self-adaptive selection by the information distribution strategy according to user interests, performing modal self-adaptive selection according to network conditions, and finally performing information automatic distribution system construction by a process arrangement technology, wherein the system is finally used for message distribution management of message middleware.
Referring to fig. 2, the detailed steps of the information adaptation strategy are as follows:
step 1: the method comprises the following steps of constructing a topic content extraction model based on a TF-IDF, textRank, LDA algorithm, acquiring user interested topic contents of a user received history message text, and further performing feature screening based on an AdaBoost algorithm so as to acquire user interested topic words;
and 1-1, performing word segmentation on an input document by using a word segmentation tool by using a TF-IDF algorithm, removing stop words and low-frequency words, and only preserving the words with the names and parts of speech as a candidate word set. The TF-IDF value of the candidate word is calculated, wherein the TF value (Term Frequency) is calculated as follows:
where m is the number of times the word w appears in the text, n is the total number of words in the text. Calculating the IDF value (Inverse Document Frequency) of the candidate word, i.e. the reverse document frequency, by dividing the total number of documents by the number of documents containing the word, and then taking the logarithm of the quotient to calculate:
and 1-2, performing word segmentation on the input document by using a word segmentation tool by using a textrank algorithm, removing stop words and low-frequency words, and only preserving the words with the name part of speech as a candidate word set. Constructing the reserved words into a semantic relation undirected graph, and calculating the TextRank value of the candidate words:
where d is the damping coefficient, typically set to 0.85, in (V i ) For word set pointing to word i, for word Out (V i ) A set of words pointed to. w (w) ij For word node V i Node V with word j Weighting of edges.
And 1-3, performing word segmentation on the input document by using a word segmentation tool by using an LDA algorithm, removing stop words and low-frequency words, and only preserving the words with the names and the parts of speech as a candidate word set. Calculating a P (word/topic) according to the probability of the topic word appearing in the topic, calculating P (topic/document) according to the probability of a topic appearing in the document, and calculating the probability of the word appearing according to P (word/topic) and P (topic/document):
P(Word/Text)=∑ Topic P(Word/Topic)×P(Topic/Text)
step 1-4, combining weak classifiers (TF-IDF, textRank, LDA algorithm classifiers) by using AdaBoost algorithm to construct strong classifiers, wherein the idea is to change the weights of training samples in a data set to learn a plurality of classifiers, and integrating the classifiers according to a certain rule so as to improve classification performance; and finally, taking the obtained subject term as a tag term of interest to the user.
Step 2: constructing a triplet < user, relation, interest tag phrase > of the user and interest topic group, and constructing a user interest knowledge graph based on a Neo4j tool;
and 2-1, extracting named entity words (subject words of interest of users) and relation words, adding the named entity words and the relation words into a knowledge graph database, and importing Neo4j to realize knowledge graph visualization construction.
Step 3: based on the Bert model and the attention mechanism, constructing a semantic mapping relation between the user interest knowledge graph and the message to be received, and realizing the self-adaptive distribution of the information content to interested users;
step 3-1, the information text is originally expressed as i= { w 1 ,…,w n And n is the number of words in the text, wherein the interest tag entity words of the knowledge graph are expressed as U= { w 1 ,…,w m }, makeObtaining sentence context feature vector representation I using a pre-trained language model BERT h ={h CLS ,h 1 ,…,h n },U={h CLS ,h 1 ,…,h m }。
And 3-2, calculating cosine similarity of the message text and the interest tag entity words of the knowledge graph based on an attribute mechanism, wherein the calculation formula is as follows:
wherein U is i Representing a representation of a user's interest feature, I j Representing a text characteristic representation of the message.
Step 3-3, based on the semantic mapping model of the user interest word and the message text, setting a cosine similarity threshold beta, and forming the user-message greater than the beta value into a list to be distributed (U i ,I j )。
Step 4: in a scene of network change and limitation, based on the user priority and the network link state, an information modality (text, atlas, video, voice and the like) self-adaptive distribution strategy is realized based on an immune optimization algorithm, and an information distribution modality optimal combination is selected, so that the utilization efficiency of a channel is improved;
step 4-1, acquiring the user channel communication state based on the network probe, and obtaining the user channel communication state according to the to-be-distributed list (U i ,I j ) The modality (text, picture, video, etc.) in which the information to be distributed contains the data and the specification thereof are acquired.
Step 4-2, defining the user and the data to be distributed as [ user, text size, picture size, video size, voice size ]]=[U i ,T i ,I i ,V i ]。
Step 4-3, selecting an optimal transmit data list [ U ] based on an immune algorithm j ,T j ,I j ,V j ]The detailed steps are as follows:
(1) antigen recognition: the objective function and various constraints are input as antigens to the immune algorithm.
(2) Initial antibody production: the initial antibody population was randomly generated.
(3) Affinity calculation: the adaptation value of the antibody was calculated.
(4) And (3) immune treatment: immune treatment includes immune selection, cloning, mutation and suppression.
(5) And (3) immune selection: the antibodies with higher affinity are selected according to the affinity of the antibodies.
(6) Cloning: selected higher affinity antibodies were replicated.
(7) Variation: the cloned individuals are subjected to crossover and mutation operations so as to change the affinity of the cloned individuals.
(8) Inhibition: and selecting variant antibodies, and retaining antibodies with higher affinity.
(9) Group refreshing: the immune selected antibodies and the immune inhibited antibodies form a collection, and the antibodies with higher affinity are reserved, so that the antibodies enter a new population. Insufficient portions of the new population are randomly generated to increase diversity.
Referring to fig. 3, the detailed steps for constructing the information adaptive distribution system based on the process engine technology are as follows:
step 5: and designing an automatic arrangement system meeting the requirement of the data self-adaptive distribution module based on the ETL and the flow engine, and constructing an information distribution flow.
Step 5-1, processing the data conversion flow by using the ETL tool: including extraction of test data, wash conversion, loading.
Step 5-2, designing a flow definition file to contain information (node position, size, shape, etc.) required for each node to be visually displayed.
Step 5-3, defining a processing node and a circulation mode based on a workflow engine, and specifically comprising the following steps:
(1) starting node: indicating the start of a process flow, there can be only one start node.
(2) End node: there may be a plurality of processes that indicate the end of a process flow, and none (there is no end node flow, the program can run, but it is certainly unreasonable).
(3) Task node: the core node represents an approval node, the process progress can be stopped at the task nodes, and the AP I needs to be called to push the process to be carried out. Task nodes include automatic tasks and manual tasks, and we mainly perform data processing, so automatic tasks are mainly used.
(4) Gateway node: the method is a node for performing flow control, for example, after the node of all nodes is checked by the gateway node, the node can be stopped, if and only if all the nodes are checked, the gateway node can continue pushing backwards, and the gateway node can be designed with a plurality of outlets.
And step 5-4, storing and deploying the needed flow definition file into the operation environment of the workflow engine, and determining the subsequent data processing flow by the execution flow according to the identification information in the current analysis converted data.
An information self-adaptive distribution strategy and flow automatic arrangement system of message middleware can be used for message distribution management of the message middleware.
Although the present invention has been described with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described, or equivalents may be substituted for elements thereof, and any modifications, equivalents, improvements and changes may be made without departing from the spirit and principles of the present invention.
Claims (6)
1. An information self-adaptive distribution strategy and flow automatic arrangement system of message middleware is characterized in that the system comprises the following steps:
step S10: constructing a topic content extraction model based on a TF-IDF, textRank, LDA algorithm, which is used for acquiring the user interested topic content of a user receiving historical message text, and further carrying out feature screening based on an AdaBoost algorithm so as to acquire the user interested topic words;
step S20: constructing a triplet < user relation, interest tag phrase > of the user and the interest topic group, and constructing a user interest knowledge graph based on a Neo4j tool;
step S30: based on the Bert model and the attention mechanism, constructing a semantic mapping relation between the user interest knowledge graph and the message to be received, and realizing the self-adaptive distribution of the information content to interested users;
step S40: in a scene of network change and limitation, based on the user priority and the network link state, an information modal self-adaptive distribution strategy is realized based on an immune optimization algorithm, an information distribution modal optimal combination is selected, and the utilization efficiency of a channel is improved;
step S50: and designing an automatic arrangement system meeting the requirement of the data self-adaptive distribution module based on the ETL and the flow engine, and constructing an information distribution flow.
2. The information self-adaptive distribution strategy and flow automatic arrangement system of message middleware according to claim 1, wherein: the specific steps of the step S10 are as follows:
step S101, a TF-IDF algorithm uses a word segmentation tool to segment an input document, stop words and low-frequency words are removed, only words with names and parts of speech are reserved as a candidate word set, TF-IDF values of the candidate words are calculated, and a TF value calculation formula is as follows:
wherein m is the number of times that the word w appears in the text, n is the total number of words in the text, the IDF value of the candidate word, namely the reverse document frequency, is calculated, the total number of files is divided by the number of files containing the word, and the quotient is obtained to be calculated by taking the logarithm:
step S102, a TextRank algorithm uses a word segmentation tool to segment an input document, removes stop words and low-frequency words, reserves only words with name parts of speech as a candidate word set, constructs reserved words into a semantic relation undirected graph, and calculates TextRank values of the candidate words:
where d is the damping coefficient, set to 0.85, in (V i ) For word set pointing to word i, for word Out (V i ) Directed word set, w ij For word node V i Node V with word j Weighting of edges;
step S103, the LDA algorithm uses a Word segmentation tool to segment an input document, removes stop words and low-frequency words, only reserves name part-of-speech words as a candidate Word set, calculates a P (Word/Topic) according to the probability that a subject Word appears in the subject, calculates a P (Topic/Text) according to the probability that a certain subject in the document appears, and calculates the probability that the Word appears according to the P (Word/Topic) and the P (Topic/Text):
P(Word/Text)=∑ Topic P(Word/Topic)×P(Topic/Text)
step S104, combining weak classifiers by using an AdaBoost algorithm to construct strong classifiers, changing weights of training samples in a data set to learn a plurality of classifiers, and integrating the classifiers according to a certain rule so as to improve classification performance; and finally, taking the obtained subject term as a tag term of interest to the user.
3. The information self-adaptive distribution strategy and flow automatic arrangement system of message middleware according to claim 1, wherein: the specific steps of the step S20 are as follows:
step S201, extracting named entity words and related words, adding the named entity words and the related words into a knowledge graph database, and importing Neo4j to realize knowledge graph visualization construction, wherein the named entity words comprise subject words of interest of users and users.
4. The information self-adaptive distribution strategy and flow automatic arrangement system of message middleware according to claim 1, wherein: the specific steps of the step S30 are as follows:
step S301, the information text is originally represented as i= { w 1 ,…,w n And n is the number of words in the text, wherein the interest tag entity words of the knowledge graph are expressed as U= { w 1 ,…,w m Use of a pre-trained language model BERT to obtain sentence context feature vector representation I h ={h CLS ,h 1 ,…,h n },U={h CLS ,h 1 ,…,h m };
Step S302, based on the attention mechanism, the cosine similarity of the text of the message and the interest tag entity word of the knowledge graph is calculated according to the following calculation formula:
wherein U is i Representing a representation of a user's interest feature, I j Representing a text feature representation of the message;
step S303, based on the semantic mapping model of the user interest word and the message text, a cosine similarity threshold beta is set, and the user-message greater than the beta value is composed into a to-be-distributed list (U i ,I j )。
5. The information self-adaptive distribution strategy and flow automatic arrangement system of message middleware according to claim 4, wherein: the specific steps of the step S40 are as follows:
step S401, acquiring user channel communication status based on the network probe, and according to the to-be-distributed list (U i ,I j ) Acquiring a mode of information to be distributed, which contains data, and a specification of the mode;
step S402, defining the user and the data to be distributed as [ user, text size, picture size, video size, voice size ]]=[U i ,T i ,I i ,V i ];
Step S403, selecting an optimal transmit data list [ U ] based on an immune algorithm j ,T j ,I j ,V j ]。
6. The information self-adaptive distribution strategy and flow automatic arrangement system of message middleware according to claim 1, wherein: the specific steps of the step S50 are as follows:
step S501, the data conversion flow is processed using ETL: test data extraction, cleaning conversion and loading;
in step S502, the flow definition file is designed to include information required for each node to be visually displayed, including the node position, size, and shape.
Step S503, defining a processing node and a circulation mode based on a workflow engine;
step S504, the needed flow definition file is stored and deployed in the operation environment of the workflow engine, and the execution flow decides the subsequent data processing flow according to the identification information in the current analysis converted data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311096337.2A CN117149457A (en) | 2023-08-29 | 2023-08-29 | Information self-adaptive distribution strategy and flow automatic arrangement system of message middleware |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311096337.2A CN117149457A (en) | 2023-08-29 | 2023-08-29 | Information self-adaptive distribution strategy and flow automatic arrangement system of message middleware |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117149457A true CN117149457A (en) | 2023-12-01 |
Family
ID=88886162
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311096337.2A Pending CN117149457A (en) | 2023-08-29 | 2023-08-29 | Information self-adaptive distribution strategy and flow automatic arrangement system of message middleware |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117149457A (en) |
-
2023
- 2023-08-29 CN CN202311096337.2A patent/CN117149457A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111414479B (en) | Label extraction method based on short text clustering technology | |
CN110377759B (en) | Method and device for constructing event relation graph | |
US11941527B2 (en) | Population based training of neural networks | |
US9785888B2 (en) | Information processing apparatus, information processing method, and program for prediction model generated based on evaluation information | |
US20120095943A1 (en) | System for training classifiers in multiple categories through active learning | |
CN109597493B (en) | Expression recommendation method and device | |
CN109255115B (en) | Text punctuation adjustment method and device | |
WO2019214048A1 (en) | Method, device, computer apparatus, and storage medium for automatically generating investment advice | |
CN113326374B (en) | Short text emotion classification method and system based on feature enhancement | |
CN116127020A (en) | Method for training generated large language model and searching method based on model | |
CN114022737A (en) | Method and apparatus for updating training data set | |
CN114691838B (en) | Training and recommending method of chat robot search recommending model and electronic equipment | |
CN110689359A (en) | Method and device for dynamically updating model | |
CN111563378A (en) | Multi-document reading understanding realization method for combined learning | |
CN113934835A (en) | Retrieval type reply dialogue method and system combining keywords and semantic understanding representation | |
CN116756391A (en) | Unbalanced graph node neural network classification method based on graph data enhancement | |
CN117149457A (en) | Information self-adaptive distribution strategy and flow automatic arrangement system of message middleware | |
CN114491029B (en) | Short text similarity calculation method based on graph neural network | |
CN111078886B (en) | Special event extraction system based on DMCNN | |
CN116361446A (en) | Text abstract generation method and device and electronic equipment | |
CN112270571A (en) | Meta-model training method for cold-start advertisement click rate estimation model | |
CN111428144A (en) | Recommendation method and device based on combination of DCN and L DA and computer equipment | |
CN112347289A (en) | Image management method and terminal | |
CN115409135B (en) | Classification management method for network service documents | |
CN116431779B (en) | FAQ question-answering matching method and device in legal field, storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |