CN112711587A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112711587A
CN112711587A CN201911019829.5A CN201911019829A CN112711587A CN 112711587 A CN112711587 A CN 112711587A CN 201911019829 A CN201911019829 A CN 201911019829A CN 112711587 A CN112711587 A CN 112711587A
Authority
CN
China
Prior art keywords
keyword
target
message data
candidate
candidate aggregation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911019829.5A
Other languages
Chinese (zh)
Other versions
CN112711587B (en
Inventor
朱杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lazas Network Technology Shanghai Co Ltd
Original Assignee
Lazas Network Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lazas Network Technology Shanghai Co Ltd filed Critical Lazas Network Technology Shanghai Co Ltd
Priority to CN201911019829.5A priority Critical patent/CN112711587B/en
Publication of CN112711587A publication Critical patent/CN112711587A/en
Application granted granted Critical
Publication of CN112711587B publication Critical patent/CN112711587B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the disclosure discloses a data processing method and device, electronic equipment and a storage medium. The method comprises the following steps: receiving first message data; matching according to a preset mapping relation to obtain a candidate aggregation node group corresponding to the target keyword; the candidate aggregation node groups comprise at least one candidate aggregation node, and different target keywords correspond to different candidate aggregation node groups; determining a target aggregation node for the first message data from the candidate aggregation node group; and sending the first message data to the target aggregation node so that the target aggregation node can aggregate the first message data according to the target keyword. The embodiment of the disclosure can send the same kind of message data containing the target keywords to one or more special target aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can meet the aggregation effect and the uniform fragmentation effect.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
With the development of internet technology, more and more message data are generated and are applied more and more, for example, message data are stored in a database, or message data are forwarded to a data subscriber in the order of receiving time, and the like. In order to reduce the number of processing times of the message data, such as the number of storage times or the number of forwarding times, data is generally aggregated at one end of the received message data, for example, multiple pieces of data with the same key value are aggregated into one piece of data to be stored or forwarded. It can be seen that the efficiency of message data aggregation directly affects the efficiency of message data processing, and therefore how to improve the efficiency of data aggregation is one of the main problems that those skilled in the art related to data aggregation are currently dedicated to solve.
Disclosure of Invention
The embodiment of the disclosure provides a data processing method and device, electronic equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a data processing method.
Specifically, the data processing method includes:
receiving first message data; wherein, the first message data comprises a target keyword;
matching according to a preset mapping relation to obtain a candidate aggregation node group corresponding to the target keyword; the candidate aggregation node groups comprise at least one candidate aggregation node, and different target keywords correspond to different candidate aggregation node groups;
determining a target aggregation node for the first message data from the candidate aggregation node group;
and sending the first message data to the target aggregation node so that the target aggregation node can aggregate the first message data according to the target keyword.
With reference to the first aspect, in a first implementation manner of the first aspect, the number of candidate aggregation nodes in the candidate aggregation node group is related to an occurrence probability of the target keyword; the probability of occurrence of the target keyword is a proportion of the number of second message data containing the target keyword in second message data received within a preset time period to the total number of the second message data.
With reference to the first aspect and/or the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the present disclosure further includes:
and allocating the candidate aggregation node group to the target keyword according to the occurrence probability of the target keyword.
With reference to the first aspect, the first implementation manner of the first aspect, and/or the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the allocating the candidate aggregation node group to the target keyword according to the occurrence probability of the target keyword includes:
counting keywords contained in the second message data received within a preset time period to obtain a keyword set;
determining, for the keywords in the set of keywords, a first amount of the second message data containing the same keyword;
determining the occurrence probability of the keywords according to the first number and the total number of the second message data received in the preset time period;
and determining the candidate aggregation node grouping of the keywords according to the occurrence probability.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, and/or the third implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the determining, according to the occurrence probability, a candidate aggregation node group of the keyword includes:
determining a target computing power of the candidate aggregation nodes capable of being allocated to the keywords according to the occurrence probability;
determining the candidate aggregation node grouping from available candidate aggregation nodes having remaining computing power not assigned to any of the keywords in accordance with the target computing power.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, and/or the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the determining, according to the occurrence probability, a target computation capacity that can be allocated to the candidate aggregation node of the keyword includes:
determining a total number of the available candidate aggregation nodes according to the occurrence probability of the keywords in the keyword set, and determining a second number of the candidate aggregation nodes capable of being allocated to the keywords according to the occurrence probability of the keywords multiplied by the total number when the candidate aggregation nodes are allocated to the keywords in the unit of the whole of the available candidate aggregation nodes;
determining the candidate aggregation node grouping from the available candidate aggregation nodes having remaining computing power not assigned to any of the keywords in accordance with the target computing power, comprising:
selecting a second number of unassigned available candidate aggregation nodes to join the group of candidate aggregation nodes for the keyword.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, the fourth implementation manner of the first aspect, and/or the fifth implementation manner of the first aspect, in a sixth implementation manner of the first aspect, the determining, according to the occurrence probability, a target computation capacity that can be allocated to the candidate aggregation node of the keyword includes:
determining a target computing power of the candidate aggregation nodes that can be assigned to the keyword according to the probability of occurrence and the total computing power of the available candidate aggregation nodes.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, the fourth implementation manner of the first aspect, the fifth implementation manner of the first aspect, and/or the sixth implementation manner of the first aspect, in a seventh implementation manner of the first aspect, the determining, according to the target computation capacity, the candidate aggregation node group from available candidate aggregation nodes having a remaining computation capacity that is not allocated to any of the keywords, includes:
determining the candidate aggregation node group in a manner of preferentially allocating the same available candidate aggregation node to the same keyword.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, the fourth implementation manner of the first aspect, the fifth implementation manner of the first aspect, the sixth implementation manner of the first aspect, and/or the seventh implementation manner of the first aspect, in an eighth implementation manner of the first aspect, the determining a target aggregation node of the first message data from the candidate aggregation node group includes:
determining the target aggregation node in a manner of uniformly distributing message data including the target keyword to the candidate aggregation nodes in the candidate aggregation node group corresponding to the target keyword.
In a second aspect, a data processing apparatus is provided in an embodiment of the present disclosure.
Specifically, the data processing apparatus includes:
a receiving module configured to receive first message data; wherein, the first message data comprises a target keyword;
the matching module is configured to obtain a candidate aggregation node group corresponding to the target keyword according to a preset mapping relation; the candidate aggregation node groups comprise at least one candidate aggregation node, and different target keywords correspond to different candidate aggregation node groups;
a determining module configured to determine a target aggregation node of the first message data from the candidate aggregation node group;
and the sending module is configured to send the first message data to the target aggregation node so that the target aggregation node performs aggregation processing on the first message data according to the target keyword.
With reference to the second aspect, in a first implementation manner of the second aspect, the number of candidate aggregation nodes in the candidate aggregation node group is related to the occurrence probability of the target keyword; the probability of occurrence of the target keyword is a proportion of the number of second message data containing the target keyword in second message data received within a preset time period to the total number of the second message data.
With reference to the second aspect and/or the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the present disclosure further includes:
an assignment module configured to assign the candidate aggregation node group to the target keyword according to the probability of occurrence of the target keyword.
With reference to the second aspect, the first implementation manner of the second aspect, and/or the second implementation manner of the second aspect, in a third implementation manner of the second aspect, the allocating module includes:
the statistic submodule is configured to count keywords contained in the second message data received within a preset time period to obtain a keyword set;
a first determining submodule configured to determine, for the keywords in the keyword set, a first amount of the second message data containing the same keyword;
a second determining submodule configured to determine a probability of occurrence of the keyword according to the first number and a total number of the second message data received within the preset time period;
a third determining submodule configured to determine a candidate aggregation node group of the keyword according to the occurrence probability.
With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, and/or the third implementation manner of the second aspect, in a fourth implementation manner of the second aspect, the third determining submodule includes:
a fourth determining submodule configured to determine a target computing power of the candidate aggregation nodes that can be allocated to the keyword according to the occurrence probability;
a fifth determining sub-module configured to determine the group of candidate aggregation nodes from available candidate aggregation nodes having a remaining computational power not allocated to any of the keywords in accordance with the target computational power.
With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, and/or the fourth implementation manner of the second aspect, in a fifth implementation manner of the second aspect, the fourth determining submodule includes:
a sixth determining sub-module configured to determine, when the total number of the available candidate aggregation nodes is determined according to the occurrence probability of the keyword in the keyword set and the candidate aggregation nodes are allocated to the keyword in units of the whole of the available candidate aggregation nodes, a second number of the candidate aggregation nodes that can be allocated to the keyword according to the occurrence probability of the keyword multiplied by the total number;
the fifth determination submodule includes:
a selecting sub-module configured to select a second number of the unassigned available candidate aggregation nodes to join the group of candidate aggregation nodes of the keyword.
With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, the fourth implementation manner of the second aspect, and/or the fifth implementation manner of the second aspect, in a sixth implementation manner of the second aspect, the fourth determining submodule includes:
a seventh determining sub-module configured to determine a target computation power of the candidate aggregation nodes that can be assigned to the keyword according to the occurrence probability and the total computation power of the available candidate aggregation nodes.
With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, the fourth implementation manner of the second aspect, the fifth implementation manner of the second aspect, and/or the sixth implementation manner of the second aspect, in a seventh implementation manner of the second aspect, the fifth determining sub-module includes:
an eighth determining submodule configured to determine the candidate aggregation node group in a manner that the same available candidate aggregation node is preferentially allocated to the same keyword.
With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, the fourth implementation manner of the second aspect, the fifth implementation manner of the second aspect, the sixth implementation manner of the second aspect, and/or the seventh implementation manner of the second aspect, in an eighth implementation manner of the second aspect, the determining module includes:
a ninth determining sub-module configured to determine the target aggregation node in a manner of uniformly distributing the message data including the target keyword to the candidate aggregation nodes in the candidate aggregation node group corresponding to the target keyword.
The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.
In one possible design, the data processing apparatus includes a memory and a processor, the memory is used for storing one or more computer instructions for supporting the data processing apparatus to execute the data processing method in the first aspect, and the processor is configured to execute the computer instructions stored in the memory. The data processing apparatus may further comprise a communication interface for the data processing apparatus to communicate with other devices or a communication network.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including a memory and a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement any of the methods described above.
In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium for storing computer instructions for a data processing apparatus, which contains computer instructions for performing any of the methods described above.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
according to the embodiment of the disclosure, a candidate aggregation node group corresponding to a target keyword of message data is obtained by matching according to a preset mapping relation for the received message data, and a target aggregation node is determined from the candidate node group, wherein the target aggregation node is a back-end processing node which is specially allocated to the target keyword and specially performs aggregation processing on the message data containing the target keyword. Therefore, in this way, the embodiment of the present disclosure can send the same kind of message data including the target keyword to one or more special target aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can satisfy both the aggregation effect and the uniform fragmentation effect.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure;
fig. 2 illustrates a flow diagram for determining a candidate aggregated node packet portion according to an embodiment of the present disclosure;
FIG. 3 shows a flowchart of step S204 according to the embodiment shown in FIG. 2;
FIG. 4 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure;
fig. 5 illustrates a block diagram of a structure for determining a candidate aggregate node packet portion according to an embodiment of the present disclosure;
FIG. 6 illustrates a block diagram of a third determination submodule 504 according to the embodiment illustrated in FIG. 5;
fig. 7 is a schematic structural diagram of an electronic device suitable for implementing a data processing method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.
In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.
It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
In the prior art, after receiving message data, a message receiving system generally forwards the message data to one of a plurality of back-end processing nodes for aggregation processing in two ways: one is a forwarding mode satisfying aggregation effect, and the other is a forwarding mode satisfying uniform fragmentation.
However, the two methods have the following defects:
first, a forwarding method satisfying aggregation effect:
message data which can be aggregated is sent to the same back-end processing node through a Hash algorithm, and the scheme is easy to generate a hot spot problem. If the number of the backend processing nodes is 10, and the number of the messages of a certain type that can be aggregated occupies 40% of the total number, a backend processing node will process the message data of 40%, which may cause the problem of the aggregate performance degradation of the backend processing node, and further cause the overall data processing efficiency to degrade;
secondly, meeting the forwarding mode of uniform fragmentation:
the method forwards the message data in a polling manner, that is, the received message data is evenly forwarded to each back-end processing node, but the aggregation effect of the method is not good, because the similar message data which can be aggregated can be scattered and sent to different back-end processing nodes, and the message data processed by the same back-end processing node can not be aggregated.
To sum up, the scheme in the prior art cannot simultaneously satisfy the aggregation effect and the fragmentation uniformity effect, and when the aggregation effect is satisfied, the problem of hot spots occurs, and when the fragmentation uniformity effect is satisfied, the aggregation effect is poor. Therefore, the embodiment of the present disclosure provides a data processing method, in which a candidate aggregation node group corresponding to a target keyword of received message data is obtained by matching according to a preset mapping relationship for the received message data, a target aggregation node is determined from the candidate node group, and the message data is forwarded to the target aggregation node, where the target aggregation node is a backend processing node that is specially allocated to the target keyword and specially performs aggregation processing on the message data including the target keyword. Therefore, in this way, the embodiment of the present disclosure can send the same kind of message data including the target keyword to one or more special target aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can satisfy both the aggregation effect and the uniform fragmentation effect.
Fig. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure. As shown in fig. 1, the data processing method includes the steps of:
in step S101, first message data is received; wherein, the first message data comprises a target keyword;
in step S102, obtaining a candidate aggregation node group corresponding to the target keyword according to a preset mapping relationship matching; the candidate aggregation node groups comprise at least one candidate aggregation node, and different target keywords correspond to different candidate aggregation node groups;
in step S103, determining a target aggregation node of the first message data from the candidate aggregation node group;
in step S104, the first message data is sent to the target aggregation node, so that the target aggregation node performs aggregation processing on the first message data according to the target keyword.
In this embodiment, the message data may be data received by a message receiving system, for example, the message receiving system may be a streaming computing system, such as a kafka system. A message data may be, for example, a log record in a database, an order data for a certain user in a website, etc.
The message receiving system can have a data forwarding function, and can forward the received data to the back-end processing node for aggregation processing. The back-end processing nodes may be server clusters, device clusters, processor clusters, and the like. The message receiving system receives the message data from the message data generating system and forwards the message data to the back-end processing node, the back-end processing node aggregates the message data capable of being aggregated and then returns the aggregated message data to the message receiving system, and the message receiving system can store the aggregated data in a database or forward the aggregated data to a message subscriber such as a user or other systems.
In this embodiment of the present disclosure, the first message data may be any message data received by the message receiving system at the current stage, and in this embodiment of the present disclosure, aggregation is performed through keywords included in the message data, that is, multiple pieces of message data with the same keyword are aggregated into the same piece of message data. For example, when the message data is a log record, a plurality of log records with the same key word field can be aggregated into the same log record and stored in the database; for another example, when the message data is user data, multiple pieces of message data with the same user ID may be aggregated into the same piece of data, and then forwarded to the user or another system.
The message receiving system may previously establish a preset mapping relationship between each keyword appearing in the message data and the candidate aggregation node combination, where different keywords may correspond to different candidate aggregation node groups, and the candidate aggregation nodes in two candidate aggregation node groups corresponding to two different keywords may be different, partially the same, or completely the same, and may be specifically set according to an actual situation, which is not limited herein. After receiving the first message data, the message receiving system may parse the first message data to obtain keywords, that is, target keywords, included in the first message data, then determine, according to a preset mapping relationship, a candidate aggregation node group corresponding to the target keywords, where the candidate aggregation node group may include one or more candidate aggregation nodes, and then may select one of the candidate aggregation node groups as a target aggregation node.
The message receiving system may forward, for the received message data, the message data containing the same keyword to the one or more aggregation nodes specifically allocated to the keyword for processing, that is, to one of the candidate aggregation node groups having a preset mapping relationship with the keyword for processing. The message receiving system may allocate different candidate aggregation node groups for all keywords from all selectable aggregation nodes according to a certain allocation rule, for example, a uniform allocation manner. The aggregation node is a back-end processing node mentioned above and may include, but is not limited to, a server cluster, a device cluster, a processor cluster, and the like.
After receiving the first message data, the message receiving system analyzes the message data and obtains a target keyword contained in the message data, obtains a candidate aggregation node group corresponding to the target keyword according to preset mapping relation matching, selects one of the candidate aggregation node groups as a target aggregation node, and then sends the first message data to the target aggregation node for aggregation processing. In this way, the embodiment of the present disclosure can send the same kind of message data containing the target keyword to one or more special target aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can satisfy both the aggregation effect and the uniform fragmentation effect.
In an optional implementation manner of this embodiment, the number of the candidate aggregation nodes in the candidate aggregation node group is related to the occurrence probability of the target keyword; the probability of occurrence of the target keyword is a proportion of the number of second message data containing the target keyword in second message data received within a preset time period to the total number of the second message data.
In this alternative implementation, the message receiving system may group the allocation candidate aggregation nodes according to the occurrence probability of the target keyword. The message receiving system may periodically count the occurrence probability of all keywords occurring in the received message data. The probability of occurrence is the ratio of the number of the message data containing the keyword in all the message data received in the statistical period, that is, the second message data, to the total number of the second message data. The preset time period may be a time period in which a last statistical period is located, and the occurrence probability of the target keyword of the currently received first message data may be an occurrence probability in a result obtained by statistics in the last statistical period. For example, the message receiving system counts every 10s, and in the second message data received in the first 10s, the occurrence probability of all the keywords is the proportion of the number of the second message data containing the keyword in the second message data received in the first 10s to the total number. In this way, the embodiment of the present disclosure can send the same kind of message data containing the same keyword to one or more special aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can satisfy both the aggregation effect and the uniform fragmentation effect.
In an optional implementation manner of this embodiment, the method further includes the following steps:
and allocating the candidate aggregation node group to the target keyword according to the occurrence probability of the target keyword.
In this optional implementation manner, as described above, according to the occurrence probability of each keyword counted in the previous statistical period, a candidate aggregation node may be assigned to each keyword, and a preset mapping relationship may be established between each keyword and a candidate aggregation node group formed by the candidate aggregation nodes assigned to the keyword. Therefore, the candidate aggregation nodes corresponding to the target keywords in the first message data are also allocated in advance according to the occurrence probability of the target keywords. In this way, the embodiment of the present disclosure can send the same kind of message data containing the same keyword to one or more special aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can satisfy both the aggregation effect and the uniform fragmentation effect.
In an optional implementation manner of this embodiment, as shown in fig. 2, the step of allocating the candidate aggregation node group to the target keyword according to the occurrence probability of the target keyword further includes the following steps:
in step S201, counting keywords included in the second message data received within a preset time period to obtain a keyword set;
in step S202, for the keywords in the keyword set, determining a first amount of the second message data containing the same keyword;
in step S203, determining the probability of occurrence of the keyword according to the first quantity and the total quantity of the second message data received in the preset time period;
in step S204, a candidate aggregation node group of the keyword is determined according to the occurrence probability.
In this optional implementation manner, when the message receiving system determines the candidate aggregation nodes allocated to the keywords in advance, all the keywords included in the second message data received in the period may be periodically counted to obtain a keyword set including all the keywords. Then, for each keyword in the keyword set, a first quantity of all second message data including the keyword is determined, and the occurrence probability of the keyword is determined according to the ratio of the first quantity and the total quantity of the second message data. After the occurrence probabilities of all keywords in the keyword set are determined, for all available candidate aggregation nodes currently used for aggregating message data, one or more candidate aggregation nodes are selected from the available candidate aggregation nodes according to the occurrence probabilities of the keywords as candidate aggregation nodes allocated to the keywords, and the candidate aggregation nodes are specially used for aggregating message data containing the keywords received in the next period. It can be understood that the occurrence probability of the keyword is proportional to the number and/or the computing power of the candidate aggregation nodes allocated to the keyword, that is, if the occurrence probability of the keyword is large, the number or the computing power of the candidate aggregation nodes allocated to the keyword is large, and if the occurrence probability of the keyword is small, the number or the computing power of the candidate aggregation nodes allocated to the keyword is small. In this way, the embodiment of the present disclosure can send the same kind of message data containing the same keyword to one or more special aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can satisfy both the aggregation effect and the uniform fragmentation effect.
In an optional implementation manner of this embodiment, as shown in fig. 3, the step S204, that is, the step of determining the candidate aggregation node group of the keyword according to the occurrence probability, further includes the following steps:
in step S301, determining a target computation capacity of the candidate aggregation nodes that can be assigned to the keyword according to the occurrence probability;
in step S302, the candidate aggregation node group is determined from available candidate aggregation nodes having a remaining computing capacity not allocated to any of the keywords according to the target computing capacity.
In this alternative implementation, the candidate aggregation nodes may be assigned to the keywords in units of the computing power of the candidate aggregation nodes. Firstly, determining target computing capacity capable of being allocated to a keyword according to the occurrence probability of the keyword; for example, if the total computation capacity of the available candidate aggregation nodes for message data aggregation is N, and the probability of occurrence of a certain keyword is N%, the target computation capacity that can be allocated to the keyword is N × N%. After determining the target computing capacity, one or more available candidate aggregation nodes may be selected from the available candidate aggregation nodes that also have a remaining computing capacity as candidate aggregation nodes assigned to the key such that a total remaining computing capacity of the candidate aggregation nodes is greater than the target computing capacity. In this way, the embodiment of the present disclosure can send the same kind of message data containing the same keyword to one or more special aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can satisfy both the aggregation effect and the uniform fragmentation effect.
In an optional implementation manner of this embodiment, the step S301, namely, the step of determining the target computing capacity of the candidate aggregation node that can be allocated to the keyword according to the occurrence probability, further includes the following steps:
determining a total number of the available candidate aggregation nodes according to the occurrence probability of the keywords in the keyword set, and determining a second number of the candidate aggregation nodes capable of being allocated to the keywords according to the occurrence probability of the keywords multiplied by the total number when the candidate aggregation nodes are allocated to the keywords in the unit of the whole of the available candidate aggregation nodes;
the step S302 of determining the candidate aggregation node group from the available candidate aggregation nodes having a remaining computing capacity not allocated to any of the keywords according to the target computing capacity further comprises the steps of:
selecting a second number of unassigned available candidate aggregation nodes to join the group of candidate aggregation nodes for the keyword.
In this alternative implementation, the number of available candidate aggregation nodes for aggregating message data is sufficiently large, and the number of all keywords in the keyword set is small, so that when it is sufficient to allocate available candidate aggregation nodes for keywords in units of the available candidate aggregation nodes as a whole, a second number of candidate aggregation nodes that can be allocated to the keyword may be determined according to the probability of occurrence of the keyword multiplied by the total number of available candidate aggregation nodes, and an available candidate aggregation node that is also allocated to any keyword may be selected from the available candidate aggregation nodes as the first aggregation node allocated to the keyword. For example, if the number of available candidate aggregation nodes for message data aggregation is M and the probability of occurrence of a certain keyword is M%, the second number of candidate aggregation nodes that can be assigned to the keyword is M × M%, then M × M% of the assigned candidate aggregation nodes can be selected as the candidate aggregation nodes assigned to the keyword, and if M × M% is not an integer, the candidate aggregation nodes can be rounded up. In this way, the embodiment of the present disclosure can send the same kind of message data containing the same keyword to one or more special aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can satisfy both the aggregation effect and the uniform fragmentation effect.
In some embodiments, a reference ratio may be preset based on experience or data analysis, and when an actual ratio between the number of keywords in the keyword set and the number of available candidate aggregation nodes is smaller than the reference ratio, the total number of available candidate aggregation nodes may be considered to satisfy a condition for allocating a second candidate node to a keyword in units of the whole of the available candidate aggregation nodes.
In other embodiments, the determination may be further performed according to whether the value of the probability of occurrence of the keyword in the keyword set multiplied by the total number of the available candidate nodes is an integer, and if there is no keyword in the keyword set whose value of the probability of occurrence multiplied by the total number of the available candidate nodes is not an integer, that is, when the value of the probability of occurrence of all keywords in the keyword set multiplied by the total number of the available candidate aggregation nodes is an integer, it may be considered that the total number of the available candidate aggregation nodes satisfies a condition that a second candidate node is assigned to the keyword in units of the whole of the available candidate aggregation nodes.
In an optional implementation manner of this embodiment, the step 301 of determining, according to the occurrence probability, a target computation capability of the candidate aggregation node that can be allocated to the keyword further includes the following steps:
determining a target computing power of the candidate aggregation nodes that can be assigned to the keyword according to the probability of occurrence and the total computing power of the available candidate aggregation nodes.
In this alternative implementation, the available candidate aggregation nodes may be assigned to each keyword in the keyword set in units of assignment of computational power. Thus, the target computing power for a key may be obtained by multiplying the probability of occurrence of the key by the total computing power of all available candidate aggregation nodes for aggregating message data. In this way, for the case that the number of aggregation nodes is small, each keyword can be more finely allocated with a corresponding available candidate aggregation node.
In an optional implementation manner of this embodiment, the step 302 of determining the candidate aggregation node group from the available candidate aggregation nodes having the remaining computation power not allocated to any of the keywords according to the target computation power further includes the following steps:
determining the grouping of candidate aggregation nodes in a manner that preferentially assigns the remaining computing power of the same available candidate aggregation node to the same keyword.
In this alternative implementation, from the available candidate aggregation nodes with the remaining computing power, the candidate aggregation node allocated to the keyword may be determined in a manner that the same available candidate aggregation node is preferentially allocated to the same keyword. That is, the candidate aggregation node corresponding to the keyword may be preferentially selected from the available candidate aggregation nodes with the remaining computing capacity greater than the target computing capacity, but it is also considered that other keywords in the keyword set are also allocated to the same available candidate aggregation node as much as possible.
In the implementation process, for a keyword whose target computing capacity is greater than the total computing capacity of one available candidate aggregation node, determining how many complete available candidate aggregation nodes need to be allocated to the current keyword according to a manner that the target computing capacity is divided by the total computing capacity and rounded down, selecting a corresponding number of available candidate aggregation nodes from the available candidate aggregation nodes to allocate to the keyword, and then placing the unallocated remaining computing capacity in the target computing capacity into a waiting queue together with the unallocated remaining computing capacity of other keywords and the target computing capacity less than the total computing capacity of one available candidate aggregation node, and matching the computing capacity in the waiting queue and the unallocated available candidate aggregation nodes, wherein in the matching process, all computing capacities of the same available candidate aggregation node can be occupied preferentially and/or the same computing capacity in the candidate queue can be allocated to the same available aggregation node as much as possible And finally obtaining the candidate aggregation nodes distributed for each keyword from the available candidate aggregation nodes by taking the available candidate aggregation nodes as a matching principle.
The process of assigning candidate aggregation nodes for a key in the embodiments of the present disclosure is illustrated below by way of an illustrative example.
For example, the message receiving system receives the following 20 key value pairs of message data in the last statistical period, and these message data need to be forwarded to the 10 back-end compute service nodes for aggregation processing:
{"a",1}{"a",1}{"a",1}{"a",1}{"a",1}{"a",1}
{"b",1}{"b",1}{"b",1}{"b",1}
{"c",1}{"c",1}
{"d",1}{"e",1}{"f",1}{"g",1}{"h",1}{"i",1}{"j",1}{"k",1}
wherein, there are 11 keywords appearing in total, and the formed keyword set is { a, b, c, d, e, f, g, h, i, j, k }, wherein there are 6 message data containing a, 4 message data containing b, 2 message data containing c, and 1 message data containing d to k, respectively.
It can be calculated through statistics that the occurrence probability of the keyword a is: 6/20 is 30%, b is 4/20 is 20%, c is 2/20 is 10%, and d-k is 1/20 is 5%.
Then, according to the occurrence probability, the following algorithm can be designed to allocate the back-end processing nodes to different keywords:
assume the rear end 10 nodes are N1, N2.. N10, respectively; the number of candidate backend nodes allocated to a is 30% by 10 — 3 nodes, i.e., N1, N2, N3; the candidate backend node assigned to b is 20% by 10 — 2 nodes, since N1, N2, N3 are already assigned to a, N4, N5 can be assigned to b; similarly, it is found that the candidate back-end node allocated to c is N6, and in d-k, since each keyword only occupies 5%, two keywords may be allocated to one node, i.e., the candidate back-end node allocated to d and e is N7, the candidate back-end node allocated to f and g is N8, the candidate back-end node allocated to h and i is N9, and the candidate back-end node allocated to j and k is N10.
By the aid of the distribution scheme, hot spot problems caused by the fact that messages with the same keyword are forwarded to the same node are avoided, message data with the same keyword are forwarded to the same node as far as possible, and the aggregation effect is optimal.
In an optional implementation manner of this embodiment, the step 103 of determining the target aggregation node of the first message data from the candidate aggregation node group further includes the following steps:
determining the target aggregation node in a manner of uniformly distributing message data including the target keyword to the candidate aggregation nodes in the candidate aggregation node group corresponding to the target keyword.
In this optional implementation manner, when the candidate aggregation node group corresponding to the target keyword includes a plurality of candidate aggregation nodes, in some embodiments, one of the candidate aggregation nodes may be selected as the target aggregation node in a fragmentation-uniform manner, that is, in the fragmentation-uniform manner, the first message data may be uniformly sent to one of the candidate aggregation nodes of the target keyword in a polling manner, so that the number of message data currently processed by each of the candidate aggregation node groups corresponding to the target keyword is as balanced as possible.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.
Fig. 4 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 4, the data processing apparatus includes:
a receiving module 401 configured to receive first message data; wherein, the first message data comprises a target keyword;
a matching module 402, configured to obtain a candidate aggregation node group corresponding to the target keyword according to a preset mapping relationship; the candidate aggregation node groups comprise at least one candidate aggregation node, and different target keywords correspond to different candidate aggregation node groups;
a determining module 403 configured to determine a target aggregation node of the first message data from the candidate aggregation node group;
a sending module 404, configured to send the first message data to the target aggregation node, so that the target aggregation node performs aggregation processing on the first message data according to the target keyword.
In this embodiment, the message data may be data received by a message receiving system, for example, the message receiving system may be a streaming computing system, such as a kafka system. A message data may be, for example, a log record in a database, an order data for a certain user in a website, etc.
The message receiving system can have a data forwarding function, and can forward the received data to the back-end processing node for aggregation processing. The back-end processing nodes may be server clusters, device clusters, processor clusters, and the like. The message receiving system receives the message data from the message data generating system and forwards the message data to the back-end processing node, the back-end processing node aggregates the message data capable of being aggregated and then returns the aggregated message data to the message receiving system, and the message receiving system can store the aggregated data in a database or forward the aggregated data to a message subscriber such as a user or other systems.
In this embodiment of the present disclosure, the first message data may be any message data received by the message receiving system at the current stage, and in this embodiment of the present disclosure, aggregation is performed through keywords included in the message data, that is, multiple pieces of message data with the same keyword are aggregated into the same piece of message data. For example, when the message data is a log record, a plurality of log records with the same key word field can be aggregated into the same log record and stored in the database; for another example, when the message data is user data, multiple pieces of message data with the same user ID may be aggregated into the same piece of data, and then forwarded to the user or another system.
The message receiving system may previously establish a preset mapping relationship between each keyword appearing in the message data and the candidate aggregation node combination, where different keywords may correspond to different candidate aggregation node groups, and the candidate aggregation nodes in two candidate aggregation node groups corresponding to two different keywords may be different, partially the same, or completely the same, and may be specifically set according to an actual situation, which is not limited herein. After receiving the first message data, the message receiving system may parse the first message data to obtain keywords, that is, target keywords, included in the first message data, then determine, according to a preset mapping relationship, a candidate aggregation node group corresponding to the target keywords, where the candidate aggregation node group may include one or more candidate aggregation nodes, and then may select one of the candidate aggregation node groups as a target aggregation node.
The message receiving system may forward, for the received message data, the message data containing the same keyword to the one or more aggregation nodes specifically allocated to the keyword for processing, that is, to one of the candidate aggregation node groups having a preset mapping relationship with the keyword for processing. The message receiving system may allocate different candidate aggregation node groups for all keywords from all selectable aggregation nodes according to a certain allocation rule, for example, a uniform allocation manner. The aggregation node is a back-end processing node mentioned above and may include, but is not limited to, a server cluster, a device cluster, a processor cluster, and the like.
After receiving the first message data, the message receiving system analyzes the message data and obtains a target keyword contained in the message data, obtains a candidate aggregation node group corresponding to the target keyword according to preset mapping relation matching, selects one of the candidate aggregation node groups as a target aggregation node, and then sends the first message data to the target aggregation node for aggregation processing. In this way, the embodiment of the present disclosure can send the same kind of message data containing the target keyword to one or more special target aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can satisfy both the aggregation effect and the uniform fragmentation effect.
In an optional implementation manner of this embodiment, the number of the candidate aggregation nodes in the candidate aggregation node group is related to the occurrence probability of the target keyword; the probability of occurrence of the target keyword is a proportion of the number of second message data containing the target keyword in second message data received within a preset time period to the total number of the second message data.
In this alternative implementation, the message receiving system may group the allocation candidate aggregation nodes according to the occurrence probability of the target keyword. The message receiving system may periodically count the occurrence probability of all keywords occurring in the received message data. The probability of occurrence is the ratio of the number of the message data containing the keyword in all the message data received in the statistical period, that is, the second message data, to the total number of the second message data. The preset time period may be a time period in which a last statistical period is located, and the occurrence probability of the target keyword of the currently received first message data may be an occurrence probability in a result obtained by statistics in the last statistical period. For example, the message receiving system counts every 10s, and in the second message data received in the first 10s, the occurrence probability of all the keywords is the proportion of the number of the second message data containing the keyword in the second message data received in the first 10s to the total number. In this way, the embodiment of the present disclosure can send the same kind of message data containing the same keyword to one or more special aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can satisfy both the aggregation effect and the uniform fragmentation effect.
In an optional implementation manner of this embodiment, the apparatus further includes:
an assignment module configured to assign the candidate aggregation node group to the target keyword according to the probability of occurrence of the target keyword.
In this optional implementation manner, as described above, according to the occurrence probability of each keyword counted in the previous statistical period, a candidate aggregation node may be assigned to each keyword, and a preset mapping relationship may be established between each keyword and a candidate aggregation node group formed by the candidate aggregation nodes assigned to the keyword. Therefore, the candidate aggregation nodes corresponding to the target keywords in the first message data are also allocated in advance according to the occurrence probability of the target keywords. In this way, the embodiment of the present disclosure can send the same kind of message data containing the same keyword to one or more special aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can satisfy both the aggregation effect and the uniform fragmentation effect.
In an optional implementation manner of this embodiment, as shown in fig. 5, the allocating module includes:
the statistics submodule 501 is configured to count keywords included in the second message data received within a preset time period, so as to obtain a keyword set;
a first determining submodule 502 configured to determine, for the keywords in the keyword set, a first amount of the second message data containing the same keyword;
a second determining submodule 503 configured to determine the occurrence probability of the keyword according to the first quantity and the total quantity of the second message data received in the preset time period;
a third determining submodule 504 configured to determine a candidate aggregation node group of the keyword according to the occurrence probability.
In this optional implementation manner, when the message receiving system determines the candidate aggregation nodes allocated to the keywords in advance, all the keywords included in the second message data received in the period may be periodically counted to obtain a keyword set including all the keywords. Then, for each keyword in the keyword set, a first quantity of all second message data including the keyword is determined, and the occurrence probability of the keyword is determined according to the ratio of the first quantity and the total quantity of the second message data. After the occurrence probabilities of all keywords in the keyword set are determined, for all available candidate aggregation nodes currently used for aggregating message data, one or more candidate aggregation nodes are selected from the available candidate aggregation nodes according to the occurrence probabilities of the keywords as candidate aggregation nodes allocated to the keywords, and the candidate aggregation nodes are specially used for aggregating message data containing the keywords received in the next period. It can be understood that the occurrence probability of the keyword is proportional to the number and/or the computing power of the candidate aggregation nodes allocated to the keyword, that is, if the occurrence probability of the keyword is large, the number or the computing power of the candidate aggregation nodes allocated to the keyword is large, and if the occurrence probability of the keyword is small, the number or the computing power of the candidate aggregation nodes allocated to the keyword is small. In this way, the embodiment of the present disclosure can send the same kind of message data containing the same keyword to one or more special aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can satisfy both the aggregation effect and the uniform fragmentation effect.
In an optional implementation manner of this embodiment, as shown in fig. 6, the third determining sub-module 504 includes:
a fourth determining sub-module 601 configured to determine a target computing power of the candidate aggregation nodes that can be assigned to the keyword according to the occurrence probability;
a fifth determining sub-module 602 configured to determine the group of candidate aggregation nodes from available candidate aggregation nodes having a remaining computational power not allocated to any of the keywords in accordance with the target computational power.
In this alternative implementation, the candidate aggregation nodes may be assigned to the keywords in units of the computing power of the candidate aggregation nodes. Firstly, determining target computing capacity capable of being allocated to a keyword according to the occurrence probability of the keyword; for example, if the total computation capacity of the available candidate aggregation nodes for message data aggregation is N, and the probability of occurrence of a certain keyword is N%, the target computation capacity that can be allocated to the keyword is N × N%. After determining the target computing capacity, one or more available candidate aggregation nodes may be selected from the available candidate aggregation nodes that also have a remaining computing capacity as candidate aggregation nodes assigned to the key such that a total remaining computing capacity of the candidate aggregation nodes is greater than the target computing capacity. In this way, the embodiment of the present disclosure can send the same kind of message data containing the same keyword to one or more special aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can satisfy both the aggregation effect and the uniform fragmentation effect.
In an optional implementation manner of this embodiment, the fourth determining sub-module 601 includes:
a sixth determining sub-module configured to determine, when the total number of the available candidate aggregation nodes is determined according to the occurrence probability of the keyword in the keyword set and the candidate aggregation nodes are allocated to the keyword in units of the whole of the available candidate aggregation nodes, a second number of the candidate aggregation nodes that can be allocated to the keyword according to the occurrence probability of the keyword multiplied by the total number;
the fifth determining sub-module 602 includes:
a selecting sub-module configured to select a second number of the unassigned available candidate aggregation nodes to join the group of candidate aggregation nodes of the keyword.
In this alternative implementation, the number of available candidate aggregation nodes for aggregating message data is sufficiently large, and the number of all keywords in the keyword set is small, so that when it is sufficient to allocate available candidate aggregation nodes for keywords in units of the available candidate aggregation nodes as a whole, a second number of candidate aggregation nodes that can be allocated to the keyword may be determined according to the probability of occurrence of the keyword multiplied by the total number of available candidate aggregation nodes, and an available candidate aggregation node that is also allocated to any keyword may be selected from the available candidate aggregation nodes as the first aggregation node allocated to the keyword. For example, if the number of available candidate aggregation nodes for message data aggregation is M and the probability of occurrence of a certain keyword is M%, the second number of candidate aggregation nodes that can be assigned to the keyword is M × M%, then M × M% of the assigned candidate aggregation nodes can be selected as the candidate aggregation nodes assigned to the keyword, and if M × M% is not an integer, the candidate aggregation nodes can be rounded up. In this way, the embodiment of the present disclosure can send the same kind of message data containing the same keyword to one or more special aggregation nodes for aggregation processing, so that the aggregation processing result of the message data can satisfy both the aggregation effect and the uniform fragmentation effect.
In some embodiments, a reference ratio may be preset based on experience or data analysis, and when an actual ratio between the number of keywords in the keyword set and the number of available candidate aggregation nodes is smaller than the reference ratio, the total number of available candidate aggregation nodes may be considered to satisfy a condition for allocating a second candidate node to a keyword in units of the whole of the available candidate aggregation nodes.
In other embodiments, the determination may be further performed according to whether the value of the probability of occurrence of the keyword in the keyword set multiplied by the total number of the available candidate nodes is an integer, and if there is no keyword in the keyword set whose value of the probability of occurrence multiplied by the total number of the available candidate nodes is not an integer, that is, when the value of the probability of occurrence of all keywords in the keyword set multiplied by the total number of the available candidate aggregation nodes is an integer, it may be considered that the total number of the available candidate aggregation nodes satisfies a condition that a second candidate node is assigned to the keyword in units of the whole of the available candidate aggregation nodes.
In an optional implementation manner of this embodiment, the fourth determining sub-module 601 includes:
a seventh determining sub-module configured to determine a target computation power of the candidate aggregation nodes that can be assigned to the keyword according to the occurrence probability and the total computation power of the available candidate aggregation nodes.
In this alternative implementation, the available candidate aggregation nodes may be assigned to each keyword in the keyword set in units of assignment of computational power. Thus, the target computing power for a key may be obtained by multiplying the probability of occurrence of the key by the total computing power of all available candidate aggregation nodes for aggregating message data. In this way, for the case that the number of aggregation nodes is small, each keyword can be more finely allocated with a corresponding available candidate aggregation node.
In an optional implementation manner of this embodiment, the fifth determining sub-module 602 includes:
a ninth determining sub-module configured to determine the candidate aggregation node group in such a manner that the remaining computing power of the same available candidate aggregation node is preferentially allocated to the same keyword.
In this alternative implementation, from the available candidate aggregation nodes with the remaining computing power, the candidate aggregation node allocated to the keyword may be determined in a manner that the same available candidate aggregation node is preferentially allocated to the same keyword. That is, the candidate aggregation node corresponding to the keyword may be preferentially selected from the available candidate aggregation nodes with the remaining computing capacity greater than the target computing capacity, but it is also considered that other keywords in the keyword set are also allocated to the same available candidate aggregation node as much as possible.
In the implementation process, for a keyword whose target computing capacity is greater than the total computing capacity of one available candidate aggregation node, determining how many complete available candidate aggregation nodes need to be allocated to the current keyword according to a manner that the target computing capacity is divided by the total computing capacity and rounded down, selecting a corresponding number of available candidate aggregation nodes from the available candidate aggregation nodes to allocate to the keyword, and then placing the unallocated remaining computing capacity in the target computing capacity into a waiting queue together with the unallocated remaining computing capacity of other keywords and the target computing capacity less than the total computing capacity of one available candidate aggregation node, and matching the computing capacity in the waiting queue and the unallocated available candidate aggregation nodes, wherein in the matching process, all computing capacities of the same available candidate aggregation node can be occupied preferentially and/or the same computing capacity in the candidate queue can be allocated to the same available aggregation node as much as possible And finally obtaining the candidate aggregation nodes distributed for each keyword from the available candidate aggregation nodes by taking the available candidate aggregation nodes as a matching principle.
The process of assigning candidate aggregation nodes for a key in the embodiments of the present disclosure is illustrated below by way of an illustrative example.
For example, the message receiving system receives the following 20 key value pairs of message data in the last statistical period, and these message data need to be forwarded to the 10 back-end compute service nodes for aggregation processing:
{"a",1}{"a",1}{"a",1}{"a",1}{"a",1}{"a",1}
{"b",1}{"b",1}{"b",1}{"b",1}
{"c",1}{"c",1}
{"d",1}{"e",1}{"f",1}{"g",1}{"h",1}{"i",1}{"j",1}{"k",1}
wherein, there are 11 keywords appearing in total, and the formed keyword set is { a, b, c, d, e, f, g, h, i, j, k }, wherein there are 6 message data containing a, 4 message data containing b, 2 message data containing c, and 1 message data containing d to k, respectively.
It can be calculated through statistics that the occurrence probability of the keyword a is: 6/20 is 30%, b is 4/20 is 20%, c is 2/20 is 10%, and d-k is 1/20 is 5%.
Then, according to the occurrence probability, the following algorithm can be designed to allocate the back-end processing nodes to different keywords:
assume the rear end 10 nodes are N1, N2.. N10, respectively; the number of candidate backend nodes allocated to a is 30% by 10 — 3 nodes, i.e., N1, N2, N3; the candidate backend node assigned to b is 20% by 10 — 2 nodes, since N1, N2, N3 are already assigned to a, N4, N5 can be assigned to b; similarly, it is found that the candidate back-end node allocated to c is N6, and in d-k, since each keyword only occupies 5%, two keywords may be allocated to one node, i.e., the candidate back-end node allocated to d and e is N7, the candidate back-end node allocated to f and g is N8, the candidate back-end node allocated to h and i is N9, and the candidate back-end node allocated to j and k is N10.
By the aid of the distribution scheme, hot spot problems caused by the fact that messages with the same keyword are forwarded to the same node are avoided, message data with the same keyword are forwarded to the same node as far as possible, and the aggregation effect is optimal.
In an optional implementation manner of this embodiment, the determining module 403 includes:
a ninth determining sub-module configured to determine the target aggregation node in a manner of uniformly distributing the message data including the target keyword to the candidate aggregation nodes in the candidate aggregation node group corresponding to the target keyword.
In this optional implementation manner, when the candidate aggregation node group corresponding to the target keyword includes a plurality of candidate aggregation nodes, in some embodiments, one of the candidate aggregation nodes may be selected as the target aggregation node in a fragmentation-uniform manner, that is, in the fragmentation-uniform manner, the first message data may be uniformly sent to one of the candidate aggregation nodes of the target keyword in a polling manner, so that the number of message data currently processed by each of the candidate aggregation node groups corresponding to the target keyword is as balanced as possible.
The embodiment of the present disclosure also provides an electronic device, as shown in fig. 7, including at least one processor 701; and a memory 702 communicatively coupled to the at least one processor 701; wherein the memory 702 stores instructions executable by the at least one processor 701 to perform, by the at least one processor 701, the steps of:
receiving first message data; wherein, the first message data comprises a target keyword;
matching according to a preset mapping relation to obtain a candidate aggregation node group corresponding to the target keyword; the candidate aggregation node groups comprise at least one candidate aggregation node, and different target keywords correspond to different candidate aggregation node groups;
determining a target aggregation node for the first message data from the candidate aggregation node group;
and sending the first message data to the target aggregation node so that the target aggregation node can aggregate the first message data according to the target keyword.
Wherein the number of candidate aggregation nodes in the group of candidate aggregation nodes is related to the probability of occurrence of the target keyword; the probability of occurrence of the target keyword is a proportion of the number of second message data containing the target keyword in second message data received within a preset time period to the total number of the second message data.
Wherein, still include:
and allocating the candidate aggregation node group to the target keyword according to the occurrence probability of the target keyword.
Wherein assigning the candidate aggregation node group to the target keyword according to the probability of occurrence of the target keyword comprises:
counting keywords contained in the second message data received within a preset time period to obtain a keyword set;
determining, for the keywords in the set of keywords, a first amount of the second message data containing the same keyword;
determining the occurrence probability of the keywords according to the first number and the total number of the second message data received in the preset time period;
and determining the candidate aggregation node grouping of the keywords according to the occurrence probability.
Determining the candidate aggregation node group of the keyword according to the occurrence probability comprises the following steps:
determining a target computing power of the candidate aggregation nodes capable of being allocated to the keywords according to the occurrence probability;
determining the candidate aggregation node grouping from available candidate aggregation nodes having remaining computing power not assigned to any of the keywords in accordance with the target computing power.
Wherein determining a target computing power of the candidate aggregation nodes that can be assigned to the keyword according to the probability of occurrence comprises:
determining a total number of the available candidate aggregation nodes according to the occurrence probability of the keywords in the keyword set, and determining a second number of the candidate aggregation nodes capable of being allocated to the keywords according to the occurrence probability of the keywords multiplied by the total number when the candidate aggregation nodes are allocated to the keywords in the unit of the whole of the available candidate aggregation nodes;
determining the candidate aggregation node grouping from the available candidate aggregation nodes having remaining computing power not assigned to any of the keywords in accordance with the target computing power, comprising:
selecting a second number of unassigned available candidate aggregation nodes to join the group of candidate aggregation nodes for the keyword.
Wherein determining a target computing power of the candidate aggregation nodes that can be assigned to the keyword according to the probability of occurrence comprises:
determining a target computing power of the candidate aggregation nodes that can be assigned to the keyword according to the probability of occurrence and the total computing power of the available candidate aggregation nodes.
Wherein determining the candidate aggregation node group from available candidate aggregation nodes having a remaining computing capacity not assigned to any of the keywords in accordance with the target computing capacity comprises:
determining the candidate aggregation node group in a manner of preferentially allocating the same available candidate aggregation node to the same keyword.
Wherein determining a target aggregation node for the first message data from the group of candidate aggregation nodes comprises:
determining the target aggregation node in a manner of uniformly distributing message data including the target keyword to the candidate aggregation nodes in the candidate aggregation node group corresponding to the target keyword.
Specifically, the processor 701 and the memory 702 may be connected by a bus or by other means, and fig. 7 illustrates an example of connection by a bus. Memory 702, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The processor 701 executes various functional applications of the device and data processing by executing nonvolatile software programs, instructions, and modules stored in the memory 702, that is, implements the above-described method in the embodiments of the present disclosure.
The memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store historical data of shipping network traffic, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the electronic device optionally includes a communications component 703 and the memory 702 optionally includes memory remotely located from the processor 701, which may be connected to an external device through the communications component 703. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 702, and when executed by the one or more processors 701, perform the above-described methods in the embodiments of the present disclosure.
The product can execute the method provided by the embodiment of the disclosure, has corresponding functional modules and beneficial effects of the execution method, and reference can be made to the method provided by the embodiment of the disclosure for technical details which are not described in detail in the embodiment.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, the present disclosure also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the above-described embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (10)

1. A data processing method, comprising:
receiving first message data; wherein, the first message data comprises a target keyword;
matching according to a preset mapping relation to obtain a candidate aggregation node group corresponding to the target keyword; the candidate aggregation node groups comprise at least one candidate aggregation node, and different target keywords correspond to different candidate aggregation node groups;
determining a target aggregation node for the first message data from the candidate aggregation node group;
and sending the first message data to the target aggregation node so that the target aggregation node can aggregate the first message data according to the target keyword.
2. The method according to claim 1, wherein the number of the candidate aggregation nodes in the candidate aggregation node group is related to the occurrence probability of the target keyword; the probability of occurrence of the target keyword is a proportion of the number of second message data containing the target keyword in second message data received within a preset time period to the total number of the second message data.
3. The method of claim 2, further comprising:
and allocating the candidate aggregation node group to the target keyword according to the occurrence probability of the target keyword.
4. The method of claim 3, wherein assigning the candidate aggregation node group to the target keyword according to the probability of occurrence of the target keyword comprises:
counting keywords contained in the second message data received within a preset time period to obtain a keyword set;
determining, for the keywords in the set of keywords, a first amount of the second message data containing the same keyword;
determining the occurrence probability of the keywords according to the first number and the total number of the second message data received in the preset time period;
and determining the candidate aggregation node grouping of the keywords according to the occurrence probability.
5. The method of claim 4, wherein determining the candidate aggregation node grouping for the keyword according to the probability of occurrence comprises:
determining a target computing power of the candidate aggregation nodes capable of being allocated to the keywords according to the occurrence probability;
determining the candidate aggregation node grouping from available candidate aggregation nodes having remaining computing power not assigned to any of the keywords in accordance with the target computing power.
6. The method of claim 5, wherein determining a target computing power of the candidate aggregation nodes that can be assigned to the keyword according to the probability of occurrence comprises:
determining a total number of the available candidate aggregation nodes according to the occurrence probability of the keywords in the keyword set, and determining a second number of the candidate aggregation nodes capable of being allocated to the keywords according to the occurrence probability of the keywords multiplied by the total number when the candidate aggregation nodes are allocated to the keywords in the unit of the whole of the available candidate aggregation nodes;
determining the candidate aggregation node grouping from the available candidate aggregation nodes having remaining computing power not assigned to any of the keywords in accordance with the target computing power, comprising:
selecting a second number of unassigned available candidate aggregation nodes to join the group of candidate aggregation nodes for the keyword.
7. The method of claim 5, wherein determining a target computing power of the candidate aggregation nodes that can be assigned to the keyword according to the probability of occurrence comprises:
determining a target computing power of the candidate aggregation nodes that can be assigned to the keyword according to the probability of occurrence and the total computing power of the available candidate aggregation nodes.
8. A data processing apparatus, comprising:
a receiving module configured to receive first message data; wherein, the first message data comprises a target keyword;
the matching module is configured to obtain a candidate aggregation node group corresponding to the target keyword according to a preset mapping relation; the candidate aggregation node groups comprise at least one candidate aggregation node, and different target keywords correspond to different candidate aggregation node groups;
a determining module configured to determine a target aggregation node of the first message data from the candidate aggregation node group;
and the sending module is configured to send the first message data to the target aggregation node so that the target aggregation node performs aggregation processing on the first message data according to the target keyword.
9. An electronic device comprising a memory and a processor; wherein,
the memory is to store one or more computer instructions, wherein the one or more computer instructions are to be executed by the processor to implement the method of any one of claims 1-7.
10. A computer-readable storage medium having computer instructions stored thereon, wherein the computer instructions, when executed by a processor, implement the method of any one of claims 1-7.
CN201911019829.5A 2019-10-24 2019-10-24 Data processing method and device, electronic equipment and storage medium Active CN112711587B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911019829.5A CN112711587B (en) 2019-10-24 2019-10-24 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911019829.5A CN112711587B (en) 2019-10-24 2019-10-24 Data processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112711587A true CN112711587A (en) 2021-04-27
CN112711587B CN112711587B (en) 2022-10-28

Family

ID=75540881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911019829.5A Active CN112711587B (en) 2019-10-24 2019-10-24 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112711587B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449159A (en) * 2021-06-29 2021-09-28 乐视云计算有限公司 Node data processing method, device, equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060235879A1 (en) * 2005-04-15 2006-10-19 Microsoft Corporation Techniques for specifying and collecting data aggregations
US20090086663A1 (en) * 2007-09-27 2009-04-02 Kah Kin Ho Selecting Aggregation Nodes in a Network
CN105630856A (en) * 2014-11-24 2016-06-01 奥多比公司 Automatic aggregation of online user profiles
CN106897309A (en) * 2015-12-18 2017-06-27 阿里巴巴集团控股有限公司 The polymerization and device of a kind of similar word

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060235879A1 (en) * 2005-04-15 2006-10-19 Microsoft Corporation Techniques for specifying and collecting data aggregations
US20090086663A1 (en) * 2007-09-27 2009-04-02 Kah Kin Ho Selecting Aggregation Nodes in a Network
CN105630856A (en) * 2014-11-24 2016-06-01 奥多比公司 Automatic aggregation of online user profiles
CN106897309A (en) * 2015-12-18 2017-06-27 阿里巴巴集团控股有限公司 The polymerization and device of a kind of similar word
US20180293294A1 (en) * 2015-12-18 2018-10-11 Alibaba Group Holding Limited Similar Term Aggregation Method and Apparatus

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449159A (en) * 2021-06-29 2021-09-28 乐视云计算有限公司 Node data processing method, device, equipment and computer readable storage medium
CN113449159B (en) * 2021-06-29 2024-02-02 乐视云网络技术(北京)有限公司 Node data processing method, device, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN112711587B (en) 2022-10-28

Similar Documents

Publication Publication Date Title
CN109218355B (en) Load balancing engine, client, distributed computing system and load balancing method
CN111522641B (en) Task scheduling method, device, computer equipment and storage medium
CN113038609B (en) Method, device and equipment for allocating bandwidth resources based on communication demands
US10728050B2 (en) Method of terminal-based conference load-balancing, and device and system utilizing same
CN106657327A (en) Message pushing method and message pushing device
CN111459641B (en) Method and device for task scheduling and task processing across machine room
CN112306651A (en) Resource allocation method and resource borrowing method
CN113641505B (en) Resource allocation control method and device for server cluster
CN110891087A (en) Log transmission method and device, electronic equipment and storage medium
CN112711587B (en) Data processing method and device, electronic equipment and storage medium
CN102333280B (en) A kind of method, system and the Service Process Server of business cipher key renewal
CN110290228B (en) Internet protocol IP address allocation method and device
CN118509399A (en) Message processing method and device, electronic equipment and storage medium
CN113342526A (en) Dynamic management and control method, system, terminal and medium for cloud computing mobile network resources
CN109582242B (en) Address determination method and device for cascade memory array system and electronic equipment
CN110046040B (en) Distributed task processing method and system and storage medium
CN110708374B (en) Distribution method and distribution device of edge nodes and readable storage medium
CN110677463B (en) Parallel data transmission method, device, medium and electronic equipment
CN109445934B (en) Query request distribution method and system
US20170118082A1 (en) Systems and methods for an intelligent, distributed, autonomous, and scalable resource discovery, management, and stitching
CN113132262B (en) Data stream processing and classifying method, device and system
CN113098914B (en) Message bus system, message transmission method and device, and electronic equipment
CN112422613B (en) Data processing method, data processing platform and computer readable storage medium
CN113852554B (en) Data transmission method, device and equipment
CN111865832B (en) Resource allocation method, terminal, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant