CN111325254B - Method and device for constructing conditional relational network and processing conditional service - Google Patents

Method and device for constructing conditional relational network and processing conditional service Download PDF

Info

Publication number
CN111325254B
CN111325254B CN202010089190.4A CN202010089190A CN111325254B CN 111325254 B CN111325254 B CN 111325254B CN 202010089190 A CN202010089190 A CN 202010089190A CN 111325254 B CN111325254 B CN 111325254B
Authority
CN
China
Prior art keywords
network
node
conditional
attribute
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010089190.4A
Other languages
Chinese (zh)
Other versions
CN111325254A (en
Inventor
吴歈
何建杉
王太峰
褚崴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010089190.4A priority Critical patent/CN111325254B/en
Publication of CN111325254A publication Critical patent/CN111325254A/en
Application granted granted Critical
Publication of CN111325254B publication Critical patent/CN111325254B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Abstract

The method and the device for constructing the conditional relational network and processing the conditional service by using the constructed conditional relational network, which are provided by the embodiments of the present specification, introduce a distributed architecture into a data processing process of the conditional relational network. When a conditional relational network is constructed, on the basis of joint probability distribution of attribute types of all service states, when a connecting edge in an initial relational network is updated, a plurality of local networks are split for distributed data processing, so that data processed by a single task only comprises joint probability distribution data and local network data taking one node as a reference. Furthermore, in the process of processing the service data by using the conditional relational network, based on the concept of distributed data processing, a plurality of subtasks for performing attribute class sampling on the node to be predicted are distributed to a plurality of distributed devices for processing. The concept can reduce the data processing amount of a single task and solve the problem of data measuring bottleneck in the practice of conditional relational network application.

Description

Method and device for constructing conditional relational network and processing conditional service
Technical Field
One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for constructing a conditional relationship network using sample service data, and a method and an apparatus for predicting an attribute type of a service state using the constructed conditional relationship network.
Background
With the development of computer technology, artificial intelligence is more and more widely applied. More and more scenarios can be implemented by machine learning models. A relational network is a graph structure, which is generally composed of nodes, which may correspond to entities, objects, etc., and connecting edges for representing the associative relationships between the nodes. The relational network can be processed through a machine learning model to predict attributes of the corresponding nodes. A conditional relational network, such as a bayesian network, is a probabilistic graph model that can be used to describe the probability of hops between nodes, such as a graph model that describes the probability of hops between different pages, etc. Thus, conditional relationship networks can be widely used for causal relationship mining and inference of networks.
Disclosure of Invention
The method and the device described in one or more embodiments of the present specification can solve one or more problems mentioned in the background art.
According to a first aspect, a method for constructing a conditional relationship network is provided, where the conditional relationship network is configured to describe a conditional relationship among a plurality of predetermined service states, and includes nodes respectively corresponding to the service states in service data, and a directed connection edge indicating the conditional relationship among the nodes, and a single service state corresponds to at least one attribute category; the method comprises the following steps: determining the joint probability distribution of the attribute types of the service states according to the attribute types of the samples on the service states, wherein each sample corresponds to each sample service data; based on an initial relationship network obtained by initializing the conditional relationship among the nodes in a preset mode, splitting each local network by taking each node as a reference, and generating each conditional relationship change task for each local network respectively to distribute to a plurality of distributed devices, so that after a single distributed device receives the conditional relationship change task of the corresponding local network, based on the joint probability distribution, under the condition that the conditional relationship of the corresponding local network is changed, a network score gain of the local network with the changed conditional relationship compared with the initial local network is obtained; iteratively updating the condition relation in the initial relation network according to each network score gain obtained from the plurality of distributed devices until no network score gain meeting a preset gain condition exists, and obtaining a current relation network; and determining each conditional probability distribution corresponding to each node for the current relationship network based on the current relationship network and the joint probability distribution, so as to obtain a conditional relationship network formed by each node corresponding to the plurality of service states, wherein the conditional relationship network is used for predicting service processing of attribute classes of other nodes under the condition of the current attribute classes of the given plurality of nodes, and the conditional probability distribution is used for describing the probability that the service state of the corresponding node is in each attribute class under the condition of the service state corresponding to each father node of the corresponding node.
In one embodiment, the joint probability distribution includes probabilities respectively corresponding to various attribute class combinations on the plurality of traffic states, and a single attribute class combination is a combination formed by taking one attribute class on each of the plurality of traffic states; determining the joint probability distribution of the attribute types of each service state according to the attribute types of each sample on the plurality of service states respectively comprises: generating a first statistical task aiming at a first attribute category combination in various attribute category combinations; distributing the first statistical task to first distributed equipment so that the first distributed equipment can count the number of first samples corresponding to the first attribute category combination and feed back the first samples; and taking the number of the first samples or the ratio of the number of the first samples to the total number of the samples as the probability corresponding to the first attribute class combination.
In one embodiment, the attribute classes in a single traffic state are described by state values on a single node corresponding to the single traffic state.
In one embodiment, the predetermined manner includes randomly adding a predetermined number of directed connection edges between nodes, where the plurality of service states include a first service state, the first service state corresponds to the first node, and splitting each local network based on each node as a reference based on an initial relationship network obtained by initializing a conditional relationship between each node in the predetermined manner includes: and determining a local network formed by the first node and the father node thereof from the initial relationship network, and taking the local network as a first local network split by taking the first node as a reference.
In one embodiment, the iteratively updating the conditional relationships in the initial relationship network according to the respective network score gains until there are no network score gains that satisfy a predetermined gain condition includes: updating the condition relation change item of the local network corresponding to the maximum network score gain to the initial relation network to obtain an intermediate relation network; determining each candidate condition change item and each network score gain corresponding to each candidate condition change item for the intermediate relationship network, wherein under the condition that a single candidate condition change item is consistent with the condition relationship change items of the local networks fed back by the plurality of distributed devices, the network score gain of the condition relationship change item of the corresponding local network is used as the network score gain of the single candidate condition change item, and under the condition that the single candidate condition change item is inconsistent with the condition relationship change items of the local networks fed back by the plurality of distributed devices, the network score gain is determined by comparing the relationship network added with the single candidate condition change item in the intermediate relationship network with the current intermediate relationship network; the intermediate relationship network is updated with the candidate conditional alteration that maximizes the gain of the network score.
In one embodiment, the predetermined gain condition is that the net fractional gain is positive.
In an embodiment, the iteratively updating the conditional relationship in the initial relationship network according to each network score gain until there is no network score gain satisfying a predetermined gain condition, and obtaining the current relationship network includes: comparing an intermediate relationship network obtained by iteratively updating the condition relationship in the initial relationship network with a plurality of other intermediate relationship networks obtained by iteratively updating the condition relationship based on a plurality of other initial relationship networks obtained by initializing the condition relationship among the nodes in other predetermined manners; and selecting the intermediate relationship network with the highest network score as the current relationship network.
In an embodiment, the plurality of service states include a second service state, the second service state corresponds to a second node, each parent node of the second node corresponds to a second attribute type combination, and the conditional probability distribution corresponding to the second node includes probabilities that the second node corresponds to each attribute type of the second service state respectively under the second attribute type combination.
According to a second aspect, there is provided a conditional traffic handling method of predicting an attribute class of a traffic state by means of a conditional relationship network determined by means of the first aspect, the method comprising: acquiring a plurality of attribute categories respectively corresponding to the service states from service data to be processed; taking other nodes except the nodes corresponding to the plurality of service states in the conditional relationship network as nodes to be predicted, and generating a plurality of subtasks for performing attribute category sampling on the nodes to be predicted; distributing each subtask to a plurality of distributed devices, so that each distributed device can sample the attribute class of at least one node to be predicted according to the condition relation network and each attribute class corresponding to the plurality of service states; and determining each attribute type corresponding to each node to be predicted respectively based on each sampling result.
In one embodiment, the nodes to be predicted include a first node to be predicted, and the sampling result of the first node to be predicted is a sampling result of each parent node of the first node to be predicted under the corresponding conditional probability distribution according to the attribute category of the parent node.
In one embodiment, the sampling result of the first node to be predicted comprises a first sampling result, the first sampling result corresponds to a first confidence coefficient, the first confidence coefficient is a product of a conditional probability of the first prediction node under a given attribute category of each parent node and a predetermined weight, and the predetermined weight is an initial weight of the first sampling result.
In an embodiment, the attribute categories corresponding to the first nodes to be predicted include a first attribute category, and the determining, based on the respective sampling results, the respective attribute categories corresponding to the respective nodes to be predicted includes: determining a probability distribution of the first prediction node on each candidate attribute class based on each sampling result, wherein the probability distribution comprises a first probability that the prediction attribute class of the first node to be predicted is the first attribute class, and the first probability is as follows: the attribute type of the first node to be predicted is the ratio of the confidence coefficient sum of the sampling results of the first attribute type to the confidence coefficient sum of each sampling result; determining a current attribute category of the first prediction node as the first attribute category if the first probability satisfies a predetermined probability condition.
According to a third aspect, there is provided an apparatus for constructing a conditional relationship network, where the conditional relationship network is configured to describe a conditional relationship among a plurality of predetermined service states, and includes nodes corresponding to the service states in service data, and a directed connection edge indicating the conditional relationship among the nodes, where a single service state corresponds to at least one attribute category;
the device comprises:
the statistical unit is configured to determine the joint probability distribution of the attribute types of the service states according to the attribute types of the samples on the service states, wherein the samples correspond to the service data of the samples respectively;
the splitting unit is configured to split each local network respectively by taking each node as a reference based on an initial relationship network obtained by initializing a conditional relationship among each node in a predetermined manner, and generate each conditional relationship change task for each local network respectively to distribute to a plurality of distributed devices, so that after a single distributed device receives the conditional change task of the corresponding local network, the network score gain of the local network with the changed conditional relationship compared with the initial local network is obtained under the condition that the conditional relationship of the corresponding local network is changed based on the joint probability distribution;
A network structure determining unit configured to iteratively update the conditional relationship in the initial relationship network according to each network score gain obtained from the plurality of distributed devices until no network score gain satisfying a predetermined gain condition exists, so as to obtain a current relationship network;
and a probability distribution determining unit configured to determine, based on the current relationship network and the joint probability distribution, each conditional probability distribution corresponding to each node for the current relationship network, so as to obtain a conditional relationship network formed by each node corresponding to the multiple service states, so that the conditional relationship network is used to predict service processing of attribute classes of other nodes under the condition of the current attribute classes of the given nodes, and the conditional probability distribution is used to describe the probability that the service state of the corresponding node is in each attribute class under the condition of the service state corresponding to each parent node of the corresponding node.
According to a fourth aspect, there is provided a conditional service processing apparatus for predicting an attribute class of a service state via a conditional relationship network determined by the apparatus of the third aspect, the service processing apparatus comprising:
The acquisition unit is configured to acquire each attribute type corresponding to a plurality of service states from service data to be processed;
the generating unit is configured to take other nodes except the nodes corresponding to the plurality of service states in the conditional relational network as nodes to be predicted, and generate a plurality of subtasks for performing attribute category sampling on the nodes to be predicted;
the distribution unit is configured to distribute each subtask to a plurality of distributed devices, so that each distributed device samples attribute categories of at least one node to be predicted according to the conditional relationship network and each attribute category corresponding to the plurality of service states;
and the determining unit is configured to determine each attribute category corresponding to each node to be predicted respectively based on each sampling result.
According to a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second aspect.
According to a sixth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and the processor when executing the executable code implements the method of the first or second aspect.
Through the method and the device for constructing the conditional relational network and processing the conditional service by using the constructed conditional relational network, the distributed architecture is introduced into the data processing process of the conditional relational network. When a conditional relational network is constructed, on the basis of joint probability distribution of attribute types of all service states, when a connecting edge in an initial relational network is updated, a plurality of local networks are split for distributed data processing, so that data processed by a single task only comprises joint probability distribution data and local network data taking one node as a reference. Furthermore, in the process of processing the service data by using the conditional relational network, based on the concept of distributed data processing, a plurality of subtasks for performing attribute class sampling on the node to be predicted are distributed to a plurality of distributed devices for processing. The concept can reduce the data processing amount of a single task and solve the problem of large data amount in the practice of conditional relational network application.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 illustrates a schematic diagram of an implementation scenario of one embodiment disclosed herein;
FIG. 2 illustrates a flow diagram of a method of building a conditional relational network, according to one embodiment;
FIG. 3 illustrates a flow diagram of a method of conditional business processing for predicting attribute categories for business states through a conditional relationship network, according to one embodiment;
FIG. 4 shows a schematic block diagram of an apparatus for building a conditional relationship network according to one embodiment;
fig. 5 shows a schematic block diagram of an apparatus for conditional traffic handling for predicting attribute classes of traffic states over a conditional relationship network according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
For convenience of explanation, a specific application scenario of the embodiment of the present specification shown in fig. 1 is described. Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. In the implementation scenario, a conditional relationship network is established by preprocessing observation sample data of rainfall. The state values of an observation sample in each traffic state can be used to describe the actual observations of the attribute classes of the observation sample in the corresponding traffic state. The traffic state may be a satisfied condition or a state in which it is, for example, a condition in which the ground is wet, a state in which it is raining, etc. As shown in fig. 1, one observation sample may contain the following traffic states: 1. thunder strike; 2. raining; 3. wet on the ground. The business state may have multiple attribute categories, for example, a "rainy" business state may have attribute categories of: no rain, light rain, medium rain, heavy rain, etc.
In an alternative, each service state may correspond to at least one state value for describing an attribute class of the service state. For example, in the example of fig. 1, the state values of the service states are all 2, when the attribute type of the service state is occurrence, the state value of the observation sample corresponding to the service state is the first preset value, and 1 is taken from fig. 1, otherwise, the state value of the observation sample corresponding to the service state is the second preset value, and 0 is taken from fig. 1. In fig. 1, the sample record is performed in units of days, and in practice, the record may be performed in units of hours or the like. In sample 1, the service state of lightning strike is 1, which indicates that a lightning strike event occurs in 1 month and 1 day, and the values of rain and ground humidity are 0, which indicates that no rain occurs in 1 month and 1 day and the ground is not wet. By analogy, each observation sample corresponds to an attribute category with a state value.
Under the framework of the embodiment of the present specification, by performing statistics on each observation sample, a joint probability distribution of attribute classes of each service state can be determined, and the joint probability distribution can describe a probability distribution of different attribute classes of each service state combined in the sample. For example, the attribute type of lightning strike is a probability that an attribute type of rain and/or ground wetting occurs when the occurrence of lightning strike occurs, the attribute type of lightning strike and/or ground wetting occurs when the occurrence of rain occurs, the attribute type of ground wetting is a probability that lightning strike and/or ground wetting occurs when the occurrence of lightning strike and/or rain occurs, and the like. Further, different attribute classes of respective traffic states are represented by state values, and for three traffic states each having 2 attribute classes, the combination of different attribute classes may be represented as: (1, 0, 0), (1, 1, 0), (1, 0, 1), (1, 1, 1), (0, 0, 0), (0, 1, 0), (0, 0, 1), and the like. The individual probabilities of occurrence of these combinations in all observation samples constitute a joint probability distribution. Thereafter, based on the joint probability distribution, a conditional relationship network may be established.
The conditional relationship network may be a relationship network representing causal logic between nodes. In the implementation scenario shown in fig. 1, the conditional relationship network may include nodes corresponding to the service states and conditional relationships between the nodes represented by directed connecting edges. Each node has a conditional probability distribution from the parent node to the current node.
In the case shown in fig. 1, nodes corresponding to rain and ground wetting have parents, and conditional probability distributions respectively correspond to the parents. If the thunderstorm and the rain have causal relationship, the arrow direction of the connecting side represents the causal sequence, the thunderstorm can cause the rain, when the thunderstorm accident happens, the probability of the occurrence of the rain event is 60%, the probability of the non-occurrence of the rain event is 40%, when the thunderstorm accident does not happen, the probability of the rain event is 70%, and the probability of the non-occurrence of the rain event is 30%. Similarly, rain may cause the ground to be wet, and when a rain event occurs, the probability of the ground being wet is 90%, the probability of the ground not being wet is 10% (e.g., the amount of rain is very small), the probability of the ground not being wet is 1%, and the probability of the ground not being wet is 99%. Fig. 1 shows only the case where there is one parent node, and in practical implementation, there may be multiple parent nodes in one node, constituting a joint probability distribution from the multiple parent nodes to the current node.
It should be noted that the observation sample and the service state may be set according to an actual scenario, for example, in a page click scenario, the service state may also be a browsing situation of each page including the hyperlink, a situation of clicking the corresponding hyperlink to jump to another page, and so on, which is not described herein again. The observation samples, the number of the service states (the number of nodes in the corresponding conditional access network), and the number of the service states are only examples, and in practice, any number may be set as needed, and is not limited herein. Although a conditional relational network has a certain theoretical basis, the calculation during its construction or use is rather complicated and time-consuming. Especially when the network scale or the sample number is very large, the calculation of the conditional relational network faces a huge bottleneck, and the calculation mode of the conventional technology cannot meet the requirement. For example, in the scenario shown in fig. 1, when the observation sample, the number of service states, and the number of values are all large, the amount of calculation is very large.
The following describes in detail a method for performing conditional access network construction through sample service data in the above conditional access network construction process.
FIG. 2 illustrates a flow diagram of a method of building a conditional relationship network, according to one embodiment. The execution subject of the method can be any system, device, apparatus, platform or server with computing and processing capabilities. A conditional relationship network may be used to describe logical conditional relationships between multiple business states. A conditional relationship network based on the conditional relationship described by the Bayesian principle is also called a Bayesian network. Each node in the conditional relational network corresponds to each service state respectively, and the conditional logic between the service states is indicated through the directed connecting edges between the nodes. The service status may be any condition, event, property, etc. related to the current service, such as "thunder", "rain" in the scenario of fig. 1, "sales", "turnover" in the business scenario, and "browse", "click" in the page conversion scenario.
As an example, if a directed connection edge points from node a to node B, the traffic state corresponding to node a may be the cause/condition of causing the traffic state corresponding to node B, and the traffic state corresponding to node B may be the result of the traffic state corresponding to node a. One sample may correspond to one sample traffic data. A sample service data describes the attribute category corresponding to each service state under a prior actual situation. For example, a service state corresponding to a node is "raining", and the attribute type corresponding to the service state may be "occurring" or "not occurring", or may have an attribute of an occurrence degree in a light occurrence, such as "light rain", "medium rain", "heavy rain", and the like.
As shown in fig. 2, the method for constructing the conditional relationship network may include the following steps: step 201, determining the joint probability distribution of the attribute types of each service state according to the attribute types of each sample on a plurality of service states, wherein each sample corresponds to each sample service data; step 202, based on an initial relationship network obtained by initializing a conditional relationship among nodes in a predetermined manner, splitting each local network based on each node, and generating each conditional relationship change task for each local network to distribute to a plurality of distributed devices, so that after a single distributed device receives a conditional relationship change task of a corresponding local network, based on joint probability distribution, under the condition that the conditional relationship of the corresponding local network is changed, a network score gain of the local network after the conditional relationship is changed compared with the initial local network is obtained; step 203, iteratively updating the condition relationship in the initial relationship network according to each network score gain until no network score gain meeting a preset gain condition exists, and obtaining a current relationship network; and 204, determining each conditional probability distribution corresponding to each node for the current relationship network based on the current relationship network and the joint probability distribution, so as to obtain a conditional relationship network formed by each node corresponding to a plurality of service states, wherein the conditional relationship network is used for predicting service processing of attribute classes of other nodes under the condition of the current attribute classes of a plurality of given nodes, and the conditional probability distribution is used for describing the probability that the service state of the corresponding node is in each attribute class under the condition of the service state corresponding to each parent node of the corresponding node.
First, in step 201, according to the attribute class of each sample in the plurality of service states, a joint probability distribution of the attribute class of each service state is determined. It is understood that a sample may correspond to a piece of sample traffic data, and each sample may have an attribute category explicitly determined based on the sample traffic data in a predetermined respective traffic state.
In order to mine the association relationship between the business states, the occurrence frequency of different attribute categories of the business states in the whole sample can be combined. These frequencies of occurrence constitute a joint probability distribution between the various nodes. It is easy to understand that each service state may correspond to two or more attribute categories, for example, for the service state "strike a lightning" shown in fig. 1, two attribute categories of strike a lightning and not strike a lightning may be corresponded, and a plurality of attribute categories may be given according to the frequency, loudness, etc. of strike a lightning in a day. Alternatively, these attribute classes may be described by different values, for example, 0 for attribute class without lightning strike, 1-4 for attribute class with lightning strike, where 1 for attribute class with lower lightning strike frequency and lower loudness, 2 for attribute class with higher lightning strike frequency and lower loudness, 3 for attribute class with lower lightning strike frequency and higher loudness, 4 for attribute class with higher lightning strike frequency and higher loudness, and so on. The attribute category combinations of the respective service states are exemplified by the numerical value tables as (1, 0, 0 … …), (1, 1, 0 … …), and the like.
In theory, the number of possible attribute class combinations in the sample can be described by the product of the number of attribute classes corresponding to each traffic state. Suppose the number of service states is m, and the number of attribute classes of each service state is niWhere i is a natural number from 1 to m, and in a sample, the attribute class that the ith service state may take is
Figure BDA0002383140080000112
The number of attribute category combinations of each service state can be recorded as
Figure BDA0002383140080000111
I.e. n1·n2·n3…nm. In this way, when each service state corresponds to 2 cases, the number of attribute class combinations corresponding to 3 service states is 2 × 2 × 2 — 8. Taking the number of the service states as 2, one service state has 2 attribute categories, which are respectively represented by the values 0 and 1, and the other service state has 3 attribute categories, which are respectively represented by the values 0, 1 and 2 as examples, the attribute category combination can be represented as: (0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), and 2 × 3 is 6 in total.
For each attribute class combination, a respective probability can be determined from the respective samples, which probabilities constitute a joint probability distribution describing the respective traffic state. According to one embodiment, the probability corresponding to each attribute class combination may be determined statistically. In the implementation scenario shown in fig. 1, the probability for the occurrence of lightning strike, rain, and wet on the ground is: p (1, 1, 1) ═ N (1, 1, 1)/N, where N is the total number of samples and N (1, 1, 1) is the number of samples corresponding to the case where lightning strike, rain, and wet on the ground all occur.
It is understood that the number of the theoretical attribute category combinations increases dramatically when at least one of the number of the service states and the attribute category of each service state is large enough. For example, the number of service states is 10, each service state has 4 attribute classes, and the number of possible attribute class combinations is 410(perhaps in the millions) the number of possible attribute class combinations increases to 4 as the number of traffic states increases to 100100(roughly a 60 th order of 10). And in practice, the sample does not necessarily have to exhaust these attribute category combinations. Therefore, the number of actually generated attribute-class combinations may be much smaller than the theoretical number of attribute-class combinations. In the worst case, each sample corresponds to one attribute class combination, and the possible number of attribute class combinations and the number of samples (e.g., 10 ten thousand) areExample) are equal. Thus, even if at least one of the number of service states and the attribute class of each service state is large, the actually occurring attribute class combination is much smaller than the possible number of attribute class combinations. In one embodiment, the joint probability distribution under the combination of attribute classes generated by the actual samples may be counted. Alternatively, the joint probability distribution of the combination of attribute classes generated by the real sample can be described in the form of a histogram. Each real attribute class combination is used as a class in the histogram, and the sample number corresponding to the real attribute class combination or the ratio of the sample number corresponding to the real attribute class combination to the total sample number is used as the value of the corresponding class.
According to one possible design, under the implementation architecture of the present specification, when the joint probability distribution under different attribute category combinations is counted, the calculation may be performed in a distributed computing manner. For example, a first statistical task is generated for a first attribute category combination in various attribute category combinations, the first statistical task is distributed to a first distributed device so that the first distributed device counts the number of first samples corresponding to the first attribute category combination and feeds back the number, and then the number of the first samples or the ratio of the number of the first samples to the total number of the samples is used as the probability corresponding to the first attribute category combination. Wherein the first attribute class combination is any attribute class combination that is truly generated in the sample. In this way, the number of samples corresponding to each attribute category combination can be counted as one task, and a plurality of distributed tasks can be generated. The generated tasks are distributed to computing devices (also referred to as distributed devices) in the distributed system, and the computing devices count and feed back the corresponding sample number respectively. In this way, the corresponding result can be obtained quickly and efficiently for the probability distribution statistics of a large number of samples. Meanwhile, data capacity expansion can be better realized, for example, when a new attribute category combination is added, a new task can be generated. Alternatively, the distributed times statistics calculation may be performed by means of map reduce. For example, various combination states can be mapped through a map grid structure, and statistical results of the number of samples corresponding to the various combination states are obtained through distributed reduce calls.
Next, in step 202, based on the initial relationship network obtained by initializing the conditional relationship among the nodes in a predetermined manner, each local network is split based on each node, and each conditional relationship change task is generated for each local network to distribute to a plurality of distributed devices.
The initial relationship network may be a relationship network obtained by initializing a conditional relationship between each node in a predetermined manner. The predetermined scheme may be, for example, a configuration in which a plurality of nodes corresponding to the respective traffic states form an empty network structure without connection edges, or a random relational network in which a plurality of directed connection edges are randomly added between the respective nodes. Under the condition that the initial relationship network is an empty network structure, all nodes are independent, at the moment, any two nodes can be split into local networks, a connecting edge can also be randomly added into the empty network structure, and various possible local networks are determined by the expansion of the connecting edge.
Specifically, each node may be sequentially used as a reference node to generate a local network, and the initial relationship network is traversed to generate a plurality of corresponding local networks. The current reference node is referred to as a current node, and the local network corresponding to the current node may be any natural number. For example, when the current node has a parent node in the initial relationship network, the network formed by the current node and its parent node may be used as the local network corresponding to the current node, and when the current node has no parent node in the initial relationship network, it may be determined that the local network corresponding to the current node is 0 or more, for example, the current node and all other nodes form the conditional relationship network. When the relationship network shown in fig. 1 is used as the initial relationship network, the "thunderstorm-raining", "raining-ground wetting", etc. can be used as the local network.
Further, the respective local networks may be distributed to a plurality of distributed devices. In this way, after receiving the condition change task of the corresponding local network, the single distributed device may obtain, based on the joint probability distribution determined in step 201, a network score gain of the local network with the changed condition relation compared with the initial local network in the case of changing the condition relation of the corresponding local network. The distributed devices herein may be various devices with certain computing capabilities, such as tablet computers, desktop computers, smart phones, and the like. It will be appreciated that changing the conditional relationship of the respective local network may make one possible modification to the connection state of the connection edge between two nodes. By way of example, possible connection states for a connecting edge between nodes a and B may include, for example, a point B, B points to a, no connection, and so on. One possible modification may include adding, deleting, changing the direction of a connected edge, etc., such as modifying from a to B to a or no connected edge.
According to one embodiment, the conditional relationship change task generated for each local network may include determining various possible conditional relationship change items of the corresponding local network, and determining each network score gain obtained after the corresponding local network is changed according to each possible conditional relationship change item. After receiving the corresponding connection structure modification task, the distributed device may modify and score the connection structure of the corresponding local network, so as to obtain a network score gain obtained when the corresponding local network is possibly modified once. Each time a possible modification is made, the corresponding network score gain can be recorded and fed back. Alternatively, the distributed device may feed back a modification with a network score gain greater than 0, and record the content of the modification, such as deleting a connection line between the slave node a and the node B, with a network score gain of 0.3. In this way, various possible modifications in each local network and network score gains thereof can be recorded, so as to better perform overall modification of the condition relation of the global network. Optionally, the distributed device may also record the eliminated change, for example, if the current modification is to delete the connecting edge between the nodes a and B, resulting in a local network with a higher network score, the connecting edge between the nodes a and B is the eliminated change, and this situation is not considered subsequently, so as to avoid repeated calculation.
According to another embodiment, the conditional relationship changing task generated for each local network may include a task of generating a corresponding locally optimal connection structure for the corresponding local network. After receiving the corresponding conditional relationship change task, the distributed device may try to modify the connection state of the connection edge between every two nodes for the local network, traverse the connection edge between every two nodes, select the modification with the largest network score gain after completing all possible current modifications, change the local network structure, traverse the connection edge between every two nodes for the next time, and modify the local network with the largest network score gain until there is no modification with a score gain greater than 0. In this case, the distributed device may feed back the locally optimal relationship network, and the corresponding network score gain. It will be appreciated that when the local network contains more than 2 nodes, the locally optimal connection structure may be modified several times. At this time, the network fractional gain of the local network may be a sum of network fractional gains caused by the plurality of modifications.
The network score may be obtained based on a network scoring model such as Heckerman, BIC score (Bayesian Information Criterion), or the like. Taking Heckerman as an example, assuming that the number of nodes is N, Heckerman can be described as:
Figure BDA0002383140080000151
Figure BDA0002383140080000152
Figure BDA0002383140080000153
Wherein r isiIs the number of state values (corresponding attribute classes) of the node i, qiIs the total number of state value combinations (corresponding attribute class combinations) of the parent node of node i, NijkThe value of the state value of the node i is k, and the combination of the state values of the father nodes is the sample count of the jth node, NijThe combination of the state values of the parent node of the node i is the sample count of the jth node, Γ is a gamma function (euler second integral), and C is a constant.
Thus, the local network after each change can be scored according to the joint probability distribution obtained in step 201, and the score is PHThe calculation result of (1) (hereinafter, denoted as S).
The net score gain is used to describe the degree of increase in net score quantification. The local network score before modification is recorded as S1The modified local network score is denoted S2The network score gain can be expressed as a variation value S of the network score2-S1Or the rate of change of the network score (S)2-S1)/S1And so on. The larger the network scoring gain is, the stronger the trend of the network changing to a good direction is shown, and if the network scoring gain is negative, the score is reduced, and the change does not make the relationship network develop to a good direction.
In this step, the distributed layout of the local network is split according to the network structure, so that the distributed detection of the conditional relationship change item in the initial relationship network is realized, the calculation amount of a single device can be greatly reduced, and a solution is provided for the bottleneck problem of overlarge calculation amount in the conditional relationship network.
Next, in step 203, iteratively update the condition relationship in the initial relationship network according to each network score gain obtained from the multiple distributed devices until there is no network score gain satisfying a predetermined gain condition, so as to obtain a current relationship network. It will be appreciated that the purpose of this step is to make the network structure adjust in the direction of higher network scores. Therefore, each time a network modification is made, a conditional relationship changing term with a network score gain as high as possible can be selected.
According to a possible design, the condition relation change item with the maximum network score gain is selected according to each condition relation change item fed back by the distributed devices and each network score gain, the initial relation network is updated to obtain an intermediate relation network, and then the intermediate relation network is further updated in an iterative mode. The process of updating the intermediate relationship network in a single iteration may be: and determining each candidate condition change item and each network score gain corresponding to each candidate condition change item aiming at the intermediate relationship network, and updating the intermediate relationship network by using the candidate condition change item which enables the network score gain to be maximum.
Wherein, each candidate condition alteration item determined for the intermediate relationship network may be all possible condition relationship changes in the current intermediate relationship network, such as condition relationship changes within the local networks, condition relationship changes between the local networks, and so on. It is worth noting that the candidate conditional alteration does not include at least a completed conditional relationship modification. For example, if the previous step modifies the connection edge pointing to the node B from the node a to the connection edge pointing to the node a from the node B, the connection edge pointing to the node a from the node B is not modified to the connection edge pointing to the node B from the node a in the subsequent process. This is because, if the connection line pointing to node B from node a is selected to be deleted, the deletion of this connection line results in a positive gain compared to the network structure in which it exists, and the addition of the connection line necessarily results in a negative gain. This changed state, which changes back to the original state, can therefore be excluded.
In the case where a single candidate condition modification item coincides with a condition relation change item of the local network fed back by the plurality of distributed devices, the network score gain of the condition relation change item of the corresponding local network may be used as the network score gain of the single candidate condition modification item. Thus, local optimization of the conditional relational network is facilitated. For example, the condition relationship changing task generated for each local network includes determining various possible condition relationship changing items of the corresponding local network, and determining that the corresponding local network is changed according to the various possible condition relationship changing items to obtain various network score gains, where the network score gains for the corresponding local networks are obtained, the network score gains for the corresponding local networks may be queried for the condition relationship changing items related to the inside of a certain local network.
Under the condition that a single candidate condition change item is inconsistent with a condition relation change item of any local network fed back by a plurality of distributed devices, a first network score of a current intermediate relation network and a second network score of a relation network after the single candidate condition change item is added to the intermediate relation network can be respectively determined, and corresponding network score gains are determined based on comparison between the second network score and the first network score. In general, the network score gain may be determined in this manner for candidate condition alterations involving two nodes respectively belonging between two local networks.
It can be understood that when the obtained relationship network is closer to the real relationship network, the network score gain will not be greatly increased, and at this time, the continuous modification of the network structure may increase the amount of calculation and affect the accuracy of the relationship network. Thus, in one embodiment, the predetermined gain condition that determines whether to stop iteratively updating the initial relationship network may be that the network score gain corresponding to the current candidate condition change is less than a predetermined threshold. The predetermined threshold may be a value set according to manual experience (e.g. 0 or a positive number close to 0, such as 0.0001), or may be a value determined by a machine learning method, which is not described herein again.
According to each network score gain, in the process of iteratively updating the condition relationship in the initial relationship network, the candidate condition change item with the current maximum network score gain can be used for updating every time, and a relatively better result relationship network can be obtained based on the current initial relationship network. It is worth noting that the resulting relationship network only includes whether a conditional relationship exists between nodes, and not a specifically quantified conditional relationship. The result relationship network can be directly used as the current relationship network for determining the quantitative condition relationship subsequently, and can also be compared with other result relationship networks, and a better result relationship network (such as a relationship network with the highest network score, voting on the connecting edges between every two nodes and the like) is taken as the current relationship network for determining the quantitative condition relationship subsequently. Wherein, other result relationship networks can be determined by the method of step 202 and step 203 respectively. Optionally, different initialization manners may be adopted to obtain each initial relationship network on which the other result relationship networks are based, so as to reduce the influence of the initialization manner of the initial relationship network on the result relationship network.
Then, in step 204, according to the finally determined current network structure and the joint probability distribution, each conditional probability distribution corresponding to each node is determined for the current network structure, so as to obtain a conditional relationship network formed by a plurality of nodes corresponding to the plurality of service states.
It can be understood that the finally updated current network structure is a relationship network optimized by network scoring, which describes the conditional relationship between nodes through directed connecting edges, and is equivalent to the determination of the network structure, and for the relationship network for service processing, the parameters of the network structure need to be further determined. For a conditional relational network, the parameter of its network structure may be a conditional probability distribution.
In the conventional technology, the network parameters of the graph model are usually described by the weights of the connecting edges, and for the conditional relationship network, the network parameters may describe the conditional probability of the current node under the condition of at least one parent node. Since each node has at least 2 values, the conditional probability includes a plurality of situations, constituting a conditional probability distribution. For each node for which a parent exists, its corresponding conditional probability distribution may be determined. It is understood that in the current network structure determined in step 203, there must be a node having a parent node (superordinate node). The following describes a specific case of conditional probability distribution, taking as an example any node (hereinafter referred to as a second node) in which a parent node exists.
And the second node corresponds to the second service state. And a parent node with a connecting edge pointing to the second node has a certain causal relationship with the second node. Or, its attribute class may affect the attribute class of the second node. The probability distribution corresponding to the second node is assumed to be the second probability distribution. It can be understood that the second probability distribution may be used to describe each probability of each attribute class of the second service state corresponding to the second node under the condition of the attribute class corresponding to the parent node of the second node.
For pairwise nodes, the Bayesian principle can be described as, for random events A and B, where the conditional probability P is(A | B) is the probability of A occurring if B occurs, assuming that the prior probability of event B occurring is P (B), the probability of event A occurring may be: p (a) ═ P (B) P (a | B). When there are multiple states of event B, any state is represented by i, which can be written as: p (a) ═ ΣiP(Bi)P(A|Bi)。
By way of example, referring to FIG. 1, consider a thunderstorm as event B and a rain event as event A, in all samples: the number of lightning strikes is ndN (1, 1, 1) + n (1, 0, 1) + n (1, 1, 0) + n (1, 0, 0); in case of thunderstorm, the number of raining occurrences is n dxN (1, 1, 1) + n (1, 1, 0). Then the prior probability of lightning strike is Pd=ndN, when a lightning strike event occurs, the conditional probability of rain is P (x | d) ═ Ndx/nd. For example, when P (x | d) is 1, it means that lightning inevitably causes rain, and when P (x | d) is 0, it means that lightning does not cause rain. Similarly, the conditional probability that rain respectively occurs or does not occur under the condition that no lightning occurs can be determined. The number of times of no lightning strike is n~dN (0, 1, 1) + n (0, 0, 1) + n (0, 1, 0) + n (0, 0, 0); in the case of no thunderstorm, the number of raining occurrences is n~dxN (0, 1, 1) + n (0, 1, 0). When the lightning strike does not occur, the conditional probability of rain is P (x | -d) ═ n~dx/n~d. As such, it may be determined that the conditional probability distribution corresponding to the node raining includes: p (rain 1| strike 1) ═ a ═ P (x | d), P (rain 0| strike 1) ═ 1-a, P (rain 1| strike 0) ═ b ═ P (x | -d), and P (ground wet 0| rain 0) ═ 1-b. Wherein a and b can be obtained by the above calculation. It is easy to know that n (1, 1, 1), n (1, 0, 1), n (1, 1, 0), n (1, 0, 0), etc. can be obtained by the joint probability distribution counted in step 201.
When there are a plurality of father nodes of the second node, the conditional probability distribution is that the second node corresponds to each probability of each attribute class of the second service state under the condition of each possible attribute class combination of each father node. For example, if the node A, B, C has possible values of 0 and 1 for the node a, 0 and 1 for the node B, and 0, 1 and 2 for the node C, respectively, the probability distribution corresponding to the node C may include: { P (a ═ 0, B ═ 0| C ═ 0), P (a ═ 0, B ═ 0| C ═ 1), P (a ═ 0, B ═ 0| C ═ 2) }, { P (a ═ 0, B ═ 1| C ═ 0), P (a ═ 0, B ═ 1| C ═ 1), P (a ═ 0, B ═ 1| C ═ 2) }, { P (a ═ 1, B ═ 1| C ═ 0), P (a ═ 1, B ═ 1| C ═ 1), P (a ═ 1, B ═ 1| C ═ 2), P (a ═ 1, B ═ 0| C ═ 0), B ═ 0, B ═ 1, B ═ 0, C ═ 1, and the like. The calculation method is the same as above.
Optionally, a certain disturbance may be added when determining the prior probability, so as to assign a value to the prior probability when the sample size is too small or the number of samples conforming to a certain state is too small. For example, the prior probability of a thunder is Pd=(nd+ s)/(N + t), where s and t are preset perturbation values, e.g. both integers close to 0, and t is greater than s.
In one possible design, the conditional relationship network may be used not only for cause-to-effect forecasted traffic, but also for effect-cause forecasted traffic. For example, the weather conditions, the daily business flow rate of the clothing merchant, the daily business flow rate of the rain gear merchant, and the like are taken as corresponding states and correspond to different nodes, and generally, the daily business flow rate of the clothing merchant and the daily business flow rate of the rain gear merchant can be predicted according to the weather conditions, otherwise, the weather conditions can be reversely predicted based on the daily business flow rate of the clothing merchant and the daily business flow rate of the rain gear merchant. In this case, the cause is denoted as a and the effect is denoted as B, and the bayes principle is as follows:
Figure BDA0002383140080000201
where A represents non-A, i.e., A does not occur, in the above example, lightning strike does not occur. P (B | A), (A), P (B | A), P (A) can be determined based on the joint probability distribution table, and will not be described herein.
That is, for a third node where a child node exists, the reverse conditional probability distribution may also be determined by the third node and its child nodes.
According to an embodiment, when determining each conditional probability distribution corresponding to each node, the determining may be performed by a distributed task, for example, a task is generated by determining the conditional probability distribution of the first node, and is distributed to the first distributed device, so that the first distributed device determines the first conditional probability distribution of the first node based on the joint probability distribution determined in step 201, and performs feedback.
In this way, a conditional relationship network including a connection relationship and a conditional probability distribution (corresponding to a network parameter) can be obtained. The conditional relational network can be used for predicting the business processing of the attribute classes of other nodes under the current state of a plurality of given nodes.
Referring to fig. 3, a flow of a method for predicting node states by the conditional relationship network obtained by the embodiment shown in fig. 2 is described. As can be seen from fig. 2, the conditional relationship network includes nodes corresponding to a plurality of predetermined service states, and describes the conditional relationship through directed connection edges between the nodes. The parent node is the condition of the child node, and the child node corresponds to the conditional probability distribution under the condition of various attribute category combinations of the service state corresponding to the parent node.
As shown in fig. 3, a method for processing a service by predicting an attribute category of a service state through a conditional relationship network according to one embodiment includes the following steps: step 301, acquiring attribute categories of a plurality of service states from service data to be processed; step 302, taking other nodes except each node corresponding to the plurality of service states in the conditional relationship network as nodes to be predicted, and generating a plurality of subtasks for performing attribute type sampling on the nodes to be predicted; 303, distributing each subtask to a plurality of distributed devices, so that each distributed device performs attribute type sampling on at least one node to be predicted according to a conditional relationship network and each attribute type corresponding to the plurality of service states respectively; and 304, determining each attribute type corresponding to each node to be predicted respectively based on each sampling result.
First, in step 301, attribute categories corresponding to a plurality of service states are obtained from service data to be processed. The service data may be data describing at least one service state in a corresponding scenario, for example, in the scenario shown in fig. 1, the service data may be "rained," and the attribute category of the service state that may correspond to "raining" is "occurrence. Alternatively, the node corresponding to the traffic state of "raining" in the conditional relationship network may have a state value of 1.
Next, in step 302, nodes other than the nodes corresponding to the plurality of service states in the conditional access network are used as nodes to be predicted, and a plurality of subtasks for performing attribute type sampling on the nodes to be predicted are generated. It can be understood that in the conditional relationship network, except for the node with the known attribute category, the attribute categories of other nodes to be determined may be all used as the nodes to be predicted.
One subtask may be a task of sampling for one node to be predicted for multiple times, or may be a task of completing sampling for all nodes to be predicted once. The subtask can sample the attribute type of at least one node to be predicted according to the conditional relationship network and the attribute types of the nodes determined by the service data. Alternatively, the sampling result of the node to be predicted may be determined under its corresponding conditional probability distribution based on the attribute categories of its respective parent nodes.
As an example, assuming that a conditional relationship network includes A, B, C, D four nodes corresponding to four states, if the business data includes A, C description information of the corresponding business state, the attribute class of the node A, C may be determined. Node B, D is the node to be predicted. The attribute type of the node A, C may be represented by a state value, such as a-1 and C-0. One subtask may be a task of sampling the attribute class of the node B, D once based on the conditional probability distribution of each node in the conditional relationship network. Since the sampling is performed according to the conditional probability distribution, a large number of the sampling results of the node B, D satisfy the corresponding conditional probability distribution. Assuming that the parent node of the node B is a, and the candidate attribute type of the node B corresponds to the state values 0 and 1, the sampling result of the node B satisfies the following probability distribution: the probability of the attribute class corresponding to 0 is taken as P (B0 | a ═ 1), and the probability of the attribute class corresponding to 1 is taken as P (B1 | a ═ 1). Assuming that the parent node of the node D is the node A, C, and the candidate attribute class of the node D is represented by the state values 0 and 1, the probability distribution that the sampling result of the node D satisfies is: the probability of the attribute class corresponding to 0 is taken as P (D ═ 0| a ═ 1, and C ═ 0), and the probability of the attribute class corresponding to 1 is taken as P (D ═ 1| a ═ 1, and C ═ 0).
Then, through step 303, each subtask is distributed to a plurality of distributed devices, so that each distributed device performs attribute class sampling on at least one node to be predicted according to the conditional relationship network and each attribute class corresponding to each of the plurality of service states. A distributed device may handle one or more sampling tasks.
The distributed device samples the node to be predicted once, and a corresponding sampling result can be obtained. Because the probability distribution of the node to be predicted meets the corresponding conditional probability distribution under a large number of sampling results, if only the sampling results are recorded, the results are not accurate enough. In an alternative implementation, the confidence of the sampling result may also be recorded at the same time. The confidence may be the product of the conditional probability of the respective node to be predicted under the given attribute category of its respective parent node and a predetermined weight. And the weight corresponding to the parent node of the given state value in the service data is the conditional probability corresponding to the corresponding value. The predetermined weight is an initial weight set for the sampling result of the node to be predicted, and is, for example, 1.
Still taking the conditional relationship network including the node A, B, C, D in step 302 as an example, assuming that the current sampling result of the node D is the attribute class corresponding to 0, the confidence corresponding thereto is: p (D ═ 0| a ═ 1, C ═ 0) × a predetermined weight. If the conditional relationship network further includes a node E, and the node D is the only parent node of the node E, when the current sampling result of the node E is 0, the confidence level may be: p (D ═ 0| a ═ 1, C ═ 0) × P (E ═ 0| D ═ 0) × predetermined weight. It should be noted that, in the sampling process, the sampling may be performed in the order from top to bottom according to the direction of the connecting edge in the conditional relationship network, and then, for each node to be predicted, each parent node thereof is already determined. For each node corresponding to the attribute class that can be obtained from the service data to be processed, the corresponding attribute class can be directly used as input.
In an optional implementation manner, a confidence may be determined for all values of each node to be predicted, where the confidence is, for example, a product of confidences corresponding to each node at the last stage after traversing all nodes from top to bottom. For example, in the above example where B, D, E is a node to be predicted, the multiple complete sampling results of each node to be predicted are obtained as follows:
Figure BDA0002383140080000231
the first three columns respectively correspond to the sampling result of the node B, D, E to be predicted, each row corresponds to one sampling, and the last column corresponds to the confidence of the sampling result.
Through the step 302 and the step 303, sampling for the node to be predicted can be realized in a distributed manner, so that service processing of the attribute type of the service state predicted by the conditional relationship network with large data processing amount can be realized, and the data amount bottleneck problem of the conditional relationship network is solved.
Further, in step 304, based on each sampling result, each attribute category corresponding to each node to be predicted is determined.
In one embodiment, for a node to be predicted, the attribute class with the highest frequency of occurrence in each sampling result may be determined as the corresponding attribute class. For example, in the example in step 303, the attribute class of the node B to be predicted is the attribute class corresponding to the state value 1.
In another embodiment, the attribute category corresponding to the sampling result with the highest confidence coefficient may be determined as the current attribute category of the corresponding node to be predicted.
In another embodiment, for a node to be predicted, probability distributions of the predicted node on candidate attribute categories may be determined based on respective sampling results, and then an attribute category satisfying a predetermined probability condition in the probability distributions is selected as a current attribute category of the predicted node. The predetermined probability condition is, for example, that the corresponding maximum probability value and/or the corresponding probability value is greater than a predetermined threshold in the probability distribution, etc.
For example, in the example of step 303, the probability distribution of the node B on each candidate attribute category may be:
P(B=1)=(0.1+0.4+0.3)/(0.1+0.2+0.3+0.4)=0.8
P(B=0)=(0.2)/(0.1+0.2+0.3+0.4)=0.2;
the probability distribution of node D over each candidate attribute class may be:
P(D=1)=0.2/(0.1+0.2+0.3+0.4)=0.2
P(D=0)=(0.1+0.3+0.4)/(0.1+0.2+0.3+0.4)=0.8;
the probability distribution of node E over each candidate attribute class may be:
P(E=1)=(0.1+0.3)/(0.1+0.2+0.3+0.4)=0.4
P(E=0)=(0.2+0.4)/(0.1+0.2+0.3+0.4)=0.6。
assuming that the predetermined probability condition is the maximum probability value in the corresponding probability distribution, the predicted state values for node D and node E take 0 respectively, their attribute categories are the attribute categories corresponding to the respective state values 0, and the predicted state value for node B takes 1, and its attribute category is the attribute category corresponding to the state value 1.
In more embodiments, the attribute categories corresponding to the nodes to be predicted may also be determined based on the sampling results in other manners, which are not described herein again.
Reviewing the above process, in the process of constructing the conditional relational network, based on the joint probability distribution of the attribute types of each service state, when the connection edge in the initial relational network is updated, a plurality of local networks are split to perform distributed data processing, so that the data processed by a single task only comprises the joint probability distribution data and the local network data with one node as the reference. Furthermore, in the process of processing the service data by using the conditional relational network, based on the concept of distributed data processing, a plurality of subtasks for performing attribute type sampling on the node to be predicted are distributed to a plurality of distributed devices for processing, so that the data processing amount of a single subtask is greatly reduced. In summary, embodiments provided herein can address the data volume bottleneck problem in the practice of conditional relational network applications.
According to an embodiment of another aspect, an apparatus for constructing a conditional relationship network is also provided. The conditional relationship network is used for describing conditional relationships among a plurality of preset service states, and comprises nodes respectively corresponding to the service states in the service data and directed connection edges indicating the conditional relationships among the nodes, wherein a single service state corresponds to at least one attribute category. Fig. 4 shows a schematic block diagram of an apparatus for building a conditional relationship network according to one embodiment. As shown in fig. 4, the apparatus 400 for constructing a conditional relationship network includes:
a statistical unit 41 configured to determine a joint probability distribution of the attribute types of each service state according to the attribute types of each sample in the plurality of service states, where each sample corresponds to each sample service data;
a splitting unit 42 configured to split each local network based on an initial relationship network obtained by initializing a conditional relationship between each node in a predetermined manner, and generate each conditional relationship change task for each local network, respectively, to distribute to a plurality of distributed devices, so that a single distributed device receives the conditional relationship change task of the corresponding local network, and obtains a network score gain of the local network after changing the conditional relationship compared with the initial local network based on joint probability distribution under a condition that the conditional relationship of the corresponding local network is changed;
A network structure determining unit 43 configured to iteratively update the conditional relationship in the initial relationship network according to each network score gain obtained from the multiple distributed devices until there is no network score gain satisfying a predetermined gain condition, so as to obtain a current relationship network;
a probability distribution determining unit 44 configured to determine, based on the current relationship network and the joint probability distribution, each conditional probability distribution corresponding to each node for the current relationship network, so as to obtain a conditional relationship network formed by each node corresponding to a plurality of service states, so that the conditional relationship network is used for predicting service processing of attribute classes of other nodes under the condition of the current attribute classes of the given plurality of nodes, and the conditional probability distribution is used for describing the probability that the service state of the corresponding node is in each attribute class under the condition of the service state corresponding to each parent node of the corresponding node.
According to one embodiment, the joint probability distribution includes probabilities respectively corresponding to various attribute class combinations in a plurality of service states, and a single attribute class combination is a combination formed by taking one attribute class in each of the plurality of service states;
The statistical unit 41 may be further configured to:
generating a first statistical task aiming at a first attribute category combination in various attribute category combinations;
distributing the first statistical task to first distributed equipment so that the first distributed equipment can count the number of first samples corresponding to the first attribute category combination and feed back the first samples;
and taking the number of the first samples or the ratio of the number of the first samples to the number of the whole samples as the probability corresponding to the first attribute type combination.
In one embodiment, the attribute classes in a single traffic state are each described by a respective state value on a single node corresponding to the single traffic state.
In an embodiment, the predetermined manner includes that a predetermined number of directed connection edges are randomly added between nodes, the plurality of service states include a first service state, the first service state corresponds to a first node, and the splitting unit 42 is further configured to:
and determining a local network formed by the first node and the father node thereof from the initial relation network, and taking the local network as a first local network split by taking the first node as a reference.
According to one embodiment, the network structure determining unit 43 may be further configured to:
Updating the condition relation change item of the local network corresponding to the maximum network score gain to the initial relation network to obtain an intermediate relation network;
determining each candidate condition change item aiming at the intermediate relationship network, and each network score gain corresponding to each candidate condition change item respectively, wherein under the condition that a single candidate condition change item is consistent with the condition relationship change item of the local network fed back by the plurality of distributed devices, the network score gain of the condition relationship change item of the corresponding local network is used as the network score gain of the single candidate condition change item, and under the condition that the single candidate condition change item is inconsistent with the condition relationship change item of the local network fed back by the plurality of distributed devices, the corresponding network score gain is determined by comparing the relationship network added with the single candidate condition change item in the intermediate relationship network with the current intermediate relationship network;
the intermediate relationship network is updated with the candidate conditional alteration that maximizes the gain of the network score.
In one embodiment, the predetermined gain condition is that the net fractional gain is positive.
In a further embodiment, the network structure determining unit 43 may be further configured to:
Comparing an intermediate relationship network obtained by iteratively updating the condition relationship in the initial relationship network with a plurality of other intermediate relationship networks obtained by iteratively updating the condition relationship based on a plurality of other initial relationship networks obtained by initializing the condition relationship among the nodes in other predetermined manners;
and selecting the intermediate relationship network with the highest network score as the current relationship network.
According to one possible design, the plurality of service states include a second service state, the second service state corresponds to a second node, each parent node of the second node corresponds to a second attribute type combination, and the conditional probability distribution corresponding to the second node includes each probability that the second node corresponds to each attribute type of the second service state, respectively, under the second attribute type combination.
It should be noted that the apparatus 400 shown in fig. 4 is an apparatus embodiment corresponding to the method embodiment shown in fig. 2, and the corresponding description in the method embodiment shown in fig. 2 is also applicable to the apparatus 400, and is not repeated herein.
According to an embodiment of another aspect, an apparatus for conditional business processing for predicting attribute categories of business states through a conditional relationship network is also provided. The conditional relationship network therein may be determined by the apparatus 400 shown in fig. 4. Fig. 5 shows a schematic block diagram of a conditional traffic handling apparatus predicting an attribute class of a traffic state over a conditional relational network according to one embodiment. As shown in fig. 5, the apparatus 500 includes:
An obtaining unit 51, configured to obtain, from service data to be processed, attribute categories corresponding to a plurality of service states respectively;
the generating unit 52 is configured to use, as nodes to be predicted, other nodes in the conditional relationship network except the nodes corresponding to the service states, and generate a plurality of subtasks for performing attribute class sampling on the nodes to be predicted;
the distribution unit 53 is configured to distribute each sub-task to a plurality of distributed devices, so that each distributed device performs attribute type sampling on at least one node to be predicted according to the conditional relationship network and each attribute type corresponding to a plurality of service states;
and the determining unit 54 is configured to determine, based on the sampling results, attribute categories corresponding to the nodes to be predicted respectively.
In one embodiment, the node to be predicted comprises a first node to be predicted, and the sampling result of the first node to be predicted is the sampling result under the corresponding conditional probability distribution according to the attribute category of each parent node of the first node to be predicted.
In a further embodiment, the sampling result of the first node to be predicted comprises a first sampling result, the first sampling result corresponds to a first confidence coefficient, the first confidence coefficient is a product of a conditional probability of the first node to be predicted under the given attribute category of each parent node and a predetermined weight, and the predetermined weight is an initial weight of the first sampling result.
According to one possible design, the determining unit 54 may be further configured to:
determining probability distribution of the first prediction node on each candidate attribute category based on each sampling result, wherein the probability distribution comprises a first probability that the prediction attribute category of the first node to be predicted is the first attribute category, and the first probability is as follows: the attribute type of the first node to be predicted is the ratio of the confidence sum of the sampling results of the first attribute type to the confidence sum of each sampling result;
and determining the current attribute category of the first prediction node as the first attribute category under the condition that the first probability meets a preset probability condition.
It should be noted that the apparatus 500 shown in fig. 5 is an apparatus embodiment corresponding to the method embodiment shown in fig. 3, and the corresponding description in the method embodiment shown in fig. 3 is also applicable to the apparatus 500, and is not repeated herein.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2 or fig. 3.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 2 or fig. 3.
Those skilled in the art will recognize that the functions described in the embodiments of this specification may be implemented in hardware, software, firmware, or any combination thereof, in one or more of the examples described above. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments are intended to explain the technical idea, technical solutions and advantages of the present specification in further detail, and it should be understood that the above-mentioned embodiments are merely specific embodiments of the technical idea of the present specification, and are not intended to limit the scope of the technical idea of the present specification, and any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the embodiments of the present specification should be included in the scope of the technical idea of the present specification.

Claims (15)

1. A method for constructing a conditional relationship network is used for describing the conditional relationship among a plurality of preset service states, and comprises nodes respectively corresponding to the service states in service data and a directed connection edge indicating the conditional relationship among the nodes, wherein a single service state corresponds to at least one attribute category;
The method comprises the following steps:
determining the joint probability distribution of the attribute types of each service state according to the attribute types of each sample on the plurality of service states, wherein each sample corresponds to each sample service data;
based on an initial relationship network obtained by initializing the conditional relationship among the nodes in a preset mode, splitting each local network by taking each node as a reference, and generating each conditional relationship change task for each local network respectively to distribute to a plurality of distributed devices, so that after a single distributed device receives the conditional relationship change task of the corresponding local network, based on the joint probability distribution, under the condition that the conditional relationship of the corresponding local network is changed, a network score gain of the local network with the changed conditional relationship compared with the initial local network is obtained;
iteratively updating the condition relation in the initial relation network according to each network score gain obtained from the plurality of distributed devices until no network score gain meeting a preset gain condition exists, and obtaining a current relation network;
and determining each conditional probability distribution corresponding to each node for the current relationship network based on the current relationship network and the joint probability distribution, so as to obtain a conditional relationship network formed by each node corresponding to the plurality of service states, wherein the conditional relationship network is used for predicting service processing of attribute classes of other nodes under the condition of the current attribute classes of the given plurality of nodes, and the conditional probability distribution is used for describing the probability that the service state of the corresponding node is each attribute class under the condition of the service state corresponding to each father node of the corresponding node.
2. The method of claim 1, wherein the joint probability distribution comprises respective probabilities corresponding to respective combinations of attribute classes across the plurality of traffic states, a single attribute class combination being a combination of one attribute class across the plurality of traffic states;
determining the joint probability distribution of the attribute categories of each service state according to the attribute categories of each sample on the plurality of service states respectively comprises:
generating a first statistical task aiming at a first attribute category combination in various attribute category combinations;
distributing the first statistical task to first distributed equipment so that the first distributed equipment can count the number of first samples corresponding to the first attribute category combination and feed back the first samples;
and taking the number of the first samples or the ratio of the number of the first samples to the number of the whole samples as the probability corresponding to the first attribute type combination.
3. The method of claim 1, wherein each attribute class in a single traffic state is described by a respective state value on a single node corresponding to the single traffic state.
4. The method according to claim 1, wherein the predetermined manner includes randomly adding a predetermined number of directed connection edges between nodes, the plurality of service states include a first service state, the first service state corresponds to a first node, and splitting each local network based on an initial relationship network obtained by initializing a conditional relationship between nodes in a predetermined manner, with each node as a reference includes:
And determining a local network formed by the first node and a parent node thereof from the initial relationship network, and taking the local network as a first local network split by taking the first node as a reference.
5. The method of claim 1, wherein iteratively updating the conditional relationships in the initial relationship network as a function of each network score gain until there are no network score gains that satisfy a predetermined gain condition comprises:
updating the condition relation change item of the local network corresponding to the maximum network score gain to the initial relation network to obtain an intermediate relation network;
determining each candidate condition change item aiming at the intermediate relationship network, and each network score gain corresponding to each candidate condition change item respectively, wherein under the condition that a single candidate condition change item is consistent with the condition relationship change item of the local network fed back by the plurality of distributed devices, the network score gain of the condition relationship change item of the corresponding local network is used as the network score gain of the single candidate condition change item, and under the condition that the single candidate condition change item is inconsistent with the condition relationship change item of the local network fed back by the plurality of distributed devices, the corresponding network score gain is determined by comparing the relationship network added with the single candidate condition change item in the intermediate relationship network with the current intermediate relationship network;
The intermediate relationship network is updated with the candidate conditional alteration that maximizes the gain of the network score.
6. The method of claim 1 or 5, the predetermined gain condition being that the net fractional gain is positive.
7. The method of claim 5, wherein iteratively updating the conditional relationships in the initial relationship network according to each network score gain until there is no network score gain that satisfies a predetermined gain condition, resulting in a current relationship network comprises:
comparing an intermediate relationship network obtained by updating the condition relationship in the initial relationship network aiming at iteration with a plurality of other intermediate relationship networks obtained by initializing the condition relationship among the nodes in other predetermined modes and updating the condition relationship through iteration;
and selecting the intermediate relationship network with the highest network score as the current relationship network.
8. The method of claim 1, wherein the plurality of traffic states includes a second traffic state, the second traffic state corresponds to a second node, each parent node of the second node corresponds to a second attribute class combination, and the conditional probability distribution corresponding to the second node includes each probability that the second node corresponds to each attribute class of the second traffic state, respectively, under the second attribute class combination.
9. A conditional service processing method of predicting an attribute class of a service state by a conditional relationship network determined by the manner of claim 1, the method comprising:
acquiring a plurality of attribute categories respectively corresponding to the service states from service data to be processed;
taking other nodes except the nodes corresponding to the plurality of service states in the conditional relationship network as nodes to be predicted, and generating a plurality of subtasks for performing attribute category sampling on the nodes to be predicted;
distributing each subtask to a plurality of distributed devices, so that each distributed device samples at least one node to be predicted according to the conditional relationship network and each attribute category corresponding to the plurality of service states, and the sampling result of a single node to be predicted is determined under the corresponding conditional probability distribution based on the attribute category of each father node of the single node to be predicted;
and determining each attribute type corresponding to each node to be predicted respectively based on each sampling result.
10. The method of claim 9, wherein the node to be predicted comprises a first node to be predicted, the sampled result of the first node to be predicted comprises a first sampled result, the first sampled result corresponds to a first confidence level, the first confidence level is a product of a conditional probability of the first node to be predicted under a given attribute category of each parent node and a predetermined weight, and the predetermined weight is an initial weight of the first sampled result.
11. The method according to claim 10, wherein the attribute class corresponding to the first node to be predicted comprises a first attribute class, and the determining, based on the respective sampling results, each attribute class corresponding to each node to be predicted comprises:
determining a probability distribution of the first node to be predicted on each candidate attribute class based on each sampling result, wherein the probability distribution comprises a first probability that the predicted attribute class of the first node to be predicted is the first attribute class, and the first probability is: the attribute type of the first node to be predicted is the ratio of the confidence coefficient sum of the sampling results of the first attribute type to the confidence coefficient sum of each sampling result;
determining that the current attribute category of the first node to be predicted is the first attribute category if the first probability satisfies a predetermined probability condition.
12. A device for constructing a conditional relationship network is used for describing the conditional relationship among a plurality of preset service states, and comprises nodes respectively corresponding to the service states in service data and directed connection edges indicating the conditional relationship among the nodes, wherein a single service state corresponds to at least one attribute category;
The device comprises:
the statistical unit is configured to determine the joint probability distribution of the attribute types of the service states according to the attribute types of the samples on the service states, wherein the samples correspond to the service data of the samples respectively;
the splitting unit is configured to split each local network by taking each node as a reference based on an initial relationship network obtained by initializing a conditional relationship among the nodes in a predetermined manner, and generate each conditional relationship change task for each local network respectively to distribute to a plurality of distributed devices, so that after a single distributed device receives the conditional relationship change task of the corresponding local network, a network score gain of the local network after the conditional relationship is changed compared with the initial local network is obtained based on the joint probability distribution under the condition that the conditional relationship of the corresponding local network is changed;
a network structure determining unit configured to iteratively update the conditional relationship in the initial relationship network according to each network score gain obtained from the plurality of distributed devices until no network score gain satisfying a predetermined gain condition exists, so as to obtain a current relationship network;
And a probability distribution determining unit configured to determine, based on the current relationship network and the joint probability distribution, each conditional probability distribution corresponding to each node for the current relationship network, so as to obtain a conditional relationship network formed by each node corresponding to the multiple service states, so that the conditional relationship network is used to predict service processing of attribute classes of other nodes under the condition of the current attribute classes of the given nodes, and the conditional probability distribution is used to describe the probability that the service state of the corresponding node is in each attribute class under the condition of the service state corresponding to each parent node of the corresponding node.
13. A conditional transaction apparatus for predicting attribute classes of a transaction state via a conditional relationship network determined by the apparatus of claim 12, the transaction apparatus comprising:
the acquisition unit is configured to acquire each attribute type corresponding to a plurality of service states from service data to be processed;
the generating unit is configured to take other nodes except the nodes corresponding to the plurality of service states in the conditional relational network as nodes to be predicted, and generate a plurality of subtasks for performing attribute category sampling on the nodes to be predicted;
The distribution unit is configured to distribute each subtask to a plurality of distributed devices, so that each distributed device samples at least one node to be predicted according to the conditional relationship network and each attribute type corresponding to the plurality of service states, and the sampling result of a single node to be predicted is determined under the corresponding conditional probability distribution based on the attribute type of each father node of the single node to be predicted;
and the determining unit is configured to determine each attribute category corresponding to each node to be predicted respectively based on each sampling result.
14. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-11.
15. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-11.
CN202010089190.4A 2020-02-12 2020-02-12 Method and device for constructing conditional relational network and processing conditional service Active CN111325254B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010089190.4A CN111325254B (en) 2020-02-12 2020-02-12 Method and device for constructing conditional relational network and processing conditional service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010089190.4A CN111325254B (en) 2020-02-12 2020-02-12 Method and device for constructing conditional relational network and processing conditional service

Publications (2)

Publication Number Publication Date
CN111325254A CN111325254A (en) 2020-06-23
CN111325254B true CN111325254B (en) 2022-06-28

Family

ID=71172726

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010089190.4A Active CN111325254B (en) 2020-02-12 2020-02-12 Method and device for constructing conditional relational network and processing conditional service

Country Status (1)

Country Link
CN (1) CN111325254B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723872B (en) * 2020-06-24 2023-04-07 浙江大华技术股份有限公司 Pedestrian attribute identification method and device, storage medium and electronic device
CN113256275B (en) * 2021-07-14 2021-11-02 支付宝(杭州)信息技术有限公司 Expert system updating method, service processing method and device
WO2023054112A1 (en) * 2021-10-01 2023-04-06 株式会社日立製作所 Travel pattern generation device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107071846A (en) * 2017-04-01 2017-08-18 西安邮电大学 The distributed quick common recognition method of Ad Hoc one way link networks non-stop layer
CN107103000A (en) * 2016-02-23 2017-08-29 广州启法信息科技有限公司 It is a kind of based on correlation rule and the integrated recommended technology of Bayesian network
CN108512765A (en) * 2017-02-28 2018-09-07 中国科学院声学研究所 A kind of Web content method of diffusion based on network node distribution Pagerank
US10445170B1 (en) * 2018-11-21 2019-10-15 Fmr Llc Data lineage identification and change impact prediction in a distributed computing environment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170372212A1 (en) * 2016-06-28 2017-12-28 Ca, Inc. Model based root cause analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103000A (en) * 2016-02-23 2017-08-29 广州启法信息科技有限公司 It is a kind of based on correlation rule and the integrated recommended technology of Bayesian network
CN108512765A (en) * 2017-02-28 2018-09-07 中国科学院声学研究所 A kind of Web content method of diffusion based on network node distribution Pagerank
CN107071846A (en) * 2017-04-01 2017-08-18 西安邮电大学 The distributed quick common recognition method of Ad Hoc one way link networks non-stop layer
US10445170B1 (en) * 2018-11-21 2019-10-15 Fmr Llc Data lineage identification and change impact prediction in a distributed computing environment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
U-2-Tree: A Universal Two-Layer Distributed Indexing Scheme for Cloud Storage System;Gao, XF,et.al;《IEEE-ACM TRANSACTIONS ON NETWORKING》;20190228;全文 *
区块链技术研究综述:原理、进展与应用;曾诗钦;《区块链技术研究综述:原理、进展与应用》;20200108;全文 *

Also Published As

Publication number Publication date
CN111325254A (en) 2020-06-23

Similar Documents

Publication Publication Date Title
US10360517B2 (en) Distributed hyperparameter tuning system for machine learning
CN111325254B (en) Method and device for constructing conditional relational network and processing conditional service
US10218808B2 (en) Scripting distributed, parallel programs
US20220076150A1 (en) Method, apparatus and system for estimating causality among observed variables
US20230102337A1 (en) Method and apparatus for training recommendation model, computer device, and storage medium
US20170161641A1 (en) Streamlined analytic model training and scoring system
Cao et al. Data mining for business applications
CN112765477B (en) Information processing method and device, information recommendation method and device, electronic equipment and storage medium
US11797885B2 (en) Optimizations for machine learning data processing pipeline
US11748452B2 (en) Method for data processing by performing different non-linear combination processing
US10963802B1 (en) Distributed decision variable tuning system for machine learning
Džamić et al. Ascent–descent variable neighborhood decomposition search for community detection by modularity maximization
CN103336791A (en) Hadoop-based fast rough set attribute reduction method
US11521077B1 (en) Automatic recommendation of predictor variable values for improving predictive outcomes
CN112925821B (en) MapReduce-based parallel frequent item set incremental data mining method
Wu et al. Reliability allocation model and algorithm for phased mission systems with uncertain component parameters based on importance measure
CN112162860A (en) CPU load trend prediction method based on IF-EMD-LSTM
CN103902582B (en) A kind of method and apparatus for reducing data warehouse data redundancy
US10127192B1 (en) Analytic system for fast quantile computation
US20220172087A1 (en) Data source correlation techniques for machine learning and convolutional neural models
US20220188315A1 (en) Estimating execution time for batch queries
CN115794586A (en) Cloud server software aging prediction method, device, equipment and medium
Bolshakov et al. Comparative analysis of machine learning methods to assess the quality of IT services
CN116089886A (en) Information processing method, device, equipment and storage medium
US20230162518A1 (en) Systems for Generating Indications of Relationships between Electronic Documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant