CN111325246A

CN111325246A - Region selection method and device, computer equipment and storage medium

Info

Publication number: CN111325246A
Application number: CN202010081902.8A
Authority: CN
Inventors: 刘志煌
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-02-06
Filing date: 2020-02-06
Publication date: 2020-06-23
Anticipated expiration: 2040-02-06
Also published as: CN111325246B

Abstract

The application relates to a region selection method, a region selection device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a regional network graph comprising regional nodes and edges; a portion of the regional nodes carry target values; dividing the regional network graph according to the regional similarity between regional nodes corresponding to the edges to obtain at least one sub-regional network; the same sub-area network comprises area nodes which are gathered into one type; determining the occupation ratio of the area nodes with the target values larger than or equal to the target value threshold value aiming at each sub-area network; selecting a target sub-area network from each sub-area network according to the ratio; the fraction in the target sub-area network is greater than the fraction in a non-target sub-area network; and selecting a target area node from the target sub-area network. The scheme of the application can save cost.

Description

Region selection method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of computer technology and machine learning technology, and in particular, to a region selection method, apparatus, computer device, and storage medium.

Background

With the rapid development of scientific technology, machine learning technology is more and more emphasized. There are more and more scenarios where machine learning techniques are applied, such as automatically selecting a target region through machine learning techniques.

In the conventional method, when a target area is selected by a machine learning technique, a large number of delivered areas and target values need to be prepared as training samples and sample labels, and machine learning training is performed with supervision to generate a machine learning model. And then, based on the machine learning model, the non-placement area is selected in a prediction mode. Thus, a large number of target values are used as sample labels, which requires a large cost.

Disclosure of Invention

In view of the above, it is necessary to provide a region selection method, an apparatus, a computer device and a storage medium for solving the problem of high cost of the conventional method.

A method of region selection, the method comprising:

acquiring a regional network graph comprising regional nodes and edges; a portion of the regional nodes carry target values;

dividing the regional network graph according to the regional similarity between regional nodes corresponding to the edges to obtain at least one sub-regional network; the same sub-area network comprises area nodes which are gathered into one type;

determining the occupation ratio of the area nodes with the target values larger than or equal to the target value threshold value aiming at each sub-area network;

selecting a target sub-area network from each sub-area network according to the ratio; the fraction in the target sub-area network is greater than the fraction in a non-target sub-area network;

and selecting a target area node from the target sub-area network.

In one embodiment, the obtaining the area network graph including the area nodes and the edges includes:

acquiring a region feature vector of an initial region;

mapping the region feature vector into region nodes in a space, and determining the region similarity between every two region nodes according to the region feature vector;

and establishing edges between the area nodes with the area similarity larger than or equal to the similarity threshold value to generate an area network graph.

In one embodiment, the obtaining the region feature vector of the initial region includes:

acquiring region attribute characteristics of an initial region and resource transfer characteristics generated in the initial region;

and fusing the region attribute features and the resource transfer features corresponding to the same initial region to generate region feature vectors of each initial region.

In an embodiment, the dividing the area network graph according to the area similarity between the area nodes corresponding to the edges to obtain at least one sub-area network includes:

determining the transition probability between the region nodes according to the region similarity between the region nodes corresponding to each edge; the transition probability is positively correlated with the region similarity;

carrying out random walk in the area network graph according to the transition probability, and determining a first occurrence probability of each area node and a second occurrence probability of a cluster type in the random walk process;

determining the shortest average coding length of a sequence generated by coding the result of random walk according to the first occurrence probability and the second occurrence probability;

and clustering each region node in the region network graph by minimizing the shortest average coding length to obtain at least one subregion network.

In one embodiment, the determining, according to the first occurrence probability and the second occurrence probability, a shortest average encoding length of a sequence generated by encoding a result of the random walk includes:

determining a first shortest average coding length of region nodes in the same cluster category in a sequence generated by coding a random walk result according to the first occurrence probability;

determining a second shortest average coding length of the cluster categories in the sequence according to the second occurrence probability;

and determining the shortest average coding length of the sequence according to the first shortest average coding length and the second shortest average coding length.

In one embodiment, in the regional network graph, edges are established between regional nodes with regional similarity greater than or equal to a similarity threshold;

before determining, for each sub-area network, the proportion of the area node whose carried target value is greater than or equal to the target value threshold, the method further includes:

step-by-step adjusting the similarity threshold, and updating the edges in the area network graph according to the similarity threshold after each step-by-step adjustment;

aiming at the updated regional network graph each time, executing the step of dividing the regional network graph according to the regional similarity between the regional nodes corresponding to the edges to obtain at least one sub-regional network until the step stop condition is met;

selecting a target subarea network division result from the subarea network division results obtained by each division to obtain at least one final subarea network; in the target sub-area network division result, the distribution of the area nodes with the target value larger than or equal to the target value threshold satisfies a centralized distribution condition.

In one embodiment, the selecting a target sub-area network from each of the sub-area networks according to the proportion includes:

screening the sub-area networks with the occupation ratio larger than or equal to the occupation ratio threshold value from each sub-area network to obtain a target sub-area network; or the like, or, alternatively,

and selecting the sub-area networks ranked in the order from large to small according to the preset digit from the sub-area networks as target sub-area networks.

In one embodiment, the selecting a target area node from the target sub-area network comprises:

selecting area nodes which do not carry the target value from the target sub-area network to obtain candidate area nodes;

and determining a target area node according to the candidate area node.

In one embodiment, the determining a target area node according to the candidate area node includes:

acquiring geographical position information corresponding to the candidate area nodes;

according to the geographical position information, identifying area nodes adjacent to the position from the candidate area nodes to obtain adjacent area nodes;

when the number of the adjacent area nodes is larger than or equal to a preset number threshold, screening the area nodes meeting the preset number threshold from the adjacent area nodes;

and taking the screened area nodes and the area nodes except the adjacent area nodes in the candidate area nodes as target area nodes in the target sub-area network.

In one embodiment, the number of target sub-area networks is at least two; the method further comprises the following steps:

obtaining the region nodes which are not screened out in the adjacent region nodes to obtain the re-divided region nodes;

determining the associated region nodes which have an edge establishing relationship with the re-divided region nodes and are in different sub-region networks with the re-divided region nodes;

determining a related region node with the highest region similarity with a re-divided region node from all related region nodes, and re-dividing the re-divided region node into the determined sub-region network to which the related region node belongs;

and when the sub-area network which is newly divided is the target sub-area network, adding the newly divided area nodes to the candidate area nodes in the newly divided target sub-area network, and executing the steps of obtaining the geographic position information corresponding to the candidate area nodes and the subsequent steps.

In one embodiment, the area corresponding to the area node carrying the target value is the area where the target object is delivered; the target value is a profit index value generated after the target object is put;

the method further comprises the following steps:

and determining the area corresponding to the target area node as a target area to be launched with the target object.

An area selection apparatus, the apparatus comprising:

the acquisition module is used for acquiring a regional network graph comprising regional nodes and edges; a portion of the regional nodes carry target values;

the dividing module is used for dividing the regional network graph according to the regional similarity between the regional nodes corresponding to the edges to obtain at least one sub-regional network; the same sub-area network comprises area nodes which are gathered into one type;

the target area determining module is used for determining the occupation ratio of the area nodes with the target values larger than or equal to the target value threshold value aiming at each sub-area network; selecting a target sub-area network from each sub-area network according to the ratio; the fraction in the target sub-area network is greater than the fraction in a non-target sub-area network; and selecting a target area node from the target sub-area network.

A computer device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of the region selection method of embodiments of the present application.

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to perform the steps of a region selection method as described in embodiments of the present application.

The area selection method, the device, the computer equipment and the storage medium obtain an area network graph comprising area nodes and edges; and according to the region similarity between the region nodes corresponding to the edges, clustering and dividing the region network graph to obtain at least one sub-region network. Then, the regional nodes in the same sub-region network have common characteristics. And aiming at each sub-area network, determining the proportion of the area nodes with the target values larger than or equal to the target value threshold, namely determining the proportion of the area nodes with high target values in each sub-area network, and selecting the target sub-area network according to the proportion. The target sub-area network is obtained by clustering the area nodes carrying the high target values. Furthermore, the target area nodes selected from the target sub-area network are area nodes capable of generating high target values to a large extent. Therefore, only a small number of regional nodes which partially carry the target value are used as samples, and the clustering processing of the regional network graph is combined, so that the regional nodes which can generate the high target value can be accurately selected, and compared with the traditional method which needs a large number of label samples, the method reduces the number of samples which need to be labeled, and saves the cost.

Drawings

FIG. 1 is a diagram of an application scenario of a region selection method in one embodiment;

FIG. 2 is a flow diagram illustrating a method for region selection in one embodiment;

FIG. 3 is a diagram illustrating a clustering result of a regional network graph in an embodiment;

FIG. 4 is a diagram illustrating a clustering result of a regional network graph in an embodiment;

fig. 5-7 are schematic diagrams illustrating the step adjustment result in one embodiment;

FIG. 8 is a simplified flowchart of a method for region selection in one embodiment;

FIG. 9 is a block diagram of a region selection apparatus in one embodiment;

FIG. 10 is a block diagram of a region selection apparatus in another embodiment;

FIG. 11 is a block diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Fig. 1 is a diagram of an application scenario of a region selection method in one embodiment. Referring to fig. 1, the application scenario includes a network-connected server 110 and a terminal 120. The terminal 120 is a desktop computer or a mobile terminal, which may include at least one of a mobile phone, a tablet computer, a notebook computer, a personal digital assistant, a wearable device, and the like. The server 110 may be implemented as a stand-alone server or as a server cluster comprised of a plurality of physical servers. It is understood that in other embodiments, the server 110 may be replaced by a terminal capable of performing the area selection method in the embodiments of the present application.

The server 110 may obtain a regional network graph including regional nodes and edges; a portion of the regional nodes carry target values. The server 110 may divide the area network graph according to the area similarity between the area nodes corresponding to the edges to obtain at least one sub-area network; the same sub-area network comprises area nodes which are gathered into one type; determining the occupation ratio of the area nodes with the target values larger than or equal to the target value threshold value aiming at each sub-area network; selecting a target sub-area network from each sub-area network according to the ratio; the fraction in the target sub-area network is greater than the fraction in a non-target sub-area network; and selecting a target area node from the target sub-area network. Further, the server 110 may send the target area identifier corresponding to the target area node to the terminal 120. In this way, the user can know the target area through the terminal 120.

It can be understood that the region selection method in the embodiments of the present application is equivalent to using an artificial intelligence technique to automatically analyze and determine the target region.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

It is understood that the region selection method in the embodiments of the present application is equivalent to using an unsupervised machine learning process. Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

Fig. 2 is a flowchart illustrating a region selection method according to an embodiment. The area selection method in this embodiment may be applied to a computer device, and is mainly illustrated by taking the computer device as the server 110 in fig. 1. Referring to fig. 2, the method specifically includes the following steps:

s202, acquiring a regional network graph comprising regional nodes and edges.

The area network diagram is a diagram for representing the relationship between areas. The area network graph comprises area nodes and edges. And the area node is used for representing the area. Different region nodes represent different regions.

In one embodiment, in the area network graph, edges may be created between area nodes whose area similarity is greater than or equal to a preset similarity threshold. That is, when the area similarity corresponding to the two area nodes is smaller than the preset similarity threshold, the two area nodes are not subjected to edge building. The region similarity refers to the similarity between regions represented by region nodes. For example, there are 30 area nodes in total, where the area similarity between the area node a and 10 area nodes is greater than the similarity threshold, and the area similarity between the area node a and the remaining 20 area nodes is less than the similarity threshold, an edge may be created between the area node a and the 10 area nodes, and no edge may be created between the area node a and the remaining 20 area nodes.

In other embodiments, in the area network graph, an edge may also be established between every two area nodes, or an edge may also be established between area nodes that satisfy other conditions (for example, that satisfy a geographic location proximity condition, etc.).

It is understood that a portion of the regional nodes in the regional network graph carry target values.

The target value is an index value used for reference when selecting a region. It will be appreciated that the target value has an important reference value for selecting the region. Generally, the suitability of a selected region can be measured by the size or magnitude of the target value. For example, for a scenario where a merchant selects a site or is released by a target object capable of generating revenue, the target value may be a sales amount, the sales amount may be used as a revenue index for reference, and the sales amount may play an important reference value when selecting a release area.

A part of the regional nodes carry the target values, that is, only a part of the regional nodes in the regional network graph carry the target values, and the rest of the regional nodes do not carry the target values. According to the area selection method in the embodiment of the present application, the target area in the area with unknown target values can be determined under the condition that only part of the area nodes are known to carry the target values.

It can be understood that, for the area node carrying the target value, it is described that the area characterized by the area node has been selected and used before, and the target value has been generated in the area during the use. For example, corresponding to the region node carrying sales, it is indicated that the region represented by the region node has been selected previously for delivering the target object, and after being used for delivering the target object, sales are generated in the region. Then the regional node carries the generated sales.

In one embodiment, the target value carried by the area node may be an initial target value, or may be a target value obtained by preprocessing the initial target value. The initial target value is an original, untreated target value. The computer device can acquire the initial target value, perform preprocessing such as null value and abnormal value removal and summary calculation on the initial target value, and perform normalization processing on the preprocessed target value to obtain a final target value.

In one embodiment, the initial target value comprises a revenue index value. The profit index value is an index value for measuring the profit. In one embodiment, the revenue indicator value may include sales and user conversion. Then, the final target value may be a value obtained by normalizing the total sales and the sales of each area. Wherein, the total sales amount is obtained by summarizing the sales amount of each area.

In one embodiment, the initial target value may include at least one of a target value generated by the target object in the dropped zone and a target value generated by an object of the same type as the target object in the dropped zone.

The target object is an object to be delivered.

For example, if the target object is a brand a beverage vending machine, the target value generated by the brand a beverage vending machine or other brands of the same type in the dropped area may be used as the initial target value.

It should be noted that the computer device may directly obtain the constructed local area network map, or may construct the local area network map.

S204, dividing the regional network graph according to the regional similarity between the regional nodes corresponding to each edge to obtain at least one sub-regional network; the same sub-area network comprises area nodes grouped into one type.

The sub-area network, which is a sub-graph (i.e., belongs to a part of the area network graph), includes vertices and edges. The connections between the area nodes within the same sub-area network are very tight (i.e., the relationship strength is relatively strong), while the connections between the sub-area network and the sub-area network are relatively sparse (i.e., the relationship strength between the area nodes in different sub-area networks is relatively weak).

It is understood that edges are used to represent area nodes and relationships between area nodes. And the region similarity between the region nodes corresponding to the edges is used for representing the relationship strength between the regions.

Specifically, the computer device may perform cluster division on the area network graph according to the area similarity between the area nodes corresponding to each edge (i.e., according to the relationship strength between the areas), so as to cluster the area nodes in the area network graph, thereby obtaining at least one sub-area network.

In one embodiment, the computer device may perform a community discovery process on the area network graph to perform community division on the area nodes to obtain at least one sub-area network. It is understood that a sub-area network, i.e. corresponding to a community.

The Community discovery (Community Detection) process refers to a process of discovering a Community structure in an area network diagram. It can be understood that the process of community discovery process is equivalent to a clustering process. The same community comprises area nodes which are grouped into one type, namely a sub-area network.

S206, aiming at each sub-area network, determining the occupation ratio of the area nodes with the target values larger than or equal to the target value threshold.

It can be understood that since the partial area nodes carry the target values, the divided sub-area network may include area nodes carrying the target values and area nodes not carrying the target values.

Specifically, for each sub-area network, the computer device may screen the area nodes carrying the target values from the sub-area network, compare the target values carried by the screened area nodes with preset target value thresholds, and determine, according to the comparison result, the number of the area nodes carrying target values greater than or equal to the target value thresholds. The computer device can determine the occupation ratio of the area nodes with the target value larger than or equal to the target value threshold according to the ratio of the number to the total number of the area nodes in the sub-area network.

For example, the sub-area network a includes 20 area nodes. Wherein, 10 area nodes in the sub-area network a carry target values, and 8 area nodes in the 10 area nodes carry target values larger than a preset target value threshold. Then the occupation ratio of the area nodes with the target value greater than or equal to the target value threshold carried in the sub-area network a is 8/20.

S208, selecting a target sub-area network from each sub-area network according to the ratio; the fraction in the target sub-area network is greater than the fraction in the non-target sub-area network.

The occupation ratio refers to the occupation ratio of the area nodes of which the target values carried in the sub-area network are greater than or equal to the target value threshold. The target sub-area network is used for screening target area points from the target sub-area network. The non-target subregion network refers to a subregion network except the target subregion network in the subregion network obtained by division.

In one embodiment, the computer device may sort the sub-area networks in an order from a larger area to a smaller area, and select the sub-area network ranked at the top with a preset number of digits as the target sub-area network. It can be understood that the sub-area network with the ranking not being the previous preset number of bits is the non-target sub-area network. In this embodiment, since the proportion in the target sub-area network is ranked before the proportion in the non-target sub-area network, the proportion in the target sub-area network is greater than the proportion in the non-target sub-area network. It should be noted that the predetermined number of bits may be any non-zero integer.

In an embodiment, the computer device may also filter, from the sub-area networks, a sub-area network whose occupancy is greater than or equal to a preset occupancy threshold, to obtain a target sub-area network. It can be understood that the sub-area network with the ratio smaller than the preset ratio threshold is a non-target sub-area network. In this embodiment, since the ratio in the target sub-area network is greater than or equal to the ratio threshold, and the ratio in the non-target sub-area network is less than the ratio threshold, the ratio in the target sub-area network is greater than the ratio in the non-target sub-area network.

For example, the sub-area network a includes 20 area nodes. Wherein, 10 area nodes in the sub-area network a carry target values, and 8 area nodes in the 10 area nodes carry target values larger than a preset target value threshold. Then the occupation ratio of the area nodes with the target value greater than or equal to the target value threshold carried in the sub-area network a is 8/20. The sub-area network B includes 22 area nodes. Wherein, there are 12 area nodes in the sub-area network B carrying target values, and 4 of the 12 area nodes carrying target values larger than a preset target value threshold. Then the occupation ratio of the area nodes with target values greater than or equal to the target value threshold carried in the sub-area network B is 4/22. The sub-area network C includes 15 area nodes, and the area node carried by the sub-area network B, whose target value is greater than or equal to the target value threshold, has a proportion of 3/15. Then, a duty threshold may be preset, and a sub-area network higher than the duty threshold may be selected from the sub-area networks a to C as a target sub-area network. Or, sorting according to the sequence of the occupation ratio from large to small, and selecting the sub-area network with the preset digit in the top rank as the target sub-area network. For example, if the preset number of bits is 2, the sub-area networks a and C ranked at the top 2 bits may be selected as the target sub-area network.

S210, selecting target area nodes from the target sub-area network.

Wherein the target region point is the selected region point.

It will be appreciated that the carried target value is greater than or equal to the target value threshold, indicating that the regional point carries a high target value. For the target sub-area network, the occupation ratio of the area nodes with the target values larger than or equal to the target value threshold is higher, and the same sub-area network comprises the area points clustered into one class, so that the target sub-area network can be represented to a certain extent as a sub-area network obtained by clustering the area points with the high target values.

The computer device can select all or part of the area points from the target sub-area network as the target area points.

In one embodiment, since the region represented by the region point carrying the target value is already used to generate the target value, the region point may not be selected as the target region point. Then, the computer device may select the area points not carrying the target value from the target sub-area network, and select the target area points from the area points not carrying the target value. It can be understood that, since the target sub-area network is equivalent to a sub-area network obtained by clustering the area points with high target values, the area points in the target sub-area network that do not carry target values may also be area points that can generate high target values to a large extent. Therefore, all or part of the area points may be selected as the target area points from the area points that do not carry the target value.

The area selection method acquires an area network graph comprising area nodes and edges; and according to the region similarity between the region nodes corresponding to the edges, clustering and dividing the region network graph to obtain at least one sub-region network. Then, the regional nodes in the same sub-region network have common characteristics. And aiming at each sub-area network, determining the proportion of the area nodes with the target values larger than or equal to the target value threshold, namely determining the proportion of the area nodes with high target values in each sub-area network, and selecting the target sub-area network according to the proportion. The target sub-area network is obtained by clustering the area nodes carrying the high target values. Furthermore, the target area nodes selected from the target sub-area network are area nodes capable of generating high target values to a large extent. Therefore, only a small number of regional nodes which partially carry the target value are used as samples, and the clustering processing of the regional network graph is combined, so that the regional nodes which can generate the high target value can be accurately selected, and compared with the traditional method which needs a large number of label samples, the method reduces the number of samples which need to be labeled, and saves the cost.

In addition, the clustering processing based on the regional network graph can consider the regional difference between different regions, and solves the problem that the difference between learning regions is ignored in the traditional method, so that the method is suitable for the regions with different characteristics, and the generalization capability and the applicability are improved.

In one embodiment, step S202 includes: acquiring a region feature vector of an initial region; mapping the region feature vectors into region nodes in the space, and determining the region similarity between every two region nodes according to the region feature vectors; and establishing edges between the area nodes with the area similarity larger than or equal to the similarity threshold value to generate an area network graph.

The initial region is a candidate region prepared in advance. The region feature vector is a vectorized representation of the region feature. And the region node is a visual representation of the region feature vector in the space and is used for representing the region. The region similarity refers to the similarity between regions represented by region nodes. The similarity threshold is a preset threshold for region similarity.

Specifically, the computer device may directly obtain the region feature vector of each initial region, or may obtain features of different dimensions of each initial region, and fuse the obtained features to obtain the region feature vector of the initial region.

The computer device may map the region feature vector to a region node in space. Each region node corresponds to the initial region corresponding to the region feature vector one by one. The computer device may calculate a similarity between the region feature vectors corresponding to each two region nodes as a region similarity between the two region nodes. The computer device may compare the region similarity between every two region nodes with a preset similarity threshold, and create an edge between the region nodes whose region similarity is greater than or equal to the similarity threshold, that is, connect the region nodes whose region similarity is greater than or equal to the similarity threshold, to generate an edge. Further, a regional network graph is generated based on the regional nodes and the connected edges.

It is to be appreciated that the area network graph can be a directed graph. Each initial area corresponds to an area node in the area network graph. And the edges in the regional network graph are used for representing regional similarity between two regional nodes. The weight of the edge is positively correlated with the region similarity between the region nodes.

In one embodiment, obtaining the region feature vector of the initial region comprises: acquiring region attribute characteristics of an initial region and resource transfer characteristics generated in the initial region; respectively constructing a region attribute feature vector and a resource transfer feature vector of an initial region according to the region attribute feature and the resource transfer feature; and fusing the region attribute feature vectors and the resource transfer feature vectors corresponding to the same initial region to generate the region feature vectors of each initial region.

The region attribute feature refers to a feature that the initial region itself has an attribute. The resource transfer characteristics refer to characteristic information corresponding to the resource transfer transaction generated in the initial region.

In one embodiment, the region attribute features include at least one of region native attribute features and user attribute features in the region. In one embodiment, the region attribute feature may further include at least one of a competitive product distribution feature in the region, a product preference user number distribution feature in the region, and the like.

In one embodiment, the resource transfer characteristics include a resource transfer period distribution characteristic in the region. In one embodiment, the resource transfer period distribution characteristics in the region include at least one of the number of users in different resource transfer periods, the number of resource transfers in different resource transfer periods, and the distribution of the number of resource transfers in the region.

Specifically, the computer device may construct a region attribute feature vector for the initial region based on the region attribute features and construct a resource transfer feature vector based on the resource transfer features. The computer device may perform splicing and fusion on the region attribute feature vector and the resource transfer feature vector corresponding to the same initial region, to generate a region feature vector of each initial region.

In one embodiment, the computer device may construct a feature engineering, and perform feature filtering, missing value padding, feature derivation, feature encoding, and the like on the region attribute features and the resource transfer features to construct corresponding region attribute feature vectors and resource transfer feature vectors.

In one embodiment, the computer device may filter the features by discarding features lacking excessive values, deleting single-valued features and abnormal values, deleting useless features and interference information, and discarding values at a preset position before the head of the features according to the feature distribution.

In one embodiment, the computer device may obtain a pre-set feature missing value threshold. When the number of missing feature data for a column exceeds the feature missing value threshold, the computer device may discard the column of feature data for deletion. In one embodiment, the feature missing value threshold is a sample size (i.e., the number of initial regions) by a predetermined factor (e.g., 0.4).

In one embodiment, the computer device may pad the missing values with a mean value for continuous features and with a constant value for discrete features.

In one embodiment, the computer device may derive the correlation features by linear combination.

In one embodiment, the computer device may bin discretize the continuous features and employ one-hot encoding (one-hot) of the discretized features.

It can be understood that the accuracy of the region feature vector can be improved by preprocessing the features, so that the accuracy of similarity calculation is improved, and the accuracy of subregion network division is improved. Finally, the accuracy of target area selection can be improved.

In one embodiment, dividing the area network graph according to the area similarity between the area nodes corresponding to each edge to obtain at least one sub-area network includes: determining the transition probability among the region nodes according to the region similarity among the region nodes corresponding to each edge; the transition probability is positively correlated with the region similarity; carrying out random walk in the area network graph according to the transition probability, and determining a first occurrence probability of each area node and a second occurrence probability of a cluster type in the random walk process; determining the shortest average coding length of a sequence generated by coding the result of random walk according to the first occurrence probability and the second occurrence probability; and clustering each region node in the region network graph by minimizing the shortest average coding length to obtain at least one subregion network.

The transition probability refers to the probability of jumping from one regional node to another regional node in the regional network graph. The transition probability is positively correlated with the region similarity. The greater the region similarity between the region nodes, the greater the transition probability between the two region nodes, and conversely, the smaller the region similarity between the region nodes, the smaller the transition probability between the two region nodes. It can be understood that the edges are used to characterize the region similarity between the region nodes, and the transition probability is equivalent to the weight of the edge. And (4) clustering categories, namely the categories of clustering division. It can be understood that the clustering categories to which different clustering results belong are different. Different sub-area networks are different clustering results, so that the clustering categories of different sub-area networks are different.

In one embodiment, when the area network map is divided into the sub-area networks through the community discovery process, then the divided sub-area networks belong to the divided communities, and the cluster categories to which the sub-area networks belong to the community categories.

It will be appreciated that the computer device may calculate regional similarities between each regional node in the regional network graph and other regional nodes, respectively. The computer device may obtain a preset similarity threshold, and connect the region nodes having the region similarity greater than or equal to the similarity threshold to form an edge. The computer equipment can also establish edges between the area nodes needing to be connected according to other conditions.

In one embodiment, the computer device may directly use the region similarity between the region nodes corresponding to the respective edges as the transition probability between the region nodes. In other embodiments, the region similarity may also be normalized to obtain the transition probability.

The random walk refers to a process of jumping in a regional node in a regional network graph according to the transition probability corresponding to the edge.

Specifically, the computer device may perform random walk in the area network graph according to the transition probability, that is, start from a start area node in the area network graph, jump to a next area node according to the transition probability of an edge corresponding to the start point, start from the jumped area node, continue to jump to the next area node according to the transition probability of the edge corresponding to the jumped area node, and repeat the process, thereby implementing random walk in the area network graph.

It can be understood that, in the process of performing random walk in the area network graph, jump is performed in the area node, so that the area node occurs in the process of random walk, and then the area node has a corresponding occurrence probability in the process of random walk. Moreover, the region nodes that jump in the random walk process may belong to different cluster categories, so that the cluster categories have corresponding probability of occurrence in the random walk process. Therefore, the computer device can determine a first occurrence probability of each region node and a second occurrence probability of the cluster category in the random walk process according to the transition probability.

It can be understood that the result of the random walk may be encoded according to the probability in the random walk process to generate a sequence, and the sequence may be hierarchically encoded. Specifically, the step of hierarchical coding includes: inserting a cluster class mark in front of the region nodes of the same cluster class, and inserting a termination mark at the end of the cluster class, wherein the cluster class mark is represented by a single set of codes (for example, represented by 000, 001, 002), the region nodes in the cluster class and the termination mark are represented by another set of codes, and the region nodes in different cluster classes can also be represented by the same set of codes (for example, all can be represented by 000, 001, 010, 011, 100) due to the consideration of the cluster class mark.

The computer device may determine a shortest average encoding length of a sequence generated by encoding the result of the random walk according to the first occurrence probability and the second occurrence probability. The computer equipment can cluster each region node in the region network graph by minimizing the shortest average coding length of the sequence to obtain at least one subregion network.

It can be understood that in the information theory, the shortest average length of the code is the information entropy. The information entropy is equivalent to the shortest code length, and a good classification scheme should satisfy the principle of minimum entropy, which can make the information entropy of the system lower. This is the essential optimization goal of the area network graph, and the optimal clustering scheme is sought by minimizing the information entropy.

Therefore, the computer device can cluster each region node in the region network graph by minimizing the shortest average coding length of the sequence to obtain at least one sub-region network.

It is to be understood that the process of minimizing the shortest average code length of the sequence is equivalent to an iterative process. The method specifically comprises the following processing steps: during initialization, each area node is regarded as an independent sub-area network; randomly sampling a sequence for area nodes in an area network graph according to a random walk mode, sequentially trying to assign each area node to a subarea network where a neighbor area node is located, assigning the subarea network when the average bit is reduced to the area node, and if the average bit is not reduced, keeping the subarea network of the area node unchanged. And iterating the steps until the shortest average coding length of the sequence is minimized, and in this case, obtaining the final cluster-partitioned subregion network partitioning result.

Fig. 3 and fig. 4 are schematic diagrams of clustering results of the area network diagrams in different embodiments, respectively. Referring to fig. 3, 302-306 are 3 sub-area networks obtained by division, and referring to fig. 4, 9 sub-area networks are obtained by division. In fig. 3 and 4, each of the sub-area networks includes area nodes grouped into one type. The line connecting the area nodes is an edge.

In one embodiment, the preset similarity threshold is S, the computer device may connect the area nodes above the threshold S to form an edge, and the weight of the edge between the area node a and the area node β is normalized to the transition probability P_α→β. The computer device may determine a first probability of occurrence of each region node during the random walk based on the transition probabilities.

In one embodiment, assume that the first probability of occurrence of region node a is p_αThe first occurrence probability of the region node β is p_βAnd transition probability P_α→βThe crossing probability is τ. The crossing probability τ is a hyper-parameter, and is proposed to avoid an unreasonable situation that a result of random walk depends on an initial value of iteration, so as to avoid a situation that the random walk enters an isolated area node and cannot go out.

Then, the first occurrence probability of each area node in the random walk process can be determined according to the following formula:

it will be appreciated that if the crossing probability is not considered

Wherein, P_α→βRepresents the transition probability of the jump from area node a to area node β;

representing the probability in terms of P of 1-tau_α→βRandomly selecting any point on the regional network graph to jump according to the probability of tau, wherein n is the number of regional nodes in the regional network graphAmount of the compound (A).

In one embodiment, the second probability of occurrence of the cluster category may be determined according to the following formula:

wherein the content of the first and second substances,

a second probability of occurrence for the ith cluster class; p is a radical of_αA first probability of occurrence for a region node a; p_α→βIndicating the transition probability of jumping from area node a to area node β.

In one embodiment, determining the shortest average code length of the sequence generated by encoding the result of the random walk according to the first occurrence probability and the second occurrence probability comprises: determining a first shortest average coding length of region nodes in the same cluster category in a sequence generated by coding the result of random walk according to the first occurrence probability; determining a second shortest average coding length of the cluster categories in the sequence according to the second occurrence probability; and determining the shortest average coding length of the sequence according to the first shortest average coding length and the second shortest average coding length. It should be noted that the area nodes in the same cluster category are the area nodes in the same sub-area network.

It will be appreciated that the clustering categories in the area network graph and the area nodes within the same clustering category use two different sets of codes. Therefore, a first shortest average coding length of the region nodes in the same cluster category and a second shortest average coding length of the cluster category are respectively calculated, and the total average coding length of the sequence is determined according to the first shortest average coding length and the second shortest average coding length.

In one embodiment, the first shortest average encoding length of region nodes within the same cluster category may be determined according to the following formula:

wherein the content of the first and second substances,

it is understood that, where i is the ith cluster category; h (P)ⁱ) A first shortest average encoding length of region nodes in an ith cluster category;

a second probability of occurrence for the ith cluster class; p is a radical of_αIs the first probability of occurrence of region node a.

In one embodiment, the second shortest average encoding length for a cluster class may be determined according to the following formula:

wherein the content of the first and second substances,

it is understood that, where i is the ith cluster category; h (q) is the second shortest average encoding length of the cluster category; q. q.s_iDA second probability of occurrence for the ith cluster class; q. q.s_DIs the sum of the second probabilities of occurrence for each cluster class.

In one embodiment, the shortest average code length of a sequence may be determined according to the following equation:

wherein the content of the first and second substances,

it is understood that h (q) is the second shortest average encoding length for the cluster class; q. q.s_DIs the sum of the second probability of occurrence for each cluster category; i is the ith cluster category; q. q.s_iDA second probability of occurrence for the ith cluster class; h (P)ⁱ) Is the first shortest of the region nodes in the ith cluster classThe code length is averaged.

It can be understood that the above embodiment is equivalent to using an InfoMap algorithm (which is a network clustering algorithm based on a mapping equation) to perform community discovery processing, so that the area nodes are clustered based on an area network graph, and a large number of samples and labels do not need to be used in advance to perform machine learning training, thereby saving cost. In addition, when the regional network graph is clustered, the regional difference characteristics existing among different regions are considered, and compared with the traditional method, the clustering accuracy is further improved. Moreover, the clustering is carried out by considering the region difference characteristics among different regions, the clustering processing of all the regions can be realized in a generalization mode, and the applicability is improved.

In one embodiment, in the area network graph, edges are created between area nodes with area similarity greater than or equal to a similarity threshold. Before determining, for each sub-area network in step S206, the proportion of the area node whose carried target value is greater than or equal to the target value threshold, the method further includes: step-by-step adjusting the similarity threshold, and updating the edges in the regional network graph according to the similarity threshold after each step-by-step adjustment; aiming at the updated regional network graph each time, executing the step of dividing the regional network graph according to the regional similarity between the regional nodes corresponding to each edge to obtain at least one sub-regional network until the step stop condition is met; selecting a target subarea network division result from the subarea network division results obtained by each division to obtain at least one final subarea network; in the target sub-area network division result, the distribution of the area nodes of which the target values are greater than or equal to the target value threshold satisfies the centralized distribution condition.

It can be understood that the clustering result can be adjusted by a method of adjusting the similarity threshold step by step without specifying the number of categories in advance.

Wherein, stepping is to move forward or backward step by step according to a preset amplitude. The step-by-step adjustment of the similarity threshold refers to a step-by-step adjustment of the similarity threshold according to a preset adjustment range. It is understood that the similarity threshold may be adjusted according to S ± preset adjustment magnitude values (e.g., 0.05).

It can be understood that, in the area network graph, since edges are created between area nodes whose area similarity is greater than or equal to the similarity threshold, after the similarity threshold is adjusted, the relationship between the edges in the area network graph also changes, that is, after the similarity threshold is adjusted step by step, the area network graph is also updated. Then, the computer device may execute step S204 to divide the area network graph according to the area similarity between the area nodes corresponding to each edge, so as to obtain at least one sub-area network, until the step stop condition is satisfied.

It can be understood that, each time the similarity threshold is adjusted in a stepping manner, a sub-area network division result is obtained through division, and each sub-area network division result includes at least one sub-area network obtained through division.

The step stop condition is a condition for stopping step adjustment.

In one embodiment, the step stop condition may include the number of step adjustments reaching a preset number threshold.

In another embodiment, the step stop condition may include that the distribution of the divided sub-area network division result including the area points with the high target value satisfies the centralized distribution condition. The high target value area point refers to an area point carrying a target value greater than or equal to a preset target value threshold. The centralized distribution condition refers to a condition that distribution of high target value area points (i.e., area nodes carrying target values greater than or equal to a target value threshold) is relatively centralized. That is, as many high target value region points as possible are included in the partial subregion network in the subregion network division result. This case is that the distribution of points belonging to the high target value region satisfies the concentrated distribution condition. The centralized distribution condition may include that the number of high target value region points included in the sub-region network is greater than or equal to a first preset threshold, or that the proportion of the high target value region points is greater than or equal to a second preset threshold.

Further, the computer device may select a target sub-area network division result from the sub-area network division results obtained by each division to obtain at least one final sub-area network; in the target sub-area network division result, the distribution of the area nodes of which the target values are greater than or equal to the target value threshold satisfies the centralized distribution condition.

The centralized distribution refers to the distribution of the area nodes with the target value greater than or equal to the target value threshold value.

Specifically, the computer device may determine, for each sub-area network division result obtained by division, a distribution situation of a high target value area node (i.e., an area node having a target value greater than or equal to a target value threshold), compare the distribution situation corresponding to each sub-area network division result, and select, from the sub-area network division results obtained by division, a sub-area network division result in which the distribution of the high target value area node (i.e., an area node having a target value greater than or equal to a target value threshold) satisfies a centralized distribution condition, as the target sub-area network division result, thereby obtaining at least one sub-area network in the target sub-area network division result.

Further, the computer device may execute steps S206 to S208 for each sub-area network in the final sub-area network division result to select a target sub-area network from each sub-area network.

Fig. 5 to 7 are schematic diagrams illustrating the step adjustment result in one embodiment. Referring to fig. 5 to 7, the network partition results of different sub-areas obtained by 3 partitions are shown. Fig. 6 is a result of dividing the sub-area network obtained by re-dividing the sub-area network after adjusting the preset amplitude step by step on the basis of the similarity threshold S1 in fig. 5. Fig. 7 is a result of dividing the sub-area network obtained by re-dividing the sub-area network after adjusting the preset amplitude step by step on the basis of the similarity threshold S2 in fig. 6. It should be noted that in fig. 5 to fig. 7, black dots are used to indicate high target value area nodes (i.e., area nodes carrying target values greater than or equal to a preset target value threshold), and white dots are used to indicate common area nodes. As can be seen from fig. 5 to 7, the high target value area nodes in fig. 6 are distributed more intensively, that is, as many high target value area nodes as possible are included in a certain sub-area network in fig. 6. Therefore, the sub-area network division result of fig. 6 may be selected as the final sub-area network division result.

In the above embodiment, the clustering result and the high target value distribution are combined to perform step adjustment on the sub-region network partitioning result, so that the partitioned sub-region network includes as many high target value regions as possible, that is, the high target value region points are distributed in the same sub-region network as intensively as possible, thereby improving the clustering accuracy. Furthermore, based on the sub-area network with the high target area points distributed in a centralized manner, the target area points capable of generating the high target values can be selected more accurately.

In one embodiment, selecting the target area node from the target sub-area network at step S210 includes: selecting area nodes which do not carry target values from a target sub-area network to obtain candidate area nodes; and determining the target area node according to the candidate area node.

It can be understood that, since only a part of the area nodes in the area network graph carry the target values, the target sub-area network includes a part of the area nodes carrying the target values and a part of the area nodes not carrying the target values. The area node not carrying the target value refers to an area node corresponding to an area which is not used to generate the target value. The candidate area node is an area node from which a target area node is selected.

Specifically, the computer device may select a region node not carrying the target value from the target sub-region network to obtain a candidate region node. The computer device may directly determine all candidate region nodes as target region nodes. The computer device may also select a partial region node from the candidate region nodes as a target region node.

In one embodiment, the computer device may select a target area node from the candidate area nodes according to a preset adjacent position constraint. The adjacent position constraint condition is a condition for constraining area nodes adjacent to the geographical position in the candidate area nodes.

In one embodiment, determining the target area node based on the candidate area nodes comprises: acquiring geographical position information corresponding to candidate region nodes; according to the geographical position information, identifying area nodes adjacent to the position from the candidate area nodes to obtain adjacent area nodes; when the number of the adjacent area nodes is larger than or equal to a preset number threshold, screening the area nodes meeting the preset number threshold from the adjacent area nodes; and taking the screened area nodes and the area nodes except the adjacent area nodes in the candidate area nodes as target area nodes in the target sub-area network.

The geographic location information refers to the geographic location of the area corresponding to the candidate area node in the real world. The adjacent region node refers to a region node adjacent to the geographical position in the candidate region node.

Specifically, the computer device may obtain geographic location information corresponding to the candidate area node; and according to the geographical position information, identifying the area nodes adjacent to the position from the candidate area nodes to obtain the adjacent area nodes. The computer device may determine the number of the nodes in the neighboring area, compare the number with a preset number threshold, and screen the nodes in the neighboring area that satisfy the preset number threshold when the number of the nodes in the neighboring area is greater than or equal to the preset number threshold. Further, the computer device may use the screened area nodes and the area nodes except for the adjacent area node in the candidate area nodes as the target area nodes in the target sub-area network.

It is understood that the computer device may randomly select the area nodes satisfying the preset number threshold from the adjacent area nodes. The computer device may also select the area nodes meeting the preset number threshold according to a preset rule.

In one embodiment, the computer device may determine the region similarity between the nodes in the neighboring regions, and select the region nodes with the preset number of threshold values from the nodes in the neighboring regions according to the sequence of the region similarity from high to low. Namely, the area nodes with the preset number threshold with more similar characteristics in the adjacent area nodes are selected. For example, if the preset number threshold is N and the neighboring region nodes are M, M > N, the computer device may rank the top N region nodes according to the region similarity from the M neighboring region nodes. It will be appreciated that the characteristics between these first N-bit region nodes are more closely related.

In one embodiment, the computer device may further re-partition the area nodes that are not screened out of the adjacent area nodes into other sub-area networks except the target sub-area network.

It can be understood that, when only one target sub-area network is provided, the area nodes not screened out in the adjacent area nodes may not be re-divided, and only the screened area nodes and the area nodes except for the adjacent area nodes in the candidate area nodes need to be used as the target area nodes in the target sub-area network. Because, whether the area nodes which are not screened out in the adjacent area nodes are re-divided or not does not affect the selection result of the target area nodes in the target sub-area network. When a plurality of target sub-area networks exist, the area nodes which are not screened out from the adjacent area nodes can be re-divided into other sub-area networks, and when the re-divided sub-area networks are also the target sub-area networks, the target area nodes can be selected from the new target sub-area networks according to the adjacent position constraint condition again according to the method for the target sub-area network which belongs to before the re-division.

In the embodiment, the situation that the positions are not excessively concentrated in some areas is considered, and the adjacent constraint condition is added, so that the geographic position distribution of the target area points is well ensured not to be excessively concentrated, the operability is good in industry, and the accuracy of target area selection is improved.

In one embodiment, the number of target sub-area networks is at least two. The method further comprises the following steps: obtaining the region nodes which are not screened out in the adjacent region nodes to obtain the re-divided region nodes; determining associated region nodes which have an edge establishing relationship with the re-divided region nodes and are in different sub-region networks with the re-divided region nodes; determining the associated region node with the highest region similarity between the associated region nodes and the re-divided region node from the associated region nodes, and re-dividing the re-divided region node into the sub-region network to which the determined associated region node belongs; and when the sub-area network which is newly divided is the target sub-area network, adding the nodes of the newly divided area to the nodes of the candidate area in the target sub-area network, and executing the steps of obtaining the geographical position information corresponding to the nodes of the candidate area and the subsequent steps.

The re-partition area node is an area node of the sub-area network to be re-partitioned. The area nodes which are not screened out in the adjacent area nodes of the current target sub-area network belong to area nodes to be re-divided into a new sub-area network, namely re-divided area nodes. The association region node is a region node which has an edge establishing relationship with the re-divided region node and is in a different sub-region network with the re-divided region node. It is to be understood that the repartitioning area node may be at least one.

Specifically, the computer device may acquire, as the repartitioned region node, a region node that is not screened out of the neighboring region nodes. Further, the computer device can determine an associated region node having an edge-establishing relationship with the re-divided region node and being in a different sub-region network from the re-divided region node. It is understood that the sub-area network in which the re-divided area node is located is different from the sub-area network in which the re-divided area node is located before the re-division. For each re-partitioned region node, the computer device can determine a region similarity between each associated region node of the re-partitioned region node and the re-partitioned region node, respectively. The computer device may determine, from the respective associated region nodes, an associated region node having the highest region similarity with the repartitioned region node. Further, the computer device may re-partition the re-partitioned region node into the sub-region network to which the determined associated region node having the highest region similarity belongs.

When the repartitioned sub-area network is the target sub-area network, the computer device may add repartitioned area nodes to candidate area nodes in the repartitioned target sub-area network. It can be understood that since the re-divided region node belongs to the region node not carrying the target value in the previous target sub-region network, the re-divided region node is divided into a new target sub-region network, and still belongs to the region node not carrying the target value, which can still be used as the candidate region node. Therefore, the computer device may add the repartitioned region node to the candidate region node in the network of target sub-regions to which it is repartitioned.

Further, the computer device may execute, for the target sub-area network newly divided, acquiring geographic location information corresponding to the candidate area node; according to the geographical position information, identifying area nodes adjacent to the position from the candidate area nodes to obtain adjacent area nodes; when the number of the adjacent area nodes is larger than or equal to a preset number threshold, screening the area nodes meeting the preset number threshold from the adjacent area nodes; and taking the screened area nodes and the area nodes except the adjacent area nodes in the candidate area nodes as the target area nodes in the target sub-area network.

The constraint processing is carried out iteratively according to the position adjacent constraint conditions, so that the geographic position distribution of the target area points can be well ensured not to be too concentrated, and the overall target value maximization principle can be met, namely, the principle that the target sub-area network comprises as many high target value area points as possible is met.

FIG. 8 is a simplified flowchart of a method for region selection in one embodiment. Referring to fig. 8, the initial target value may be preprocessed. Then, the region attribute feature and the payment feature (i.e., the resource transfer feature) of each initial region are constructed, and then the region attribute feature and the payment feature are fused to generate a region feature vector. And mapping the region feature vectors into region nodes, thereby constructing a region network graph by taking the regions as the nodes, and clustering the region network graph based on a network clustering algorithm InfoMap. And then, according to the clustering result and the distribution of the high target value area, dividing the area nodes into subarea networks, selecting a target subarea network from the subarea networks, and further selecting the target area nodes from the target subarea network. And finally, adding a model constraint condition (namely an adjacent position constraint condition) to constrain adjacent position target area nodes in the target sub-area network, thereby avoiding the target area from being excessively distributed and concentrated.

In one embodiment, the area corresponding to the area node carrying the target value is the area where the target object is delivered; the target value is a profit index value generated after the target object is placed. The method further comprises the following steps: and determining the area corresponding to the target area node as the target area of the target object to be launched.

The target object refers to an object to be delivered. The target object may include at least one of a virtual object and a real object. A virtual object refers to an object that has no entities. A real object refers to an object in which an entity exists in the real world.

It is understood that the target object may be delivered in an online or offline manner. For example, when the target object is a virtual object, the virtual object may be delivered in an online manner. When the target object is a real object of the entity, the real object can be delivered in a offline manner.

In one embodiment, the virtual object may include at least one of online promotional information (e.g., internet advertisements) and virtual products.

In one embodiment, the real object may include at least one of a physical merchant store, a physical product, and offline promotional information (e.g., offline billboard), among others.

It is to be understood that the virtual object and the real object are not limited to those listed above, and all objects that can be used for delivery may be target objects.

The profit index value is an index value for measuring the profit.

In one embodiment, the value of the revenue index may include at least one of sales and user conversion, among others. Then, the final target value may be a value obtained by normalizing the total sales and the sales of each area. Wherein, the total sales amount is obtained by summarizing the sales amount of each area. It will be appreciated that the revenue indicator value may also include other indicator values that measure revenue.

Sales refers to the amount of resource transfer generated by the user using the target object. The user conversion rate is the ratio of the number of users using the target object to the number of users accessing the target object delivery area. It is to be appreciated that the user's use of the target object can include at least one of use through a purchase behavior and use through a rental behavior.

Specifically, the computer device may determine an area corresponding to the target area node as a target area of the target object to be delivered. That is, the region corresponding to the target region point selected according to the method in the embodiments of the present application may be used to launch the target object. Then, the process of selecting a target object for delivery corresponds to the site selection process. For example, the merchant site (e.g., the physical store site), the product placement site, and the advertisement placement site, etc.

In the above embodiment, the target area node is selected through unsupervised processing, and then the area corresponding to the target area node is used as the area to be delivered, so that the accuracy of object delivery can be improved.

As shown in fig. 9, in one embodiment, a region selection apparatus 900 is provided to a computer device. The computer device may be a terminal or a server. The apparatus 900 includes: an acquisition module 902, a partitioning module 904, and a target area determination module 906, wherein:

an obtaining module 902, configured to obtain a regional network graph including regional nodes and edges; a portion of the regional nodes carry target values.

A dividing module 904, configured to divide the area network graph according to the area similarity between the area nodes corresponding to each edge, so as to obtain at least one sub-area network; the same sub-area network comprises area nodes grouped into one type.

A target area determining module 906, configured to determine, for each sub-area network, an occupation ratio of area nodes whose carried target values are greater than or equal to a target value threshold; selecting a target sub-area network from each sub-area network according to the occupation ratio; the ratio in the target sub-area network is greater than the ratio in the non-target sub-area network; and selecting a target area node from the target sub-area network.

In one embodiment, the obtaining module 902 is further configured to obtain a region feature vector of the initial region; mapping the region feature vectors into region nodes in the space, and determining the region similarity between every two region nodes according to the region feature vectors; and establishing edges between the area nodes with the area similarity larger than or equal to the similarity threshold value to generate an area network graph.

In one embodiment, the obtaining module 902 is further configured to obtain a region attribute feature of the initial region and a resource transfer feature generated in the initial region; and fusing the region attribute features and the resource transfer features corresponding to the same initial region to generate region feature vectors of the initial regions.

In one embodiment, the dividing module 904 is further configured to determine transition probabilities between region nodes according to the region similarity between the region nodes corresponding to each edge; the transition probability is positively correlated with the region similarity; carrying out random walk in the area network graph according to the transition probability, and determining a first occurrence probability of each area node and a second occurrence probability of the cluster type in the random walk process; determining the shortest average coding length of a sequence generated by coding the result of random walk according to the first occurrence probability and the second occurrence probability; and clustering each region node in the region network graph by minimizing the shortest average coding length to obtain at least one subregion network.

In one embodiment, the dividing module 904 is further configured to determine, according to the first occurrence probability, a first shortest average encoding length of the region nodes in the same cluster category in the sequence generated by encoding the result of the random walk; determining a second shortest average coding length of the cluster categories in the sequence according to the second occurrence probability; and determining the shortest average coding length of the sequence according to the first shortest average coding length and the second shortest average coding length.

In one embodiment, in the area network graph, edges are created between area nodes with area similarity greater than or equal to a similarity threshold. As shown in fig. 10, the apparatus 900 further includes:

a step adjustment module 905, configured to adjust the similarity threshold step by step, and update an edge in the regional network graph according to the similarity threshold after each step adjustment; and for the area network graph updated each time, notifying the dividing module 904 to execute the step of dividing the area network graph according to the area similarity between the area nodes corresponding to the edges to obtain at least one sub-area network until the step stop condition is met.

The target area determining module 906 is further configured to select a target sub-area network division result from the sub-area network division results obtained by the respective division to obtain at least one final sub-area network; in the target sub-area network division result, the distribution of the area nodes of which the target values are greater than or equal to the target value threshold satisfies the centralized distribution condition.

In one embodiment, the target area determining module 906 is further configured to filter, from each of the sub-area networks, a sub-area network whose proportion is greater than or equal to a proportion threshold value, to obtain a target sub-area network; or selecting the sub-area networks with the preset digit ranked in the descending order from the occupation ratio from each sub-area network as the target sub-area network.

In one embodiment, the target area determining module 906 is further configured to select an area node not carrying a target value from the target sub-area network to obtain a candidate area node; and determining the target area node according to the candidate area node.

In one embodiment, the target area determining module 906 is further configured to obtain geographic location information corresponding to the candidate area node; according to the geographical position information, identifying area nodes adjacent to the position from the candidate area nodes to obtain adjacent area nodes; when the number of the adjacent area nodes is larger than or equal to a preset number threshold, screening the area nodes meeting the preset number threshold from the adjacent area nodes; and taking the screened area nodes and the area nodes except the adjacent area nodes in the candidate area nodes as target area nodes in the target sub-area network.

In one embodiment, the number of target sub-area networks is at least two; the target area determining module 906 is further configured to obtain an area node that is not screened out from the adjacent area nodes, and obtain a re-partitioned area node; determining associated region nodes which have an edge establishing relationship with the re-divided region nodes and are in different sub-region networks with the re-divided region nodes; determining the associated region node with the highest region similarity between the associated region nodes and the re-divided region node from the associated region nodes, and informing the sub-region network dividing node to re-divide the re-divided region node into the sub-region network to which the determined associated region node belongs;

the target area determining module 906 is further configured to, when the sub-area network newly divided is the target sub-area network, add the newly divided area node to the candidate area node in the newly divided target sub-area network, and perform the subsequent steps of obtaining the geographic location information corresponding to the candidate area node.

In one embodiment, the area corresponding to the area node carrying the target value is the area to which the target object has been delivered. The target value is a profit index value generated after the target object is put; the target area determining module 906 is further configured to determine an area corresponding to the target area node as a target area of the target object to be delivered.

FIG. 11 is a block diagram of a computer device in one embodiment. Referring to fig. 11, the computer device may be a terminal or a server. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device may store an operating system and a computer program. The computer program, when executed, may cause a processor to perform a region selection method. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The internal memory may have stored therein a computer program which, when executed by the processor, causes the processor to perform a region selection method. The network interface of the computer device is used for network communication.

Those skilled in the art will appreciate that the architecture shown in fig. 11 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the area selection apparatus provided in the present application may be implemented in the form of a computer program that is executable on a computer device as shown in fig. 11, and a non-volatile storage medium of the computer device may store various program modules constituting the area selection apparatus. Such as the fetching module 902, the dividing module 904, and the target area determining module 906 shown in fig. 9. The computer program composed of the respective program modules is used for causing the computer device to execute the steps in the region selection method of the embodiments of the present application described in the present specification.

For example, the computer device may acquire the area network map including the area nodes and edges through the acquisition module 902 in the area selection apparatus 900 shown in fig. 9; a portion of the regional nodes carry target values. The computer device may divide the area network graph according to the area similarity between the area nodes corresponding to each edge through the dividing module 904 to obtain at least one sub-area network; the same sub-area network comprises area nodes grouped into one type. The computer device may determine, by the target area determination module 906, for each of the sub-area networks, the fraction of area nodes whose carried target values are greater than or equal to the target value threshold; selecting a target sub-area network from each sub-area network according to the occupation ratio; the ratio in the target sub-area network is greater than the ratio in the non-target sub-area network; and selecting a target area node from the target sub-area network.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above-described region selection method. Here, the steps of the region selection method may be the steps in the region selection methods of the above-described respective embodiments.

In one embodiment, a computer-readable storage medium is provided, in which a computer program is stored, which, when executed by a processor, causes the processor to perform the steps of the above-described region selection method. Here, the steps of the region selection method may be the steps in the region selection methods of the above-described respective embodiments.

It should be noted that "first" and "second" in the embodiments of the present application are used only for distinction, and are not used for limitation in terms of size, order, dependency, and the like.

It should be understood that although the individual steps in the embodiments of the present application are not necessarily performed in the order indicated by the step numbers. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in various embodiments may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of region selection, the method comprising:

and selecting a target area node from the target sub-area network.

2. The method of claim 1, wherein obtaining the area network graph comprising area nodes and edges comprises:

acquiring a region feature vector of an initial region;

3. The method of claim 2, wherein the obtaining the region feature vector of the initial region comprises:

4. The method according to claim 1, wherein the dividing the area network graph according to the area similarity between the area nodes corresponding to the edges to obtain at least one sub-area network comprises:

5. The method of claim 4, wherein determining the shortest average encoding length of the sequence generated by encoding the result of the random walk according to the first and second probabilities of occurrence comprises:

6. The method of claim 1, wherein in the area network graph, edges are created between area nodes having an area similarity greater than or equal to a similarity threshold;

7. The method of claim 1, wherein the selecting a target sub-area network from each of the sub-area networks according to the fraction comprises:

8. The method of claim 1, wherein the selecting a target area node from the target sub-area network comprises:

and determining a target area node according to the candidate area node.

9. The method of claim 8, wherein determining a target region node based on the candidate region nodes comprises:

10. The method of claim 9, wherein the number of target sub-area networks is at least two; the method further comprises the following steps:

11. The method according to any one of claims 1 to 10, wherein the area corresponding to the area node carrying the target value is an area to which the target object has been delivered; the target value is a profit index value generated after the target object is put;

the method further comprises the following steps:

12. An area selection apparatus, the apparatus comprising:

13. The apparatus according to claim 12, wherein in the regional network graph, edges are created between regional nodes whose regional similarity is greater than or equal to a similarity threshold;

the device further comprises:

the step adjustment module is used for adjusting the similarity threshold step by step and updating the edge in the area network graph according to the similarity threshold after each step adjustment; for the updated regional network graph each time, informing the dividing module to execute the step of dividing the regional network graph according to the regional similarity between the regional nodes corresponding to the edges to obtain at least one sub-regional network until the step stop condition is met;

the target area determining module is further used for selecting a target sub-area network division result from the sub-area network division results obtained by each division to obtain at least one final sub-area network; in the target sub-area network division result, the distribution of the area nodes with the target value larger than or equal to the target value threshold satisfies a centralized distribution condition.

14. A computer arrangement comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to carry out the steps of the method of any one of claims 1 to 11.

15. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 11.