CN111274345A

CN111274345A - Similar area retrieval method and system based on grid division and value taking

Info

Publication number: CN111274345A
Application number: CN202010070771.3A
Authority: CN
Inventors: 郑理; 李为民
Original assignee: Chengdu Zhiku 2861 Information Technology Co ltd
Current assignee: Chengdu Zhiku 2861 Information Technology Co ltd
Priority date: 2020-01-21
Filing date: 2020-01-21
Publication date: 2020-06-12

Abstract

The invention discloses a similar area retrieval method and a system based on grid division and value taking, wherein the method comprises the following steps of grid division, namely dividing any geographical area into a plurality of same grids d; searching a similar area, acquiring equivalent grid set coordinates of an input grid set, and calculating the similarity between the input grid set and the equivalent grid set; judging a retrieval result, and if the similarity between the input grid set and the equivalent grid set is greater than the threshold similarity, judging that the input grid set and the equivalent grid set are similar; otherwise, the judgment is not similar. The invention aims to provide a similar region retrieval method and a similar region retrieval system based on grid division and value taking.

Description

Similar area retrieval method and system based on grid division and value taking

Technical Field

The invention relates to the technical field of data quantitative analysis application, in particular to a similar area retrieval method and system based on grid division and value taking.

Background

Any geographic area can be regarded as one or more grid sets with the same specification, once the size of the grid is determined, the size of the grid coverage area is determined, and the coverage area of a geographic area by which grids are covered is determined. Similarly, by arbitrarily designating a set of grid sets, this set can approximately cover an arbitrarily shaped geographic area.

Each mesh is quantized using the same set of dimensions, i.e., features of the mesh. Such a set of features is a set of multidimensional data (also called vectors). The quantitative result of the strength, the size and the degree of the certain feature of a grid is the value of the corresponding feature position in the grid vector.

The retrieval of similar areas means that for a given area and the multi-dimensional feature vectors of the area, the area which is identical or similar to the given area in shape and feature vectors is searched and retrieved in the geographic area. In retrieving this particular area in similar geographic regions, there are three core problems that need to be solved:

(1) the basic unit forming the area can be a grid or a grid set, and the area formed by the grid set is not necessarily in a regular shape;

(2) the two groups of grids are described by two groups of multi-dimensional feature vectors, and how to quantify the similarity between the two groups of grids;

(3) the two regions used for comparison may differ in direction, but may still be considered similar in geographic area, and such changes in direction may include rotation and symmetry.

In summary, under the similar geographic region retrieval scene, a richer and more comprehensive retrieval mode needs to be provided on the region granularity, the quantization dimension and the region azimuth. However, in the current similar geographic search analysis method, especially for the search method in the multi-dimensional quantization and region direction, there is no quantization method for calculating the similarity by using the multi-dimensional data similarity and the region orientation transformation.

Disclosure of Invention

The invention aims to provide a similar region retrieval method and a similar region retrieval system based on grid division and value taking.

The invention is realized by the following technical scheme:

a similar area retrieval method based on grid division and value taking comprises the following steps:

dividing a random geographical area into a plurality of grids d with the same size and shape;

D＝{d₁,d₂,...,d_m}

where D represents a geographic area containing m grids, D_mRepresenting the mth grid in the geographic area;

searching a similar area, acquiring equivalent grid set coordinates of an input grid set, and calculating the similarity between the input grid set and the equivalent grid set;

judging a retrieval result, if the similarity between the input grid set and the equivalent grid set is greater than the threshold similarity, judging that the area where the input grid set is located is similar to the area where the equivalent grid set is located; otherwise, the area of the input grid set is judged to be dissimilar to the area of the equivalent grid set.

In the technical scheme, the geographic area is subjected to grid division, and each grid is quantized into a group of eigenvalue vectors. For an input target area, the target area is defined as a group of grid sets covering the target area, other grid sets with characteristic value vectors similar to the grid set of the target area are searched in the whole geographic area, and the similar geographic area can be accurately found. In addition, as the geographic area is divided into grids, the area with any shape can be represented by the grids, so that the applicability of the scheme is increased.

Wherein, the arbitrary grid d contains two sets of attribute information:

(1) coordinates (a, b) of the grid d in the entire geographical area;

(2) the feature vector v of the grid d_x，v_x＝[v_x1,v_x2,...,v_xn]Wherein v is_xnRepresents the feature vector v_xThe n-th element in (1), the feature vector v_xEach element in (a) is characteristic information of the geographical location where the grid d is located. The characteristic information of the geographic position of the grid d comprises the population number, the greening rate, the average price of restaurant consumption and the like of the geographic position.

The obtaining of the equivalent grid set coordinates of the input grid set specifically includes:

randomly selecting one grid d from grids d in the geographic area as a reference point d_yReference point d_yThe coordinates of (i, j);

within said geographic area, except for said reference point d_yRandomly selecting one grid set from the rest grids d as an original point set;

wrapping the set of origin points around the reference point d_yRotating or bringing the set of origin points to the reference point d_yAfter the horizontal axis or the vertical axis is processed symmetrically, a plurality of equivalent grid sets of the original point set and coordinates of the equivalent grid sets are obtained.

The efficient and accurate similar area retrieval can be performed by considering special conditions such as area rotation, mirror image and the like which can occur in a geographical area scene.

Wherein, calculating the similarity between the input grid set and the equivalent grid set comprises:

quantizing each grid d in the input grid set and any equivalent grid set by using an n-dimensional vector to obtain 2 m x n matrixes, wherein m represents the number of grids d in one grid set;

and calculating the number A of the same element value at each corresponding position in the 2 matrixes, wherein the ratio of the number A of the same elements to the total number of the elements is the similarity of the two grid sets.

Calculating the similarity between the input grid set and the equivalent grid set comprises:

and calculating the similarity of the two grid sets by calculating the distance between the two vectors after the vectorization of the two matrixes, wherein the similarity of the two grid sets is in inverse proportion to the distance between the two vectors.

A similar area retrieval system based on grid division and value taking comprises:

the dividing unit is used for dividing any geographical area into a plurality of grids d with the same size and shape;

D＝{d₁,d₂,...,d_m}

the retrieval unit is used for acquiring equivalent grid set coordinates of the input grid set according to the grid information divided by the dividing unit, calculating the similarity between the input grid set and the equivalent grid set and transmitting the calculated similarity to the judging unit;

the judging unit is used for receiving the similarity between the input grid set and the equivalent grid set transmitted by the retrieval unit and comparing the similarity with the threshold similarity; if the similarity between the input grid set and the equivalent grid set is greater than the threshold similarity, judging that the area where the input grid set is located is similar to the area where the equivalent grid set is located; otherwise, the area of the input grid set is judged to be dissimilar to the area of the equivalent grid set.

The arbitrary mesh d contains two sets of attribute information:

(1) coordinates (a, b) of the grid d in the entire geographical area;

(2) the feature vector v of the grid d_x，v_x＝[v_x1,v_x2,...,v_xn]Wherein v is_xnRepresents the feature vector v_xThe n-th element in (1), the feature vector v_xIs the gridd, the characteristic information of the geographic position of the grid d comprises the population number, the greening rate, the average price of restaurant consumption and the like of the geographic position.

quantizing each grid d in the input grid set and any equivalent grid set by using an n-dimensional vector to obtain 2 m x n matrixes, wherein m represents the number of grids in one grid set;

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the method comprises the steps of carrying out grid division on a geographic area, quantizing each grid d into a group of characteristic value vectors, defining the target area as a group of grid sets covering the target area for the input target area, and searching other grid sets with characteristic value vectors similar to the grid set of the target area in the whole geographic area so as to accurately find out the similar geographic area.

2. The method supports regions of any shapes, and can perform effective and accurate similar region retrieval under special conditions of region rotation, mirror image and the like which may occur in a geographical region scene.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a schematic diagram of the location of reference points in a grid in accordance with one embodiment of the present invention;

FIG. 2 is a schematic diagram of the location of each grid within a grid geographic region in accordance with an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating the locations of grid sets in a geographic region according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.

Examples

As shown in the figures 1-3 of the drawings,

D＝{d₁,d₂,...,d_m}

The arbitrary mesh d contains two sets of attribute information:

(1) coordinates (a, b) of the grid d in the entire geographical area;

(2) the feature vector v of the grid d_x，v_x＝[v_x1,v_x2,...,v_xn]，v_xnRepresents the feature vector v_xThe n-th element in (1), the feature vector v_xEach element in the table is feature information of a geographic location where the grid d is located, the feature information of the geographic location where the grid d is located includes the population number, the greening rate, the average price of restaurant consumption and the like of the geographic location, and each element is a value quantifying one dimension of the grid.

wrapping the set of origin points around the reference point d_yRotating or bringing the set of origin points to the reference point d_yAfter the horizontal axis or the vertical axis is processed symmetrically, a plurality of equivalent grid sets and a plurality of equivalent grid sets of the original point set are obtainedCoordinates of the collection.

For ease of understanding, the description is made with specific examples:

as shown in fig. 1, any 1 geographic area is divided into a grid set composed of 49 grids, in the figure, grid 0 is a reference point, where the coordinate of reference point 0 is (i, j). A grid d is arbitrarily chosen around the reference point 0 and is labeled 1, with the coordinates of grid 1 being (a, b). Taking a reference point 0 as a center, rotating the grid 1 around the reference point 0 clockwise or anticlockwise to obtain a plurality of equivalent grids; the horizontal axis or the vertical axis of the reference point 0 is used for symmetry processing, so that a plurality of equivalent grids can be obtained, and it is worth to say that the equivalent grids can also be obtained in other ways, not only limited to the two ways, but also the number of the equivalent grids can be selected according to actual situations. In this embodiment, after the grid 1 is processed, 7 reference grids equivalent to the grid 1 are obtained and are respectively denoted by 2, 3, 4, 5, 6, 7, and 8, as shown in fig. 2.

Thus, the equivalent grid of grid 1 with respect to reference point 0 includes grid 2, grid 3, grid 4, grid 5, grid 6, grid 7, and grid 8. The equivalent position coordinates of each equivalent grid are shown in table 1:

TABLE 1

It should be noted that the grid 1 can be expanded to an arbitrary shape grid set, in this embodiment, the grid set 1 is composed of three adjacent grids, whose coordinates are [ (a, b), (a, b +1), (a +1, b +1) ], and the equivalent grid set of the grid set 1 relative to the reference point 0 has 2, 3, 4, 5, 6, 7, 8, as shown in fig. 3, where the equivalent position coordinates of each equivalent grid are shown in table 2:

TABLE 2

Therefore, for any one mesh set, as long as the reference mesh and the original mesh set are given, the equivalent mesh set with respect to the reference mesh can be directly obtained.

After the coordinates of the equivalent grid set of the input grid set are obtained, the similarity between the input grid set and the equivalent grid set can be calculated.

In this embodiment, taking the grid set 1 as an example, the grid set 1 includes three grids, and each grid is quantized by an n-dimensional vector, thereby forming a 3 × n matrix; each equivalent grid set is also quantized with an n-dimensional vector, from which 7 different 3 x n matrices are obtained. Each element in the matrix represents characteristic information of a grid in different geographic positions, the characteristic information can be the population number, the greening rate and the average price of restaurant consumption or other parameters of the geographic position, and the value of n is selected according to actual conditions.

After all the grid sets are quantized, the similarity between the input grid set and the equivalent grid set can be calculated, and the similarity calculation method comprises two methods:

(1) and calculating the number A of the same element value at each corresponding position in the matrix of the grid set and the matrix of the equivalent grid set, wherein the ratio of the number A of the same elements to the total number of the elements is the similarity of the two grid sets.

(2) Calculating the similarity of the two grid sets by calculating the distance between the two vectors after the vectorization of the two matrixes, wherein the closer the distance between the two vectors is, the greater the similarity of the two grid sets is; the further the distance between the two vectors, the less similar the two sets of grids. The similarity calculation method may be a euclidean distance, a manhattan distance, a mahalanobis distance, or other calculation method.

No matter which method is used for similarity calculation, as long as the similarity between the input grid set and the equivalent grid set is greater than the threshold similarity, the area where the input grid set is located is judged to be similar to the area where the equivalent grid set is located; otherwise, the judgment is not similar.

D＝{d₁,d₂,...,d_m}

The arbitrary mesh d contains two sets of attribute information:

(1) coordinates (a, b) of the grid d in the entire geographical area;

For ease of understanding, the description is made with specific examples: as shown in fig. 1, an arbitrary 1 geographic area is divided into a grid set composed of 49 grids, in the figure, grid 0 is a reference point, where the coordinate of reference point 0 is (i, j), one grid d is arbitrarily selected and labeled as 1 around reference point 0, and the coordinate of grid 1 is (a, b). Taking a reference point 0 as a center, rotating the grid 1 around the reference point 0 clockwise or anticlockwise to obtain a plurality of equivalent grids; the horizontal axis or the vertical axis of the reference point 0 is used for symmetry processing, so that a plurality of equivalent grids can be obtained, and it is worth to say that the equivalent grids can also be obtained in other ways, not only limited to the two ways, but also the number of the equivalent grids can be selected according to actual situations. In this embodiment, after the grid 1 is processed, 7 reference grids equivalent to the grid 1 are obtained and are respectively denoted by 2, 3, 4, 5, 6, 7, and 8, as shown in fig. 2.

Therefore, the equivalent grids of grid 1 include grid 2, grid 3, grid 4, grid 5, grid 6, grid 7 and grid 8 with respect to reference point 0, and the equivalent position coordinates of each equivalent grid are shown in table 1:

TABLE 1

TABLE 2

In this embodiment, taking the grid set 1 as an example, the grid set 1 includes three grids, and each grid is quantized by an n-dimensional vector, thereby forming a 3 × n matrix; each equivalent grid set is also quantized with an n-dimensional vector, from which 7 different 3 x n matrices are obtained. Each element in the matrix represents characteristic information of a grid in different geographic positions, the characteristic information can be the population number, the greening rate, the average price of restaurant consumption or the like of the geographic position, and the value of n is selected according to actual conditions.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A similar area retrieval method based on grid division and value taking is characterized by comprising the following steps:

D＝{d₁,d₂,...,d_m}

2. The method for retrieving the similar region based on the grid division and the value taking as claimed in claim 1, wherein the arbitrary grid d comprises two sets of attribute information:

(1) coordinates (a, b) of the grid d in the entire geographical area;

(2) the feature vector v of the grid d_x，v_x＝[v_x1,v_x2,...,v_xn]Wherein v is_xnRepresents the feature vector v_xThe n-th element in (1), the feature vector v_xEach element in (a) is characteristic information of the geographical location of the grid d.

3. The method for retrieving similar regions based on grid division and value taking according to any one of claims 1 or 2, wherein the obtaining of the equivalent grid set coordinates of the input grid set specifically includes:

4. The method of claim 3, wherein calculating the similarity between the input mesh set and the equivalent mesh set comprises:

5. The method of claim 3, wherein calculating the similarity between the input mesh set and the equivalent mesh set comprises:

6. A similar area retrieval system based on meshing and value taking is characterized by comprising:

D＝{d₁,d₂,...,d_m}

7. The system according to claim 6, wherein the arbitrary mesh d includes two sets of attribute information:

(1) coordinates (a, b) of the grid d in the entire geographical area;

(2) the feature vector v of the grid d_x，v_x＝[v_x1,v_x2,...,v_xn]Wherein v is_xnRepresents the feature vector v_xThe n-th element in (1), the feature vector v_xEach element in (a) is characteristic information of the geographical location where the grid d is located.

8. The system according to any one of claims 6 or 7, wherein the obtaining of the equivalent grid set coordinates of the input grid set specifically comprises:

9. The system of claim 8, wherein calculating the similarity between the input mesh set and the equivalent mesh set comprises:

10. The system of claim 8, wherein calculating the similarity between the input mesh set and the equivalent mesh set comprises: