CN109656967B

CN109656967B - Big data mining processing method, device, medium and electronic equipment based on space

Info

Publication number: CN109656967B
Application number: CN201811334367.1A
Authority: CN
Inventors: 刘朋飞; 陈晓建; 于国梁; 赵汗青
Original assignee: Beijing Jingdong Financial Technology Holding Co Ltd
Current assignee: JD Digital Technology Holdings Co Ltd; Jingdong Technology Holding Co Ltd
Priority date: 2018-11-09
Filing date: 2018-11-09
Publication date: 2021-08-17
Anticipated expiration: 2038-11-09
Also published as: CN109656967A

Abstract

The embodiment of the invention provides a method and a device for mining and processing big data based on space, a computer readable medium and electronic equipment, and relates to the technical field of computers, wherein the method comprises the following steps: performing spatial autocorrelation calculation according to a preset spatial granularity, an index and a preset time granularity to obtain spatial autocorrelation data; carrying out statistical test on the spatial autocorrelation data to obtain a confidence coefficient and a confidence interval; distinguishing spatial distribution modes of different spaces according to the autocorrelation data, the confidence and the confidence interval to obtain spatial distribution mode data; outputting the spatially distributed pattern data for a spatially dependent operation. In the technical scheme of the embodiment of the invention, the spatial distribution modes of different spaces are judged based on the spatial statistics and confidence test method to obtain the spatial distribution mode data, and the operation related to the spaces is carried out.

Description

Big data mining processing method, device, medium and electronic equipment based on space

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for mining and processing big data based on space, a computer readable medium and electronic equipment.

Background

With the advent of the big data age, objective connections or rules between things can be discovered or utilized through big data mining, thereby contributing to improving information utilization efficiency or improving performance.

The utilization of information based on spatial location is a hot spot, and the related art often mines the intrinsic connection of various information in the same location or area. With the continuous expansion of e-commerce scale and the popularization of unbounded retail, e-commerce has broken through the online and offline limit, nationwide large-scale e-commerce has a profound influence on merchants in different geographic positions and regions, and the connection between different regions is increasingly tight and the interaction is increasingly increased.

Traditional statistics generally assume independence and randomness between variables, but modern business activities, especially e-business activities, have broken through the limitations of online and offline, and even through the limitations of specific geographic locations, so that variables affecting e-business operations and marketing become spatially non-independent, e-business activities in a certain geographic location may affect sales or marketing in other places, and thus information in different geographic locations and areas has spatial relevance and non-randomness.

How to output different spatial distribution mode data according to the correlation characteristics of the spatial regions to improve the information utilization efficiency is a problem to be solved at present.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present invention and therefore may include information that does not constitute prior art known to a person of ordinary skill in the art.

Disclosure of Invention

The embodiment of the invention aims to provide a space-based big data mining processing method and device, a computer readable medium and electronic equipment, so as to overcome the problem that a merchant marketing scheme cannot be formulated according to the correlation characteristics of a space region at least to a certain extent.

Additional features and advantages of the invention will be set forth in the detailed description which follows, or may be learned by practice of the invention.

According to a first aspect of the embodiments of the present invention, there is provided a method for processing large data mining based on space, the method including: performing spatial autocorrelation calculation according to preset spatial granularity, indexes and time granularity to obtain spatial autocorrelation data; carrying out statistical test on the spatial autocorrelation data to obtain a confidence coefficient and a confidence interval; distinguishing spatial distribution modes of different spaces according to the spatial autocorrelation data, the confidence degrees and the confidence intervals to obtain spatial distribution mode data; outputting the spatially distributed pattern data for a spatially dependent operation.

In the foregoing solution, before performing the spatial autocorrelation calculation according to the preset spatial granularity, the index, and the time granularity, the method further includes: presetting geographical grading information; preset indexes and time granularity.

In the foregoing solution, the performing spatial autocorrelation calculation according to preset spatial granularity, index, and temporal granularity includes: the spatial weight W is calculated according to the following formula:

wherein i is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to n, w_ijIndicates the proximity of the areas i and j, and

calculating a global Moran index I according to the spatial weight, the spatial granularity, the index, the temporal granularity, and the following formula:

wherein, x is an index,

in the foregoing solution, the performing statistical test on the spatial autocorrelation data includes: calculating confidence and confidence intervals according to the global Moran index I and the following formula:

wherein E (I) is the I mean, and VAR (I) is the I variance.

In the foregoing solution, the determining spatial distribution patterns of different spaces according to the spatial autocorrelation data, the confidence level, and the confidence interval to obtain spatial distribution pattern data includes: the local Moran index is calculated according to the following formula:

calculating a normalized statistic for the local Moran index test according to the following formula:

and drawing a Moran scatter diagram taking (Wz, z) as a coordinate point, wherein Wz is a spatial hysteresis factor, z is the standardized statistic of the current area, and the spatial hysteresis factor is the weighted average of the standardized statistics of the adjacent areas of the area where z is located.

In the above solution, the outputting the spatial distribution pattern data to perform a spatial correlation operation includes: generating a regional accurate marketing scheme according to the spatial distribution mode data; and generating an address selection scheme according to the address preselected by the user and the spatial distribution mode data.

In the above solution, the generating an address selection scheme according to the user preselected address and the spatial distribution pattern data includes: receiving a user preselected address, the preselected address comprising an address granularity; when the address granularity is less than or equal to the set address granularity, generating an address selection scheme according to the user preselected address and the spatial distribution mode data; and when the address granularity is larger than the set address granularity, generating a set number of address selection schemes under the set address granularity for a user to select.

According to a second aspect of the embodiments of the present invention, there is provided a large data mining processing apparatus based on space, the apparatus including: the autocorrelation calculating unit is used for carrying out spatial autocorrelation calculation according to preset spatial granularity, indexes and time granularity to obtain spatial autocorrelation data; the statistical test unit is used for carrying out statistical test on the spatial autocorrelation data to obtain confidence and a confidence interval; the spatial distribution mode judging unit is used for judging spatial distribution modes of different spaces according to the spatial autocorrelation data, the confidence coefficient and the confidence interval to obtain spatial distribution mode data; an output unit for outputting the spatially distributed pattern data for a spatially dependent operation.

In the above scheme, the apparatus further comprises: the first preset unit is used for presetting geographical grading information; and the second preset unit is used for presetting the index and the time granularity.

In the foregoing solution, the autocorrelation calculating unit includes: a first calculation subunit for calculating a spatial weight W according to the following formula;

wherein, w_ijIndicates the proximity of the areas i and j, and

a second calculating subunit, configured to calculate a global Moran index I according to the spatial weight, the spatial granularity, the index, the temporal granularity, and the following formula:

wherein, x is the marketing index,

in the foregoing solution, the statistical test unit is further configured to: calculating confidence and confidence intervals according to the global Moran index I and the following formula:

wherein E (I) is the I mean, and VAR (I) is the I variance.

In the foregoing solution, the spatial distribution pattern determining unit includes:

a local index calculation subunit, configured to calculate a local Moran index according to the following formula:

a normalization subunit for calculating a normalization statistic of the local Moran index test according to the following formula:

and the scatter diagram drawing subunit is used for drawing a Moran scatter diagram taking (Wz, z) as a coordinate point, wherein Wz is a spatial lag factor, z is the standardized statistic of the current area, and the spatial lag factor is the weighted average value of the standardized statistic of the area adjacent to the area where z is located.

In the above scheme, the output unit includes: the marketing scheme generating subunit is used for generating a regional accurate marketing scheme according to the spatial distribution mode data; and the address selection scheme generating subunit is used for generating an address selection scheme according to the user pre-selected address and the spatial distribution mode data.

In the foregoing solution, the marketing scheme generating subunit is further configured to: receiving a user preselected address, the preselected address comprising an address granularity; when the address granularity is less than or equal to the set address granularity, generating an address selection scheme according to the user preselected address and the spatial distribution mode data; and when the address granularity is larger than the set address granularity, generating a set number of address selection schemes under the set address granularity for a user to select.

According to a third aspect of the embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method for processing large data mining based on space as described in the first aspect of the embodiments above.

According to a fourth aspect of embodiments of the present invention, there is provided an electronic apparatus, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method for processing large data mining based on space as described in the first aspect of the embodiments above.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

in the technical scheme provided by some embodiments of the invention, the spatial distribution modes of different spaces are distinguished based on a spatial statistics and confidence test method to obtain spatial distribution mode data, and operations related to the spaces are carried out based on the spatial distribution mode data.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 schematically shows a flow diagram of a method for space-based big data mining processing according to an embodiment of the invention.

FIG. 2 schematically illustrates a Moran scatter plot in an embodiment of the present invention;

FIG. 3 schematically illustrates a flow diagram for one embodiment of a method for large space-based data mining processing according to the present invention;

FIG. 4 is a block diagram schematically illustrating a spatial-based big data mining processing apparatus according to an embodiment of the present invention;

FIG. 5 schematically illustrates a block diagram of another large space-based data mining process arrangement in accordance with the present invention;

FIG. 6 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

The core of the spatial statistical analysis is to recognize spatial dependence, spatial correlation or spatial autocorrelation among data related to geographic positions and establish statistical relationship among the data through the spatial positions.

Spatial autocorrelation (spatial autocorrelation) refers to the potential interdependence between observed data for several variables within the same distribution area. Anything is related to something else, but something near is more relevant than something far. Spatial dependency (spatial dependency) refers to the degree of interdependence between data at a location and data at other locations. Geographic data may no longer be independent of each other, but rather correlated, due to the effects of spatial interactions and spatial diffusion. For example, if the markets are spatially separated into a collection, and if the markets are so close that the exchange and movement of the commodities can be performed, the prices and supplies of the commodities may be spatially related and not independent. In fact, the closer the inter-market distance, the closer, and more relevant, the price of the commodity.

Any city on the earth surface cannot exist independently, and in order to guarantee normal operation of production and life, the exchange of materials, energy, personnel and information is always carried out between cities, cities and areas, and the exchange is called spatial interaction. It is this interaction that combines spatially separated cities into an organic whole with a certain structure and function.

Based on the above spatial statistical analysis theory, the invention provides a spatial-based big data mining processing scheme in the embodiment of the disclosure.

FIG. 1 schematically illustrates a space-based big data mining processing method of an exemplary embodiment of the present disclosure. Referring to fig. 1, the method for processing large data mining based on space may include the steps of:

and S102, performing spatial autocorrelation calculation according to preset spatial granularity, indexes and time granularity to obtain spatial autocorrelation data.

And step S104, carrying out statistical test on the spatial autocorrelation data to obtain confidence and a confidence interval.

And step S106, distinguishing spatial distribution modes of different spaces according to the spatial autocorrelation data, the confidence degrees and the confidence intervals to obtain spatial distribution mode data.

In step S108, the spatial distribution pattern data is output for the spatial correlation operation.

According to the technical scheme, the spatial distribution mode data are obtained based on a spatial statistics and confidence test method, an integrated solution for spatial correlation is provided, the spatial correlation and non-randomness neglected by the traditional statistical method are avoided, and a better solution is provided for accurate marketing and spatial addressing; in addition, the information such as preset space granularity, indexes and time granularity is combined with the user-defined function, so that the automatic and personalized adaptive self-adaptive process and intelligent device are realized, the use and deployment efficiency is greatly improved, and the important effect is played on the improvement of the application effect.

Before step S102, it is also necessary to preset geographical classification information, and preset indexes and time granularity.

When the geographical grading information is preset, the user-defined function is supported. Specifically, the user selects the spatial granularity from the preset 6-level regional information of regions, provinces, cities, counties, towns and villages on geographical and administrative divisions, and the business information of cells, business circles and the like; the user self-defining function supports the user to carry out operations such as screening, combination, filtering and the like according to the existing preset information, and also supports the user to interactively select a designated area according to longitude and latitude or a map.

When the indexes and the time granularity are preset, the user self-defining function is also supported. Specifically, when the index is preset, the user selects the index from preset indexes commonly used by the electric business service, such as the indexes of order amount, money amount, user number and the like, and meanwhile, the user can add the index in a self-defined mode according to the service requirement of the user. When the time granularity is preset, the time granularity is selected by a user in a preset time granularity range such as year, quarter, natural month, week or user-defined range.

And then, performing spatial statistics autocorrelation calculation on the data of the selected spatial granularity, the index and the time granularity. Specifically, in step S102, the spatial weight W is first calculated according to the following formula:

and then calculating a global Moran index I according to the spatial weight, the spatial granularity, the index, the time granularity and the following formula:

wherein x can be the index of order quantity, amount and the like,

the space weight W corresponds to a binary symmetric space weight matrix W, and is used for expressing the proximity relation of the space regions of the n positions. w is a_ijMeasured according to an adjacency criterion or a distance criterion. The distance is adopted as the measurement in the scheme, wherein the distance is calculated by adopting the Euclidean distance by default.

Wherein, the Euclidean distance is a straight line distance between two points, and is defined as:

where ρ is a point (x)₂，y₂) And point (x)₁，y₁) The euclidean distance between.

Here, the Moran index is affected by the size of the aggregation region, and the Moran index is significantly enlarged as the spatial aggregation range is expanded. The value of the Moran index I is generally between [ -1, 1], less than 0 represents negative correlation, equal to 0 represents no correlation, and greater than 0 represents positive correlation. The global Moran index is adopted to measure the spatial correlation, which reflects the similarity degree of the attribute values of the spatial adjacent or spatial neighboring area units, emphasizes the covariability of the area statistic value and the mean value difference, and provides a more global index value.

And then, carrying out statistical test on the spatial autocorrelation data, and outputting a confidence coefficient and a confidence interval so as to be convenient for a subsequent user to decide for use. Specifically, the confidence can be collected only by a reliable result on the confidence, otherwise, the randomness is strong, and a reliable decision basis is not available. In step S104, a confidence and confidence interval is calculated based on the global Moran index I and the following formula:

wherein E (I) is the I mean, and VAR (I) is the I variance.

Here, the normalization statistic Z may be used to verify whether there is a spatial autocorrelation relationship for the n regions. In particular, the amount of the solvent to be used,

en (I) is E (I), Var (I) is VAR (I).

When the Z value is positive and significant, it indicates that there is a positive spatial autocorrelation, that is, similar observations (high or low) tend to be spatially clustered; when the Z value is negative and significant, the existence of negative spatial autocorrelation is indicated, and similar observed values tend to be distributed dispersedly; when the Z value is zero, the observed values are independently and randomly distributed.

If the Z value is zero or the statistics are not significant, the relation among the tested spatial data is marked to be randomly distributed, and no relevant relation exists. At this time, subsequent calculation is not performed, the program exits, and the corresponding spatial distribution pattern data is not output. The user may re-select the metrics and event granularity and spatial granularity for calculation.

In step S106, before outputting the spatial distribution pattern data, the local Moran index and the normalized statistics of the local Moran index test need to be calculated. Specifically, in step S106, the local Moran index is first calculated according to the following formula:

wherein I_iIs the local Moran index, w, of region i_ijIs a weight matrix.

And calculating the normalized statistic of the local Moran index test according to the following formula:

and finally, drawing a Moran scatter diagram taking (Wz, z) as a coordinate point, wherein Wz is a spatial lag factor, namely a weight matrix, z is the standardized statistic of the current area, and the spatial lag factor is the weighted average of the standardized statistics of the area adjacent to the area where z is located.

The Moran scattergram, which is a two-dimensional representation of the spatial lag factor Wz and z data pairs visualized, often studies local spatial instability. Since the data pairs (Wz, z) are normalized, outliers are easily visually identified by the 2-sigma rule.

As shown in fig. 2, according to the scheme, a local Moran index is divided into four different areas by drawing a local Moran scattergram for respective research, wherein 4 quadrants of the Moran scattergram respectively correspond to 4 types of local spatial relationship forms between an area unit and a neighbor thereof, and specifically:

quadrant 1 represents a spatial relationship form of a high-observation area unit surrounded by a high-value area, the type is an aggregation area of high-value users, and the aggregation area is a golden marketing area with sufficient budget, and when the budget is sufficient or the brand marketing value is high, the area can be selected for precise marketing or site selection. Quadrant 2 represents the spatial relationship of the area unit with low observation value surrounded by the area with high observation value; the low observation areas are potential marketing value depressions and market blank points, most markets needing to be cultivated are in the areas, and the areas can be tentatively selected for putting and small-scale site selection for marketing and brands needing to develop markets and users. Quadrant 3 represents the spatial relationship of the low observation area unit surrounded by the low observation area; the type is a vast undeveloped market, has great value to long-term strategic customers and brands, but has limited possible income in a short term, and does not suggest marketing and large-scale site selection and delivery for medium and small-sized brand customers and consumer goods with strong regionality, inconvenient transportation and short shelf life or expensive high-value goods. Quadrant 4 represents the spatial relationship of the high observation area cells surrounded by the low observation area. The type belongs to a to-be-diffused area, the value and the red profit of a high-observation-value area need to be diffused or overflowed to a surrounding low-value area, the method is a good space for water testing of novel business and commodities, and trial or strategic layout can be carried out.

The four-class aggregation mode represented by the above 4 quadrants has strong economic meaning and marketing value, and by taking the aggregation mode represented by the quadrant 1 as an example, the aggregation mode has three layers of meanings:

the overall understanding of the area represents that the area is characterized by a high-high aggregation area (the area comprises the area itself and the surrounding area), the areas with high self height and high surrounding area are aggregated to form an area, if the purchasing power is taken as a measurement, the area is the aggregation area with strong purchasing power, and the area with high value is a key marketing area; but also has a certain marketing saturation, and possibly has the problem of high marketing cost; the area is understood as a high-value area, the periphery of the area is highly gathered in the same type of area, and the homogeneity of the whole area is high; there may be a spatial diffusion or spill-over effect between itself within a region and its surrounding area (i.e., within a high-high concentration region), and the spatial difference between the two tends to shrink.

Based on the analysis, according to the characteristics of different quadrants, marketing personnel or users can read the data by combining the service characteristics and economic indicators of the marketing personnel or users, and then corresponding marketing strategies are formulated to find marketing opportunities.

In step S108, outputting the spatial distribution pattern data includes: generating a regional accurate marketing scheme according to the spatial distribution mode data; and generating an address selection scheme according to the address preselected by the user and the spatial distribution mode data.

The method for generating the address selection scheme according to the user preselected address and the spatial distribution mode data comprises the following steps: receiving a user preselected address, the preselected address comprising an address granularity; when the address granularity is less than or equal to the set address granularity, generating an address selection scheme according to the user preselected address and the spatial distribution mode data; and when the address granularity is larger than the set address granularity, generating a set number of address selection schemes under the set address granularity for the user to select.

When outputting the regional accurate marketing scheme, making a corresponding accurate marketing strategy according to the type of the region judged in the step S106; meanwhile, the user information in the region can be communicated, under the condition that data safety and user privacy are guaranteed, personalized thousands of people and thousands of marketing promotion information are formulated for the user granularity, and accurate touch is carried out on an accurate marketing touch system.

When the selection of the addressing scheme is output according to the addressing requirement of the user, the addressing needs to be calculated and compared at a very fine granularity to be meaningful, so that the addressing needs to be refined to a cell or even a street granularity. When the preset space granularity is fine granularity, the module is triggered to calculate in real time, and a result is returned. And when the preset space granularity is a coarse granularity space of the granularity of the villages and the towns or above, calculating all the addresses subdivided to the finest granularity under the selected address granularity, sequencing the addresses according to scores, and taking the addresses with smaller sequencing serial numbers. For example, a sequence number less than 10 may be taken by default. The setting can be manually adjusted, for example, when the selected province-level granularity is calculated, index scores such as order quantity of all districts and business circles in province-city-county granularity are calculated, and the top ten is taken. However, this calculation is very heavy, and by default, this service is not turned on, and a trigger calculation is performed after manual confirmation.

As shown in fig. 3, in an embodiment of the method for processing large data mining based on space according to the present invention, step S301 and step S302 are executed first, and the selection of space granularity and the selection of index and time are performed. Then, step S303 is performed to perform spatial autocorrelation calculation, specifically, step S303 includes calculation of a weight matrix and calculation of a global Moran index. Step S304 is then performed to perform a confidence check calculation. Then, step S305 is executed to perform spatial mode discrimination. And finally, executing a step S306, and outputting a scheme, wherein the scheme specifically comprises an accurate marketing scheme output and a spatial addressing scheme output.

In the space-based big data mining processing method provided by the embodiment of the invention, the space distribution modes of different spaces are judged based on the space statistics and confidence test method to obtain the space distribution mode data, and the space-related operation is carried out based on the space distribution mode data.

The following describes an embodiment of the apparatus of the present invention, which can be used to implement the above-mentioned space-based big data mining processing method of the present invention. Specifically, referring to fig. 4, the space-based big data mining processing apparatus 400 includes:

the autocorrelation calculating unit 402 is configured to perform spatial autocorrelation calculation according to a preset spatial granularity, an index, and a time granularity, so as to obtain spatial autocorrelation data.

The statistical test unit 404 is configured to perform a statistical test on the spatial autocorrelation data to obtain a confidence level and a confidence interval.

And a spatial distribution pattern discriminating unit 406, configured to discriminate spatial distribution patterns of different spaces according to the spatial autocorrelation data, the confidence and the confidence interval, so as to obtain spatial distribution pattern data.

An output unit 408 for outputting the spatial distribution pattern data for a spatial correlation operation.

Specifically, the autocorrelation calculating unit 402 includes a first calculating subunit and a second calculating subunit, and when performing spatial statistical autocorrelation and other operations on data in three aspects of time, index, and spatial region that have been selected, the first calculating subunit calculates the spatial weight W according to the following formula:

wherein, w_ijIndicates the proximity of the areas i and j, and

the second calculating subunit calculates the global Moran index I according to the spatial weight, the spatial granularity, the index, the temporal granularity, and the following formula:

wherein, x is the marketing index,

Specifically, the confidence can be collected only by a reliable result on the confidence, otherwise, the randomness is strong, and a reliable decision basis is not available. Therefore, a statistical test on the spatial autocorrelation data is also needed, and confidence intervals are output to facilitate the decision-making of subsequent users. The statistical test unit 404 statistically tests the calculated autocorrelation, and outputs a confidence level and a confidence interval, which is convenient for a subsequent user to make a decision.

Specifically, the statistical test unit 404 calculates the confidence and confidence interval according to the global Moran index I and the following formula:

wherein E (I) is the I mean, and VAR (I) is the I variance.

Here, the normalization statistic Z may be used to verify whether there is a spatial autocorrelation relationship for the n regions. When the Z value is positive and significant, it indicates that there is a positive spatial autocorrelation, that is, similar observations (high or low) tend to be spatially clustered; when the Z value is negative and significant, the existence of negative spatial autocorrelation is indicated, and similar observed values tend to be distributed dispersedly; when the Z value is zero, the observed values are independently and randomly distributed. If the Z value is zero or the statistics are not significant, the relation among the tested spatial data is marked to be randomly distributed, and no relevant relation exists. At this time, subsequent calculation is not performed, the program exits, and the corresponding spatial distribution pattern data is not output. The user may re-select the metrics and event granularity and spatial granularity for calculation.

The spatial distribution pattern determining unit 406 includes a local index calculating subunit, a normalizing subunit, and a scatter diagram drawing subunit.

Wherein the local index calculation subunit is configured to calculate the local Moran index according to the following formula:

the normalization subunit is for calculating a normalization statistic for the local Moran index test according to the following formula:

and the scatter diagram drawing subunit is used for drawing the Moran scatter diagram taking (Wz, z) as a coordinate point, wherein Wz is a spatial lag factor, z is the standardized statistic of the current area, and the spatial lag factor is the weighted average of the standardized statistics of the adjacent area of the area where z is located.

According to the drawn scatter diagram, marketing personnel or users can read according to self business characteristics and economic indexes, and make corresponding marketing strategies to find marketing opportunities.

The output unit 408 comprises a marketing scheme generation subunit and an addressing scheme generation subunit, wherein the marketing scheme generation subunit is used for generating a regional accurate marketing scheme according to the spatial distribution pattern data; and the address selection scheme generating subunit is used for generating an address selection scheme according to the user pre-selected address and the spatial distribution mode data.

The marketing scheme generation subunit is further configured to receive a user preselected address, the preselected address comprising an address granularity; when the address granularity is less than or equal to the set address granularity, generating an address selection scheme according to the user preselected address and the spatial distribution mode data; and when the address granularity is larger than the set address granularity, generating a set number of address selection schemes under the set address granularity for the user to select.

According to an exemplary embodiment of the present disclosure, referring to fig. 5, compared to the spatial-based big data mining processing apparatus 400, the spatial-based big data mining processing apparatus 500 includes not only the autocorrelation calculating unit 402, the statistical verifying unit 404, the spatial distribution pattern discriminating unit 406, and the output unit 408, but also the first preset unit 502 and the second preset unit 504.

The first preset unit 502 is used for presetting geographical grading information; a second presetting unit 504 for presetting the index and the time granularity. The first preset unit 502 presets standard geographical ranking information and supports user-defined functions, providing basic spatial dimension selection for the whole solution calculation framework. The second preset unit 504 presets the common indicators and time windows, and can also be customized by the user to provide the basic indicators and time window selections for the whole solution calculation framework.

Other functions of the space-based big data mining processing device 500 are the same as those of the space-based big data mining processing device 400, and a description thereof will not be repeated.

For details that are not disclosed in the embodiment of the apparatus of the present invention, please refer to the embodiment of the large data mining processing method based on space of the present invention for the details that are not disclosed in the embodiment of the apparatus of the present invention.

In the space-based big data mining processing device provided by the embodiment of the invention, the space distribution modes of different spaces are judged based on the space statistics and confidence test method to obtain the space distribution mode data, and the space-related operation is carried out based on the space distribution mode data.

Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use with the electronic device implementing an embodiment of the present invention. The computer system 600 of the electronic device shown in fig. 6 is only an example, and should not bring any limitation to the function and the scope of the use of the embodiments of the present invention.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for system operation are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the present application are executed when the computer program is executed by the Central Processing Unit (CPU) 601.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs, which when executed by the electronic device, cause the electronic device to implement the method for processing large data mining based on space as described in the above embodiments.

For example, the electronic device may implement the following as shown in fig. 1: step S102, performing spatial autocorrelation calculation according to preset spatial granularity, indexes and time granularity to obtain spatial autocorrelation data; step S104, carrying out statistical test on the spatial autocorrelation data to obtain confidence and a confidence interval; step S106, distinguishing spatial distribution modes of different spaces according to the spatial autocorrelation data, the confidence degrees and the confidence intervals to obtain spatial distribution mode data; in step S108, the spatial distribution pattern data is output for the spatial correlation operation.

As another example, the electronic device may implement the steps shown in fig. 3.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiment of the present invention.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A big data mining processing method based on space is characterized by comprising the following steps:

performing spatial autocorrelation calculation by using an autocorrelation calculating unit according to a spatial weight, a preset spatial granularity, an index and a time granularity to obtain spatial autocorrelation data, wherein the spatial weight is calculated according to a proximity relation between regions;

performing statistical test on the spatial autocorrelation data by using a statistical test unit to obtain confidence and a confidence interval;

adopting a spatial distribution mode distinguishing unit to distinguish spatial distribution modes of different spaces according to the spatial autocorrelation data, the confidence coefficient and the confidence interval to obtain spatial distribution mode data;

outputting the spatial distribution mode data by adopting an output unit, wherein the spatial distribution mode data represents the correlation between the current region and the adjacent region;

performing spatial correlation operation according to the spatial distribution mode data;

the spatial autocorrelation calculation according to the spatial weight, the preset spatial granularity, the index and the time granularity includes:

the spatial weight W is calculated according to the following formula:

n is the number of regions;

wherein, x is an index,

the performing a statistical test on the spatial autocorrelation data comprises:

calculating confidence and confidence intervals according to the global Moran index I and the following formula:

wherein E (I) is the I mean, VAR (I) is the I variance, and the normalized statistic Z is used for testing whether the n regions have the spatial autocorrelation relation;

the distinguishing of the spatial distribution modes of different spaces according to the spatial autocorrelation data, the confidence and the confidence interval to obtain spatial distribution mode data comprises:

the local Moran index is calculated according to the following formula:

I_ilocal Moran index for region i;

2. The method of claim 1, wherein before the performing the spatial autocorrelation calculation according to the spatial weight, the preset spatial granularity, the index and the temporal granularity, the method further comprises:

presetting geographical grading information;

preset indexes and time granularity.

3. The method of claim 1, wherein outputting the spatially distributed pattern data with an output unit comprises:

generating a regional accurate marketing scheme according to the spatial distribution mode data;

and generating an address selection scheme according to the address preselected by the user and the spatial distribution mode data.

4. The method of claim 3, wherein generating an addressing scheme based on the user preselected address and the spatial distribution pattern data comprises:

receiving a user preselected address, the preselected address comprising an address granularity;

when the address granularity is less than or equal to the set address granularity, generating an address selection scheme according to the user preselected address and the spatial distribution mode data;

and when the address granularity is larger than the set address granularity, generating a set number of address selection schemes under the set address granularity for a user to select.

5. A large data mining processing device based on space, which is characterized by comprising:

the self-correlation calculation unit is used for performing spatial self-correlation calculation according to the spatial weight, the preset spatial granularity, the index and the time granularity to obtain spatial self-correlation data, wherein the spatial weight is calculated according to the proximity relation between the regions;

the statistical test unit is used for carrying out statistical test on the spatial autocorrelation data to obtain confidence and a confidence interval;

the spatial distribution mode judging unit is used for judging spatial distribution modes of different spaces according to the spatial autocorrelation data, the confidence coefficient and the confidence interval to obtain spatial distribution mode data;

the output unit is used for outputting the spatial distribution mode data, wherein the spatial distribution mode data represents the correlation between the current region and the adjacent region;

an operation unit, configured to perform a spatial correlation operation according to the spatial distribution pattern data;

the device further comprises:

the first preset unit is used for presetting geographical grading information;

the second preset unit is used for presetting indexes and time granularity;

the autocorrelation calculating unit includes:

a first calculating subunit for calculating the spatial weight W according to the following formula:

wherein, w_ijIndicates the proximity of the areas i and j, and

n is the number of regions;

wherein, x is an index,

the statistical test unit is further configured to:

the spatial distribution pattern discrimination unit includes:

I_ilocal Moran index for region i;

6. The apparatus of claim 5, wherein the output unit comprises:

the marketing scheme generating subunit is used for generating a regional accurate marketing scheme according to the spatial distribution mode data;

and the address selection scheme generating subunit is used for generating an address selection scheme according to the user pre-selected address and the spatial distribution mode data.

7. The apparatus of claim 6, wherein the marketing plan generation subunit is further configured to:

8. A computer-readable medium on which a computer program is stored, the program, when executed by a processor, implementing the method of processing large space-based data mining according to any one of claims 1 to 4.

9. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of large space-based data mining processing according to any one of claims 1 to 4.