CN116662839A

CN116662839A - Associated big data cluster analysis method and device based on multidimensional intelligent acquisition

Info

Publication number: CN116662839A
Application number: CN202310516513.7A
Authority: CN
Inventors: 张煇; 刘俊龙; 杨勇
Original assignee: Shanxi Changhe Technology Co ltd
Current assignee: Shanxi Changhe Technology Co ltd
Priority date: 2023-05-09
Filing date: 2023-05-09
Publication date: 2023-08-29

Abstract

The invention relates to the technical field of data cluster analysis, and discloses a related big data cluster analysis method and device based on multidimensional intelligent acquisition, comprising the following steps: performing attribute analysis on the target data to obtain data attributes; performing linear transformation on the data attributes to obtain attribute linear values, performing normal distribution processing on each attribute in the data attributes to obtain an attribute normal distribution map, and calculating probability density of each graph in the attribute normal distribution map; determining an expected value corresponding to the data attribute, constructing a covariance matrix of attribute covariance, and determining a covariance structure of target data; carrying out sharpening noise reduction treatment on target data to obtain noise reduction data, carrying out feature extraction on the noise reduction data to obtain feature data, constructing a conversion matrix of the feature data, carrying out sharpening projection on the feature data to obtain data projection, and carrying out cluster analysis on the feature data to obtain a cluster result of the associated big data. The invention aims to improve the accuracy of the multi-dimensional intelligent acquisition associated big data clustering analysis.

Description

Associated big data cluster analysis method and device based on multidimensional intelligent acquisition

Technical Field

The invention relates to the technical field of data cluster analysis, in particular to a related big data cluster analysis method and device based on multidimensional intelligent acquisition.

Background

At present, the rapid development of cloud computing, intelligent technology and sensing technology promotes the data to be explosively and rapidly increased, the processing and analysis of the data become important factors of the current society, and under the background of big data age, a large amount of data can be generated every day in the research fields of different dimensions such as government service, biology, medicine, celestial body research and the like, and due to the diversity of the data dimensions, the mining of potential information of the data is particularly important, and the main method of mining the data is a clustering analysis method.

The existing clustering analysis method mainly calculates the relevance between data by combining the semantics of the data and the data keywords, and performs clustering analysis on the data according to the relevance, but uncertainty data can be generated in each process of acquisition, transmission and processing of the data, and potential relations in the data are not mined and analyzed, so that the accuracy of the clustering analysis of the data is reduced, and therefore, a method capable of improving the accuracy of the clustering analysis of the related big data acquired by multidimensional intelligence is needed.

Disclosure of Invention

The invention provides a correlation big data cluster analysis method and device based on multidimensional intelligent acquisition, which mainly aim to improve the accuracy of the correlation big data cluster analysis of the multidimensional intelligent acquisition.

In order to achieve the above purpose, the related big data cluster analysis method based on multidimensional intelligent acquisition provided by the invention comprises the following steps:

acquiring associated big data to be analyzed, performing data filtering on the associated big data to obtain target data, and performing attribute analysis on the target data to obtain data attributes;

performing linear transformation on the data attributes to obtain attribute linear values, performing normal distribution processing on each attribute in the data attributes according to the attribute linear values to obtain an attribute normal distribution map, and calculating the probability density of each graph in the attribute normal distribution map through the following formula;

wherein F expresses probability density of each graph in the normal distribution diagram of the attribute, beta represents attribute mean value of the data attribute, exp represents an exponential function, B _j Random variable representing normal distribution map of jth attribute, C _j Representing a graph parameter corresponding to the j-th attribute normal distribution graph;

determining expected values corresponding to the data attributes according to the probability density, calculating covariance among each attribute in the data attributes according to the expected values to obtain attribute covariance, constructing a covariance matrix of the attribute covariance, and determining a covariance structure of the target data according to the covariance matrix;

And carrying out sharpening noise reduction treatment on the target data according to the covariance structure to obtain noise reduction data, carrying out feature extraction on the noise reduction data to obtain feature data, constructing a conversion matrix of the feature data, carrying out sharpening projection on the feature data by combining the conversion matrix to obtain data projection, and carrying out cluster analysis on the feature data according to the data projection to obtain a clustering result of the associated big data.

Optionally, the performing data filtering on the associated big data to obtain target data includes:

carrying out standardization processing on the associated big data to obtain standardized data;

vectorizing the standardized data to obtain standardized vectors, and calculating cosine values of included angles among the standardized vectors;

and performing de-duplication processing on the standardized data according to the cosine value of the included angle to obtain target data.

Optionally, the performing attribute analysis on the target data to obtain a data attribute includes:

extracting a data tag corresponding to each data in the target data, and calculating the weight of each tag in the data tag through the following formula to obtain a tag weight;

Wherein D is _i Representing the tag weight of each of the data tags, B _i A tag vector representing an i-th tag of the data tags,representing vector covariance corresponding to a label vector of an ith label in the user labels, wherein trace () represents a spatial filtering function;

and extracting the characteristic labels in the data labels according to the label weights, and carrying out attribute analysis on the characteristic labels to obtain data attributes.

Optionally, calculating the covariance of each attribute in the data attributes according to the expected value to obtain an attribute covariance, including:

the covariance between each of the data attributes is calculated by the following formula:

Cov(m,m+1)＝E[m，m+1]-E[m]E[m+1]

wherein Cov (m, m+1) represents covariance between each attribute in the data attributes, m and m+1 represent sequence numbers of the data attributes, em represents expected values corresponding to the mth data attribute, and Em+1 represents expected values corresponding to the mth+1th data attribute.

Optionally, the sharpening noise reduction processing is performed on the target data according to the covariance structure to obtain noise reduction data, including:

according to the covariance structure, carrying out feature decomposition on the covariance matrix to obtain matrix features;

calculating the feature weight of each feature in the matrix features, and screening the target data according to the feature weights to obtain screening data;

Performing dimension reduction processing on the screening data to obtain dimension reduction data;

and carrying out sharpening processing on the data with reduced dimension to obtain sharpened data, and carrying out noise reduction processing on the sharpened data to obtain noise reduction data.

Optionally, the performing feature decomposition on the covariance matrix according to the covariance structure to obtain a matrix feature includes:

performing feature decomposition on the covariance matrix through the following formula:

wherein G represents the matrix characteristics of the covariance matrix, cov ^z Representing covariance structure, Q represents orthonormal matrix, ΣQ ^-1 Is the reciprocal sum of the orthogonal matrix.

Optionally, the constructing a transformation matrix of the feature data includes:

calculating the characteristic value of each data in the characteristic data, and sequencing the characteristic values to obtain sequenced characteristic values;

counting the number of the characteristic values to obtain the characteristic number, and identifying the data dimension of the characteristic data;

setting a retention coefficient of the sorting characteristic values according to the characteristic quantity and the data dimension;

filtering the sorting characteristic values according to the retention coefficient to obtain target characteristic values;

and carrying out vector conversion on the characteristic data corresponding to the target characteristic value to obtain a characteristic vector, and constructing a conversion matrix of the characteristic data according to the target characteristic value and the characteristic vector.

Optionally, the performing cluster analysis on the feature data according to the sharpened projection to obtain a cluster result of the associated big data includes:

obtaining the coordinates of each projection in the sharpened projections to obtain projection coordinates, and calculating the projection similarity of each projection according to the projection coordinates;

determining potential association degrees of the characteristic data according to the projection similarity;

and carrying out cluster analysis on the characteristic data according to the potential association degree to obtain a cluster result of the association big data.

Optionally, the calculating the projection similarity of each projection according to the projection coordinates includes:

calculating the projection similarity of each projection by the following formula:

wherein S represents the projection similarity of each projection, k represents the distance parameter, l and l+1 represent the sequence numbers of the projections, w represents the total number of projections, X _l And Y _l Representing the projection coordinates of the first projection, X _l+1 And Y _l+1 The projection coordinates of the (i+1) th projection are indicated.

Associated big data cluster analysis device based on multidimensional intelligent acquisition, which is characterized in that the device comprises:

the attribute analysis module is used for acquiring associated big data to be analyzed, carrying out data filtering on the associated big data to obtain target data, and carrying out attribute analysis on the target data to obtain data attributes;

The matrix construction module is used for carrying out linear transformation on the data attributes to obtain attribute linear values, carrying out normal distribution processing on each attribute in the data attributes according to the attribute linear values to obtain an attribute normal distribution diagram, and calculating the probability density of each graph in the attribute normal distribution diagram through the following formula;

wherein F expresses probability density of each graph in the normal distribution diagram of the attribute, beta represents attribute mean value of the data attribute, exp represents an exponential function, B _j Represents the j-th attribute normal scoreRandom variables of the layout, C _j Representing a graph parameter corresponding to the j-th attribute normal distribution graph;

the feature extraction module is used for determining expected values corresponding to the data attributes according to the probability density, calculating covariance among each attribute in the data attributes according to the expected values to obtain attribute covariance, constructing a covariance matrix of the attribute covariance, and determining a covariance structure of the target data according to the covariance matrix;

and the cluster analysis module is used for carrying out sharpening noise reduction processing on the target data according to the covariance structure to obtain noise reduction data, carrying out feature extraction on the noise reduction data to obtain feature data, constructing a conversion matrix of the feature data, carrying out sharpening projection on the feature data by combining the conversion matrix to obtain data projection, carrying out cluster analysis on the feature data according to the data projection to obtain a clustering result of the associated big data.

According to the method, unimportant data or incomplete data in the associated big data can be removed by obtaining the associated big data to be analyzed and carrying out data filtering on the associated big data, convenience is provided for subsequent processing of the data. Therefore, the associated big data cluster analysis method and device based on the multidimensional intelligent acquisition can improve the accuracy of the associated big data cluster analysis of the multidimensional intelligent acquisition.

Drawings

FIG. 1 is a schematic flow chart of a related big data clustering analysis method based on multidimensional intelligent acquisition according to an embodiment of the present invention;

FIG. 2 is a functional block diagram of a related big data cluster analysis device based on multidimensional intelligent acquisition according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device for implementing the associated big data cluster analysis method based on multidimensional intelligent collection according to an embodiment of the present application.

The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The embodiment of the application provides a related big data cluster analysis method based on multidimensional intelligent acquisition. In the embodiment of the application, the execution subject of the associated big data cluster analysis method based on multidimensional intelligent acquisition comprises at least one of electronic equipment such as a server and a terminal which can be configured to execute the method provided by the embodiment of the application. In other words, the associated big data cluster analysis method based on multidimensional intelligent acquisition can be executed by software or hardware installed in a terminal device or a server device, wherein the software can be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

Referring to fig. 1, a flow chart of a related big data cluster analysis method based on multidimensional intelligent acquisition according to an embodiment of the present invention is shown. In this embodiment, the associated big data cluster analysis method based on multidimensional intelligent acquisition includes steps S1-S4.

S1, acquiring associated big data to be analyzed, performing data filtering on the associated big data to obtain target data, and performing attribute analysis on the target data to obtain data attributes.

According to the invention, through acquiring the associated big data to be analyzed, filtering the data of the associated big data, unimportant data or incomplete data in the associated big data can be removed, convenience is provided for subsequent processing of the data, wherein the associated big data is data with certain relativity, for example, in the aspect of social management in government service, the social management comprises various management, for example, multi-aspect management of public security, traffic and institutions, a large amount of data can be generated, mutual lottery utilization can be performed among the associated data, so that better management of the data is facilitated, for example, in the aspect of public security management, information data about the vehicle in traffic management is scheduled, so that information statistics of the vehicle is facilitated, further, the efficiency of data processing can be improved, the target data is the data obtained after the associated big data is processed through filtering, deleting and the like, further, the acquisition of the associated big data to be analyzed can be realized through a data collector, and the data collector is realized through a script.

According to the invention, the invalid data in the associated big data can be removed by carrying out data filtering on the associated big data so as to improve the efficiency of subsequent data processing, wherein the target data is the data obtained by removing the invalid data in the associated big data.

As an embodiment of the present invention, the performing data filtering on the associated big data to obtain target data includes: and carrying out standardization processing on the associated big data to obtain standardized data, carrying out vectorization operation on the standardized data to obtain standardized vectors, calculating an included angle cosine value between the standardized vectors, and carrying out de-duplication processing on the standardized data according to the included angle cosine value to obtain target data.

The standardized data are data obtained after the associated big data are subjected to format unification, the standardized vector is a vector expression form corresponding to the standardized data, the included angle cosine value is the included angle between the standardized vectors, and the closer the included angle is to zero, the more similar the two vectors are indicated.

Furthermore, the normalization processing of the associated big data can be realized by a standard deviation normalization method, the vectorization operation of the normalized data can be realized by a word2vec algorithm, the calculation of the cosine value of the included angle between the normalized vectors can be realized by a cosine function, the de-duplication processing of the normalized data can be realized by a de-duplication tool, and the de-duplication tool is compiled by a script language.

According to the method, the relevant attribute information of each data in the target data can be known by carrying out attribute analysis on the target data, so that the knowledge of the target data is increased, and a premise is provided for the follow-up calculation of the attribute covariance, wherein the data attribute is the property corresponding to each data in the target data.

As an embodiment of the present invention, the performing attribute analysis on the target data to obtain a data attribute includes: extracting a data tag corresponding to each data in the target data, calculating the weight of each tag in the data tag to obtain a tag weight, extracting a characteristic tag in the data tag according to the tag weight, and carrying out attribute analysis on the characteristic tag to obtain a data attribute.

The data labels are information such as identifiers or marks corresponding to each piece of data in the target data, the label weight represents the importance degree of each label in the data labels, and the characteristic labels are representative labels in the data labels.

Further, extracting the data tag corresponding to each data in the target data can be achieved through a tag extractor, the tag extractor is compiled by Java language, the feature tag can be obtained by extracting the tag with the largest value of the tag weight through an extraction function, the extraction function comprises a LEFT function, and the attribute analysis of the feature tag can be achieved through an attribute analysis method, such as a funnel analysis method.

Further, as an optional embodiment of the present invention, the calculating the weight of each tag in the data tag to obtain a tag weight includes:

the weight of each of the data tags is calculated by the following formula:

wherein D is _i Representing the tag weight of each of the data tags, B _i A tag vector representing an i-th tag of the data tags,representing the vector covariance corresponding to the label vector of the ith label in the user labels, trace () represents the spatial filter function.

S2, carrying out linear transformation on the data attributes to obtain attribute linear values, carrying out normal distribution processing on each attribute in the data attributes according to the attribute linear values to obtain an attribute normal distribution diagram, and calculating the probability density of each graph in the attribute normal distribution diagram.

The invention can obtain the linear value of the data attribute by carrying out linear transformation on the data attribute, and provides guarantee for the subsequent normal distribution processing, wherein the attribute linear value is a numerical expression form corresponding to the data attribute, the attribute normal distribution diagram is a variable frequency distribution diagram corresponding to the data attribute, the probability density is the area of each image in the normal distribution diagram, namely the occurrence probability corresponding to each attribute, and the expected value is the average number of output values corresponding to the data attribute.

Further, as an alternative embodiment of the present invention, the linear transformation of the data attributes may be implemented by a linear function, such as a linear function, and the normal distribution process of each of the data attributes may be implemented by a gaussian function, and the expected value may be obtained by calculating an integrated value of the probability density.

As one embodiment of the present invention, the calculating the probability density of each graph in the attribute normal distribution map includes:

the probability density of each graph in the attribute normal distribution graph is calculated by the following formula:

wherein F expresses probability density of each graph in the normal distribution diagram of the attribute, beta represents attribute mean value of the data attribute, exp represents an exponential function, B _j Random variable representing normal distribution map of jth attribute, C _j And (5) representing the graph parameters corresponding to the j-th attribute normal distribution graph.

S3, determining expected values corresponding to the data attributes according to the probability density, calculating covariance among each attribute in the data attributes according to the expected values to obtain attribute covariance, constructing a covariance matrix of the attribute covariance, and determining a covariance structure of the target data according to the covariance matrix.

According to the probability density, the expected value corresponding to the data attribute is determined so as to facilitate understanding of the gap corresponding to the data attribute, wherein the expected value is the average number of the output values corresponding to the data attribute, and further, the expected value corresponding to the data attribute is determined by calculating the average value of the data attribute.

According to the method, the covariance among each attribute in the data attributes is calculated according to the expected value, so that the degree of the phase difference among the data attributes can be known, and further guarantee is provided for the follow-up construction of the variance matrix of the attribute covariance, wherein the attribute covariance is the overall error among each attribute in the data attributes, the larger the error of the target data is indicated if the numerical value of the attribute covariance is a negative number, and the smaller the error of the target data is indicated if the numerical value of the attribute covariance is a negative number.

As an embodiment of the present invention, the calculating the covariance between each attribute in the data attributes according to the expected value, to obtain an attribute covariance includes:

Cov(m,m+1)＝E[m，m+1]-E[m]E[m+1]

The invention can know the construction condition corresponding to the covariance matrix and what kind of difference exists between the target data by constructing the covariance matrix of the attribute covariance, and further, the covariance matrix of the attribute covariance can be constructed by a matrix function which is compiled by programming language, and the covariance structure of the target data can be determined by the structure type of the covariance matrix.

And S4, carrying out sharpening noise reduction treatment on the target data according to the covariance structure to obtain noise reduction data, carrying out feature extraction on the noise reduction data to obtain feature data, constructing a conversion matrix of the feature data, carrying out sharpening projection on the feature data in combination with the conversion matrix to obtain data projection, and carrying out clustering analysis on the feature data according to the data projection to obtain a clustering result of the associated big data.

According to the method, the target data is sharpened and noise reduced according to the covariance structure, uncertainty data of workers of the target data can be removed, redundant data are removed, and accuracy is improved for subsequent feature extraction of the noise reduction data, wherein the noise reduction data is obtained by removing the redundant data and the uncertainty data in the target data.

As an embodiment of the present invention, the sharpening noise reduction processing is performed on the target data according to the covariance structure, to obtain noise reduction data, including: according to the covariance structure, carrying out feature decomposition on the covariance matrix to obtain matrix features, calculating feature weights of each feature in the matrix features, screening the target data according to the feature weights to obtain screening data, carrying out dimension reduction processing on the screening data to obtain dimension reduction data, carrying out sharpening processing on the dimension reduction data to obtain sharpening data, and carrying out noise reduction processing on the sharpening data to obtain noise reduction data.

The matrix features are features corresponding to the covariance matrix, the feature weights represent importance of the matrix features, the screening data are data obtained after the target data are screened according to the size of the matrix feature values, the dimension reduction data are data obtained after the target data are reduced from high dimension to low dimension, and the sharpening data are data obtained after the definition of the dimension reduction data is improved.

Further, as an optional embodiment of the present invention, the feature weight of each feature in the matrix feature may be implemented by an analytic hierarchy process, the screening of the target data may be implemented by a screening function, such as a look up function, the dimension reduction processing of the screening data may be implemented by a naive bayes method, the sharpening processing of the dimension reduction data may be implemented by a sharpening tool, the sharpening tool is compiled by a scripting language, and the noise reduction processing of the sharpening data may be implemented by a mean filtering method.

Further, as an optional embodiment of the present invention, the performing feature decomposition on the covariance matrix according to the covariance structure to obtain matrix features includes:

wherein G represents the matrix characteristics of the covariance matrix, cov ^z Representing the covariance structure, Q represents the orthonormal matrix, and Σ () is the diagonal matrix corresponding to the covariance matrix.

The invention can obtain the characteristic part in the noise reduction data by carrying out characteristic extraction on the noise reduction data, and provides guarantee for the subsequent construction of a conversion matrix, wherein the characteristic data is representative data in the noise reduction data, and further, the characteristic extraction of the noise reduction data can be realized by a principal component analysis method.

According to the invention, the conversion matrix of the characteristic data is constructed so as to facilitate the subsequent sharpening projection processing of the characteristic data through the conversion matrix, so that the accuracy of the subsequent data clustering analysis is improved, wherein the conversion matrix is a square matrix for converting the characteristic data.

As an embodiment of the present invention, the constructing a transformation matrix of the feature data includes: calculating the characteristic value of each data in the characteristic data, sorting the characteristic values to obtain sorted characteristic values, counting the number of the characteristic values to obtain characteristic numbers, identifying the data dimension of the characteristic data, setting a retention coefficient of the sorted characteristic values according to the characteristic numbers and the data dimension, filtering the sorted characteristic values according to the retention coefficient to obtain a target characteristic value, carrying out vector conversion on the characteristic data corresponding to the target characteristic value to obtain a characteristic vector, and constructing a conversion matrix of the characteristic data according to the target characteristic value and the characteristic vector.

The feature value is a feature score value corresponding to each data in the feature data, the sorting feature value is obtained after sorting according to the numerical value of the feature value, the feature quantity is the total number of the feature values, the data dimension is a space dimension of the feature data, such as a two-dimensional space, a three-dimensional space and the like, the retention coefficient is a proportion of retention of the sorting feature value, the target feature value is obtained after filtering the sorting feature value according to the retention coefficient, and the feature vector is a vector expression form corresponding to the feature data.

Further, calculating the characteristic value of each data in the characteristic data can be achieved through a characteristic value calculator, sorting of the characteristic values can be achieved through a bubbling sorting algorithm, statistics of the number of the characteristic values can be achieved through a moving weighted average method, identification of the data dimension of the characteristic data can be achieved through key values of the characteristic data, filtering of the sorting characteristic values can be achieved through a bloom filter, vector conversion of the characteristic data corresponding to the target characteristic values can be achieved through a Word2vec algorithm, and construction of the conversion matrix of the characteristic data can be achieved through the matrix function.

The method and the device can project the characteristic data into the coordinates by sharpening projection, so that cluster analysis is conveniently carried out on the sharpened data, wherein the data projection is an image obtained after the characteristic data are projected to the corresponding coordinates, and further, the sharpening projection of the characteristic data can be realized through a projection tool, such as a data projector.

According to the data projection, the clustering analysis is carried out so that the sharpened data can be subjected to the aggregation classification, and the classification of the data with different dimensions is finished, wherein the clustering result is obtained after the clustering analysis of the sharpened data.

As an embodiment of the present invention, the performing cluster analysis on the feature data according to the sharpened projection to obtain a cluster result of the associated big data includes: obtaining coordinates of each projection in the sharpened projections to obtain projection coordinates, calculating projection similarity of each projection according to the projection coordinates, determining potential association degree of the feature data according to the projection similarity, and carrying out cluster analysis on the feature data according to the potential association degree to obtain a clustering result of the associated big data.

The projection coordinates are point coordinates of each projection in the sharpened projections, the projection similarity is the similarity degree between each projection, the correlation between the sharpened projections is represented, and the potential correlation degree is the hidden correlation between the characteristic data and is not easy to obtain through a data surface.

Further, as an alternative embodiment of the present invention, the obtaining the coordinates of each projection in the sharpened projections may be implemented by a coordinate identifier, where the coordinate identifier is compiled by C language, and the cluster analysis of the feature data may be implemented by a K-means algorithm.

Further, as an optional embodiment of the present invention, the calculating the projection similarity of each projection according to the projection coordinates includes:

According to the method, unimportant data or incomplete data in the associated big data can be removed by obtaining the associated big data to be analyzed and carrying out data filtering on the associated big data, convenience is provided for subsequent processing of the data. Therefore, the associated big data cluster analysis method based on multidimensional intelligent acquisition can improve the accuracy of associated big data cluster analysis of multidimensional intelligent acquisition.

Fig. 2 is a functional block diagram of a related big data cluster analysis device based on multidimensional intelligent acquisition according to an embodiment of the present invention.

The associated big data cluster analysis device 100 based on multidimensional intelligent acquisition can be installed in electronic equipment. Depending on the implementation function, the associated big data cluster analysis device 100 based on multidimensional intelligent collection may include an attribute analysis module 101, a matrix construction module 102, a feature extraction module 103 and a cluster analysis module 104. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.

In the present embodiment, the functions concerning the respective modules/units are as follows:

the attribute analysis module 101 is configured to obtain associated big data to be analyzed, perform data filtering on the associated big data to obtain target data, and perform attribute analysis on the target data to obtain a data attribute;

the matrix construction module 102 is configured to perform linear transformation on the data attributes to obtain attribute linear values, perform normal distribution processing on each attribute in the data attributes according to the attribute linear values to obtain an attribute normal distribution map, and calculate probability density of each graph in the attribute normal distribution map according to the following formula;

Wherein F expresses probability density of each graph in the attribute normal distribution diagram, and beta expresses numberAccording to attribute mean value of attribute, exp represents exponential function, B _j Random variable representing normal distribution map of jth attribute, C _j Representing a graph parameter corresponding to the j-th attribute normal distribution graph;

the feature extraction module 103 is configured to determine an expected value corresponding to the data attribute according to the probability density, calculate covariance between each attribute in the data attribute according to the expected value, obtain attribute covariance, construct a covariance matrix of the attribute covariance, and determine a covariance structure of the target data according to the covariance matrix;

the cluster analysis module 104 is configured to perform sharpening noise reduction processing on the target data according to the covariance structure to obtain noise reduction data, perform feature extraction on the noise reduction data to obtain feature data, construct a transformation matrix of the feature data, perform sharpening projection on the feature data in combination with the transformation matrix to obtain data projection, and perform cluster analysis on the feature data according to the data projection to obtain a clustering result of the associated big data.

In detail, each module in the related big data cluster analysis device 100 based on multidimensional intelligent collection in the embodiment of the present application adopts the same technical means as the related big data cluster analysis method based on multidimensional intelligent collection described in fig. 1, and can produce the same technical effects, which are not described herein.

Fig. 3 is a schematic structural diagram of an electronic device 1 for implementing a multidimensional intelligent acquisition-based associated big data cluster analysis method according to an embodiment of the present application.

The electronic device 1 may comprise a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may further comprise a computer program stored in the memory 11 and executable on the processor 10, such as an associated big data cluster analysis method program based on multidimensional intelligent acquisition.

The processor 10 may be formed by an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be formed by a plurality of integrated circuits packaged with the same function or different functions, including one or more central processing units (Central Processing Unit, CPU), a microprocessor, a digital processing chip, a graphics processor, a combination of various control chips, and so on. The processor 10 is a Control Unit (Control Unit) of the electronic device 1, connects various components of the entire electronic device using various interfaces and lines, executes or executes programs or modules stored in the memory 11 (for example, executes associated big data cluster analysis method programs based on multidimensional intelligent acquisition, etc.), and invokes data stored in the memory 11 to perform various functions of the electronic device and process data.

The memory 11 includes at least one type of readable storage medium including flash memory, a removable hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, such as a mobile hard disk of the electronic device. The memory 11 may in other embodiments also be an external storage device of the electronic device, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only to store application software installed in an electronic device and various data, such as codes of related big data cluster analysis method programs based on multidimensional intelligent collection, but also to temporarily store data that has been output or is to be output.

The communication bus 12 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.

The communication interface 13 is used for communication between the electronic device 1 and other devices, including a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), or alternatively a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device and for displaying a visual user interface.

Fig. 3 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.

For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

The associated big data cluster analysis method program based on multidimensional intelligent acquisition stored in the memory 11 in the electronic device 1 is a combination of a plurality of instructions, which when run in the processor 10 can realize:

In particular, the specific implementation method of the above instructions by the processor 10 may refer to the description of the relevant steps in the corresponding embodiment of the drawings, which is not repeated herein.

Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable storage medium may be volatile or nonvolatile. For example, the computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a Read-only memory (ROM).

The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, can implement:

In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms first, second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present application without departing from the spirit and scope of the technical solution of the present application.

Claims

1. The associated big data cluster analysis method based on multidimensional intelligent acquisition is characterized by comprising the following steps of:

2. The multi-dimensional intelligent acquisition-based associated big data cluster analysis method of claim 1, wherein the performing data filtering on the associated big data to obtain target data comprises:

3. The multi-dimensional intelligent acquisition-based associated big data cluster analysis method according to claim 1, wherein the performing attribute analysis on the target data to obtain data attributes comprises:

4. The multi-dimensional intelligent collection-based associative big data cluster analysis method according to claim 1, wherein the calculating the covariance of each attribute in the data attributes according to the expected value to obtain an attribute covariance comprises:

Cov(m,m+1)＝E[m，m+1]-E[m]E[m+1]

5. The multi-dimensional intelligent acquisition-based associated big data cluster analysis method according to claim 1, wherein the sharpening noise reduction processing is performed on the target data according to the covariance structure to obtain noise reduction data, and the method comprises the following steps:

6. The method for clustering analysis of associated big data based on multidimensional intelligent acquisition according to claim 5, wherein the performing feature decomposition on the covariance matrix according to the covariance structure to obtain matrix features comprises:

7. The multi-dimensional intelligent acquisition-based associative big data cluster analysis method according to claim 1, wherein the constructing the transformation matrix of the feature data comprises:

8. The multi-dimensional intelligent acquisition-based associated big data clustering analysis method according to claim 1, wherein the performing cluster analysis on the feature data according to the sharpening projection to obtain a clustering result of the associated big data comprises:

9. The multi-dimensional intelligent collection-based associative big data cluster analysis method according to claim 8, wherein the calculating the projection similarity of each projection according to the projection coordinates comprises:

10. Associated big data cluster analysis device based on multidimensional intelligent acquisition, which is characterized in that the device comprises: