CN109583712B - Data index analysis method and device and storage medium - Google Patents

Data index analysis method and device and storage medium Download PDF

Info

Publication number
CN109583712B
CN109583712B CN201811348360.5A CN201811348360A CN109583712B CN 109583712 B CN109583712 B CN 109583712B CN 201811348360 A CN201811348360 A CN 201811348360A CN 109583712 B CN109583712 B CN 109583712B
Authority
CN
China
Prior art keywords
data index
information
data
index
industry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811348360.5A
Other languages
Chinese (zh)
Other versions
CN109583712A (en
Inventor
乔磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MIGU Culture Technology Co Ltd
Original Assignee
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MIGU Culture Technology Co Ltd filed Critical MIGU Culture Technology Co Ltd
Priority to CN201811348360.5A priority Critical patent/CN109583712B/en
Publication of CN109583712A publication Critical patent/CN109583712A/en
Application granted granted Critical
Publication of CN109583712B publication Critical patent/CN109583712B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data index analysis method, which comprises the following steps: acquiring industry information; acquiring corresponding data index information according to the industry information and the index clustering model, and training the index clustering model according to historical data index information corresponding to historical industry information; and presenting data index information. The embodiment of the invention also discloses a data index analysis device and a storage medium.

Description

Data index analysis method and device and storage medium
Technical Field
The invention relates to big data technology in the field of data analysis, in particular to a data index analysis method and device and a storage medium.
Background
With the maturity and popularization of big data technologies, data analysis represents more and more important value in the aspects of guiding enterprise management, production, operation and the like. Business Intelligence (BI) is used as an important solution for data analysis, provides a rich, convenient and flexible interactive analysis mode for data self-service analysis, and helps a user to complete corresponding data index analysis. The data index is the key for analyzing the business data and is the core part for data modeling by using a BI tool in each industry.
Currently, when data index analysis is performed by an existing BI tool, only reference examples of some data indexes are given, and the data index analysis needs to depend on professionals. Therefore, the existing BI tool provides a single function in data index analysis, and the accuracy of data index analysis cannot be guaranteed due to the experience and ability of professionals.
Disclosure of Invention
In order to solve the above technical problems, embodiments of the present invention desirably provide a data index analysis method and apparatus, and a storage medium, which can increase a function of a data index analysis apparatus for performing data index analysis, and ensure accuracy of data index analysis.
The technical scheme of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a data index analysis method, where the method includes:
acquiring industry information;
acquiring corresponding data index information according to the industry information and an index clustering model, wherein the index clustering model is acquired by training according to historical data index information corresponding to historical industry information;
and presenting the data index information.
In the above scheme, before obtaining the corresponding data index information according to the industry information and the index clustering model, the method further includes:
acquiring historical data index information corresponding to the historical industry information;
carrying out standardization processing on the historical data index information to obtain standardized data index information;
and training according to the normalized data index information and at least one preset training model to obtain the index clustering model.
In the foregoing solution, the normalizing the historical data index information to obtain normalized data index information includes:
carrying out hierarchical processing on the historical data index information to obtain hierarchical data index information;
and carrying out data processing on the hierarchical data index information to obtain the normalized data index information.
In the foregoing scheme, the performing data processing on the hierarchical data index information to obtain the normalized data index information includes:
carrying out numerical processing on the hierarchical data index information to obtain data index numerical information;
and carrying out normalization processing on the data index numerical value information to obtain normalized data index information.
In the above scheme, the training according to the normalized data index information and at least one preset training model to obtain the index clustering model includes:
constructing training parameter information based on the normalized data index information and the at least one preset training model;
training the normalized data index information according to the training parameter information to obtain a training result;
and determining the index clustering model according to the evaluation information corresponding to the training result.
In the above scheme, when the industry information is an industry identifier, the obtaining, according to the industry information and the index clustering model, corresponding data index information includes:
and matching to obtain the corresponding data index information from the index clustering model according to the industry identification.
In the above scheme, when the industry information is a data index, the obtaining of corresponding data index information according to the industry information and the index clustering model includes:
calculating the similarity between the data indexes and each cluster in the index cluster model;
and according to the similarity, using the cluster determined from the index cluster model as the data index information.
In the above aspect, the method further includes:
and when the latest historical industry information is detected, updating the index clustering model according to the data index information corresponding to the latest historical industry information.
In a second aspect, an embodiment of the present invention provides a data index analysis apparatus, where the apparatus includes: a processor, a display, a memory, and a communication bus through which the display and the memory communicate with the processor, the memory storing instructions executable by the processor, the instructions when executed, performing the method as described above by the processor.
In a third aspect, the present invention provides a computer-readable storage medium, on which instructions are stored, and when executed by a processor, the instructions implement the method as described above.
The embodiment of the invention provides a data index analysis method, a data index analysis device and a storage medium, wherein firstly, industry information is obtained; then, obtaining corresponding data index information according to the industry information and an index clustering model, and training the index clustering model according to historical data index information corresponding to historical industry information; and finally, presenting data index information. By adopting the technical implementation scheme, the data index information for data index analysis is acquired from the index clustering model through the industry information which needs to be analyzed and is provided by the user, and the index clustering model is a model which is trained by the data index analysis device according to the historical data index information corresponding to the historical industry information, so that the scheme that the data index analysis device carries out data index analysis according to the industry information and the historical data index information which are provided by the user is realized, and the function of the data index analysis device for data index analysis is increased; meanwhile, the index clustering model is trained based on historical data index information corresponding to historical industry information, so that an objective scheme for analyzing the data indexes is realized, and the accuracy of analyzing the data indexes is ensured.
Drawings
Fig. 1 is a flowchart illustrating an implementation of a data index analysis method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an exemplary data index analysis method according to an embodiment of the present invention;
FIG. 3 is a flow chart of another implementation of a data index analysis method according to an embodiment of the present invention;
fig. 4 is a schematic diagram of exemplary hierarchical data index information according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating an exemplary implementation of index clustering model training according to an embodiment of the present invention;
fig. 6 is a first schematic structural diagram of a data index analysis apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a data index analysis apparatus according to an embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Example one
An embodiment of the present invention provides a data index analysis method, and fig. 1 is a flowchart illustrating an implementation of the data index analysis method provided in the embodiment of the present invention, as shown in fig. 1, the data index analysis method includes:
and S101, acquiring industry information.
In the embodiment of the present invention, the data index is the core of data analysis and is also the key for analyzing the business data, and when the data index analyzing device performs data analysis or business data analysis, information related to the industry field in which the data index analysis is to be performed needs to be determined, that is, industry information is obtained.
It should be noted that the industry information represents information related to an industry field, such as an industry identifier, an industry name, an industry field, and an industry index.
Here, the industry information may be obtained through information input by the user in the data index analysis device, a selection instruction, and information sent by another device, which is received, and this is not particularly limited in the embodiment of the present invention.
It can be understood that the industry information acquired by the data index analysis device provides conditions and basis for the subsequent processing of the data index analysis by the data index analysis device.
It should be noted that the data index analyzing apparatus is a tool capable of analyzing the data index, such as a BI tool.
S102, obtaining corresponding data index information according to the industry information and the index clustering model, and training the index clustering model according to historical data index information corresponding to the historical industry information.
In the embodiment of the invention, a trained index clustering model is preset in the data index analysis device, and the index clustering model is used for outputting corresponding data index information according to the input industry information to finish the carding work of the data indexes. Therefore, when the data index analysis device acquires the industry information, the industry information is input to the index clustering model, so that the data index information corresponding to the industry information is acquired.
Here, the data index information is information composed of a series of data indexes, that is, the data index information is a data index system. Specifically, the data index system is an organic whole consisting of a plurality of relatively independent and interconnected statistical indexes which reflect the overall quantity characteristics of the social and economic phenomena. In statistical studies, if a global picture is to be described, it is often not sufficient to use only one index, since it can only reflect quantitative features of one aspect of the population; at this time, a plurality of related indexes need to be used at the same time, and a unified whole formed by the plurality of related and mutually independent indexes forms a data index system, namely the data index information in the embodiment of the invention.
It should be noted that the index clustering model is obtained by training according to historical data index information corresponding to historical industry information; the historical data index corresponding to the historical industry information is data index information which exists before the data index analysis is carried out, namely the historical data index information is information which is successfully combed by each industry; using the historical data index information as a training sample, preferably, an index clustering model is trained by using a Self-Organizing neural network (SOM).
SOM is one of the most attractive research areas of neural networks, which is able to detect their regularity and the relationship of input samples to each other by their learning of input samples, and to adapt the network adaptively according to the information of these input samples, so that the network's later response is adapted to the input samples.
Further, when the industry information obtained by the data index analyzing device is an industry identifier, the data index analyzing device in S102 obtains corresponding data index information according to the industry information and the index clustering model, and the method specifically includes: and the data index analysis device matches the index clustering model to obtain corresponding data index information according to the industry identification.
It should be noted that the industry mark is information representing industry fields, such as e-commerce industry, catering industry, green plant industry, and the like. The index clustering model stores clustering training results of each industry, when an industry identifier is input into the index clustering model, the data index analysis device matches the industry identifier with the clustering training results of each industry stored in the index clustering model, specifically, matches the industry identifier with the industry identifier corresponding to the clustering training result, and when the industry identifier corresponding to the clustering training result is successfully matched, takes the clustering training result corresponding to the successfully matched industry identifier as data index information.
Illustratively, when the obtained industry information is the catering industry, the data index analysis device inputs the catering industry to the trained index clustering model, and obtains a clustering result matched with the catering industry from the trained clustering training result stored in the index clustering model, so that the data index information corresponding to the catering industry is obtained.
It can be understood that when a user needs to comb a data index of a certain industry, the accurate and comprehensive data index information can be obtained only by inputting the industry identification of the industry into the data index analysis device, so that the efficiency of obtaining the data index information is improved, and the cost of obtaining the data index information is reduced.
In addition, further, when the industry information obtained by the data index analysis device is a data index, obtaining corresponding data index information according to the industry information and the index clustering model, specifically including: the data index analysis device calculates the similarity between the data index and each cluster in the index cluster model; and according to the similarity, using the cluster determined from the index cluster model as data index information.
It should be noted that the data index is information for characterizing a service index in the industry information, such as a certain service index a. Specifically, the number of the data indexes may be one or more, and this is not particularly limited in the embodiment of the present invention. The index clustering model stores clustering training results representing data indexes, and when the data indexes are input into the index clustering model, the data index analyzing device can automatically find out clustering index clusters similar to the data indexes according to the clustering training results stored in the index clustering model, so that a user is helped to obtain more comprehensive index analysis information.
And S103, presenting data index information.
In the embodiment of the invention, the data index analysis device presents the data index information after obtaining the analysis result data index information from the index data model according to the obtained industry information so that a user can see the analysis result.
Here, when the data index analyzing device presents the data index information, the data index information may be presented on an output interface, for example.
It should be noted that, when the data index information is obtained according to the industry identifier, because the data index information is a clustering result formed by a plurality of clustering clusters, when the data index information is presented, the information such as the plurality of clustering clusters and the corresponding relationship thereof is presented at the same time. For example, the association between the basic information and the overall index of each cluster is presented through an interactive two-dimensional row-column table for the user to refer to and select: when a certain clustering cell is clicked, an index list under the clustering cluster can be checked, and meanwhile, detailed information of a data index corresponding to each element can be checked by clicking each list element.
In addition, when the data index information is obtained according to the data index, because the data index information is a similar clustering index cluster to the data index, the determined similar clustering index cluster is presented when the data index information is presented. Similarly, for the similar cluster index cluster, the index list under the cluster can be viewed, and clicking each list element can view the detailed information of the data index corresponding to the element.
Further, in the embodiment of the present invention, after the data index analyzing apparatus presents the data index information in S103, the data index analyzing method further includes S104, where:
and S104, when the latest historical industry information is detected, updating the index clustering model according to the data index information corresponding to the latest historical industry information.
In the embodiment of the invention, the index clustering model has a continuous updating process, and the generalization capability of the index clustering model is improved by continuous updating. The method specifically comprises the steps that a data index analysis device detects whether the latest historical industry information exists in real time, and when the latest historical industry information is detected, an index clustering model is updated according to the data index information corresponding to the latest historical industry information.
It should be noted that the Generalization Ability (Generalization Ability) refers to the adaptability of a machine learning algorithm to a fresh sample; the generalization ability is realized by continuous learning, the learning aims at learning the rule hidden behind the data, and the trained network can also give appropriate output to the data except the learning set with the same rule; it is generally expected that networks trained by training samples have strong generalization capability, i.e., capability of giving reasonable response to new input, but correct input/output mapping relationship is obtained as the training times are increased.
Exemplarily, fig. 2 is a schematic diagram of an exemplary data index analysis method provided in an embodiment of the present invention, as shown in fig. 2, a data index analysis device is a BI tool, obtains industry information input by a BI user, inputs the industry information into an index clustering model, and determines corresponding data index information from a model library corresponding to the index clustering model for user reference and guidance, where the data index information is different according to specific content of the input industry information, and may be data index information composed of a plurality of clustering clusters, and a certain clustering cluster is clicked to view more detailed information; or may be a cluster similar to the input data index.
It can be understood that the data index analysis device can carry out normalization processing and warehousing on the data index information corresponding to the newly-entered historical industry information in real time or periodically, and update the index clustering model according to the data index information after the normalization processing; along with the increase of the amount of data index information corresponding to historical industry information, a data index library obtained by training is continuously enriched and improved, so that the generalization capability of the index clustering model is greatly improved.
Example two
An embodiment of the present invention provides another data index analysis method, and fig. 3 is a flowchart illustrating another data index analysis method according to an embodiment of the present invention, where as shown in fig. 3, the data index analysis method includes:
s201, acquiring historical data index information corresponding to historical industry information.
In the embodiment of the present invention, before the data index analysis device performs data index analysis, the index clustering model needs to be trained, and when the index clustering model is obtained, first, sample data for training the index clustering model needs to be obtained, that is, historical data index information corresponding to historical industry information is obtained.
It should be noted that, in the data index analysis industry, there are data index information that has been successfully sorted in each line, and these successfully sorted data index information are historical data index information corresponding to historical industry information.
S202, conducting normalization processing on the historical data index information to obtain normalized data index information.
In the embodiment of the present invention, after obtaining the historical data index information, the data index analysis device needs to normalize the historical data index information to normalized data index information that can be used for training the index clustering model before performing the training of the index clustering model according to the historical data index information.
Further, the data index analyzing device in S202 performs normalization processing on the historical data index information to obtain normalized data index information, which specifically includes S202a-S202b, where:
s202a, carrying out hierarchical processing on the historical data index information to obtain hierarchical data index information.
It should be noted that, with the rapid development of business intelligence and big data, the data index can be used in a plurality of fields, such as performance, monitoring, tracking, etc. In the face of various complex and complicated data indexes, the data index analysis device can extract the commonalities of the data indexes to form processable hierarchical data index information. The extraction common property is hierarchical processing; the historical data index information is high-dimensional spatial data; the hierarchical data index information is low-dimensional spatial data.
Preferably, the data index analyzing device uses a feature extraction function of the SOM to: and after feature extraction, the high-dimensional space data is presented in a low-dimensional space.
Exemplarily, in the industry B, after hierarchical processing of the historical data index information, the topological sorting property of the SOM algorithm is used to map the high-dimensional spatial data onto the two-dimensional space, and the obtained hierarchical data index information is shown in fig. 4, where the historical data index information of the industry B is presented in the form of a two-dimensional table; the historical data index information is divided into six layers, namely a first data index, a second data index, a time dimension, an area dimension, a data index calculation mode and a display mode, wherein the corresponding attribute of each layer is shown in fig. 4.
It can be understood that the data index analysis device performs hierarchical processing on the historical data index information, so that the effect of data dimension reduction is achieved, the convenience of data processing is improved, and the data processing efficiency is improved.
S202b, carrying out data processing on the hierarchical data index information to obtain normalized data index information.
In the embodiment of the present invention, after obtaining the low-dimensional hierarchical data index information, the data index analysis device only obtains the attribute tag at this time, and at this time, needs to determine the data information of the attribute tag, that is, performs data processing on the hierarchical data index information, and further obtains the normalized data index information. Here, one data index of the hierarchical data index information is one attribute.
Specifically, the data index analysis device performs data processing on hierarchical data index information to obtain normalized data index information, and the data index analysis device includes: the data index analysis device carries out numerical processing on the hierarchical data index information to obtain data index numerical information; and carrying out normalization processing on the data index numerical value information to obtain normalized data index information. That is, the data index analyzing device obtains the hierarchical data index information, and then performs the attribute digitization and data normalization processing on the hierarchical data index information.
It should be noted that attribute numeralization is a process of assigning an attribute tag, such as noun attribute numeralization: because all the noun attributes have enumeration characteristics, for a certain noun attribute C, the value is [ C1, C2, … …, cm ], the value is mapped to the [0,1] interval to further obtain the corresponding data value [ d1, d2, … …, dm ], wherein the values of d1, d2, … …, dm are between 0 and 1. For another example, referring to fig. 4, the value of the area dimension corresponding to a certain data index is [ china, sichuan province, metropolis ], which is denoted by 1; there are no values for continents and zones, denoted by 0; then according to the definition of the region dimension data index in fig. 4, the value of [0,1] corresponding to the data index is [0,1,1,1,0] (left to right represents the normalized value from continent to region), and none of the dimension levels is replaced by 0.
Here, when the value of the data index is a noun attribute, the data index is processed by both digitization and normalization; and when the value of the data index is a numerical attribute, the data index is further subjected to normalization processing after being subjected to numerical processing. Specifically, for the discrete numerical value class attribute, normalization processing is performed by using a formula (1), wherein the formula (1) is as follows:
Figure BDA0001864308700000101
wherein, x' is the numerical value of the data index after normalization processing; x is the value of the current data index; min (x) is the minimum value in the value range of the data index; max (x) is the maximum value in the range of values of the data index.
In addition, for the continuous numerical value attribute, discretization is performed according to a preset interval division algorithm, and the discretized data are subjected to normalization processing by adopting a formula (1).
Preferably, the preset interval division algorithm can adopt an unsupervised discretization technology, and the interval division algorithm to be adopted is determined to be equal-width binning, equal-frequency binning, minimum entropy method and user by specifically combining with data characteristicsAnd algorithms such as a custom interval method and the like. Here, the equal-width binning algorithm is explained as an example: in the equal-width box-dividing algorithm, firstly, according to the specified interval number K and the value range [ X ] of the attributemin,Xmax]Is divided into K sections, and the width of each section is equal, namely the width is (X)max-Xmin) K, wherein XmaxIs the maximum value in the data, XminIs the minimum value in the data; secondly, all values in the bin are replaced by the median or average value of each interval, so that the discretization of the data is realized.
It should be noted that, up to this point, the data normalization processing has been completed, and sample data-normalized data index information for training the index clustering model is obtained.
S203, training according to the normalized data index information and at least one preset training model to obtain an index clustering model.
In the embodiment of the present invention, after obtaining training sample data, the data index analysis device selects at least one preset training model to train the input normalized data index information, so as to obtain an index clustering model.
It can be understood that, in the embodiment of the present invention, at least one preset training model is used to train the index clustering model, for example, when only the SOM is used to train the index clustering model, due to the self-characteristics of the SOM, the SOM has better performance for detecting high-dimensional data, and can perform effective adaptive classification, but the SOM does not have an objective function that can be used for optimization, so that the at least one preset training model is selected to train the index clustering model, and a plurality of preset training models are comprehensively utilized to obtain a better index clustering model; other related machine learning algorithms, such as a recommended learning algorithm, can be specifically combined to improve the effect of overall training of the data.
In addition, when the SOM receives an external input mode, the SOM is divided into different corresponding areas, each area has different characteristics for the input mode, and the process is automatically completed. Therefore, the SOM can be used to perform unsupervised clustering on the input data, where "unsupervised" means that the designated clustering number of the data samples is not known in advance, but the samples are clustered by themselves in the training process of the neural network. In view of the application and success of the SOM in the aspects of natural language and data semantic correlation, the embodiment of the present invention utilizes the SOM to automatically comb data indexes.
Further, in the embodiment of the present invention, the data index analyzing device in S203 performs training according to the normalized data index information and at least one preset training model to obtain an index clustering model, which specifically includes S203a-S203c, where:
s203a, constructing training parameter information based on the normalized data index information and at least one preset training model.
In the embodiment of the present invention, when the data index analyzing apparatus obtains the normalized data index information for training the index clustering model and determines at least one preset training model for training the index clustering model, first, training parameter information needs to be constructed based on the normalized data index information and the at least one preset training model.
It should be noted that the training parameter information refers to parameter information involved in the index clustering model training of the data index analysis device, for example, each piece of data in a sample is represented by an N-dimensional vector, and N represents the number of attributes of the sample; topological arrangement structure of neurons: a rectangular structure; the number of neurons: l, wherein L1 are present per row and L2 are present per column (L-L1L 2); maximum training times: n (n > -10000); initial learning rate value: a (a < 1); the weight initialization mode is as follows: the superposition of the center vectors of the input samples plus a random fraction (the center vector consists of the average over each dimension of the input samples); and so on.
And S203b, training the normalized data index information according to the training parameter information to obtain a training result.
In the embodiment of the present invention, after obtaining the training parameter information, the data index analysis device trains the normalized data index information according to the training parameter information to obtain the training result. Here, the training result is a model that has been trained but has not yet been determined whether the data index analysis can be accurately performed.
It should be noted that the specific training operation is different according to the training parameter information determined by different preset training models. For example, when the SOM is selected for training, one input sample is randomly selected for SOM training each time; the whole process is carried out in two stages, namely a self-organizing stage (sorting stage) of the first n1 times and a convergence stage of the last n2 times (n1< n2, and n1+ n2 ═ n); after n times of network training, the SOM orderly maps the mode characteristics of the index data input in the high-dimensional space to a two-dimensional output plane. Training results L neurons arranged according to a rectangular structure L1 × L2 finally form K valid cluster clusters.
And S203c, determining an index clustering model according to the evaluation information corresponding to the training result.
In the embodiment of the present invention, each time the training operation is completed, when the data index analyzing apparatus finally determines the index clustering model, the determination is performed according to the evaluation information corresponding to the training result.
Illustratively, when training with the SOM is selected, in order to evaluate the training effect of the SOM, the inter-class gap and the intra-class aggregations of the clusters are checked; and multiple times of training are carried out by modifying the network and the training parameters, and a better training result is selected to be determined as an index clustering model.
And S204, acquiring industry information.
It should be noted that the description of the implementation process of S204 is consistent with the description of the implementation process of S101 in the first embodiment, and details of this implementation are not repeated in this embodiment of the present invention.
S205, obtaining corresponding data index information according to the industry information and the index clustering model, and training the index clustering model according to historical data index information corresponding to the historical industry information.
It should be noted that the description of the implementation process of S205 is consistent with the description of the implementation process of S102 in the first embodiment, and details of this implementation are not repeated in this embodiment of the present invention.
And S206, presenting data index information.
It should be noted that the description of the implementation process of S206 is consistent with the description of the implementation process of S103 in the first embodiment, and details of this implementation are not repeated in this embodiment of the present invention.
And S207, when the latest historical industry information is detected, updating the index clustering model according to the data index information corresponding to the latest historical industry information.
It should be noted that the description of the implementation process of S207 is consistent with the description of the implementation process of S104 in the first embodiment, and details of this implementation are not repeated in this embodiment of the present invention.
Exemplarily, fig. 5 is a schematic diagram of an exemplary implementation of index clustering model training provided by an embodiment of the present invention, and as shown in fig. 5, the process of index clustering model training includes four parts, namely data extraction, data normalization, model training and model warehousing, and existing index data is automatically clustered by using SOM, so as to achieve a combing effect of a data index system.
When data is extracted: since the data index information corresponding to each industry is stored in the database, the historical data index information corresponding to the historical industry information is extracted from the data index information and is subjected to normalization processing.
When the data is normalized: firstly, classifying treatment is carried out according to industries, such as retail, internet, communication, bank and the like; secondly, carrying out hierarchical processing on the classified historical data index information to obtain hierarchical data index information; then, carrying out numerical processing on the hierarchical data index information, such as name attribute digitization and continuous numerical attribute discretization, so as to obtain data index numerical information; and finally, carrying out normalization processing on the data index numerical value information to obtain normalized data index information. In addition, the normalized data index information is input into the SOM, and meanwhile, the normalized data index information is stored in the database, so that the normalized data index information can be conveniently and directly read in next training.
During model training, here, the data index analysis device selects the SOM as a preset training model: firstly, determining training parameter information, namely initializing a network structure, a weight vector and the like; secondly, selecting a domain function and a learning function; thirdly, performing network training to obtain a training result; and finally, verifying the training result and generating an index clustering model. In the above process, the SOM finally reaches a stable state through processes of competition, cooperation, self-organization, convergence, and the like, that is, all historical data index information is automatically aggregated onto corresponding output neurons according to respective data characteristics to form a plurality of cluster clusters, and the difference between each cluster is large, and the aggregations in each cluster are high.
When the model is put into a warehouse, the generated index clustering model is stored into a model base, and the model base is a database or a NoSQL database.
The information processing device provided by the example automatically aggregates similar data indexes on the basis of abundant industrial cases and a large amount of historical input data index information by using the unsupervised clustering characteristic of the SOM, divides the data indexes in the system library into different corresponding areas, and provides a feasible data index design reference scheme for users, wherein each area has corresponding characteristics.
It can be understood that the data index analysis device can carry out normalization processing and warehousing on the data index information corresponding to the newly-entered historical industry information in real time or periodically, and update the index clustering model according to the data index information after the normalization processing; along with the increase of the amount of data index information corresponding to historical industry information, a data index library obtained by training is continuously enriched and improved, so that the generalization capability of the index clustering model is greatly improved.
EXAMPLE III
Based on the same inventive concept as the first embodiment and the second embodiment, the first embodiment of the present invention provides a data index analyzing apparatus 1, corresponding to a data index analyzing method, and fig. 6 is a schematic structural diagram of the first data index analyzing apparatus provided in the first embodiment of the present invention, as shown in fig. 6, the first data index analyzing apparatus 1 includes:
a first obtaining unit 10, configured to obtain industry information;
the second obtaining unit 11 is configured to obtain corresponding data index information according to the industry information and an index clustering model, where the index clustering model is obtained by training according to historical data index information corresponding to historical industry information;
and a presentation unit 12, configured to present the data index information.
Further, the data index analysis device further includes a training unit 13, where the training unit is configured to obtain historical data index information corresponding to the historical industry information; carrying out normalization processing on the historical data index information to obtain normalized data index information; and training according to the normalized data index information and at least one preset training model to obtain the index clustering model.
Further, the training unit 13 is specifically configured to perform hierarchical processing on the historical data index information to obtain hierarchical data index information; and performing data processing on the hierarchical data index information to obtain the normalized data index information.
Further, the training unit 13 is specifically configured to perform a numerical processing on the hierarchical data index information to obtain data index numerical information; and carrying out normalization processing on the data index numerical value information to obtain normalized data index information.
Further, the training unit 13 is specifically configured to construct training parameter information based on the normalized data index information and the at least one preset training model; training the normalized data index information according to the training parameter information to obtain a training result; and determining the index clustering model according to the evaluation information corresponding to the training result.
Further, the second obtaining unit 11 is specifically configured to, when the industry information is an industry identifier, obtain, according to the industry identifier, the corresponding data index information from the index clustering model by matching.
Further, the second obtaining unit 11 is specifically configured to, when the industry information is a data index, calculate similarity between the data index and each cluster in the index cluster model; and according to the similarity, using the cluster determined from the index cluster model as the data index information.
In practical applications, the first obtaining Unit 10, the second obtaining Unit 11, and the training Unit 13 may be implemented by a processor 14 located on the data index analyzing apparatus 1, specifically implemented by a CPU (Central Processing Unit), an MPU (micro processor Unit), a DSP (Digital Signal Processing), a Field Programmable Gate Array (FPGA), or the like; the presentation unit 12 described above may be implemented by a display 15 located on the data indicator analysis device 1.
An embodiment of the present invention further provides a data index analyzing apparatus 1, and as shown in fig. 7, the data index analyzing apparatus 1 includes: a processor 14, a display 15, a memory 16, and a communication bus 17, wherein the display 15, the memory 16 communicate with the processor 14 through the communication bus 17, and the memory 16 stores instructions executable by the processor 14, and when the instructions are executed, the data index analysis method according to embodiments one and two is performed by the processor 14.
An embodiment of the present invention provides a computer-readable storage medium, on which a program is stored, for use in a data index analysis apparatus 1, the program, when executed by a processor 14, implementing the data index analysis method according to the first and second embodiments.
It can be understood that, the data index information for data index analysis is obtained from the index clustering model through the industry information which needs to be analyzed and is provided by the user, and the index clustering model is a model which is trained by the data index analysis device according to the historical data index information corresponding to the historical industry information, so that a scheme for data index analysis by the data index analysis device according to the industry information and the historical data index information which are provided by the user is realized, and the function of the data index analysis device for data index analysis is increased; meanwhile, the index clustering model is trained based on historical data index information corresponding to historical industry information, so that an objective scheme for analyzing the data indexes is realized, and the accuracy of analyzing the data indexes is ensured.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (9)

1. A data index analysis method, characterized in that the method comprises:
acquiring historical data index information corresponding to historical industry information;
carrying out standardization processing on the historical data index information to obtain standardized data index information;
training according to the normalized data index information and at least one preset training model to obtain an index clustering model;
acquiring industry information;
acquiring corresponding data index information according to the industry information and an index clustering model, wherein the index clustering model is acquired by training according to historical data index information corresponding to historical industry information;
and presenting the data index information.
2. The method according to claim 1, wherein the normalizing the historical data index information to obtain normalized data index information includes:
carrying out hierarchical processing on the historical data index information to obtain hierarchical data index information;
and carrying out data processing on the hierarchical data index information to obtain the normalized data index information.
3. The method according to claim 2, wherein the performing data processing on the hierarchical data index information to obtain the normalized data index information comprises:
carrying out numerical processing on the hierarchical data index information to obtain data index numerical information;
and carrying out normalization processing on the data index numerical value information to obtain normalized data index information.
4. The method according to claim 1, wherein the training according to the normalized data index information and at least one preset training model to obtain the index clustering model comprises:
constructing training parameter information based on the normalized data index information and the at least one preset training model;
training the normalized data index information according to the training parameter information to obtain a training result;
and determining the index clustering model according to the evaluation information corresponding to the training result.
5. The method according to claim 1, wherein when the industry information is an industry identifier, the obtaining corresponding data index information according to the industry information and the index clustering model comprises:
and matching to obtain the corresponding data index information from the index clustering model according to the industry identification.
6. The method according to claim 1, wherein when the industry information is a data index, the obtaining corresponding data index information according to the industry information and an index clustering model comprises:
calculating the similarity between the data indexes and each cluster in the index cluster model;
and according to the similarity, using the cluster determined from the index cluster model as the data index information.
7. The method according to any one of claims 1 to 6, further comprising:
and when the latest historical industry information is detected, updating the index clustering model according to the data index information corresponding to the latest historical industry information.
8. A data index analyzing apparatus, characterized in that the apparatus comprises: a processor, a display, a memory, and a communication bus through which the display and the memory communicate with the processor, the memory storing instructions executable by the processor, the instructions when executed causing the processor to perform the method of any of claims 1 to 7.
9. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the method of any of claims 1 to 7.
CN201811348360.5A 2018-11-13 2018-11-13 Data index analysis method and device and storage medium Active CN109583712B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811348360.5A CN109583712B (en) 2018-11-13 2018-11-13 Data index analysis method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811348360.5A CN109583712B (en) 2018-11-13 2018-11-13 Data index analysis method and device and storage medium

Publications (2)

Publication Number Publication Date
CN109583712A CN109583712A (en) 2019-04-05
CN109583712B true CN109583712B (en) 2021-06-29

Family

ID=65922356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811348360.5A Active CN109583712B (en) 2018-11-13 2018-11-13 Data index analysis method and device and storage medium

Country Status (1)

Country Link
CN (1) CN109583712B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245688B (en) * 2019-05-21 2024-05-28 中国平安财产保险股份有限公司 Data processing method and related device
CN114611850A (en) * 2020-12-03 2022-06-10 中国移动通信集团广东有限公司 Service analysis method and device and electronic equipment
CN115794043B (en) * 2023-01-31 2023-06-09 帆软软件有限公司帆软南京分公司 System and method for calculating table data aggregation processing of BI tool

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104142662A (en) * 2013-05-09 2014-11-12 洛克威尔自动控制技术股份有限公司 Industrial data analytics in a cloud platform
CN107341608A (en) * 2017-07-04 2017-11-10 广西电网有限责任公司电力科学研究院 One kind production basic data index analysis method
CN107527070A (en) * 2017-08-25 2017-12-29 江苏赛睿信息科技股份有限公司 Recognition methods, storage medium and the server of dimension data and achievement data
CN107886238A (en) * 2017-11-09 2018-04-06 金航数码科技有限责任公司 A kind of business process management system and method based on mass data analysis
CN108108887A (en) * 2017-12-18 2018-06-01 广东广业开元科技有限公司 A kind of Internet of Things based on multidimensional data is traveled out the intelligent evaluation model of row index
CN108304549A (en) * 2018-02-01 2018-07-20 广东聚晨知识产权代理有限公司 A kind of big data Intelligent processing system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150302337A1 (en) * 2014-04-17 2015-10-22 International Business Machines Corporation Benchmarking accounts in application management service (ams)
CN104834984A (en) * 2015-02-11 2015-08-12 国家电网公司 Electric power transaction supervision risk early warning system based on unified and interconnected electric power market
CN105809289A (en) * 2016-03-11 2016-07-27 郑州师范学院 Electronic commerce industry prosperity extent index system and method based on big data
US11150878B2 (en) * 2017-01-31 2021-10-19 Raytheon Bbn Technologies Corp. Method and system for extracting concepts from research publications to identify necessary source code for implementation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104142662A (en) * 2013-05-09 2014-11-12 洛克威尔自动控制技术股份有限公司 Industrial data analytics in a cloud platform
CN107341608A (en) * 2017-07-04 2017-11-10 广西电网有限责任公司电力科学研究院 One kind production basic data index analysis method
CN107527070A (en) * 2017-08-25 2017-12-29 江苏赛睿信息科技股份有限公司 Recognition methods, storage medium and the server of dimension data and achievement data
CN107886238A (en) * 2017-11-09 2018-04-06 金航数码科技有限责任公司 A kind of business process management system and method based on mass data analysis
CN108108887A (en) * 2017-12-18 2018-06-01 广东广业开元科技有限公司 A kind of Internet of Things based on multidimensional data is traveled out the intelligent evaluation model of row index
CN108304549A (en) * 2018-02-01 2018-07-20 广东聚晨知识产权代理有限公司 A kind of big data Intelligent processing system

Also Published As

Publication number Publication date
CN109583712A (en) 2019-04-05

Similar Documents

Publication Publication Date Title
CN108363821A (en) A kind of information-pushing method, device, terminal device and storage medium
CN109583712B (en) Data index analysis method and device and storage medium
CN113761259A (en) Image processing method and device and computer equipment
Chang et al. A genetic clustering algorithm using a message-based similarity measure
CN112131261A (en) Community query method and device based on community network and computer equipment
CN112070126A (en) Internet of things data mining method
CN112069269A (en) Big data and multidimensional feature-based data tracing method and big data cloud server
Ozyirmidokuz et al. A data mining based approach to a firm's marketing channel
Praveen et al. A k-means clustering algorithm on numeric data
CN108573264B (en) Household industry potential customer identification method based on novel swarm clustering algorithm
CN115860835A (en) Advertisement recommendation method, device and equipment based on artificial intelligence and storage medium
CN112506930B (en) Data insight system based on machine learning technology
CN113704617A (en) Article recommendation method, system, electronic device and storage medium
CN108805199B (en) Entity business marketing method based on genetic algorithm
Wu et al. Explainable network pruning for model acceleration based on filter similarity and importance
Huang et al. Clustering analysis on e-commerce transaction based on k-means clustering
CN116757737B (en) Marketing method and device based on address information
CN113902533B (en) Application method suitable for finance and tax field index self-definition and automatic operation
Walia Recommendation system with Automated Web Usage data mining using K-Nearest Neighbor (KNN) classification.
Zhu et al. Research on GA-KNN Image Classification Algorithm
Hassan et al. Performance evolution for sentiment classification using machine learning algorithm
CN114281994B (en) Text clustering integration method and system based on three-layer weighting model
US20240111807A1 (en) Embedding and Analyzing Multivariate Information in Graph Structures
Yang et al. A precise and robust clustering approach using homophilic degrees of graph kernel
Yang Detection method of e-commerce cluster consumption behaviour based on data feature mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant