CN117077067A

CN117077067A - Information system automatic deployment planning method based on intelligent matching

Info

Publication number: CN117077067A
Application number: CN202311346131.0A
Authority: CN
Inventors: 王建宏
Original assignee: Beijing Yakang Wanwei Information Technology Co ltd
Current assignee: Beijing Yakang Wanwei Information Technology Co ltd
Priority date: 2023-10-18
Filing date: 2023-10-18
Publication date: 2023-11-17
Anticipated expiration: 2043-10-18
Also published as: CN117077067B

Abstract

The invention relates to the technical field of data management, in particular to an information system automatic deployment planning method based on intelligent matching, which comprises the following steps: constructing a plurality of isolated trees of all high-dimensional data by utilizing an isolated forest anomaly detection algorithm; calculating the isolated characteristic of each high-dimensional data in each dimension according to the isolated tree; obtaining a dimension sequence of each high-dimensional data according to the isolated features; calculating the suitability degree of partial dimension in the dimension sequence of each high-dimensional data as the characteristic dimension of each high-dimensional data; obtaining all feature dimensions of each high-dimensional data according to the suitability degree; obtaining all concerned data in each dimension according to all characteristic dimensions of each high-dimensional data; storing all the concerned data in each dimension in each storage group to obtain a threshold value of each storage group; and matching the high-dimensional data to be searched according to the threshold value of each storage group. The invention accelerates the speed of accurately matching the multi-dimensional data to the optimal result and improves the matching efficiency.

Description

Information system automatic deployment planning method based on intelligent matching

Technical Field

The invention relates to the technical field of data management, in particular to an information system automatic deployment planning method based on intelligent matching.

Background

The information system based on intelligent matching utilizes an artificial intelligence technology, and provides more personalized, accurate and efficient information acquisition experience by understanding user requirements and intelligently matching the user requirements with information resources.

The requirement of matching optimal results in multidimensional data is often met when intelligent matching is carried out, each dimension is usually matched in each data in the prior art, and the problem of the method is that multiple times of matching are needed, and the matching efficiency is quite low.

Therefore, how to quickly and accurately match to the optimal result in the multidimensional data is a technical problem to be solved.

Disclosure of Invention

In order to solve the problems, the invention provides an intelligent matching-based automatic information system deployment planning method, which comprises the following steps:

acquiring all high-dimensional data required to be stored in an information system based on intelligent matching;

constructing a plurality of isolated trees of all high-dimensional data by utilizing an isolated forest anomaly detection algorithm; calculating the isolated characteristic of each high-dimensional data in each dimension according to the isolated tree; obtaining a dimension sequence of each high-dimensional data according to the isolated features; calculating the suitability degree of partial dimension in the dimension sequence of each high-dimensional data as the characteristic dimension of each high-dimensional data;

obtaining all feature dimensions of each high-dimensional data according to the suitability degree; obtaining all concerned data in each dimension according to all characteristic dimensions of each high-dimensional data; storing all the concerned data in each dimension in each storage group to obtain a threshold value of each storage group;

and matching the high-dimensional data to be searched according to the threshold value of each storage group.

Further, the calculating the isolated feature of each high-dimensional data in each dimension comprises the following specific steps:

according to the strong feature isolation tree and the weak feature isolation tree of each high-dimensional data in each dimension, calculating the isolated feature of each high-dimensional data in each dimension, wherein a specific calculation formula is as follows:

in the method, in the process of the invention,representing isolated features of the ith high dimensional data in the jth dimension,/for example>Representing the number of strong feature orphan trees of the ith data in the jth dimension, N representing the total number of all orphan trees, then +.>For the number of weak feature isolation trees of the ith data in the jth dimension, +.>Representing the number of high-dimensional data points of the ith data in the child node of the jth dimension in the weak feature isolation tree of the ith data in the jth dimension, and M represents the total number of all the high-dimensional data.

Further, the method for acquiring the strong feature isolated tree and the weak feature isolated tree of each high-dimensional data in each dimension is specifically as follows:

in any one of the isolated trees, if the ith high-dimensional data is isolated data under the child node of the jth dimension, taking the isolated tree as a strong characteristic isolated tree of the ith high-dimensional data under the jth dimension; otherwise, the tree is used as a weak feature isolation tree of the ith high-dimensional data in the jth dimension.

Further, the step of obtaining the dimension sequence of each high-dimensional data according to the isolated features comprises the following specific steps:

and for any one high-dimensional data, arranging all dimensions according to the sequence from large to small of the isolated features of the high-dimensional data in each dimension to obtain a dimension sequence of the high-dimensional data.

Further, the calculating the suitability degree of the partial dimension in the dimension sequence of each high-dimensional data as the characteristic dimension of each high-dimensional data includes the following specific steps:

in the method, in the process of the invention,representing the appropriateness of the first r dimensions in the sequence of dimensions of the ith high-dimensional data as characteristic dimensions of the ith high-dimensional data,/for>Isolated indicators respectively representing the 1 st dimension and the r th dimension of the ith high-dimensional data in the dimension sequence of the ith high-dimensional data, T represents the total number of all dimensions,/>An exponential function based on a natural constant is represented.

Further, the obtaining all feature dimensions of each high-dimensional data comprises the following specific steps:

for the ith high-dimensional data, calculating the former r in the dimension sequence of the ith high-dimensional data) And taking the dimension as the suitability degree of the characteristic dimension of each high-dimensional data, taking r corresponding to the maximum value as the number of the characteristic dimension of the ith high-dimensional data, taking the first r dimensions in the dimension sequence of the ith high-dimensional data as the characteristic dimension of the ith high-dimensional data, and acquiring all the characteristic dimensions of each high-dimensional data.

Further, the obtaining all the data of interest in each dimension includes the following specific steps:

and if the jth dimension is the characteristic dimension of the ith high-dimensional data, taking the ith high-dimensional data as the concerned data in the jth dimension, and acquiring all concerned data in each dimension.

Further, the storing all the data of interest in each dimension in each storage group to obtain the threshold value of each storage group includes the following specific steps:

all the data of interest in each dimension are stored together as one storage group, and the range consisting of the minimum value and the maximum value of the data value of all the data of interest in each dimension is used as the threshold value of each storage group.

Further, the matching of the high-dimensional data to be searched according to the threshold value of each storage group comprises the following specific steps:

matching the high-dimensional data to be searched with each high-dimensional data in the preferential matching group of the high-dimensional data to be searched, and if the matched high-dimensional data does not exist, sequentially matching the high-dimensional data to be searched with each high-dimensional data in each group to be searched of the high-dimensional data to be searched until the high-dimensional data matched with the high-dimensional data to be searched is obtained.

Further, the method for acquiring the priority matching group of the high-dimensional data to be searched and the packet to be searched specifically includes the following steps:

obtaining high-dimensional data to be searched, judging whether the value of the high-dimensional data to be searched in each dimension is in the threshold value of the storage grouping corresponding to each dimension, and if the value of the high-dimensional data to be searched in each dimension is in the threshold value of the storage grouping corresponding to the dimension, taking the storage grouping as the group to be searched of the high-dimensional data to be searched, obtaining the intersection of all groups to be searched of the high-dimensional data to be searched, and taking the intersection as the priority matching group of the high-dimensional data to be searched.

The technical scheme of the invention has the beneficial effects that: according to the method, the isolated characteristics of each high-dimensional data in each dimension are calculated according to the isolated tree, all characteristic dimensions of each high-dimensional data are obtained according to the suitability degree of a part of dimensions in a dimension sequence of the high-dimensional data as characteristic dimensions of each high-dimensional data, all concerned data in each dimension are obtained according to all characteristic dimensions of each high-dimensional data, all concerned data in each dimension are stored in each storage group, the threshold value of each storage group is obtained, when the high-dimensional data to be searched are matched, the high-dimensional data are matched with each high-dimensional data in a preferential matching group of the high-dimensional data to be searched, the speed of accurately matching the high-dimensional data to an optimal result is increased, and the matching efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for automatically deploying and planning an information system based on intelligent matching.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of specific implementation, structure, characteristics and effects thereof of an intelligent matching-based information system automatic deployment planning method according to the invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of the intelligent matching-based information system automatic deployment planning method.

Referring to fig. 1, a flowchart of a method for automatically deploying a data transmission module of a planning method for an information system based on intelligent matching according to an embodiment of the present invention is shown, where the method includes:

s001, obtain all high-dimensional data that needs to be stored in the intelligent matching-based information system.

In particular, all high-dimensional data that needs to be stored in an intelligent matching based information system is acquired, wherein the high-dimensional data includes several dimensions.

S002, constructing a plurality of isolated trees of all high-dimensional data by utilizing an isolated forest anomaly detection algorithm; calculating the isolated characteristic of each high-dimensional data in each dimension according to the isolated tree; obtaining a dimension sequence of each high-dimensional data according to the isolated features; calculating the suitability degree of partial dimension in the dimension sequence of each high-dimensional data as the characteristic dimension of each high-dimensional data; obtaining all feature dimensions of each high-dimensional data according to the suitability degree; obtaining all concerned data in each dimension according to all characteristic dimensions of each high-dimensional data; all data of interest in each dimension is stored in each storage group, and the threshold value of each storage group is obtained.

It should be noted that, in order to quickly and accurately match to the optimal result in the multidimensional data, considering that each high-dimensional data has its dimension different from other high-dimensional data, it is necessary to store the high-dimensional data in groups according to different dimensions, so that all the high-dimensional data stored in each group is distributed in a certain range on the dimension corresponding to each group, and in this embodiment, it is considered that each high-dimensional data is stored below the dimension in which it is located, and when matching is performed, matching is performed in the storage group below the dimension corresponding to the high-dimensional data to be searched.

1. And constructing a plurality of isolated trees of all high-dimensional data by using an isolated forest anomaly detection algorithm.

Specifically, constructing a plurality of isolated trees of all high-dimensional data by using an isolated forest anomaly detection algorithm, dividing all the high-dimensional data into two sub-nodes by each isolated tree through segmentation values of different dimensions in sequence until the depth of the isolated tree reaches the maximum depth or the number of samples in poles is smaller than or equal to the minimum number of split samples, repeating the steps until a specified number of isolated trees are constructed, wherein the maximum depth, the minimum number of split samples and the specified number are parameters in the isolated forest anomaly detection algorithm, and setting the parameters in the isolated forest anomaly detection algorithm as default values.

It should be noted that, the isolated forest anomaly detection algorithm is an unsupervised learning algorithm based on anomaly detection, and identifies outliers in the data set by constructing outliers in random binary search tree isolated data, and the isolated forest anomaly detection algorithm is in the prior art and will not be described here again.

2. And calculating the isolated characteristic of each high-dimensional data in each dimension according to the isolated tree.

It should be noted that, the isolated forest anomaly detection algorithm sequentially divides all the high-dimensional data by the division values of different dimensions, so as to construct an isolated tree, and similar high-dimensional data are divided together in each dimension, so that the constructed isolated tree can represent the specificity of each high-dimensional data in each dimension, and if the less the high-dimensional data are divided together in a certain dimension, the more specific the high-dimensional data are in the dimension, and correspondingly, the more the high-dimensional data can be represented in the dimension.

Specifically, in any one of the isolated trees, if the ith high-dimensional data is isolated data under the child node of the jth dimension, the isolated tree is used as a strong characteristic isolated tree of the ith high-dimensional data under the jth dimension; otherwise, the tree is used as a weak feature isolation tree of the ith high-dimensional data in the jth dimension.

Further, according to the strong feature isolation tree and the weak feature isolation tree of each high-dimensional data in each dimension, calculating the isolated feature of each high-dimensional data in each dimension, wherein the specific calculation formula is as follows:

It should be noted that, if the ith high-dimensional data is isolated data under the child node of the jth dimension, the ith high-dimensional data is far away from other high-dimensional data in the jth dimension, so that the more such isolated trees are, the jth dimension can represent the ith high-dimensional data, and the larger the isolated feature of the ith high-dimensional data in the jth dimension is; if the ith high-dimensional data is not isolated data under the child node of the jth dimension, at this time, the smaller the number of the high-dimensional data points of the ith data in the child node of the jth dimension is, the more the ith high-dimensional data is far away from other high-dimensional data in the jth dimension, the jth dimension can represent the ith high-dimensional data, and the greater the isolated feature of the ith high-dimensional data in the jth dimension is.

3. Obtaining a dimension sequence of each high-dimensional data according to the isolated features; calculating the suitability degree of partial dimension in the dimension sequence of each high-dimensional data as the characteristic dimension of each high-dimensional data; all feature dimensions of each high-dimensional data are obtained according to the fitness level.

It should be noted that, considering that each high-dimensional data may be specific in multiple dimensions, it is necessary to obtain the feature dimension of each high-dimensional data according to the distribution of isolated features of each high-dimensional data in all dimensions.

Specifically, for any one high-dimensional data, all dimensions are arranged according to the order of the isolated features of the high-dimensional data in each dimension from large to small, and a dimension sequence of the high-dimensional data is obtained.

Further, the first r dimensions in the dimension sequence of each high-dimensional data are calculated to be suitable degrees of the characteristic dimension of each high-dimensional data, and a specific calculation formula is as follows:

It should be noted that the number of the substrates,the smaller the isolated index distribution representing the first r dimensions in the dimension sequence of the ith high-dimensional data, the more consistent the first r dimensions can all represent the characteristics of the ith high-dimensional data, the more suitable the first r dimensions are as the characteristic dimensions of the ith high-dimensional data, and the greater the suitability degree of the first r dimensions in the dimension sequence of the ith high-dimensional data as the characteristic dimensions of the ith high-dimensional data is>The larger the dimension sequence of the ith high-dimensional data, the greater the suitability of the first r dimensions as feature dimensions of the ith high-dimensional data.

Further, for the ith high-dimensional data, calculating the former r in the dimension sequence of the ith high-dimensional data) The dimension is taken as the suitability degree of the characteristic dimension of each high-dimensional data, r corresponding to the maximum value is taken as the number of characteristic dimensions of the ith high-dimensional data, and the first r dimensions in the dimension sequence of the ith high-dimensional data are taken as the number of characteristic dimensions of the ith high-dimensional dataAnd acquiring all feature dimensions of each piece of high-dimensional data.

4. And according to all feature dimensions of each high-dimensional data, obtaining all concerned data in each dimension, storing all concerned data in each dimension in each storage group, and obtaining a threshold value of each storage group.

Specifically, if the jth dimension is a feature dimension of the ith high-dimensional data, taking the ith high-dimensional data as the data of interest in the jth dimension, and acquiring all the data of interest in each dimension.

Further, all the data of interest in each dimension are stored together as one storage group, and a range of the minimum value and the maximum value of the data value of all the data of interest in each dimension is used as a threshold value of each storage group.

S003, matching the high-dimensional data to be searched according to the threshold value of each storage group.

Specifically, for the high-dimensional data to be searched, judging whether the value of the high-dimensional data to be searched in each dimension is in the threshold value of the storage group corresponding to each dimension, if the value of the high-dimensional data to be searched in each dimension is in the threshold value of the storage group corresponding to the dimension, taking the storage group as the group to be searched of the high-dimensional data to be searched, and obtaining the intersection of all groups to be searched of the high-dimensional data to be searched as the priority matching group of the high-dimensional data to be searched.

Further, matching the high-dimensional data to be searched with each high-dimensional data in the priority matching group of the high-dimensional data to be searched, and if the matched high-dimensional data does not exist, sequentially matching the high-dimensional data to be searched with each high-dimensional data in each group to be searched of the high-dimensional data to be searched until the high-dimensional data matched with the high-dimensional data to be searched is obtained.

It should be noted that, if the values of the high-dimensional data to be searched in the plurality of dimensions are within the threshold value of the storage group in the dimension, the high-dimensional data to be searched in the plurality of dimensions has specificity, and the high-dimensional data in the intersection of the storage groups corresponding to the dimensions has specificity in the plurality of dimensions, the probability that the high-dimensional data to be searched can be matched with the high-dimensional data in the intersection of the storage groups corresponding to the dimensions is larger, so that the high-dimensional data to be searched is matched with each high-dimensional data in the priority matching group of the high-dimensional data to be searched first, the speed of accurately matching the high-dimensional data to the optimal result can be increased, and the matching efficiency is improved.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, but any modifications, equivalent substitutions, improvements, etc. within the principles of the present invention should be included in the scope of the present invention.

Claims

1. An intelligent matching-based automatic information system deployment planning method is characterized by comprising the following steps:

2. The method for automatically deploying and planning an information system based on intelligent matching according to claim 1, wherein the calculating the isolated feature of each high-dimensional data in each dimension comprises the following specific steps:

3. The method for automatically deploying and planning an information system based on intelligent matching according to claim 2, wherein the method for acquiring the strong feature isolated tree and the weak feature isolated tree of each high-dimensional data in each dimension is specifically as follows:

4. The method for automatically deploying and planning an information system based on intelligent matching according to claim 1, wherein the step of obtaining the dimension sequence of each high-dimensional data according to the isolated features comprises the following specific steps:

5. The method for automatically deploying and planning an information system based on intelligent matching according to claim 1, wherein the calculating the suitability of a part of dimensions in the sequence of dimensions of each high-dimensional data as the characteristic dimensions of each high-dimensional data comprises the following specific steps:

6. The method for automatically deploying and planning an information system based on intelligent matching according to claim 1, wherein the steps of obtaining all feature dimensions of each high-dimensional data comprise the following specific steps:

7. The method for automatically deploying and planning an information system based on intelligent matching according to claim 1, wherein the steps of obtaining all the data of interest in each dimension comprise the following steps:

8. The method for automatically deploying and planning an information system based on intelligent matching according to claim 1, wherein the steps of storing all the data of interest in each dimension in each storage group to obtain the threshold value of each storage group comprise the following specific steps:

9. The method for automatically deploying and planning an information system based on intelligent matching according to claim 1, wherein the matching of the high-dimensional data to be searched according to the threshold value of each storage group comprises the following specific steps:

10. The method for automatically deploying and planning an information system based on intelligent matching according to claim 9, wherein the method for acquiring the priority matching group and the group of the high-dimensional data to be searched specifically comprises the following steps: