CN114756557A

CN114756557A - Data processing method of improved computer algorithm model

Info

Publication number: CN114756557A
Application number: CN202210671614.7A
Authority: CN
Inventors: 郑辉健
Original assignee: Guangzhou Chenan Network Technology Co ltd
Current assignee: Guangzhou Chenan Network Technology Co ltd
Priority date: 2022-06-15
Filing date: 2022-06-15
Publication date: 2022-07-15
Anticipated expiration: 2042-06-15
Also published as: CN114756557B

Abstract

The invention discloses a data processing method of an improved computer algorithm model, which relates to the technical field of data processing and solves the technical problem that the data processing efficiency in the prior art is low, and the adopted scheme is that the data information is received by an interface compatible with an M + Kmeans algorithm model and a particle swarm scheduling model, and the different types of data information are obtained from a virtual resource database, a heterogeneous database, a resource base or a metadata base; the virtual resource database, the heterogeneous database, the resource database or the metadatabase are provided with a data search engine; the data conversion can be carried out on the acquired data information, and the data information with different formats is subjected to standardization processing and further converted into a data form which can be received by a machine; the data processing method can classify and calculate the data information received by the machine, realize data classification and coding through an M + Kmeans algorithm model, and store the classified data information, thereby greatly improving the data processing capability.

Description

Data processing method of improved computer algorithm model

Technical Field

The invention relates to the technical field of data processing, in particular to a data processing method of an improved computer algorithm model.

Background

Data (Data) is a representation of facts, concepts or instructions that can be processed by either manual or automated means. Data becomes information after being interpreted and given a certain meaning. Data processing (data processing) is the collection, storage, retrieval, processing, transformation, and transmission of data. Data processing is a basic link of system engineering and automatic control, and the data processing runs through various fields of social production and social life. The development of data processing technology and the breadth and depth of its application have greatly influenced the progress of human society development. The basic purpose of data processing is to extract and derive valuable, meaningful data for certain people from large, possibly chaotic, unintelligible amounts of data.

In the conventional technology, the data information is collected and recorded by adopting a computer technology, and the data information is processed by a manual calculation method, so that the error rate is high, and errors of the data information are easily caused by manual calculation. In the conventional technology, data information processing is realized in a statistical mode, and the method has certain technical progress compared with a manual method, but has low calculation efficiency.

Disclosure of Invention

Aiming at the defects of the technology, the invention discloses a data processing method of an improved computer algorithm model, which can greatly improve the data information processing capability through the improved computer algorithm model so as to improve the automatic processing degree of data information.

In order to realize the technical effects, the invention adopts the following technical scheme:

a data processing method for an improved computer algorithm model, comprising:

the method comprises the steps that firstly, data information is collected, the data information is received through an M + Kmeans algorithm model and a particle swarm scheduling model interface, and different types of data information are obtained through a virtual resource database, a heterogeneous database, a resource base or a metadatabase; the virtual resource database, the heterogeneous database, the resource database or the metadatabase are provided with a data search engine;

step two, performing data conversion on the acquired data information, and converting the data information with different formats into a data form which can be received by a machine through standardized processing;

step three, carrying out classification calculation on data information received by the machine, realizing data classification and coding through an M + Kmeans algorithm model, and storing the classified data information;

Fourthly, information calculation and mining are carried out on the stored data information, and the mining and processing of the data information are realized through a particle swarm algorithm model so as to realize the analysis and calculation of the data information;

and step five, storing the calculated data information for analysis and application by a user.

As a further technical scheme of the invention, data information acquisition is realized through an acquisition module, the acquisition module comprises at least one high-speed serial transceiver, at least one LA logic module, a channel maintenance module, an RXUSER interface and a TXUSER interface, wherein the high-speed serial transceiver is connected with the LA logic module, the LA logic module is respectively connected with the channel maintenance module, the RXUSER interface and the TXUSER interface, the channel maintenance module is connected with external equipment through a control interface, the RXUSER interface is connected with the RXFIFO interface, and the TXUSER interface is connected with the TXFIFO interface, wherein the at least one high-speed serial transceiver and the at least one LA logic module receive data information in an optical module through 4 high-speed serial transceivers and encode, decode, calculate and process characters.

As a further technical scheme of the invention, the method for realizing data information classification by using the M + Kmeans algorithm comprises the following steps:

Step (31), starting, inputting data sets and parameters, and initializing a clustering center, wherein the information data sets are assumed to be marked as D, and the clustering center is assumed to be marked as D

；

Step (32), traversing the input data information, calculating the distance between the data information and the data center point, sorting according to the distance, and recording as

；

Step (33), determining K clustering centers, and selecting the distance between the clustering centers and the central point as

The data point B of (1) is taken as the central point of the next data information;

step (34), classifying the class with the closest distance, updating the clustering center, outputting the clustering result and selecting the distance between the clustering center and the central point as

Calculating the distance between the point C and the point B

If it is determined that

Then set point

Is the clustering center point;

step (35) of

A sub-iterative process of finding

A seed point as an initial

A vector of mass center

；

A step (36) of dividing the clusters into clusters

Is initialized to

Calculating the distance between the sample and each centroid vector, wherein

Preparing a converted data format for the initialized information, wherein

Representing the number of centroid vectors, the distance formula is expressed as:

（1）

in the formula (1), wherein

A sample of data representing the input is presented,

in (1)iIs shown asiA number of data nodes, each of which is a data node,

a vector representing each of the centroids is represented,

In

Denotes the first

A value of the centroid vector of

Marking as

Corresponding category

，

Indicating the degree of distance of different data information, updated at that time

，

Indicating updated data information for all

If all sample points recalculate the centroid

If the centroid is not changed, the step (33) is carried out to continue the classification, and the data information value is output when the calculation is not carried out any more.

As a further technical scheme of the invention, the method for realizing data information scheduling of different storage nodes by the particle swarm algorithm model comprises the following steps:

firstly, setting different data information parameters in a particle swarm optimization model, wherein the data information parameters comprise CPU utilization rate, memory utilization rate, data particle number, data type, data size and system time delay, converting node resources of data information into different particles in the particle swarm optimization model, and recording the data particles as different particles in the particle swarm optimization model

Wherein

Indicating that the data is scheduled for the first timeiCPU utilization in data particle number scheduling for each data node,

indicating the second time of data schedulingiNumber ofAccording to the CPU utilization rate of the node in the data particle number scheduling process,

indicating the third time of data schedulingiCPU utilization rate when the data particle number of each data node is scheduled; the node i data information scheduling index is

；

Indicating first time data is schedulediThe data for each data node indicates the node resource utilization,

indicating the second time of data schedulingiThe data for each data node indicates the node resource utilization,

indicating the third time of data schedulingiThe data index node resource utilization rate of each data node; then, calculating the data storage utilization rate of the current data server node, wherein the function expression is as follows:

when a data service is deployed on a node with a low node utilization, the node resource utilization in the cluster can be expressed as:

（2）

calculating the resource utilization rate of the data nodes by the formula (2),

the data service latency is expressed as:

（3）

in the formula (3), m represents the number of the data information of the calculation node,

the presentation data information is stored in a matrix in the server,

in (1)iA row vector representing the memory matrix is shown,

in (1)jA column vector representing the memory matrix is shown,

a time delay matrix representing the data information of the node,

in (1)

The center of mass of the data is represented,

in (1)

Representing node data information;

when the value is 0, the service node j and the service node i do not carry out data scheduling; obtaining data service time delay through a formula (2), deploying data information at a node with smaller time delay, and further improving service response time, wherein the time delay between the data information of the service node and a data source is represented as follows:

（4）

In formula (4), x represents the number of data sources, k represents the deployed node data information centroid,

a dependency matrix is represented with the data source,

representing the amount of data transmitted by the data source to the data node,

a network delay matrix representing a data source;

calculating the time delay between the data node and the data source through a formula (4), expressing the relation between the data service time delay and the data quantity required by the data source, updating the speed and position attribute information of a certain dimension of the particle in the scheduling process, and taking an updating function as:

（5）

in the formula (5), the first and second groups,

representing the inertia factors of the particles to perform a global search for an optimal solution and a local search for an optimal solution,

、

in order to learn the acceleration constant of the factor,

、

is a random number of the particle swarm algorithm,

indicating the velocity of the particles after the update,

in (1)

The identity of the position of the particles is represented,

indicating the updated position of the particle or particles,

the position of the particles at the last moment is shown,

、

representing a particle swarm extremum; updating of the particle swarm scheduling model is achieved through a formula (5), a fitness function is defined according to actual requirements, and then the optimal solution and the global optimal solution of each particle in the particle swarm algorithm are calculated.

As a further technical scheme of the invention, the data storage is an FPGA high-speed storage module.

The invention has the beneficial and positive effects that:

the invention is different from the conventional technology, and discloses a data processing method of an improved computer algorithm model, which receives different types of data information through an interface compatible with an M + Kmeans algorithm model and a particle swarm scheduling model, and acquires the different types of data information through a virtual resource database, a heterogeneous database, a resource database or a metadatabase; the virtual resource database, the heterogeneous database, the resource database or the metadatabase are provided with a data search engine; the data conversion can be carried out on the acquired data information, and the data information with different formats is subjected to standardization processing and further converted into a data form which can be received by a machine; the data information received by the machine can be classified and calculated, data classification and coding are realized through an M + Kmeans algorithm model, and the classified data information is stored; the stored data information can be subjected to information calculation and mining, and the mining and processing of the data information are realized through a particle swarm algorithm model so as to realize the analysis and calculation of the data information; the calculated data information can be stored for analysis and application by a user.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive labor, wherein:

FIG. 1 is a schematic flow diagram of the overall process of the present invention;

FIG. 2 is a schematic diagram of the collection module of the present invention;

FIG. 3 is a schematic view of the structure of an acquisition module according to the present invention;

FIG. 4 is a schematic flow chart of the M + Kmeans algorithm of the present invention;

FIG. 5 is a schematic flow chart of a particle swarm scheduling algorithm in the present invention;

FIG. 6 is a schematic diagram of a data storage module according to the present invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, and it should be understood that the embodiments described herein are merely for the purpose of illustrating and explaining the present invention and are not intended to limit the present invention.

As shown in fig. 1, a data processing method of an improved computer algorithm model includes:

acquiring data information, receiving different types of data information by the data information through an interface compatible with an M + Kmeans algorithm model and a particle swarm scheduling model, and acquiring different types of data information from a virtual resource database, a heterogeneous database, a resource database or a meta database; the virtual resource database, the heterogeneous database, the resource database or the metadatabase is provided with a data search engine;

step four, information calculation and mining are carried out on the stored data information, and mining and processing of the data information are achieved through a particle swarm algorithm model so as to achieve analysis and calculation of the data information;

In the first step, data information collection is achieved through a collection module, the collection module comprises at least one high-speed serial transceiver, at least one LA logic module, a channel maintenance module, an RXUSER interface and a TXUSER interface, the high-speed serial transceiver is connected with the LA logic module, the LA logic module is respectively connected with the channel maintenance module, the RXUSER interface and the TXUSER interface, the channel maintenance module is connected with external equipment through a control interface, the RXUSER interface is connected with the RXFIFO interface, the TXUSER interface is connected with the TXFIFO interface, and the at least one high-speed serial transceiver and the at least one LA logic module receive data information in the optical module through 4 high-speed serial transceivers and encode, decode, calculate and process characters.

As shown in fig. 2 and fig. 3, the data is integrated into a storage data block with a certain format at the acquisition module according to the storage parameters set by the system. And after the integration is finished, sending the data to a high-speed data processing module, and carrying out the next analysis according to the set data frame format and other storage parameters. The LA logic module drives the serial transceiver complexly to encode and decode characters. The channel maintenance module is responsible for binding and initializing LA channels, detecting logic errors in the LA channels, and the RX user module moves high-speed data to the data processing module. Communication interface configuration

And

the interrupt from the PL terminal to the PS terminal is configured as IRQ0, and the system clock of the PS is 33.33 MHz. Of a configuration module

Interfacing with AXIGP sub-modules

The interfaces are connected to complete the interaction of the module configuration parameters.

In step three, as shown in fig. 3, the method for implementing data information classification by using the M + Kmeans algorithm includes:

step (31), starting, inputting data set and parameter, and initializing a cluster center, assuming that the information data set is marked as D, the cluster center is marked as D

；

；

Calculating the distance between the point C and the point B

If it is determined that

Then set point

Is the clustering center point;

step (35) of

A sub-iterative process of finding

A seed point as an initial

A vector of mass center

；

A step (36) of dividing the clusters into clusters

Is initialized to

Calculating the distance between the sample and each centroid vector, wherein

Preparing a converted data format for the initialized information, wherein

（1）

in the formula (1), wherein

A sample of data representing the input is presented,

in (1)iIs shown asiA number of data nodes, each of which is a data node,

a vector representing each of the centroids is represented,

in (1)

Is shown as

Individual centroidal value of

Marking as

Corresponding category

，

，

Indicating updated data information for all

If all of the sample points recalculate the centroid

In a specific embodiment, the MapReduce parallelization of data information is realized on a Hadoop layer. The MapReduce parallelization process is characterized in that different types of data information are calculated by setting data information of different layers, wherein the distance from the residual data to a random central point is calculated by a first layer of Distancemappers, and the distance from the residual data to the random central point is calculated by a second layer of Maxmappers

Point of and all

Until each MAP finds all possible cluster centers. At the Reducer stage, merging all independent MAP clustering centers, finding the shortest path connected with all the center points, merging the center points on the shortest path edges, updating the shortest path edges to new center points, and performing an iterative process until the number of the center points is equal to that of the new center points

. Obtained through an iteration stage

And (5) initially clustering centers, and then calling a Kmeans algorithm process to process data until the algorithm converges. The improved M + Kmeans algorithm reduces the influence of the clustering center point on the data clustering result, and the clustering result is more reliable and has higher execution efficiency. In the specific embodiment, a Bayesian classification algorithm model can be added to realize classification of the data information.

In step four, as shown in fig. 5, the method for implementing data information scheduling of different storage nodes by using the particle swarm optimization model comprises:

firstly, setting different data information parameters in a particle swarm algorithm model, wherein the data information parameters comprise CPU utilization rate, memory utilization rate, data particle number,Converting node resources of data information into different particles in a particle swarm algorithm model according to data type, data size and system time delay, and recording the data particles as different particles

In which

Indicating first time data is schedulediThe CPU utilization rate in the data particle number scheduling of each data node,

indicating second time of data schedulingiThe CPU utilization rate in the data particle number scheduling of each data node,

indicating third time of data schedulingiCPU utilization rate when the data particle number of each data node is scheduled; the node i data information scheduling index is

；

（2）

calculating the resource utilization rate of the data nodes through a formula (2), wherein the data service delay of the system is related to factors such as data access frequency and data volume, the system delay is composed of communication delay between data services and data transmission delay of a data source, the delay between the data service and the data source is the product of the delay between the data source node and the service node and the data volume, and when the service node is deployed at a k node, the data service delay is expressed as follows:

（3）

in formula (3), m represents the number of the calculation node data information,

the presentation data information is stored in a matrix in the server,

iniA row vector representing the memory matrix is shown,

injA column vector representing the memory matrix is shown,

a time delay matrix representing the data information of the node,

in

The center of mass of the data is represented,

in (1)

Representing node data information;

（4）

a dependency matrix is represented with the data source,

a network delay matrix representing a data source;

（5）

in the formula (5), the first and second groups,

to representParticle inertia factors to perform global search for optimal solution and local search for optimal solution,

、

in order to learn the acceleration constant of the factor,

、

is a random number of the particle swarm algorithm,

indicating the velocity of the particles after the update,

in (1)

The identity of the position of the particles is represented,

indicating the updated position of the particle or particles,

the position of the particles at the last moment is shown,

、

Data service node in dynamic scheduling policy of data service The deployment scenario of points is treated as a particle, so the deployment scenario set

As a particle swarm, an optimal scheme is obtained through a particle optimization mode, and the resource utilization rate, the balance and the service delay are calculated as a fitness function.

And in the fifth step, the data is stored as an FPGA high-speed storage module.

In a specific embodiment, as shown in fig. 2, an SSD using a PCIE interface is used as a storage device, and a storage board card capable of supporting loading multiple SSDs simultaneously is added. The high-speed storage module uses an XCZU11EG chip as a core hardware part, supports Verilog HDL hardware description language, and has a structure of programmable logic PL + a processor system PS to realize high-speed storage logic. And the optical module in the storage module is used for receiving resource data of other equipment and exchanging information with the FPGA chip through the high-speed serial interface transceiver.

Although specific embodiments of the present invention have been described above, it will be understood by those skilled in the art that these specific embodiments are merely illustrative and that various omissions, substitutions and changes in the form of the detail of the methods and systems described above may be made by those skilled in the art without departing from the spirit and scope of the invention. For example, it is within the scope of the present invention to combine the steps of the above-described methods to perform substantially the same function in substantially the same way to achieve substantially the same result. Accordingly, the scope of the invention is to be limited only by the following claims.

Claims

1. A data processing method of an improved computer algorithm model is characterized by comprising the following steps: the method comprises the following steps:

acquiring data information, receiving different types of data information by the data information through an interface compatible with an M + Kmeans algorithm model and a particle swarm scheduling model, and acquiring different types of data information from a virtual resource database, a heterogeneous database, a resource database or a meta database; the virtual resource database, the heterogeneous database, the resource database or the metadatabase are provided with a data search engine;

step two, carrying out data conversion on the acquired data information, and carrying out standardized processing on the data information with different formats so as to convert the data information into a data form which can be received by a machine;

2. A method of data processing for an improved computer algorithm model according to claim 1, characterized by: the acquisition module comprises at least one high-speed serial transceiver, at least one LA logic module, a channel maintenance module, an RXUSER interface and a TXUSER interface, wherein the high-speed serial transceiver is connected with the LA logic module, the LA logic module is respectively connected with the channel maintenance module, the RXUSER interface and the TXUSER interface, the channel maintenance module is connected with external equipment through a control interface, the RXUSER interface is connected with the RXFIFO interface, and the TXUSER interface is connected with the TXFIFO interface, wherein the at least one high-speed serial transceiver and the at least one LA logic module receive data information in the optical module through 4 high-speed serial transceivers and encode, decode, calculate and process the characters.

3. A method of data processing for an improved computer algorithm model according to claim 1, characterized by: the method for realizing data information classification by using the M + Kmeans algorithm comprises the following steps:

；

；

Calculating the distance between the point C and the point B

If it is determined that

Then set point

Is a clustering central point;

step (35) of

A sub-iterative process of finding

A seed point as an initial

A vector of mass center

；

A step (36) of dividing the clusters into clusters

Is initialized to

Calculating the distance between the sample and each centroid vector, wherein

Preparing a converted data format for the initialized information, wherein

（1）

in the formula (1), wherein

A sample of data representing the input is presented,

in (1)iIs shown asiA number of data nodes, each of which is a data node,

a vector representing each of the centroids is represented,

in (1)

Is shown as

Individual centroidal value of

Marking as

Corresponding category

，

，

Indicating updated data information for all

If all of the sample points recalculate the centroid

If the centroid is not changed, the step (33) is carried out to continue classification, and the data information value is output when the calculation is not carried out any more.

4. The method of claim 1, wherein the method comprises: the method for realizing the data information scheduling of different storage nodes by the particle swarm algorithm model comprises the following steps:

In which

indicating second time of data schedulingiCPU utilization in data particle number scheduling for each data node,

；

Indicating that the data is scheduled for the first time iThe data for each data node indicates the node resource utilization,

indicating second time of data schedulingiThe data for each data node indicates the node resource utilization,

indicating third time of data schedulingiData of a personThe data of the node indicates the node resource utilization rate; then, calculating the data storage utilization rate of the current data server node, wherein the function expression is as follows:

（2）

calculating the resource utilization rate of the data nodes by the formula (2),

the data service latency is represented as:

（3）

the presentation data information is stored in a matrix in the server,

in (1)iA row vector representing the memory matrix is shown,

in (1)jA column vector representing the memory matrix is shown,

a time delay matrix representing the data information of the node,

in (1)

The center of mass of the data is represented,

in (1)

Representing node data information;

（4）

a dependency matrix is represented with the data source,

a network delay matrix representing a data source;

（5）

in the formula (5), the first and second groups of the chemical reaction materials are selected from the group consisting of,

、

in order to learn the acceleration constant of the factor,

、

is a random number of the particle swarm algorithm,

indicating the velocity of the particles after the update,

in (1)

The identity of the position of the particles is represented,

indicating the updated position of the particle or particles,

the position of the particles at the last moment is shown,

、

5. A method of data processing for an improved computer algorithm model according to claim 1, characterized by: and the data storage is an FPGA high-speed storage module.