CN114756557A - Data processing method of improved computer algorithm model - Google Patents

Data processing method of improved computer algorithm model Download PDF

Info

Publication number
CN114756557A
CN114756557A CN202210671614.7A CN202210671614A CN114756557A CN 114756557 A CN114756557 A CN 114756557A CN 202210671614 A CN202210671614 A CN 202210671614A CN 114756557 A CN114756557 A CN 114756557A
Authority
CN
China
Prior art keywords
data
data information
node
information
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210671614.7A
Other languages
Chinese (zh)
Other versions
CN114756557B (en
Inventor
郑辉健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Chenan Network Technology Co ltd
Original Assignee
Guangzhou Chenan Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Chenan Network Technology Co ltd filed Critical Guangzhou Chenan Network Technology Co ltd
Priority to CN202210671614.7A priority Critical patent/CN114756557B/en
Publication of CN114756557A publication Critical patent/CN114756557A/en
Application granted granted Critical
Publication of CN114756557B publication Critical patent/CN114756557B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2291User-Defined Types; Storage management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Abstract

The invention discloses a data processing method of an improved computer algorithm model, which relates to the technical field of data processing and solves the technical problem that the data processing efficiency in the prior art is low, and the adopted scheme is that the data information is received by an interface compatible with an M + Kmeans algorithm model and a particle swarm scheduling model, and the different types of data information are obtained from a virtual resource database, a heterogeneous database, a resource base or a metadata base; the virtual resource database, the heterogeneous database, the resource database or the metadatabase are provided with a data search engine; the data conversion can be carried out on the acquired data information, and the data information with different formats is subjected to standardization processing and further converted into a data form which can be received by a machine; the data processing method can classify and calculate the data information received by the machine, realize data classification and coding through an M + Kmeans algorithm model, and store the classified data information, thereby greatly improving the data processing capability.

Description

Data processing method of improved computer algorithm model
Technical Field
The invention relates to the technical field of data processing, in particular to a data processing method of an improved computer algorithm model.
Background
Data (Data) is a representation of facts, concepts or instructions that can be processed by either manual or automated means. Data becomes information after being interpreted and given a certain meaning. Data processing (data processing) is the collection, storage, retrieval, processing, transformation, and transmission of data. Data processing is a basic link of system engineering and automatic control, and the data processing runs through various fields of social production and social life. The development of data processing technology and the breadth and depth of its application have greatly influenced the progress of human society development. The basic purpose of data processing is to extract and derive valuable, meaningful data for certain people from large, possibly chaotic, unintelligible amounts of data.
In the conventional technology, the data information is collected and recorded by adopting a computer technology, and the data information is processed by a manual calculation method, so that the error rate is high, and errors of the data information are easily caused by manual calculation. In the conventional technology, data information processing is realized in a statistical mode, and the method has certain technical progress compared with a manual method, but has low calculation efficiency.
Disclosure of Invention
Aiming at the defects of the technology, the invention discloses a data processing method of an improved computer algorithm model, which can greatly improve the data information processing capability through the improved computer algorithm model so as to improve the automatic processing degree of data information.
In order to realize the technical effects, the invention adopts the following technical scheme:
a data processing method for an improved computer algorithm model, comprising:
the method comprises the steps that firstly, data information is collected, the data information is received through an M + Kmeans algorithm model and a particle swarm scheduling model interface, and different types of data information are obtained through a virtual resource database, a heterogeneous database, a resource base or a metadatabase; the virtual resource database, the heterogeneous database, the resource database or the metadatabase are provided with a data search engine;
step two, performing data conversion on the acquired data information, and converting the data information with different formats into a data form which can be received by a machine through standardized processing;
step three, carrying out classification calculation on data information received by the machine, realizing data classification and coding through an M + Kmeans algorithm model, and storing the classified data information;
Fourthly, information calculation and mining are carried out on the stored data information, and the mining and processing of the data information are realized through a particle swarm algorithm model so as to realize the analysis and calculation of the data information;
and step five, storing the calculated data information for analysis and application by a user.
As a further technical scheme of the invention, data information acquisition is realized through an acquisition module, the acquisition module comprises at least one high-speed serial transceiver, at least one LA logic module, a channel maintenance module, an RXUSER interface and a TXUSER interface, wherein the high-speed serial transceiver is connected with the LA logic module, the LA logic module is respectively connected with the channel maintenance module, the RXUSER interface and the TXUSER interface, the channel maintenance module is connected with external equipment through a control interface, the RXUSER interface is connected with the RXFIFO interface, and the TXUSER interface is connected with the TXFIFO interface, wherein the at least one high-speed serial transceiver and the at least one LA logic module receive data information in an optical module through 4 high-speed serial transceivers and encode, decode, calculate and process characters.
As a further technical scheme of the invention, the method for realizing data information classification by using the M + Kmeans algorithm comprises the following steps:
Step (31), starting, inputting data sets and parameters, and initializing a clustering center, wherein the information data sets are assumed to be marked as D, and the clustering center is assumed to be marked as D
Figure 434225DEST_PATH_IMAGE001
Step (32), traversing the input data information, calculating the distance between the data information and the data center point, sorting according to the distance, and recording as
Figure 207009DEST_PATH_IMAGE002
Step (33), determining K clustering centers, and selecting the distance between the clustering centers and the central point as
Figure 295182DEST_PATH_IMAGE003
The data point B of (1) is taken as the central point of the next data information;
step (34), classifying the class with the closest distance, updating the clustering center, outputting the clustering result and selecting the distance between the clustering center and the central point as
Figure 204232DEST_PATH_IMAGE004
Calculating the distance between the point C and the point B
Figure 601716DEST_PATH_IMAGE005
If it is determined that
Figure 596216DEST_PATH_IMAGE006
Then set point
Figure 989545DEST_PATH_IMAGE007
Is the clustering center point;
step (35) of
Figure 18681DEST_PATH_IMAGE008
A sub-iterative process of finding
Figure 321486DEST_PATH_IMAGE009
A seed point as an initial
Figure 819595DEST_PATH_IMAGE010
A vector of mass center
Figure 498838DEST_PATH_IMAGE011
A step (36) of dividing the clusters into clusters
Figure 162906DEST_PATH_IMAGE012
Is initialized to
Figure 371034DEST_PATH_IMAGE013
Calculating the distance between the sample and each centroid vector, wherein
Figure 605706DEST_PATH_IMAGE014
Preparing a converted data format for the initialized information, wherein
Figure 839372DEST_PATH_IMAGE010
Representing the number of centroid vectors, the distance formula is expressed as:
Figure 577521DEST_PATH_IMAGE015
(1)
in the formula (1), wherein
Figure 487708DEST_PATH_IMAGE016
A sample of data representing the input is presented,
Figure 944098DEST_PATH_IMAGE017
in (1)iIs shown asiA number of data nodes, each of which is a data node,
Figure 211481DEST_PATH_IMAGE018
a vector representing each of the centroids is represented,
Figure 69716DEST_PATH_IMAGE019
In
Figure 885225DEST_PATH_IMAGE020
Denotes the first
Figure 579643DEST_PATH_IMAGE020
A value of the centroid vector of
Figure 404379DEST_PATH_IMAGE021
Marking as
Figure 117120DEST_PATH_IMAGE022
Corresponding category
Figure 352798DEST_PATH_IMAGE023
Figure 49359DEST_PATH_IMAGE023
Indicating the degree of distance of different data information, updated at that time
Figure 412207DEST_PATH_IMAGE024
Figure 730187DEST_PATH_IMAGE025
Indicating updated data information for all
Figure 887499DEST_PATH_IMAGE026
If all sample points recalculate the centroid
Figure 805776DEST_PATH_IMAGE027
If the centroid is not changed, the step (33) is carried out to continue the classification, and the data information value is output when the calculation is not carried out any more.
As a further technical scheme of the invention, the method for realizing data information scheduling of different storage nodes by the particle swarm algorithm model comprises the following steps:
firstly, setting different data information parameters in a particle swarm optimization model, wherein the data information parameters comprise CPU utilization rate, memory utilization rate, data particle number, data type, data size and system time delay, converting node resources of data information into different particles in the particle swarm optimization model, and recording the data particles as different particles in the particle swarm optimization model
Figure 706736DEST_PATH_IMAGE028
Wherein
Figure 911846DEST_PATH_IMAGE029
Indicating that the data is scheduled for the first timeiCPU utilization in data particle number scheduling for each data node,
Figure 240059DEST_PATH_IMAGE030
indicating the second time of data schedulingiNumber ofAccording to the CPU utilization rate of the node in the data particle number scheduling process,
Figure 645633DEST_PATH_IMAGE031
indicating the third time of data schedulingiCPU utilization rate when the data particle number of each data node is scheduled; the node i data information scheduling index is
Figure 835437DEST_PATH_IMAGE032
Figure 642856DEST_PATH_IMAGE033
Indicating first time data is schedulediThe data for each data node indicates the node resource utilization,
Figure 141970DEST_PATH_IMAGE034
indicating the second time of data schedulingiThe data for each data node indicates the node resource utilization,
Figure 284108DEST_PATH_IMAGE035
indicating the third time of data schedulingiThe data index node resource utilization rate of each data node; then, calculating the data storage utilization rate of the current data server node, wherein the function expression is as follows:
when a data service is deployed on a node with a low node utilization, the node resource utilization in the cluster can be expressed as:
Figure 526870DEST_PATH_IMAGE036
(2)
calculating the resource utilization rate of the data nodes by the formula (2),
the data service latency is expressed as:
Figure 454375DEST_PATH_IMAGE037
(3)
in the formula (3), m represents the number of the data information of the calculation node,
Figure 140702DEST_PATH_IMAGE038
the presentation data information is stored in a matrix in the server,
Figure 255289DEST_PATH_IMAGE039
in (1)iA row vector representing the memory matrix is shown,
Figure 301742DEST_PATH_IMAGE040
in (1)jA column vector representing the memory matrix is shown,
Figure 330091DEST_PATH_IMAGE041
a time delay matrix representing the data information of the node,
Figure 171008DEST_PATH_IMAGE042
in (1)
Figure 772891DEST_PATH_IMAGE043
The center of mass of the data is represented,
Figure 108188DEST_PATH_IMAGE044
in (1)
Figure 479127DEST_PATH_IMAGE045
Representing node data information;
Figure 756524DEST_PATH_IMAGE046
when the value is 0, the service node j and the service node i do not carry out data scheduling; obtaining data service time delay through a formula (2), deploying data information at a node with smaller time delay, and further improving service response time, wherein the time delay between the data information of the service node and a data source is represented as follows:
Figure 360550DEST_PATH_IMAGE047
(4)
In formula (4), x represents the number of data sources, k represents the deployed node data information centroid,
Figure 483227DEST_PATH_IMAGE048
a dependency matrix is represented with the data source,
Figure 239830DEST_PATH_IMAGE049
representing the amount of data transmitted by the data source to the data node,
Figure 422550DEST_PATH_IMAGE050
a network delay matrix representing a data source;
calculating the time delay between the data node and the data source through a formula (4), expressing the relation between the data service time delay and the data quantity required by the data source, updating the speed and position attribute information of a certain dimension of the particle in the scheduling process, and taking an updating function as:
Figure 749757DEST_PATH_IMAGE051
(5)
in the formula (5), the first and second groups,
Figure 676125DEST_PATH_IMAGE052
representing the inertia factors of the particles to perform a global search for an optimal solution and a local search for an optimal solution,
Figure 287235DEST_PATH_IMAGE053
Figure 893053DEST_PATH_IMAGE054
in order to learn the acceleration constant of the factor,
Figure 691245DEST_PATH_IMAGE055
Figure 686882DEST_PATH_IMAGE056
is a random number of the particle swarm algorithm,
Figure 637652DEST_PATH_IMAGE057
indicating the velocity of the particles after the update,
Figure 427754DEST_PATH_IMAGE058
in (1)
Figure 978821DEST_PATH_IMAGE059
The identity of the position of the particles is represented,
Figure 761838DEST_PATH_IMAGE060
indicating the updated position of the particle or particles,
Figure 81961DEST_PATH_IMAGE061
the position of the particles at the last moment is shown,
Figure 777384DEST_PATH_IMAGE062
Figure 566480DEST_PATH_IMAGE063
representing a particle swarm extremum; updating of the particle swarm scheduling model is achieved through a formula (5), a fitness function is defined according to actual requirements, and then the optimal solution and the global optimal solution of each particle in the particle swarm algorithm are calculated.
As a further technical scheme of the invention, the data storage is an FPGA high-speed storage module.
The invention has the beneficial and positive effects that:
the invention is different from the conventional technology, and discloses a data processing method of an improved computer algorithm model, which receives different types of data information through an interface compatible with an M + Kmeans algorithm model and a particle swarm scheduling model, and acquires the different types of data information through a virtual resource database, a heterogeneous database, a resource database or a metadatabase; the virtual resource database, the heterogeneous database, the resource database or the metadatabase are provided with a data search engine; the data conversion can be carried out on the acquired data information, and the data information with different formats is subjected to standardization processing and further converted into a data form which can be received by a machine; the data information received by the machine can be classified and calculated, data classification and coding are realized through an M + Kmeans algorithm model, and the classified data information is stored; the stored data information can be subjected to information calculation and mining, and the mining and processing of the data information are realized through a particle swarm algorithm model so as to realize the analysis and calculation of the data information; the calculated data information can be stored for analysis and application by a user.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive labor, wherein:
FIG. 1 is a schematic flow diagram of the overall process of the present invention;
FIG. 2 is a schematic diagram of the collection module of the present invention;
FIG. 3 is a schematic view of the structure of an acquisition module according to the present invention;
FIG. 4 is a schematic flow chart of the M + Kmeans algorithm of the present invention;
FIG. 5 is a schematic flow chart of a particle swarm scheduling algorithm in the present invention;
FIG. 6 is a schematic diagram of a data storage module according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, and it should be understood that the embodiments described herein are merely for the purpose of illustrating and explaining the present invention and are not intended to limit the present invention.
As shown in fig. 1, a data processing method of an improved computer algorithm model includes:
acquiring data information, receiving different types of data information by the data information through an interface compatible with an M + Kmeans algorithm model and a particle swarm scheduling model, and acquiring different types of data information from a virtual resource database, a heterogeneous database, a resource database or a meta database; the virtual resource database, the heterogeneous database, the resource database or the metadatabase is provided with a data search engine;
Step two, performing data conversion on the acquired data information, and converting the data information with different formats into a data form which can be received by a machine through standardized processing;
step three, carrying out classification calculation on data information received by the machine, realizing data classification and coding through an M + Kmeans algorithm model, and storing the classified data information;
step four, information calculation and mining are carried out on the stored data information, and mining and processing of the data information are achieved through a particle swarm algorithm model so as to achieve analysis and calculation of the data information;
and step five, storing the calculated data information for analysis and application by a user.
In the first step, data information collection is achieved through a collection module, the collection module comprises at least one high-speed serial transceiver, at least one LA logic module, a channel maintenance module, an RXUSER interface and a TXUSER interface, the high-speed serial transceiver is connected with the LA logic module, the LA logic module is respectively connected with the channel maintenance module, the RXUSER interface and the TXUSER interface, the channel maintenance module is connected with external equipment through a control interface, the RXUSER interface is connected with the RXFIFO interface, the TXUSER interface is connected with the TXFIFO interface, and the at least one high-speed serial transceiver and the at least one LA logic module receive data information in the optical module through 4 high-speed serial transceivers and encode, decode, calculate and process characters.
As shown in fig. 2 and fig. 3, the data is integrated into a storage data block with a certain format at the acquisition module according to the storage parameters set by the system. And after the integration is finished, sending the data to a high-speed data processing module, and carrying out the next analysis according to the set data frame format and other storage parameters. The LA logic module drives the serial transceiver complexly to encode and decode characters. The channel maintenance module is responsible for binding and initializing LA channels, detecting logic errors in the LA channels, and the RX user module moves high-speed data to the data processing module. Communication interface configuration
Figure 638341DEST_PATH_IMAGE064
And
Figure 547391DEST_PATH_IMAGE065
the interrupt from the PL terminal to the PS terminal is configured as IRQ0, and the system clock of the PS is 33.33 MHz. Of a configuration module
Figure 944874DEST_PATH_IMAGE066
Interfacing with AXIGP sub-modules
Figure 451292DEST_PATH_IMAGE067
The interfaces are connected to complete the interaction of the module configuration parameters.
In step three, as shown in fig. 3, the method for implementing data information classification by using the M + Kmeans algorithm includes:
step (31), starting, inputting data set and parameter, and initializing a cluster center, assuming that the information data set is marked as D, the cluster center is marked as D
Figure 326844DEST_PATH_IMAGE068
Step (32), traversing the input data information, calculating the distance between the data information and the data center point, sorting according to the distance, and recording as
Figure 90401DEST_PATH_IMAGE069
Step (33), determining K clustering centers, and selecting the distance between the clustering centers and the central point as
Figure 409518DEST_PATH_IMAGE070
The data point B of (1) is taken as the central point of the next data information;
step (34), classifying the class with the closest distance, updating the clustering center, outputting the clustering result and selecting the distance between the clustering center and the central point as
Figure 891315DEST_PATH_IMAGE071
Calculating the distance between the point C and the point B
Figure 570558DEST_PATH_IMAGE072
If it is determined that
Figure 703468DEST_PATH_IMAGE073
Then set point
Figure 442754DEST_PATH_IMAGE074
Is the clustering center point;
step (35) of
Figure 677426DEST_PATH_IMAGE075
A sub-iterative process of finding
Figure 645513DEST_PATH_IMAGE076
A seed point as an initial
Figure 649241DEST_PATH_IMAGE076
A vector of mass center
Figure 293849DEST_PATH_IMAGE077
A step (36) of dividing the clusters into clusters
Figure 750238DEST_PATH_IMAGE012
Is initialized to
Figure 23481DEST_PATH_IMAGE013
Calculating the distance between the sample and each centroid vector, wherein
Figure 147295DEST_PATH_IMAGE014
Preparing a converted data format for the initialized information, wherein
Figure 697225DEST_PATH_IMAGE078
Representing the number of centroid vectors, the distance formula is expressed as:
Figure 391643DEST_PATH_IMAGE079
(1)
in the formula (1), wherein
Figure 216379DEST_PATH_IMAGE016
A sample of data representing the input is presented,
Figure 194700DEST_PATH_IMAGE017
in (1)iIs shown asiA number of data nodes, each of which is a data node,
Figure 430378DEST_PATH_IMAGE080
a vector representing each of the centroids is represented,
Figure 861359DEST_PATH_IMAGE081
in (1)
Figure 958628DEST_PATH_IMAGE082
Is shown as
Figure 807767DEST_PATH_IMAGE082
Individual centroidal value of
Figure 965078DEST_PATH_IMAGE021
Marking as
Figure 617777DEST_PATH_IMAGE022
Corresponding category
Figure 54091DEST_PATH_IMAGE083
Figure 7004DEST_PATH_IMAGE083
Indicating the degree of distance of different data information, updated at that time
Figure 335217DEST_PATH_IMAGE084
Figure 491523DEST_PATH_IMAGE025
Indicating updated data information for all
Figure 930594DEST_PATH_IMAGE026
If all of the sample points recalculate the centroid
Figure 738013DEST_PATH_IMAGE027
If the centroid is not changed, the step (33) is carried out to continue the classification, and the data information value is output when the calculation is not carried out any more.
In a specific embodiment, the MapReduce parallelization of data information is realized on a Hadoop layer. The MapReduce parallelization process is characterized in that different types of data information are calculated by setting data information of different layers, wherein the distance from the residual data to a random central point is calculated by a first layer of Distancemappers, and the distance from the residual data to the random central point is calculated by a second layer of Maxmappers
Figure 237128DEST_PATH_IMAGE085
Point of and all
Figure 379265DEST_PATH_IMAGE086
Until each MAP finds all possible cluster centers. At the Reducer stage, merging all independent MAP clustering centers, finding the shortest path connected with all the center points, merging the center points on the shortest path edges, updating the shortest path edges to new center points, and performing an iterative process until the number of the center points is equal to that of the new center points
Figure 622028DEST_PATH_IMAGE087
. Obtained through an iteration stage
Figure 300265DEST_PATH_IMAGE088
And (5) initially clustering centers, and then calling a Kmeans algorithm process to process data until the algorithm converges. The improved M + Kmeans algorithm reduces the influence of the clustering center point on the data clustering result, and the clustering result is more reliable and has higher execution efficiency. In the specific embodiment, a Bayesian classification algorithm model can be added to realize classification of the data information.
In step four, as shown in fig. 5, the method for implementing data information scheduling of different storage nodes by using the particle swarm optimization model comprises:
firstly, setting different data information parameters in a particle swarm algorithm model, wherein the data information parameters comprise CPU utilization rate, memory utilization rate, data particle number,Converting node resources of data information into different particles in a particle swarm algorithm model according to data type, data size and system time delay, and recording the data particles as different particles
Figure 970281DEST_PATH_IMAGE089
In which
Figure 84867DEST_PATH_IMAGE090
Indicating first time data is schedulediThe CPU utilization rate in the data particle number scheduling of each data node,
Figure 131320DEST_PATH_IMAGE091
indicating second time of data schedulingiThe CPU utilization rate in the data particle number scheduling of each data node,
Figure 899950DEST_PATH_IMAGE092
indicating third time of data schedulingiCPU utilization rate when the data particle number of each data node is scheduled; the node i data information scheduling index is
Figure 6446DEST_PATH_IMAGE093
Figure 608329DEST_PATH_IMAGE094
Indicating first time data is schedulediThe data for each data node indicates the node resource utilization,
Figure 209205DEST_PATH_IMAGE095
indicating the second time of data schedulingiThe data for each data node indicates the node resource utilization,
Figure 845723DEST_PATH_IMAGE096
indicating the third time of data schedulingiThe data index node resource utilization rate of each data node; then, calculating the data storage utilization rate of the current data server node, wherein the function expression is as follows:
When a data service is deployed on a node with a low node utilization, the node resource utilization in the cluster can be expressed as:
Figure 857541DEST_PATH_IMAGE097
(2)
calculating the resource utilization rate of the data nodes through a formula (2), wherein the data service delay of the system is related to factors such as data access frequency and data volume, the system delay is composed of communication delay between data services and data transmission delay of a data source, the delay between the data service and the data source is the product of the delay between the data source node and the service node and the data volume, and when the service node is deployed at a k node, the data service delay is expressed as follows:
Figure 461567DEST_PATH_IMAGE098
(3)
in formula (3), m represents the number of the calculation node data information,
Figure 584244DEST_PATH_IMAGE038
the presentation data information is stored in a matrix in the server,
Figure 75268DEST_PATH_IMAGE039
iniA row vector representing the memory matrix is shown,
Figure 539878DEST_PATH_IMAGE040
injA column vector representing the memory matrix is shown,
Figure 850774DEST_PATH_IMAGE099
a time delay matrix representing the data information of the node,
Figure 777142DEST_PATH_IMAGE042
in
Figure 634590DEST_PATH_IMAGE043
The center of mass of the data is represented,
Figure 988211DEST_PATH_IMAGE044
in (1)
Figure 51982DEST_PATH_IMAGE045
Representing node data information;
Figure 798352DEST_PATH_IMAGE046
when the value is 0, the service node j and the service node i do not carry out data scheduling; obtaining data service time delay through a formula (2), deploying data information at a node with smaller time delay, and further improving service response time, wherein the time delay between the data information of the service node and a data source is represented as follows:
Figure 998389DEST_PATH_IMAGE100
(4)
In formula (4), x represents the number of data sources, k represents the deployed node data information centroid,
Figure 788490DEST_PATH_IMAGE101
a dependency matrix is represented with the data source,
Figure 73978DEST_PATH_IMAGE102
representing the amount of data transmitted by the data source to the data node,
Figure 856995DEST_PATH_IMAGE103
a network delay matrix representing a data source;
calculating the time delay between the data node and the data source through a formula (4), expressing the relation between the data service time delay and the data quantity required by the data source, updating the speed and position attribute information of a certain dimension of the particle in the scheduling process, and taking an updating function as:
Figure 911539DEST_PATH_IMAGE104
(5)
in the formula (5), the first and second groups,
Figure 872542DEST_PATH_IMAGE052
to representParticle inertia factors to perform global search for optimal solution and local search for optimal solution,
Figure 661637DEST_PATH_IMAGE053
Figure 467919DEST_PATH_IMAGE105
in order to learn the acceleration constant of the factor,
Figure 908128DEST_PATH_IMAGE055
Figure 292229DEST_PATH_IMAGE106
is a random number of the particle swarm algorithm,
Figure 552309DEST_PATH_IMAGE107
indicating the velocity of the particles after the update,
Figure 162282DEST_PATH_IMAGE058
in (1)
Figure 207730DEST_PATH_IMAGE108
The identity of the position of the particles is represented,
Figure 510535DEST_PATH_IMAGE109
indicating the updated position of the particle or particles,
Figure 257911DEST_PATH_IMAGE061
the position of the particles at the last moment is shown,
Figure 671575DEST_PATH_IMAGE110
Figure 804485DEST_PATH_IMAGE063
representing a particle swarm extremum; updating of the particle swarm scheduling model is achieved through a formula (5), a fitness function is defined according to actual requirements, and then the optimal solution and the global optimal solution of each particle in the particle swarm algorithm are calculated.
Data service node in dynamic scheduling policy of data service The deployment scenario of points is treated as a particle, so the deployment scenario set
Figure 543771DEST_PATH_IMAGE111
As a particle swarm, an optimal scheme is obtained through a particle optimization mode, and the resource utilization rate, the balance and the service delay are calculated as a fitness function.
And in the fifth step, the data is stored as an FPGA high-speed storage module.
In a specific embodiment, as shown in fig. 2, an SSD using a PCIE interface is used as a storage device, and a storage board card capable of supporting loading multiple SSDs simultaneously is added. The high-speed storage module uses an XCZU11EG chip as a core hardware part, supports Verilog HDL hardware description language, and has a structure of programmable logic PL + a processor system PS to realize high-speed storage logic. And the optical module in the storage module is used for receiving resource data of other equipment and exchanging information with the FPGA chip through the high-speed serial interface transceiver.
Although specific embodiments of the present invention have been described above, it will be understood by those skilled in the art that these specific embodiments are merely illustrative and that various omissions, substitutions and changes in the form of the detail of the methods and systems described above may be made by those skilled in the art without departing from the spirit and scope of the invention. For example, it is within the scope of the present invention to combine the steps of the above-described methods to perform substantially the same function in substantially the same way to achieve substantially the same result. Accordingly, the scope of the invention is to be limited only by the following claims.

Claims (5)

1. A data processing method of an improved computer algorithm model is characterized by comprising the following steps: the method comprises the following steps:
acquiring data information, receiving different types of data information by the data information through an interface compatible with an M + Kmeans algorithm model and a particle swarm scheduling model, and acquiring different types of data information from a virtual resource database, a heterogeneous database, a resource database or a meta database; the virtual resource database, the heterogeneous database, the resource database or the metadatabase are provided with a data search engine;
step two, carrying out data conversion on the acquired data information, and carrying out standardized processing on the data information with different formats so as to convert the data information into a data form which can be received by a machine;
step three, carrying out classification calculation on data information received by the machine, realizing data classification and coding through an M + Kmeans algorithm model, and storing the classified data information;
fourthly, information calculation and mining are carried out on the stored data information, and the mining and processing of the data information are realized through a particle swarm algorithm model so as to realize the analysis and calculation of the data information;
and step five, storing the calculated data information for analysis and application by a user.
2. A method of data processing for an improved computer algorithm model according to claim 1, characterized by: the acquisition module comprises at least one high-speed serial transceiver, at least one LA logic module, a channel maintenance module, an RXUSER interface and a TXUSER interface, wherein the high-speed serial transceiver is connected with the LA logic module, the LA logic module is respectively connected with the channel maintenance module, the RXUSER interface and the TXUSER interface, the channel maintenance module is connected with external equipment through a control interface, the RXUSER interface is connected with the RXFIFO interface, and the TXUSER interface is connected with the TXFIFO interface, wherein the at least one high-speed serial transceiver and the at least one LA logic module receive data information in the optical module through 4 high-speed serial transceivers and encode, decode, calculate and process the characters.
3. A method of data processing for an improved computer algorithm model according to claim 1, characterized by: the method for realizing data information classification by using the M + Kmeans algorithm comprises the following steps:
step (31), starting, inputting data set and parameter, and initializing a cluster center, assuming that the information data set is marked as D, the cluster center is marked as D
Figure 664262DEST_PATH_IMAGE001
Step (32), traversing the input data information, calculating the distance between the data information and the data center point, sorting according to the distance, and recording as
Figure 367164DEST_PATH_IMAGE002
Step (33), determining K clustering centers, and selecting the distance between the clustering centers and the central point as
Figure 641150DEST_PATH_IMAGE003
The data point B of (1) is taken as the central point of the next data information;
step (34), classifying the class with the closest distance, updating the clustering center, outputting the clustering result and selecting the distance between the clustering center and the central point as
Figure 225715DEST_PATH_IMAGE004
Calculating the distance between the point C and the point B
Figure 455708DEST_PATH_IMAGE005
If it is determined that
Figure 139631DEST_PATH_IMAGE006
Then set point
Figure 228809DEST_PATH_IMAGE007
Is a clustering central point;
step (35) of
Figure 210541DEST_PATH_IMAGE008
A sub-iterative process of finding
Figure 435986DEST_PATH_IMAGE009
A seed point as an initial
Figure 556388DEST_PATH_IMAGE010
A vector of mass center
Figure 991918DEST_PATH_IMAGE011
A step (36) of dividing the clusters into clusters
Figure 652706DEST_PATH_IMAGE007
Is initialized to
Figure 873603DEST_PATH_IMAGE012
Calculating the distance between the sample and each centroid vector, wherein
Figure 145666DEST_PATH_IMAGE013
Preparing a converted data format for the initialized information, wherein
Figure 943858DEST_PATH_IMAGE014
Representing the number of centroid vectors, the distance formula is expressed as:
Figure 283703DEST_PATH_IMAGE015
(1)
in the formula (1), wherein
Figure 483741DEST_PATH_IMAGE016
A sample of data representing the input is presented,
Figure 929634DEST_PATH_IMAGE017
in (1)iIs shown asiA number of data nodes, each of which is a data node,
Figure 90488DEST_PATH_IMAGE018
a vector representing each of the centroids is represented,
Figure 93079DEST_PATH_IMAGE019
in (1)
Figure 68994DEST_PATH_IMAGE020
Is shown as
Figure 905363DEST_PATH_IMAGE021
Individual centroidal value of
Figure 412568DEST_PATH_IMAGE022
Marking as
Figure 343484DEST_PATH_IMAGE023
Corresponding category
Figure 190217DEST_PATH_IMAGE024
Figure 56542DEST_PATH_IMAGE024
Indicating the degree of distance of different data information, updated at that time
Figure 178606DEST_PATH_IMAGE025
Figure 788579DEST_PATH_IMAGE026
Indicating updated data information for all
Figure 489819DEST_PATH_IMAGE027
If all of the sample points recalculate the centroid
Figure 917258DEST_PATH_IMAGE028
If the centroid is not changed, the step (33) is carried out to continue classification, and the data information value is output when the calculation is not carried out any more.
4. The method of claim 1, wherein the method comprises: the method for realizing the data information scheduling of different storage nodes by the particle swarm algorithm model comprises the following steps:
firstly, setting different data information parameters in a particle swarm optimization model, wherein the data information parameters comprise CPU utilization rate, memory utilization rate, data particle number, data type, data size and system time delay, converting node resources of data information into different particles in the particle swarm optimization model, and recording the data particles as different particles in the particle swarm optimization model
Figure 133476DEST_PATH_IMAGE029
In which
Figure 422506DEST_PATH_IMAGE030
Indicating first time data is schedulediThe CPU utilization rate in the data particle number scheduling of each data node,
Figure 961940DEST_PATH_IMAGE031
indicating second time of data schedulingiCPU utilization in data particle number scheduling for each data node,
Figure 435647DEST_PATH_IMAGE032
indicating third time of data schedulingiCPU utilization rate when the data particle number of each data node is scheduled; the node i data information scheduling index is
Figure 76844DEST_PATH_IMAGE033
Figure 887674DEST_PATH_IMAGE034
Indicating that the data is scheduled for the first time iThe data for each data node indicates the node resource utilization,
Figure 891402DEST_PATH_IMAGE035
indicating second time of data schedulingiThe data for each data node indicates the node resource utilization,
Figure 411376DEST_PATH_IMAGE036
indicating third time of data schedulingiData of a personThe data of the node indicates the node resource utilization rate; then, calculating the data storage utilization rate of the current data server node, wherein the function expression is as follows:
when a data service is deployed on a node with a low node utilization, the node resource utilization in the cluster can be expressed as:
Figure 786207DEST_PATH_IMAGE037
(2)
calculating the resource utilization rate of the data nodes by the formula (2),
the data service latency is represented as:
Figure 276094DEST_PATH_IMAGE038
(3)
in the formula (3), m represents the number of the data information of the calculation node,
Figure 744116DEST_PATH_IMAGE039
the presentation data information is stored in a matrix in the server,
Figure 480997DEST_PATH_IMAGE040
in (1)iA row vector representing the memory matrix is shown,
Figure 424682DEST_PATH_IMAGE041
in (1)jA column vector representing the memory matrix is shown,
Figure 593626DEST_PATH_IMAGE042
a time delay matrix representing the data information of the node,
Figure 306367DEST_PATH_IMAGE043
in (1)
Figure 948570DEST_PATH_IMAGE044
The center of mass of the data is represented,
Figure 254918DEST_PATH_IMAGE045
in (1)
Figure 86607DEST_PATH_IMAGE046
Representing node data information;
Figure 44068DEST_PATH_IMAGE047
when the value is 0, the service node j and the service node i do not carry out data scheduling; obtaining data service time delay through a formula (2), deploying data information at a node with smaller time delay, and further improving service response time, wherein the time delay between the data information of the service node and a data source is represented as follows:
Figure 607905DEST_PATH_IMAGE048
(4)
In formula (4), x represents the number of data sources, k represents the deployed node data information centroid,
Figure 526182DEST_PATH_IMAGE049
a dependency matrix is represented with the data source,
Figure 289126DEST_PATH_IMAGE050
representing the amount of data transmitted by the data source to the data node,
Figure 382984DEST_PATH_IMAGE051
a network delay matrix representing a data source;
calculating the time delay between the data node and the data source through a formula (4), expressing the relation between the data service time delay and the data quantity required by the data source, updating the speed and position attribute information of a certain dimension of the particle in the scheduling process, and taking an updating function as:
Figure 711197DEST_PATH_IMAGE052
(5)
in the formula (5), the first and second groups of the chemical reaction materials are selected from the group consisting of,
Figure 975825DEST_PATH_IMAGE053
representing the inertia factors of the particles to perform a global search for an optimal solution and a local search for an optimal solution,
Figure 414897DEST_PATH_IMAGE054
Figure 628841DEST_PATH_IMAGE055
in order to learn the acceleration constant of the factor,
Figure 252589DEST_PATH_IMAGE056
Figure 879879DEST_PATH_IMAGE057
is a random number of the particle swarm algorithm,
Figure 732429DEST_PATH_IMAGE058
indicating the velocity of the particles after the update,
Figure 50147DEST_PATH_IMAGE059
in (1)
Figure 720162DEST_PATH_IMAGE060
The identity of the position of the particles is represented,
Figure 444536DEST_PATH_IMAGE061
indicating the updated position of the particle or particles,
Figure 225410DEST_PATH_IMAGE062
the position of the particles at the last moment is shown,
Figure 683721DEST_PATH_IMAGE063
Figure 134425DEST_PATH_IMAGE064
representing a particle swarm extremum; updating of the particle swarm scheduling model is achieved through a formula (5), a fitness function is defined according to actual requirements, and then the optimal solution and the global optimal solution of each particle in the particle swarm algorithm are calculated.
5. A method of data processing for an improved computer algorithm model according to claim 1, characterized by: and the data storage is an FPGA high-speed storage module.
CN202210671614.7A 2022-06-15 2022-06-15 Data processing method of improved computer algorithm model Active CN114756557B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210671614.7A CN114756557B (en) 2022-06-15 2022-06-15 Data processing method of improved computer algorithm model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210671614.7A CN114756557B (en) 2022-06-15 2022-06-15 Data processing method of improved computer algorithm model

Publications (2)

Publication Number Publication Date
CN114756557A true CN114756557A (en) 2022-07-15
CN114756557B CN114756557B (en) 2022-11-08

Family

ID=82336600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210671614.7A Active CN114756557B (en) 2022-06-15 2022-06-15 Data processing method of improved computer algorithm model

Country Status (1)

Country Link
CN (1) CN114756557B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114996362A (en) * 2022-08-04 2022-09-02 深圳市共赢晶显技术有限公司 Data processing and storing method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182115A (en) * 2017-12-28 2018-06-19 福州大学 A kind of virtual machine load-balancing method under cloud environment
CN110795208A (en) * 2019-10-11 2020-02-14 南京航空航天大学 Mobile cloud computing self-adaptive virtual machine scheduling method based on improved particle swarm
CN112084025A (en) * 2020-09-01 2020-12-15 河海大学 Improved particle swarm algorithm-based fog calculation task unloading time delay optimization method
CN112612820A (en) * 2020-12-07 2021-04-06 国网北京市电力公司 Data processing method and device, computer readable storage medium and processor
CN113792754A (en) * 2021-08-12 2021-12-14 国网江西省电力有限公司电力科学研究院 Method for processing DGA (differential global alignment) online monitoring data of converter transformer by removing different elements and then repairing
CN114626426A (en) * 2020-12-11 2022-06-14 中国科学院沈阳自动化研究所 Industrial equipment behavior detection method based on K-means optimization algorithm

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182115A (en) * 2017-12-28 2018-06-19 福州大学 A kind of virtual machine load-balancing method under cloud environment
CN110795208A (en) * 2019-10-11 2020-02-14 南京航空航天大学 Mobile cloud computing self-adaptive virtual machine scheduling method based on improved particle swarm
CN112084025A (en) * 2020-09-01 2020-12-15 河海大学 Improved particle swarm algorithm-based fog calculation task unloading time delay optimization method
CN112612820A (en) * 2020-12-07 2021-04-06 国网北京市电力公司 Data processing method and device, computer readable storage medium and processor
CN114626426A (en) * 2020-12-11 2022-06-14 中国科学院沈阳自动化研究所 Industrial equipment behavior detection method based on K-means optimization algorithm
CN113792754A (en) * 2021-08-12 2021-12-14 国网江西省电力有限公司电力科学研究院 Method for processing DGA (differential global alignment) online monitoring data of converter transformer by removing different elements and then repairing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114996362A (en) * 2022-08-04 2022-09-02 深圳市共赢晶显技术有限公司 Data processing and storing method

Also Published As

Publication number Publication date
CN114756557B (en) 2022-11-08

Similar Documents

Publication Publication Date Title
JP5427640B2 (en) Decision tree generation apparatus, decision tree generation method, and program
CN112994701B (en) Data compression method, device, electronic equipment and computer readable medium
WO2020207410A1 (en) Data compression method, electronic device, and storage medium
CN114756557B (en) Data processing method of improved computer algorithm model
CN109063752B (en) Multi-source high-dimensional multi-scale real-time data stream sorting method based on neural network
CN104020983A (en) KNN-GPU acceleration method based on OpenCL
WO2023020214A1 (en) Retrieval model training method and apparatus, retrieval method and apparatus, device and medium
CN110659367A (en) Text classification number determination method and device and electronic equipment
WO2023143016A1 (en) Feature extraction model generation method and apparatus, and image feature extraction method and apparatus
CN111125469A (en) User clustering method and device for social network and computer equipment
CN114691108A (en) Automatic code generation method for neural network accelerator
CN109657711A (en) A kind of image classification method, device, equipment and readable storage medium storing program for executing
CN103064991A (en) Mass data clustering method
CN108763323A (en) Meteorological lattice point file application process based on resource set and big data technology
WO2022007596A1 (en) Image retrieval system, method and apparatus
CN103810197A (en) Hadoop-based data processing method and system
CN108549696B (en) Time series data similarity query method based on memory calculation
CN108038109A (en) Method and system, the computer program of Feature Words are extracted from non-structured text
WO2023241385A1 (en) Model transferring method and apparatus, and electronic device
WO2023174189A1 (en) Method and apparatus for classifying nodes of graph network model, and device and storage medium
CN108280224B (en) Ten thousand grades of dimension data generation methods, device, equipment and storage medium
CN114676138A (en) Data processing method, electronic device and readable storage medium
Dube et al. A novel approach of IoT stream sampling and model update on the IoT edge device for class incremental learning in an edge-cloud system
CN113342550A (en) Data processing method, system, computing device and storage medium
WO2021062219A1 (en) Clustering data using neural networks based on normalized cuts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant