CN111597399A - Computer data processing system and method based on data fusion - Google Patents

Computer data processing system and method based on data fusion Download PDF

Info

Publication number
CN111597399A
CN111597399A CN202010426699.3A CN202010426699A CN111597399A CN 111597399 A CN111597399 A CN 111597399A CN 202010426699 A CN202010426699 A CN 202010426699A CN 111597399 A CN111597399 A CN 111597399A
Authority
CN
China
Prior art keywords
data
processed
space
resource allocation
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010426699.3A
Other languages
Chinese (zh)
Inventor
尹大伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Laiwu Vocational and Technical College
Original Assignee
Laiwu Vocational and Technical College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Laiwu Vocational and Technical College filed Critical Laiwu Vocational and Technical College
Priority to CN202010426699.3A priority Critical patent/CN111597399A/en
Publication of CN111597399A publication Critical patent/CN111597399A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of computers, and particularly relates to a computer data processing system and method based on data fusion. The system comprises: the data evaluation unit is used for evaluating the data to be processed and acquiring the data information of the data to be processed; the data information at least comprises: data size, data type and data structure; the resource allocation unit allocates computing resources of the computer for data processing based on the data information acquired by the data evaluation unit according to a preset resource allocation model; and the data fusion unit is used for calling the computing resources distributed by the resource distribution unit, carrying out data fusion on the data to be processed based on the data information acquired by the data evaluation unit, and storing the fused data. The fusion processing of computer data is realized through data evaluation and resource allocation, and the method has the advantages of high processing efficiency and high resource utilization rate.

Description

Computer data processing system and method based on data fusion
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a computer data processing system and method based on data fusion.
Background
Data (Data) is a representation of facts, concepts or instructions that can be manipulated by either manual or automated means. After the data is interpreted and given a certain meaning, it becomes information. Data processing (dataprocessing) is the collection, storage, retrieval, processing, transformation, and transmission of data.
The basic purpose of data processing is to extract and derive valuable, meaningful data for certain people from large, possibly chaotic, unintelligible amounts of data.
Data processing is the basic link of system engineering and automatic control. Data processing is throughout various fields of social production and social life. The development of data processing technology and the breadth and depth of its application have greatly influenced the progress of human society development.
The data fusion technology comprises the steps of collecting, transmitting, integrating, filtering, correlating and synthesizing useful information given by various information sources so as to assist people in situation/environment judgment, planning, detection, verification and diagnosis. The method is extremely important for timely and accurately acquiring various useful information on a battlefield, carrying out timely and complete evaluation on battlefield conditions, threats and importance degrees thereof, implementing tactics and strategic aid decision making and controlling the command of combat troops. The future battlefield is changeable instantly, and factors influencing decision making are more and more complex, so that a commander is required to make the most accurate judgment on the battlefield situation in the shortest time, and the most effective command control is implemented on the combat troops. The series of most' realization needs to have the most advanced data processing technology to be basically guaranteed. Otherwise, the high-brightness military leaders and commanders are inundated with data in the great amount, such as the tobacco, or the judgment is missed, or the decision is delayed and the warplane is lost, thereby causing disastrous results.
The system resources are used to track the running of the application program rather than running the application program, as if there are more cars on the highway and there is no way to drive if there are not a few cars. It can therefore be said with certainty that it is the computer system's performance that is affected by other factors, and never the size of the available system resources. When the performance of the user computer system is significantly degraded, the cause should be looked up from other aspects without immediately doubting the system resources.
From the hardware aspect, the fact that the memory is too small to cause the system to frequently use the virtual memory is one of the main reasons for affecting the system performance;
from a software perspective, because Windows is a multitasking operating system, it is common practice to run multiple applications simultaneously, regardless of whether actually needed at the time. Programmers writing and debugging these applications generally consider only their operation in a single task environment, and do not have much effort to consider and debug from a multi-task environment, so many applications often do not work well in conjunction, and running multiple such applications at the same time can cause system performance degradation due to their conflicts with each other. Of course, imperfections in the Windows9X multitask management mechanism are also one of the major causes of this problem.
Patent No. cn201410047608.xa discloses a multi-platform point cloud data fusion method, relates to the fields of surveying and mapping and engineering measurement, and comprises the following steps: data acquisition: acquiring original data of ground objects and features in a target area through data acquisition equipment and fixed ground laser scanning equipment carried by a mobile platform; data preprocessing: preprocessing the collected original data such as engineering tissue management, filtering, denoising and the like; data fusion: and performing precision analysis on the filtered and denoised point cloud data, performing precision correction on the rest data according to the point cloud data with the highest precision, and realizing data coordinate conversion acquired by a fixed ground laser scanning device and non-field control point coordinate conversion based on the point cloud data acquired by the mobile platform. Although the data fusion of various fields and structures can be realized, and the noise reduction processing is carried out on the data, the complexity of the data processing process is higher, the occupied system resources are more, and more resource waste is caused when certain low-order data fusion is carried out.
Patent No. CN201610191767.6A discloses a processing method for data fusion and intelligent search of multiple data sources, and a processing method and application for data fusion and intelligent search of multiple data sources, wherein the sensor layout adopts a planar layout, sensors are located on the same plane to form a sensor network, the data fusion comprises data fusion of multiple sensors of the same type and data fusion of different sensors, data characteristics and data types of the sensors are acquired by polling, data are subjected to redundant processing by acquisition nodes when being acquired, and a self-adaptive algorithm based on batch estimation is adopted. According to the method, the data of each sensor is dynamically acquired, so that the identification time of the system for the data of the sensors is prolonged, the data precision is improved, and the data accuracy is increased. The data fusion is specific to the sensor system, and when the data fusion is carried out, the analysis process specific to the data is not carried out, the efficiency of carrying out the data fusion is low, and the resource occupation rate is high.
Disclosure of Invention
In view of the above, the main objective of the present invention is to provide a computer data processing system and method based on data fusion, which perform system resource allocation and data fusion based on data evaluation, and have the advantages of high processing efficiency and high resource utilization rate.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a computer data processing system based on data fusion, the system comprising: the data evaluation unit is used for evaluating the data to be processed and acquiring the data information of the data to be processed; the data information at least comprises: data size, data type and data structure; the resource allocation unit allocates computing resources of the computer for data processing based on the data information acquired by the data evaluation unit according to a preset resource allocation model; and the data fusion unit is used for calling the computing resources distributed by the resource distribution unit, carrying out data fusion on the data to be processed based on the data information acquired by the data evaluation unit, and storing the fused data.
Further, the data evaluation unit includes: a plurality of data identification subunits; the data identification subunit is used for training respectively based on a plurality of dimensions and a plurality of feature spaces; the trained data identification subunit can analyze the data to be processed under the corresponding dimension and feature space to obtain an analysis result; the data evaluation unit further includes: and the analysis integration unit is used for integrating the analysis results of all the data identification subunits to obtain the data scale, the data type and the data structure of the data to be processed.
Further, the dimensions are defined as: features of data, i.e. dataSize, data type, and data structure; the training process specifically comprises: the data identification subunit extracts data features based on pre-collected training data samples respectively under the data scale dimension, the data type dimension or the data structure dimension, and counts the times of the data features according with each feature space by using the following formula:
Figure BDA0002498983210000041
wherein N is the number of times of conforming to the feature space, S is the number of data, and lambdaiFor the weight of the ith training sample, M is the number of features in each feature space, countjThe number of data features of the ith training sample; setting the priority of the feature space corresponding to the training sample from high to low according to the counted times that the training sample conforms to each feature space and from multiple to few to finish the training of the data feature space; when the data to be processed is evaluated, the data identification subunit performs feature space mapping under corresponding dimensionality on the data to be processed respectively, counts feature space mapping results, and takes the mapping result with the highest frequency as an identification result.
Further, the resource allocation unit, according to a preset resource allocation model, based on the data information obtained by the data evaluation unit, performs the following steps: establishing a resource allocation model, wherein the resource allocation model is represented by the following notations:
Figure BDA0002498983210000042
wherein F (x) is the percentage of resources allocated, the data information comprises the data size, the weighted average of the data type and the data structure, α is a constant, α>3,
Figure BDA0002498983210000043
Is a standard average value and is a set constant; according to the established resource allocation model, firstly, the weighted average value of the data information is calculated by the following formula:the data scale size is A + the weight value corresponding to the data type is B + the weight value corresponding to the data structure is C; wherein, the weight value corresponding to the data type is as follows: presetting different numerical values as weight values of different data types; the weight corresponding to the data structure is defined as: presetting different numerical values as weights of different data structures; and then calculating the percentage of the computer resources which should be allocated by using a resource allocation model, and sending the calculation result to the data fusion unit.
Further, the data fusion unit calls the computing resources allocated by the resource allocation unit, performs data fusion on the data to be processed based on the data information acquired by the data evaluation unit, and executes the following steps in the method for storing the fused data: according to the percentage of the computer resources obtained by calculation, calling the computer resources, extracting the data space of the data to be processed, and classifying the data to be processed into different target heterogeneous databases according to the data space of the data to be processed; carrying out normalization processing on the target heterogeneous database to obtain a classified target heterogeneous data matrix; and respectively mapping and matching the classification target heterogeneous data matrix with each directional data space group by using the following formula:
Figure BDA0002498983210000051
wherein, sim (d)j,dk) In order to map the matching result,
Figure BDA0002498983210000052
for a product target heterogeneous data matrix, wjiIs the matrix row value, | djL is the value of the corresponding matrix determinant;
Figure BDA0002498983210000053
for directional data space groups, wkiIs the matrix row value, | dk| | is the value of the corresponding matrix determinant; according to the result of the final mapping matching, matching mapping result sim (d)j,dk) The directional data space group corresponding to the minimum value is used as the data space corresponding to the product information to complete the construction of the data space(ii) a And performing chaotic fuzzy matching according to the constructed data space to finish integration of different heterogeneous data.
A computer data processing method based on data fusion, the method performing the steps of: the data evaluation unit is used for evaluating the data to be processed and acquiring the data information of the data to be processed; the data information includes at least: data size, data type and data structure; the resource allocation unit allocates computing resources of the computer for data processing based on the data information acquired by the data evaluation unit according to a preset resource allocation model; and the data fusion unit is used for calling the computing resources distributed by the resource distribution unit, carrying out data fusion on the data to be processed based on the data information acquired by the data evaluation unit, and storing the fused data.
Further, the data evaluation unit includes: a plurality of data identification subunits; the data identification subunit is used for training respectively based on a plurality of dimensions and a plurality of feature spaces; the trained data identification subunit can analyze the data to be processed under the corresponding dimension and feature space to obtain an analysis result; the data evaluation unit further includes: and the analysis integration unit is used for integrating the analysis results of all the data identification subunits to obtain the data scale, the data type and the data structure of the data to be processed.
Further, the dimensions are defined as: characteristics of the data, i.e., data size, data type, and data structure; the training process specifically comprises: the data identification subunit extracts data features based on pre-collected training data samples respectively under the data scale dimension, the data type dimension or the data structure dimension, and counts the times of the data features according with each feature space by using the following formula:
Figure BDA0002498983210000061
wherein N is the number of times of conforming to the feature space, S is the number of data, and lambdaiFor the weight of the ith training sample, M is the number of features in each feature space, countjThe ith training sampleThe number of data features of (a); setting the priority of the feature space corresponding to the training sample from high to low according to the counted times that the training sample conforms to each feature space and from multiple to few to finish the training of the data feature space; when the data to be processed is evaluated, the data identification subunit performs feature space mapping under corresponding dimensionality on the data to be processed respectively, counts feature space mapping results, and takes the mapping result with the highest frequency as an identification result.
Further, the resource allocation unit, according to a preset resource allocation model, based on the data information obtained by the data evaluation unit, performs the following steps: establishing a resource allocation model, wherein the resource allocation model is represented by the following notations:
Figure BDA0002498983210000062
wherein F (x) is the percentage of resources allocated, the data information comprises the data size, the weighted average of the data type and the data structure, α is a constant, α>3,
Figure BDA0002498983210000063
Is a standard average value and is a set constant; according to the established resource allocation model, firstly, the weighted average value of the data information is calculated by the following formula: the data scale size is A + the weight value corresponding to the data type is B + the weight value corresponding to the data structure is C; wherein, the weight value corresponding to the data type is as follows: presetting different numerical values as weight values of different data types; the weight corresponding to the data structure is defined as: presetting different numerical values as weights of different data structures; and then calculating the percentage of the computer resources which should be allocated by using a resource allocation model, and sending the calculation result to the data fusion unit.
Furthermore, the data fusion unit calls the computing resources distributed by the resource distribution unit, performs data fusion on the data to be processed based on the data information acquired by the data evaluation unit,the method for storing the fused data comprises the following steps: according to the percentage of the computer resources obtained by calculation, calling the computer resources, extracting the data space of the data to be processed, and classifying the data to be processed into different target heterogeneous databases according to the data space of the data to be processed; carrying out normalization processing on the target heterogeneous database to obtain a classified target heterogeneous data matrix; and respectively mapping and matching the classification target heterogeneous data matrix with each directional data space group by using the following formula:
Figure BDA0002498983210000071
wherein, sim (d)j,dk) In order to map the matching result,
Figure BDA0002498983210000072
for a product target heterogeneous data matrix, wjiIs the matrix row value, | djL is the value of the corresponding matrix determinant;
Figure BDA0002498983210000073
for directional data space groups, wkiIs the matrix row value, | dk| | is the value of the corresponding matrix determinant; according to the result of the final mapping matching, matching mapping result sim (d)j,dk) The directional data space group corresponding to the minimum value is used as a data space corresponding to the product information to complete the construction of the data space; and performing chaotic fuzzy matching according to the constructed data space to finish integration of different heterogeneous data.
The computer data processing system and method based on data fusion have the following beneficial effects: the invention performs system resource allocation and data fusion on the basis of data evaluation, and has the advantages of high processing efficiency and high resource utilization rate. The process for realizing the beneficial effects is mainly embodied in two aspects: 1. and evaluating the data size, the data type and the data structure of the data to be processed, so that the basic situation of the data to be processed can be known as a whole. To facilitate subsequent data fusion and resource allocation. In an actual situation, the data type, the data structure and the data scale are not always specified, after the information is obtained, the system resources can be more reasonably distributed according to the current situation of the data to be processed, when the data type is more complex, such as floating point data, the data structure is more variable and complex, and the data with larger data scale is processed, more system resources are distributed, the result can be more quickly obtained, when the data with simple data structure and smaller data scale is processed, smaller resources can be distributed, and the waste of resources is avoided. 2. In the process of data fusion, the heterogeneous data matrix is established to be respectively mapped and matched with each directional data space group, so that the method has the advantages that the same data source to be processed can be fused once for data with different data structures, the fusion accuracy is higher, and the fusion efficiency is higher. Under the condition of occupying the same system resources, the computer data processing system of the invention can process data more conveniently and accurately.
Drawings
FIG. 1 is a system diagram of a computer data processing system based on data fusion according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a method of a computer data processing method based on data fusion according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart illustrating data fusion performed by the data fusion unit of the data fusion-based computer data processing system and method according to the embodiment of the present invention;
FIG. 4 is a graph showing experimental curves of data fusion efficiency of the data fusion-based computer data processing system and method according to the present invention and a graph showing comparison experimental effects of the prior art;
fig. 5 is a schematic diagram of an experimental curve of resource utilization rate of the computer data processing system and method based on data fusion according to the embodiment of the present invention and a schematic diagram of a comparative experimental effect in the prior art.
1-Experimental curves of the invention, 2-Experimental curves of the prior art.
Detailed Description
The method of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments of the invention.
Example 1
As shown in fig. 1, 2 and 3, a computer data processing system based on data fusion, the system comprising: the data evaluation unit is used for evaluating the data to be processed and acquiring the data information of the data to be processed; the data information at least comprises: data size, data type and data structure; the resource allocation unit allocates computing resources of the computer for data processing based on the data information acquired by the data evaluation unit according to a preset resource allocation model; and the data fusion unit is used for calling the computing resources distributed by the resource distribution unit, carrying out data fusion on the data to be processed based on the data information acquired by the data evaluation unit, and storing the fused data.
By adopting the technical scheme, the system resource allocation and data fusion are carried out on the basis of data evaluation, and the method has the advantages of high processing efficiency and high resource utilization rate. The process for realizing the beneficial effects is mainly embodied in two aspects: 1. and evaluating the data size, the data type and the data structure of the data to be processed, so that the basic situation of the data to be processed can be known as a whole. To facilitate subsequent data fusion and resource allocation. In practical situations, the data type, the data structure and the data scale are not specified, so that after the information is obtained, system resources can be more reasonably distributed according to the current situation of the data to be processed, when the data type is more complex, such as floating point data, the data structure is more variable and complex, and when the data scale is larger, more system resources are distributed, the result can be more quickly obtained, when the data structure is simple and the data scale is smaller, smaller resources can be distributed, and the waste of resources is avoided. 2. In the process of data fusion, the heterogeneous data matrix is established to be respectively mapped and matched with each directional data space group, so that the method has the advantages that the same data source to be processed but different data structures can be fused at one time, the fusion accuracy is higher, and the fusion efficiency is higher. Under the condition of occupying the same system resources, the computer data processing system of the invention processes data more conveniently and accurately.
Example 2
On the basis of the above embodiment, the data evaluation unit includes: a plurality of data identification subunits; the data identification subunit is used for training respectively based on a plurality of dimensions and a plurality of feature spaces; the trained data identification subunit can analyze the data to be processed under the corresponding dimension and feature space to obtain an analysis result; the data evaluation unit further includes: and the analysis integration unit is used for integrating the analysis results of all the data identification subunits to obtain the data scale, the data type and the data structure of the data to be processed.
Specifically, from a sensing layer to an application layer of the internet of things, the types and the quantity of various information are multiplied, the quantity of data to be analyzed is also increased in stages, and meanwhile, the problem of data fusion among various heterogeneous networks or multiple systems is also involved, so that the problem of how to timely dig out hidden information and effective data from massive data is solved, and a huge challenge is brought to data processing, and therefore, the problem of how to reasonably, effectively integrate, dig and intelligently process massive data is the difficult problem of the internet of things. The method is combined with distributed computing technologies such as P2P and cloud computing, and becomes a way for solving the above problems. Cloud computing provides a new high-efficiency computing mode for the Internet of things, dynamic telescopic cheap computing can be provided through a network as required, the data center is relatively reliable and safe, convenience and low price of Internet service and the capacity of a mainframe are achieved, data and application sharing among different devices can be easily achieved, and users do not need to worry about troublesome problems such as information leakage and hacker invasion. Cloud computing is a milestone in the information development process, emphasizes the aggregation, optimization and dynamic allocation of information resources, saves the information cost and greatly improves the efficiency of a data center.
Example 3
On the basis of the above embodiment, theThe dimensions are defined as: characteristics of the data, i.e., data size, data type, and data structure; the training process specifically comprises: the data identification subunit extracts data features based on pre-collected training data samples under the data scale dimension, the data type dimension or the data structure dimension, and counts the times of data features conforming to each feature space by using the following formula:
Figure BDA0002498983210000101
wherein N is the number of times of conforming to the feature space, S is the number of data, λiIs the weight of the ith training sample, M is the number of features in each feature space, countjThe number of data features of the ith training sample; setting the priority of the feature space corresponding to the training sample from high to low according to the counted times that the training sample conforms to each feature space and from multiple to few to finish the training of the data feature space; when the data to be processed is evaluated, the data identification subunit performs feature space mapping under corresponding dimensionality on the data to be processed respectively, counts the feature space mapping result, and takes the mapping result with the highest frequency as the identification result.
By adopting the technical scheme, the method respectively performs characteristic space mapping under corresponding dimensionality on the data to be processed, counts characteristic space mapping results, and takes the mapping result with the highest frequency as the identification result. This may improve the efficiency of data evaluation, because there may be a plurality of different data structures in the same data to be processed, and among these data structures, there is a dominant data structure, if the identification evaluation is performed for each data structure, the processing is too slow. The invention adopts a recognition method of a segmentation function formula, and based on feature space mapping, the efficiency of evaluation recognition can be not lost under the condition of ensuring the accuracy.
Example 4
As shown in fig. 4, on the basis of the previous embodiment, the method for allocating computing resources of a computer for data processing by the resource allocation unit based on the data information obtained by the data evaluation unit according to a preset resource allocation model performs the following steps: establishing a resource allocation model, wherein the resource allocation model is represented by the following notations:
Figure BDA0002498983210000111
wherein F (x) is the percentage of resources allocated, the data information comprises the data size, the weighted average of the data type and the data structure, α is a constant, α>3,
Figure BDA0002498983210000112
Is a standard average value and is a set constant; according to the established resource allocation model, firstly, the weighted average value of the data information is calculated by the following formula: the data scale size is A + the weight value corresponding to the data type is B + the weight value corresponding to the data structure is C; wherein, the weight value corresponding to the data type is as follows: presetting different numerical values as weight values of different data types; the weight corresponding to the data structure is defined as: presetting different numerical values as weights of different data structures; and then calculating the percentage of the computer resources which should be allocated by using a resource allocation model, and sending the calculation result to the data fusion unit.
Specifically, in computer science, a system resource (system resource) means any physical or virtual component of a computer system that limits its computing power. Any device connected to a computer system is a resource, such as a keyboard, a screen, etc. Any component within a computer system is a resource, such as a CPU, RAM. Software virtualization components in computer systems, including files, network connections, and memory blocks, are a resource. The allocation of system resources refers to the allocation of computer software resources and hardware resources, so that the system resources are fully utilized and the system is not locked. Allocating system resources can be divided into the following four categories: processor allocation, memory allocation, I/O device allocation.
In computer science, a system resource (system resource) means any physical or virtual component of a computer system that limits its computing power. Any device connected to a computer system is a resource, such as a keyboard, a screen, etc. Any component within a computer system is a resource, such as a CPU, RAM. Software virtualization components in computer systems, including files, network connections, and memory blocks, are a resource. Allocating system resources refers to allocating computer software resources and hardware resources, so that the system resources are fully utilized and the system is not deadlocked. Allocating system resources can be divided into the following four categories: processor allocation, memory allocation, I/O device allocation.
Example 5
As shown in fig. 5, on the basis of the previous embodiment, the data fusion unit invokes the computing resources allocated by the resource allocation unit, performs data fusion on the data to be processed based on the data information acquired by the data evaluation unit, and stores the fused data by the following steps: according to the calculated percentage of the computer resources, calling the computer resources, extracting the data space of the data to be processed, and classifying the data to be processed into different target heterogeneous databases according to the data space of the data to be processed; carrying out normalization processing on the target heterogeneous database to obtain a classified target heterogeneous data matrix; and respectively mapping and matching the classified target heterogeneous data matrix with each directional data space group by using the following formula:
Figure RE-GDA0002528659280000131
wherein, sim (d)j,dk) In order to map the matching result,
Figure RE-GDA0002528659280000132
for a product target heterogeneous data matrix, wjiIs the matrix row value, | dj| is the value of the corresponding matrix determinant;
Figure RE-GDA0002528659280000133
for directional data space groups, wkiIs the matrix row value, | dk| | is the value of the corresponding matrix determinant; according to the result of the final mapping matching, matching mapping result sim (d)j,dk) The directional data space group corresponding to the minimum value is used as a data space corresponding to the product information to complete the construction of the data space; and performing chaotic fuzzy matching according to the constructed data space to complete integration of different heterogeneous data.
Example 6
A computer data processing method based on data fusion, the method performing the steps of: the data evaluation unit is used for evaluating the data to be processed and acquiring the data information of the data to be processed; the data information includes at least: data size, data type and data structure; the resource allocation unit allocates computing resources of the computer for data processing based on the data information acquired by the data evaluation unit according to a preset resource allocation model; and the data fusion unit is used for calling the computing resources distributed by the resource distribution unit, carrying out data fusion on the data to be processed based on the data information acquired by the data evaluation unit, and storing the fused data.
Example 7
On the basis of the above embodiment, the data evaluation unit includes: a plurality of data identification subunits; the data identification subunit is used for training respectively based on a plurality of dimensions and a plurality of feature spaces; the trained data identification subunit can analyze the data to be processed under the corresponding dimension and feature space to obtain an analysis result; the data evaluation unit further includes: and the analysis integration unit is used for integrating the analysis results of all the data identification subunits to obtain the data scale, the data type and the data structure of the data to be processed.
Example 8
On the basis of the above embodiment, the dimensions are defined as: characteristics of the data, i.e., data size, data type, and data structure; the training process specifically comprises: the data identification subunit extracts data based on pre-collected training data samples under the data scale dimension, the data type dimension or the data structure dimensionAnd (4) counting the times of the data characteristics conforming to each characteristic space by using the following formula:
Figure BDA0002498983210000141
wherein N is the number of times of conforming to the feature space, S is the number of data, λiIs the weight of the ith training sample, M is the number of features in each feature space, countjThe number of data features of the ith training sample; setting the priority of the feature space corresponding to the training sample from high to low according to the counted times that the training sample conforms to each feature space and from multiple to few to finish the training of the data feature space; when the data to be processed is evaluated, the data identification subunit performs feature space mapping under corresponding dimensionality on the data to be processed respectively, counts the feature space mapping result, and takes the mapping result with the highest frequency as the identification result.
Example 9
On the basis of the previous embodiment, the method for allocating the computing resources of the computer for data processing by the resource allocation unit according to the preset resource allocation model and based on the data information acquired by the data evaluation unit executes the following steps: establishing a resource allocation model, wherein the resource allocation model is represented by the following general expression:
Figure BDA0002498983210000142
wherein F (x) is the percentage of resources allocated, the data information comprises the data size, the weighted average of the data type and the data structure, α is a constant, α>3,
Figure BDA0002498983210000143
Is a standard average value and is a set constant; according to the established resource allocation model, firstly, the weighted average value of the data information is calculated by the following formula: the data scale size is A + the weight value corresponding to the data type is B + the weight value corresponding to the data structure is C; wherein, the weight value corresponding to the data type is as follows: presetting different numbers for different data typesThe value is taken as its weight; the weight corresponding to the data structure is defined as: presetting different numerical values as weights of different data structures; and then calculating the percentage of the computer resources which should be allocated by using a resource allocation model, and sending the calculation result to the data fusion unit.
Example 10
On the basis of the previous embodiment, the data fusion unit calls the computing resources allocated by the resource allocation unit, performs data fusion on the data to be processed based on the data information acquired by the data evaluation unit, and stores the fused data by the following steps: according to the calculated percentage of the computer resources, calling the computer resources, extracting the data space of the data to be processed, and classifying the data to be processed into different target heterogeneous databases according to the data space of the data to be processed; carrying out normalization processing on the target heterogeneous database to obtain a classified target heterogeneous data matrix; and respectively mapping and matching the classification target heterogeneous data matrix with each directional data space group by using the following formula:
Figure BDA0002498983210000151
wherein, sim (d)j,dk) In order to map the matching result,
Figure BDA0002498983210000152
for a product target heterogeneous data matrix, wjiIs the matrix row value, | dj| is the value of the corresponding matrix determinant;
Figure BDA0002498983210000153
for directional data space groups, wkiIs the matrix row value, | dk| | is the value of the corresponding matrix determinant; according to the result of the final mapping matching, matching mapping result sim (d)j,dk) The directional data space group corresponding to the minimum value is used as a data space corresponding to the product information to complete the construction of the data space; and performing chaotic fuzzy matching according to the constructed data space to complete integration of different heterogeneous data.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that, the system provided in the foregoing embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further decomposed into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether these functions are performed as electronic hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (10)

1. A computer data processing system based on data fusion, the system comprising: the data evaluation unit is used for evaluating the data to be processed and acquiring the data information of the data to be processed; the data information at least comprises: data size, data type and data structure; the resource allocation unit allocates computing resources of the computer for data processing based on the data information acquired by the data evaluation unit according to a preset resource allocation model; and the data fusion unit is used for calling the computing resources distributed by the resource distribution unit, carrying out data fusion on the data to be processed based on the data information acquired by the data evaluation unit, and storing the fused data.
2. The system of claim 1, wherein the data evaluation unit comprises: a plurality of data identification subunits; the data identification subunit is used for training respectively based on a plurality of dimensions and a plurality of feature spaces; the trained data identification subunit can analyze the data to be processed under the corresponding dimension and characteristic space to obtain an analysis result; the data evaluation unit further includes: and the analysis integration unit is used for integrating the analysis results of all the data identification subunits to obtain the data scale, the data type and the data structure of the data to be processed.
3. The system of claim 2, wherein the dimensions are defined as: characteristics of the data, i.e., data size, data type, and data structure; the training process specifically comprises: the data identification subunit extracts data features based on training data samples collected in advance under the data scale dimension, the data type dimension or the data structure dimension, and counts the times of data features conforming to each feature space by using the following formula:
Figure FDA0002498983200000011
Figure FDA0002498983200000012
wherein N is the number of times of conforming to the feature space, S is the number of data, and lambdaiFor the weight of the ith training sample, M is the number of features in each feature space, countjThe number of data features of the ith training sample; setting the priority of the feature space corresponding to the training sample from high to low according to the counted times that the training sample conforms to each feature space and from multiple to few to finish the training of the data feature space; when the data to be processed is evaluated, the data identification subunit performs feature space mapping under corresponding dimensionality on the data to be processed respectively, counts feature space mapping results, and takes the mapping result with the highest frequency as an identification result.
4. The system of claim 3, wherein the resource allocation unit, based on the data information obtained by the data evaluation unit according to a preset resource allocation model, performs the following steps: establishing a resource allocation model, wherein the resource allocation model is represented by the following notations:
Figure FDA0002498983200000021
f (x) percentage of allocated resources, x is data information, data size, weighted average of data type and data structure, α is constant, α>3,
Figure FDA0002498983200000022
Is a standard average value and is a set constant; according to the established resource allocation model, firstly, the weighted average value of the data information is calculated by the following formula: the data scale size is A + the weight value corresponding to the data type is B + the weight value corresponding to the data structure is C; wherein, the weight value corresponding to the data type is as follows: presetting different numerical values as weights of different data types; the weight corresponding to the data structure is defined as: presetting different numerical values as weights of different data structures; and then calculating the percentage of the computer resources which should be allocated by using a resource allocation model, and sending the calculation result to the data fusion unit.
5. The system of claim 4, wherein the data fusion unit calls the computing resources allocated by the resource allocation unit, performs data fusion on the data to be processed based on the data information acquired by the data evaluation unit, and the method for storing the fused data performs the following steps: according to the calculated percentage of the computer resources, calling the computer resources, extracting the data space of the data to be processed, and classifying the data to be processed into different target heterogeneous databases according to the data space of the data to be processed; carrying out normalization processing on the target heterogeneous database to obtain a classified target heterogeneous data matrix; the use is as followsAnd (3) a formula, namely respectively mapping and matching the classified target heterogeneous data matrix with each directional data space group:
Figure FDA0002498983200000031
wherein, sim (d)j,dk) In order to map the matching result,
Figure FDA0002498983200000032
for a product target heterogeneous data matrix, wjiIs the matrix row value, | dj| is the value of the corresponding matrix determinant;
Figure FDA0002498983200000033
for directional data space groups, wkiIs the matrix row value, | dk| | is the value of the corresponding matrix determinant; according to the result of the final mapping matching, matching mapping result sim (d)j,dk) The directional data space group corresponding to the minimum value is used as a data space corresponding to the product information to complete the construction of the data space; and performing chaotic fuzzy matching according to the constructed data space to complete integration of different heterogeneous data.
6. Computer data processing method based on data fusion according to the system of one of claims 1 to 5, characterized in that the method performs the following steps: the data evaluation unit is used for evaluating the data to be processed and acquiring the data information of the data to be processed; the data information at least comprises: data size, data type and data structure; the resource allocation unit allocates computing resources of the computer for data processing based on the data information acquired by the data evaluation unit according to a preset resource allocation model; and the data fusion unit is used for calling the computing resources distributed by the resource distribution unit, carrying out data fusion on the data to be processed based on the data information acquired by the data evaluation unit, and storing the fused data.
7. The method of claim 6, wherein the data evaluation unit comprises: a plurality of data identification subunits; the data identification subunit is used for training respectively based on a plurality of dimensions and a plurality of feature spaces; the trained data identification subunit can analyze the data to be processed under the corresponding dimension and characteristic space to obtain an analysis result; the data evaluation unit further includes: and the analysis integration unit is used for integrating the analysis results of all the data identification subunits to obtain the data scale, the data type and the data structure of the data to be processed.
8. The method of claim 7, wherein the dimensions are defined as: characteristics of the data, i.e., data size, data type, and data structure; the training process specifically comprises: the data identification subunit extracts data features based on training data samples collected in advance under the data scale dimension, the data type dimension or the data structure dimension, and counts the times of data features conforming to each feature space by using the following formula:
Figure FDA0002498983200000041
Figure FDA0002498983200000042
wherein N is the number of times of conforming to the feature space, S is the number of data, and lambdaiFor the weight of the ith training sample, M is the number of features in each feature space, countjThe number of data features of the ith training sample; setting the priority of the feature space corresponding to the training sample from high to low according to the counted times that the training sample conforms to each feature space and from multiple to few to finish the training of the data feature space; when the data to be processed is evaluated, the data identification subunit performs feature space mapping under corresponding dimensionality on the data to be processed respectively, counts feature space mapping results, and takes the mapping result with the highest frequency as an identification result.
9. The method of claim 8, wherein the resource allocation unit, based on the data information obtained by the data evaluation unit according to a preset resource allocation model, performs the following steps: establishing a resource allocation model, wherein the resource allocation model is represented by the following notations:
Figure FDA0002498983200000043
wherein F (x) is percentage of allocated resources, x is data information, namely data size, weighted average of data type and data structure, α is constant, α>3,
Figure FDA0002498983200000044
Is a standard average value and is a set constant; according to the established resource allocation model, firstly, the weighted average value of the data information is calculated by the following formula: the data scale size is A + the weight value corresponding to the data type is B + the weight value corresponding to the data structure is C; wherein, the weight value corresponding to the data type is as follows: presetting different numerical values as weights of different data types; the weight corresponding to the data structure is defined as: presetting different numerical values as weights of different data structures; and then calculating the percentage of the computer resources which should be allocated by using a resource allocation model, and sending the calculation result to the data fusion unit.
10. The method according to claim 9, wherein the data fusion unit calls the computing resources allocated by the resource allocation unit, performs data fusion on the data to be processed based on the data information acquired by the data evaluation unit, and stores the fused data by performing the following steps: according to the calculated percentage of the computer resources, calling the computer resources, extracting the data space of the data to be processed, and classifying the data to be processed into different target heterogeneous databases according to the data space of the data to be processed; carrying out normalization processing on the target heterogeneous database to obtain a classified target heterogeneous data matrix; using the following formula, will be divided intoAnd (3) mapping and matching the similar target heterogeneous data matrix with each directional data space group respectively:
Figure FDA0002498983200000051
wherein, sim (d)j,dk) In order to map the matching result,
Figure FDA0002498983200000052
for a product target heterogeneous data matrix, wjiIs the matrix row value, | dj| is the value of the corresponding matrix determinant;
Figure FDA0002498983200000053
for directional data space groups, wkiIs the matrix row value, | dk| | is the value of the corresponding matrix determinant; according to the result of the final mapping matching, matching mapping result sim (d)j,dk) The directional data space group corresponding to the minimum value is used as a data space corresponding to the product information to complete the construction of the data space; and performing chaotic fuzzy matching according to the constructed data space to complete integration of different heterogeneous data.
CN202010426699.3A 2020-05-19 2020-05-19 Computer data processing system and method based on data fusion Withdrawn CN111597399A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010426699.3A CN111597399A (en) 2020-05-19 2020-05-19 Computer data processing system and method based on data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010426699.3A CN111597399A (en) 2020-05-19 2020-05-19 Computer data processing system and method based on data fusion

Publications (1)

Publication Number Publication Date
CN111597399A true CN111597399A (en) 2020-08-28

Family

ID=72182646

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010426699.3A Withdrawn CN111597399A (en) 2020-05-19 2020-05-19 Computer data processing system and method based on data fusion

Country Status (1)

Country Link
CN (1) CN111597399A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112486961A (en) * 2020-11-18 2021-03-12 广西电网有限责任公司电力科学研究院 Method and device for processing big data in real time
CN114816771A (en) * 2022-06-27 2022-07-29 深圳市乐易网络股份有限公司 Multi-channel hybrid cloud computing system
CN115859159A (en) * 2023-02-16 2023-03-28 北京爱企邦科技服务有限公司 Data evaluation processing method based on data integration

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112486961A (en) * 2020-11-18 2021-03-12 广西电网有限责任公司电力科学研究院 Method and device for processing big data in real time
CN114816771A (en) * 2022-06-27 2022-07-29 深圳市乐易网络股份有限公司 Multi-channel hybrid cloud computing system
CN114816771B (en) * 2022-06-27 2022-09-13 深圳市乐易网络股份有限公司 Multi-channel hybrid cloud computing system
CN115859159A (en) * 2023-02-16 2023-03-28 北京爱企邦科技服务有限公司 Data evaluation processing method based on data integration
CN115859159B (en) * 2023-02-16 2023-05-05 北京爱企邦科技服务有限公司 Data evaluation processing method based on data integration

Similar Documents

Publication Publication Date Title
Zhong et al. A cyber security data triage operation retrieval system
CN111597399A (en) Computer data processing system and method based on data fusion
US20210026909A1 (en) System and method for identifying contacts of a target user in a social network
CN111428231A (en) Safety processing method, device and equipment based on user behaviors
CN111614690A (en) Abnormal behavior detection method and device
Sheshasayee et al. Comparative study of fuzzy C means and K means algorithm for requirements clustering
WO2007117423A2 (en) Method and apparatus for representing multidimensional data
Alguliyev et al. Anomaly detection in Big data based on clustering
Jiang et al. A family of joint sparse PCA algorithms for anomaly localization in network data streams
CN112463859B (en) User data processing method and server based on big data and business analysis
CN112668688B (en) Intrusion detection method, system, equipment and readable storage medium
CN108322428A (en) A kind of abnormal access detection method and equipment
CN111831706A (en) Mining method and device for association rules among applications and storage medium
CN111191601A (en) Method, device, server and storage medium for identifying peer users
CN116701979A (en) Social network data analysis method and system based on limited k-means
Plonus et al. Automatic plankton image classification—can capsules and filters help cope with data set shift?
CN114817243A (en) Method, device and equipment for establishing database joint index and storage medium
Zubi et al. Using data mining techniques to analyze crime patterns in the libyan national crime data
CN110598914B (en) Mine disaster gas concentration interval prediction method and system under influence of multiple factors
CN116707859A (en) Feature rule extraction method and device, and network intrusion detection method and device
CN108830302B (en) Image classification method, training method, classification prediction method and related device
CN111339986A (en) Frequency law mining method and system for equipment based on time domain/frequency domain analysis
CN112613562B (en) Data analysis system and method based on multi-center cloud computing
CN114124484B (en) Network attack identification method, system, device, terminal equipment and storage medium
CN115470504A (en) Data risk analysis method and server combined with artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200828