CN113312428A - Multi-source heterogeneous training data fusion method, device and equipment - Google Patents

Multi-source heterogeneous training data fusion method, device and equipment Download PDF

Info

Publication number
CN113312428A
CN113312428A CN202110592669.4A CN202110592669A CN113312428A CN 113312428 A CN113312428 A CN 113312428A CN 202110592669 A CN202110592669 A CN 202110592669A CN 113312428 A CN113312428 A CN 113312428A
Authority
CN
China
Prior art keywords
data
sharing
training data
military training
heterogeneous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110592669.4A
Other languages
Chinese (zh)
Inventor
徐庆尧
杨超
唐立文
侯翔
殷智勇
邢维艳
姜曙
席文雄
胥霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Original Assignee
Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peoples Liberation Army Strategic Support Force Aerospace Engineering University filed Critical Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority to CN202110592669.4A priority Critical patent/CN113312428A/en
Publication of CN113312428A publication Critical patent/CN113312428A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a multi-source heterogeneous training data fusion method, a device and equipment, wherein the method comprises the following steps: the built standardized big data aggregation sharing platform is adopted to be respectively connected with each heterogeneous data source and respectively accessed into military training data of each heterogeneous data source; the standardized big data aggregation and sharing platform is a big data platform constructed based on a Hadoop distributed system and an HDFS distributed storage system; carrying out cleaning, weight removing and denoising pretreatment on each military training data through a standardized big data convergence sharing platform; mapping each preprocessed military training data to a standard logic space of the HDFS distributed storage system by using a metadata mapping mode and storing the military training data; and transmitting each military training data which is requested to be shared by each data sharing request in a set data sharing mode according to the data sharing requests of departments, applications and services which need to share data and the corresponding sharing authority through a standardized big data convergence sharing platform. And efficient military training data fusion and sharing are realized.

Description

Multi-source heterogeneous training data fusion method, device and equipment
Technical Field
The application relates to the technical field of military big data processing and application, in particular to a multi-source heterogeneous training data fusion method, device and equipment.
Background
With the deep advance of army information-based construction and the rapid development of new technologies represented by cloud computing, big data and artificial intelligence, the military training field is confronted with changes in concept and innovations in method and means. The digitization degree and the networking degree are continuously improved, the types of data generated in military training activities are increasingly increased and the quantity is multiplied, and the big data concept is used for guiding the military training practice and becomes essential basic support for the understanding of training laws, the evaluation of training effects, the evaluation of training benefits and the supervision of training quality at all levels.
The method has the advantages of deeply exploring the potential value of training data, analyzing the application of a big data technology in military training, actively exploring a countermeasure measure for promoting the construction of the big data of the military training, and having very important significance for improving the benefit of the military training and promoting the innovative development of the military training. In the army information construction process, due to the influence of factors such as the stage, the technology, other economic factors and human factors of the construction and implementation of the data management system of each business system, a large amount of business data adopting different storage modes are accumulated in the development process, the adopted data management systems are quite different, and the business data form a heterogeneous data source of an enterprise from a simple file database to a complex network database. However, in the process of implementing the method, the inventor finds that the technical problem that multi-source heterogeneous training data cannot be fused and shared in military training big data application construction still exists.
Disclosure of Invention
In view of the above, it is necessary to provide a multi-source heterogeneous training data fusion method, a multi-source heterogeneous training data fusion device, a computer device, and a standardized big data convergence sharing platform for efficient fusion and sharing of multi-source heterogeneous training data.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
on one hand, the embodiment of the invention provides a multi-source heterogeneous training data fusion method, which comprises the following steps:
the built standardized big data aggregation sharing platform is adopted to be respectively connected with each heterogeneous data source and respectively accessed into military training data of each heterogeneous data source; the standardized big data aggregation and sharing platform is a big data platform constructed based on a Hadoop distributed system and an HDFS distributed storage system, and military training data comprise message streaming data, structured report data, attribute data, unstructured text and picture data and video and voice streaming data;
carrying out cleaning, weight removing and denoising pretreatment on each military training data through a standardized big data convergence sharing platform;
mapping each preprocessed military training data to a standard logic space of the HDFS distributed storage system by using a metadata mapping mode and storing the military training data;
and transmitting each military training data which is requested to be shared by each data sharing request in a set data sharing mode according to the data sharing requests of departments, applications and services which need to share data and the corresponding sharing authority through a standardized big data convergence sharing platform.
On the other hand, still provide a multisource heterogeneous training data fusion device, include:
the data access module is used for respectively connecting each heterogeneous data source by adopting the built standardized big data aggregation sharing platform and respectively accessing military training data of each heterogeneous data source; the standardized big data aggregation and sharing platform is a big data platform constructed based on a Hadoop distributed system and an HDFS distributed storage system, and military training data comprise message streaming data, structured report data, attribute data, unstructured text and picture data and video and voice streaming data;
the preprocessing module is used for carrying out preprocessing of cleaning, weight removing and noise removing on each military training data through the standardized big data convergence sharing platform;
the mapping fusion module is used for mapping each preprocessed military training data to a standard logic space of the HDFS distributed storage system by using a metadata mapping mode and storing the military training data;
and the sharing service module is used for transmitting each military training data which is requested to be shared and corresponds to each data sharing request in a set data sharing mode according to the data sharing requests of departments, applications and services which need to share data and the corresponding sharing authority through the standardized big data convergence sharing platform.
In another aspect, a computer device is further provided, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the above-mentioned multi-source heterogeneous training data fusion method when executing the computer program.
On the other hand, the standardized big data aggregation and sharing platform comprises a heterogeneous data aggregation layer, a data exchange integration layer, a big data storage layer, a data sharing layer and a data service layer, wherein the heterogeneous data aggregation layer, the data exchange integration layer, the big data storage layer, the data sharing layer and the data service layer are constructed on the basis of a Hadoop distributed system and an HDFS distributed storage system;
the heterogeneous data convergence layer is used for respectively connecting each heterogeneous data source and respectively accessing military training data of each heterogeneous data source; the military training data comprises message streaming data, structured report data, attribute class data, unstructured text picture data and video voice streaming data;
the data exchange integration layer is used for preprocessing each military training data; the preprocessing comprises collecting, cleaning, removing the weight, removing the noise, exchanging, correlating and comparing data;
the big data storage layer is used for mapping each preprocessed military training data to a standard logic space by using a metadata mapping mode and storing the military training data;
the data sharing layer is used for outputting military training data which are required to be shared and correspond to the data sharing requests in a set data sharing mode;
the data service layer is used for providing data development service for each military training data; the data development service comprises a retrieval query service, an uploading service, a synchronization service, a downloading service, an analysis service and a template service.
One of the above technical solutions has the following advantages and beneficial effects:
according to the multi-source heterogeneous training data fusion method, device and equipment, a standardized large data convergence sharing platform which is built based on a Hadoop distributed system and an HDFS distributed storage system is adopted, each heterogeneous data source is respectively connected and connected into each heterogeneous military training data, after the data are converged to the platform, preprocessing such as cleaning, de-weighting and de-noising is carried out, then, a metadata mapping mode (mechanism) is utilized to map various types of heterogeneous data to a standard logic space of the HDFS distributed storage system for storage, so that a data fusion sharing service system is constructed, and the structured and text picture data and service application coupling is realized on the premise of not changing original data. Finally, for data sharing requirements among different departments, different applications and different services, different authorities can be opened by the platform according to the data requirements to ensure the uniform allocation of data resources and the control of the authorities, so that military training data corresponding to sharing requests and shared by the sharing requests are transmitted by the platform according to corresponding sharing requests and sharing authorities thereof by a set data sharing method respectively, the purposes of gathering, storing and sharing big data of multi-source heterogeneous data in military training are achieved, the data fusion sharing efficiency is high in the actual construction of military training big data applications, and the data fusion sharing service has high stability and reliability.
Drawings
FIG. 1 is a diagram of a multi-source heterogeneous data fusion architecture in one embodiment;
FIG. 2 is a schematic flow chart of a multi-source heterogeneous training data fusion method according to an embodiment;
fig. 3 is a schematic block diagram of a multi-source heterogeneous training data fusion apparatus according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should be considered to be absent and not within the protection scope of the present invention.
Aiming at the problems of convergence and sharing of multi-source heterogeneous data such as real-time message data, various structured report data and attribute data, unstructured text pictures, various streaming data of video and voice and the like in the military training process, the invention constructs a uniform big data convergence sharing platform (namely the standardized big data convergence sharing platform described below). In practical research, when the inventor finds that data access platforms of different channels and different formats are accessed, the original data are still nonstandard structured or unstructured information generally, a standardized data model can be established according to the difference of data service types and contents, a platform for realizing large data management and application is constructed, the operations of gathering, extracting, cleaning, converting, combining and the like of multi-source heterogeneous data are realized, and the data are integrated and unified. In practical application, a standardized big data aggregation and sharing platform is established, evaluation tests are carried out on the performance of aggregation, storage and sharing of the big data of the platform, and the platform is found to have higher stability and reliability.
Referring to fig. 1, in an embodiment, a standardized big data aggregation and sharing platform provided by the present application includes a heterogeneous data aggregation layer, a data exchange integration layer, a big data storage layer, a data sharing layer, and a data service layer, which are constructed based on a Hadoop distributed system and an HDFS distributed storage system. The heterogeneous data convergence layer is used for respectively connecting each heterogeneous data source and respectively accessing military training data of each heterogeneous data source; the military training data comprises message streaming data, structured report data, attribute class data, unstructured text picture data and video voice streaming data. The data exchange integration layer is used for preprocessing each military training data; preprocessing includes collection, cleaning, deduplication, denoising, swapping, correlation, and data comparison. And the big data storage layer is used for mapping each preprocessed military training data to a standard logic space by using a metadata mapping mode and storing the military training data. The data sharing layer is used for outputting each military training data which are requested to be shared and correspond to each data sharing request in a set data sharing mode. The data service layer is used for providing data development service for each military training data; the data development service comprises a retrieval query service, an uploading service, a synchronization service, a downloading service, an analysis service and a template service.
Specifically, the invention starts from the aspects of data exchange, storage, sharing, service, safety and the like, and builds a training big data platform. The platform is a big data platform based on a Hadoop distributed system, adopts an HDFS distributed storage system, internally integrates JDBC components (namely Java Database connection, Java Database Connectivity, JDBC for short) which are application program interfaces used for standardizing how a client program accesses a Database in Java language and provide methods such as inquiring and updating data in the Database), ODBC components (Open Database Connectivity, namely Open Database connection, which is generated for solving data sharing among heterogeneous databases, provides a uniform interface for heterogeneous Database access, allows an application program to access data managed by different DBMS (Database management system) by taking SQL as a data access standard, and Kafka components (namely an Open source stream processing platform and a high-throughput distributed publishing and subscribing message system) and can uniformly process online and offline messages by a parallel loading mechanism of Hadoop, or providing real-time messages through clusters) and Sqoop components (which are tools for opening sources and can lead data in a relational database to HDFS of Hadoop or lead data of HDFS to the relational database).
In addition, real-time online data processing of the Apache Hbase database and a Hive data warehouse tool are used as a calculation execution engine. Among them, the Apache Hbase database is a distributed, column-oriented open source database. The Hive data warehouse tool may be used to perform operations such as data extraction, transformation, and loading, which provides a mechanism by which large-scale data stored in Hadoop may be stored, queried, and analyzed. The Hive data warehouse tool can map the structured data file into a database table, provides SQL (structured query language) query function, and can convert SQL sentences into MapReduce tasks to execute.
It can be understood that each existing component and tool applied in the platform may be adapted and combined according to a protocol provided by the component and tool, as long as the platform can realize the cooperative work of each component and tool, and build the platform capable of realizing the required multi-source heterogeneous data fusion and sharing service, and the type of the specifically adopted interface protocol and the like in this specification is not limited uniquely. The standardized big data aggregation sharing platform can be carried by one or more computer systems and can be determined according to application needs of military training scenes.
As shown in fig. 1, the standardized big data convergence and sharing platform (hereinafter referred to as a platform) includes five architecture levels, i.e., a heterogeneous data convergence layer, a data exchange integration layer, a big data storage layer, a data sharing layer, and a data service layer, and provides two security systems, i.e., a data security system and a data standardization system. In the standardized big data convergence sharing platform, a multi-source heterogeneous big data multi-mode exchange method realizes the support of 'data integration' of a logic layer; aiming at the high-efficiency storage and index technology of data, the support of datamation and high-efficiency query is realized; the unified resource authority control technology realizes the support of 'multi-user' access; the support faces to the dynamic data service of the user, and realizes the support of 'data development' of a service layer; a technical system facing a big data convergence sharing platform is formed by establishing a standard system and a data security guarantee system, standardizing a data convergence sharing operation process, and guaranteeing service continuity and data security.
Referring to fig. 2, in an embodiment, the present invention further provides a multi-source heterogeneous training data fusion method, including the following steps S12 to S18:
s12, respectively connecting each heterogeneous data source by adopting a built standardized big data convergence sharing platform, and respectively accessing military training data of each heterogeneous data source; the standardized big data aggregation and sharing platform is a big data platform constructed based on a Hadoop distributed system and an HDFS distributed storage system, and military training data comprise message streaming data, structured report data, attribute data, unstructured text and picture data and video and voice streaming data.
It is to be understood that, for the explanation of the standardized big data aggregation sharing platform in the present embodiment, the same principle can be understood with reference to the above-mentioned embodiment of the standardized big data aggregation sharing platform. In a military training scene, information of different data sources can be converged through different data exchange protocols according to source data types, so that the standardized big data convergence sharing platform can be respectively connected with various heterogeneous data sources through JDBC, ODBC, Kafka, Sqoop, FTP (File Transfer Protocol), ETL (Extract-Transform-Load, data warehouse technology), XML (extensible markup language) and JSON (JavaScript Object Notation), and the like, so as to access various military training data of various heterogeneous data sources and converge various heterogeneous military training data to the platform.
And S14, cleaning, de-weighting and de-noising the military training data through the standardized big data gathering and sharing platform.
And S16, mapping each preprocessed military training data to a standard logic space of the HDFS distributed storage system by using a metadata mapping mode and storing the military training data.
It can be understood that after the data aggregation platform, after preprocessing such as cleaning, duplication removal and denoising is performed on each military training data, the metadata mapping mechanism is utilized to map the data resources of the multi-type heterogeneous system to a standard logic space, so as to construct a data sharing service system. On the premise of not changing the original data, the coupling of structured and text picture data and service application is realized. Regarding the detailed implementation process of the preprocessing in the foregoing steps, the same can be understood by referring to the data processing functions provided by the above-mentioned components and engines integrated in the platform.
And S18, transmitting each military training data which are requested to be shared and correspond to each data sharing request in a set data sharing mode respectively through the standardized big data convergence sharing platform according to the data sharing requests and corresponding sharing authorities of departments, applications and services which need to share data.
It can be understood that in the face of data sharing requirements among different departments, different applications, and different services, the platform may open different user/node permissions according to data requirements such as data type (offline/streaming), data unit (KB/MB/GB/TB), data real-time requirement (weekly/monthly/real-time), data security level requirement, whether data requires encryption, to ensure unified allocation of resources and management and control of permissions, such as management permissions including query, upload, synchronization, download, analysis, template, etc. of data. The set data sharing mode is an adaptive service mode of the data transmission service adopted by the platform according to different data characteristics and service scenes, and various data transmission service protocols or interfaces in the field can be adopted.
Specifically, the platform can provide data sharing requests to the platform according to data sharing requests received by different departments, different applications and data sharing requirements of different services requiring data sharing, and the platform can transmit the military training data requested to the departments, the applications and the services sending the requests in a set data sharing mode respectively according to the received data sharing requests and sharing authorities corresponding to the requests.
According to the multi-source heterogeneous training data fusion method, a standardized large data convergence sharing platform which is built based on a Hadoop distributed system and an HDFS distributed storage system is adopted, each heterogeneous data source is connected and connected into each heterogeneous military training data, after the data are converged to the platform, preprocessing such as cleaning, duplication removal and denoising is carried out, then, a metadata mapping mode (mechanism) is utilized to map various types of heterogeneous data to a standard logic space of the HDFS distributed storage system for storage, so that a data fusion sharing service system is constructed, and therefore, on the premise that original data are not changed, structural and coupling of text picture data and service application is achieved. Finally, for data sharing requirements among different departments, different applications and different services, different authorities can be opened by the platform according to the data requirements to ensure the uniform allocation of data resources and the control of the authorities, so that military training data corresponding to sharing requests and shared by the sharing requests are transmitted by the platform according to corresponding sharing requests and sharing authorities thereof by a set data sharing method respectively, the purposes of gathering, storing and sharing big data of multi-source heterogeneous data in military training are achieved, the data fusion sharing efficiency is high in the actual construction of military training big data applications, and the data fusion sharing service has high stability and reliability.
In an embodiment, the process of accessing and storing various types of message streaming data may specifically include the following processing steps:
collecting various message streaming data from Kafka components of a standardized big data aggregation sharing platform at set time intervals in a distributed message queue mode; setting the time interval to be any value between 50ms and 500 ms;
and mapping various received message streaming data into a two-dimensional relation table by adopting a Stream + Holodesk streaming big data processing framework, converting the two-dimensional relation table into a memory column and storing the memory column into a Holodesk (SSD) component.
Specifically, for various types of message streaming data with high real-time requirements, the streaming data can be collected through a distributed message queue, and a streaming big data processing framework of Stream and Holodesk (distributed column storage component) is adopted to perform interactive processing and analysis on the real-time data. The platform receives a batch of time sequence data (including various message stream data) from the Kafka component every 50-500 ms, and the received various message stream data are mapped into a two-dimensional relation table to be converted and converted into an internal memory array type for storage. The transformed data is written to the Holodesk (SSD) in real time to persist the data onto the SSD so that the columnar data on the SSD can be analyzed by the data retrieval service. By the processing mode, the convergence and fusion processing of various message stream data can be realized more efficiently.
In an embodiment, the process of accessing and storing video and voice streaming data may specifically include the following processing steps:
accessing real-time video voice streaming data through front-end convergence equipment or a direct-connected camera by using a standardized big data convergence sharing platform;
after streaming media forwarding, video analysis and video structuring processing are carried out on the video and voice streaming data through the standardized big data aggregation sharing platform, the video and voice streaming data are stored in the HDFS distributed storage system.
Specifically, for Video data, the platform may access a real-time Video stream through front-end convergence devices such as a Network Video Recorder (NVR), a Digital Video Recorder (DVR), and a Digital Video Recorder (DVR), or directly access a real-time Video stream through a data acquisition camera in a military training scene, and access the platform through an isolation conversion device (such as a gateway and a switch) for each training Video. The accessed video stream is subjected to services such as streaming media forwarding, video analysis, video structuring and the like, data is stored in a distributed file system, and meanwhile video application sharing can be performed through related standard protocols. By the processing mode, the convergence and fusion processing of various video and voice streaming data can be realized more efficiently.
In one embodiment, a distributed NoSQL real-time database Hyperbase is also arranged on the HDFS distributed storage system; the real-time database Hyperbase is used for providing retrieval service when training data are shared.
The data storage bottom layer of the platform is used for storing data by adopting a Hadoop distributed file storage system (HDFS), and the HDFS adopts a triple copy strategy to ensure the safety and reliability of the data. A distributed NoSQL (non-relational database) real-time database Hyperbase is provided on top of the HDFS, and platform support is provided for high-concurrency retrieval analysis and transaction support. The Hyperbase can support multi-dimensional millisecond-level global index, full-text index, combined index and other retrieval queries of mass data through various indexes. The platform storage layer supports low-cost storage of various structured, semi-structured and unstructured mass data, and provides basic support for storage and use of mass historical data. High concurrency and low-delay retrieval capability is provided through Hyperbase, and high-performance data access service is provided to the outside.
In one embodiment, the set data sharing mode includes an FTP mode, a database direct connection mode, a distributed message system Kaflca mode, a WebServices data exchange mode, a copy mode, a mail transmission mode, and a network capture mode.
Specifically, the platform may adopt different data service modes in the face of sharing requirements, data characteristics and service scenarios among different departments, different applications and different services. For example, for a data sharing request with a large data volume, a low real-time requirement and a simple service logic, an FTP mode can be adopted for data sharing; for data sharing of different databases of an internal system, data sharing can be performed in a database direct connection mode; for real-time and dynamic streaming data sharing, a distributed message system Kaflca can be adopted to realize data sharing among different applications and servers; for remote calling of cross-programming languages and cross-operating system platforms, the Web Services technology can be adopted to exchange data with each other for sharing; for the sharing scenes with network failure or safety requirements and the like, the data sharing can be carried out by adopting modes of copying, mail, network capturing and the like.
By adopting the data sharing mode, different data sharing can be supported efficiently and reliably.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps of fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least a portion of the sub-steps or stages of other steps.
Referring to fig. 3, a multi-source heterogeneous training data fusion apparatus 100 is further provided, which includes a data access module 13, a preprocessing module 15, a mapping fusion module 17, and a sharing service module 19. The data access module 13 is used for respectively connecting each heterogeneous data source by adopting a built standardized big data aggregation sharing platform and respectively accessing military training data of each heterogeneous data source; the standardized big data aggregation and sharing platform is a big data platform constructed based on a Hadoop distributed system and an HDFS distributed storage system, and military training data comprise message streaming data, structured report data, attribute data, unstructured text and picture data and video and voice streaming data. The preprocessing module 15 is used for performing preprocessing of cleaning, weight removing and noise removing on each military training data through the standardized big data convergence sharing platform. And the mapping fusion module 17 is configured to map each preprocessed military training data to a canonical logic space of the HDFS distributed storage system by using a metadata mapping manner, and store the canonical logic space. The sharing service module 19 is configured to transmit, through the standardized big data aggregation sharing platform, each military training data requested to be shared corresponding to each data sharing request in a set data sharing manner according to the data sharing request and the corresponding sharing permission of the department, application, and service that need to share data.
The multi-source heterogeneous training data fusion device 100 is characterized in that through cooperation of modules, by means of a standardized large data convergence sharing platform built based on a Hadoop distributed system and an HDFS distributed storage system, each heterogeneous data source is connected and connected into each heterogeneous military training data, after the data are converged to the platform, preprocessing such as cleaning, duplication removal and denoising is performed, then, various types of heterogeneous data are mapped to a standard logic space of the HDFS distributed storage system for storage through a metadata mapping mode (mechanism) to construct a data fusion sharing service system, and therefore, on the premise that original data are not changed, structural and text picture data are coupled with service application. Finally, for data sharing requirements among different departments, different applications and different services, different authorities can be opened by the platform according to the data requirements to ensure the uniform allocation of data resources and the control of the authorities, so that military training data corresponding to sharing requests and shared by the sharing requests are transmitted by the platform according to corresponding sharing requests and sharing authorities thereof by a set data sharing method respectively, the purposes of gathering, storing and sharing big data of multi-source heterogeneous data in military training are achieved, the data fusion sharing efficiency is high in the actual construction of military training big data applications, and the data fusion sharing service has high stability and reliability.
In one embodiment, in a process for enabling access and storing various types of message streaming data: the data access module 13 may be specifically configured to collect various types of message streaming data from the Kafka component of the standardized big data aggregation and sharing platform at set time intervals in a distributed message queue manner; the set time interval is any value between 50ms and 500 ms. The mapping fusion module may be specifically configured to map various received message streaming data into a two-dimensional relationship table by using a Stream + Holodesk streaming big data processing framework, and convert the two-dimensional relationship table into a memory array for storage in a Holodesk (ssd) component.
In an embodiment, in a process of implementing access and storing video and voice streaming data, the data access module 13 may be specifically configured to access the real-time video and voice streaming data through a front-end convergence device or a direct-connected camera by using a standardized big data convergence sharing platform.
The mapping and fusing module 17 may be specifically configured to perform streaming media forwarding, video analysis and video structuring on the video and voice streaming data through the standardized big data aggregation and sharing platform, and then store each of the video and voice streaming data in the HDFS distributed storage system.
In an embodiment, in the process of implementing the functions of the shared service module 19, the set data sharing modes adopted by the shared service module include an FTP mode, a database direct connection mode, a distributed message system Kaflca mode, a WebServices data exchange mode, a copy mode, a mail transmission mode, and a network capture mode.
In one embodiment, in the standardized big data aggregation and sharing platform, a distributed NoSQL real-time database superbase is further arranged on the HDFS distributed storage system; the real-time database Hyperbase is used for providing retrieval service when training data are shared.
For specific limitations of the multi-source heterogeneous training data fusion device 100, reference may be made to the corresponding limitations of the multi-source heterogeneous training data fusion method in the foregoing, and details are not repeated here. The modules in the multi-source heterogeneous training data fusion device 100 may be implemented in whole or in part by software, hardware, or a combination thereof. The modules may be embedded in a hardware form or a device independent of a specific data processing function, or may be stored in a memory of the device in a software form, so that a processor may invoke and execute operations corresponding to the modules, where the device may be various computer devices or server systems in the art.
In still another aspect, a computer device is provided, which includes a memory and a processor, the memory stores a computer program, and the processor executes the computer program to implement the following steps: the built standardized big data aggregation sharing platform is adopted to be respectively connected with each heterogeneous data source and respectively accessed into military training data of each heterogeneous data source; the standardized big data aggregation and sharing platform is a big data platform constructed based on a Hadoop distributed system and an HDFS distributed storage system, and military training data comprise message streaming data, structured report data, attribute data, unstructured text and picture data and video and voice streaming data; carrying out cleaning, weight removing and denoising pretreatment on each military training data through a standardized big data convergence sharing platform; mapping each preprocessed military training data to a standard logic space of the HDFS distributed storage system by using a metadata mapping mode and storing the military training data; and transmitting each military training data which is requested to be shared by each data sharing request in a set data sharing mode according to the data sharing requests of departments, applications and services which need to share data and the corresponding sharing authority through a standardized big data convergence sharing platform.
In one embodiment, the processor may further implement the additional steps or sub-steps in the above-described embodiments of the multi-source heterogeneous training data fusion method when executing the computer program.
In yet another aspect, there is also provided a computer readable storage medium having a computer program stored thereon, the computer program when executed by a processor implementing the steps of: the built standardized big data aggregation sharing platform is adopted to be respectively connected with each heterogeneous data source and respectively accessed into military training data of each heterogeneous data source; the standardized big data aggregation and sharing platform is a big data platform constructed based on a Hadoop distributed system and an HDFS distributed storage system, and military training data comprise message streaming data, structured report data, attribute data, unstructured text and picture data and video and voice streaming data; carrying out cleaning, weight removing and denoising pretreatment on each military training data through a standardized big data convergence sharing platform; mapping each preprocessed military training data to a standard logic space of the HDFS distributed storage system by using a metadata mapping mode and storing the military training data; and transmitting each military training data which is requested to be shared by each data sharing request in a set data sharing mode according to the data sharing requests of departments, applications and services which need to share data and the corresponding sharing authority through a standardized big data convergence sharing platform.
In one embodiment, when being executed by a processor, the computer program may further implement the additional steps or sub-steps in the above-mentioned embodiments of the multi-source heterogeneous training data fusion method.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link DRAM (Synchlink) DRAM (SLDRAM), Rambus DRAM (RDRAM), and interface DRAM (DRDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the spirit of the present application, and all of them fall within the scope of the present application. Therefore, the protection scope of the present patent should be subject to the appended claims.

Claims (10)

1. A multi-source heterogeneous training data fusion method is characterized by comprising the following steps:
respectively connecting each heterogeneous data source by adopting a built standardized big data convergence sharing platform, and respectively accessing military training data of each heterogeneous data source; the standardized big data aggregation and sharing platform is a big data platform constructed based on a Hadoop distributed system and an HDFS distributed storage system, and the military training data comprises message streaming data, structured report data, attribute data, unstructured text and picture data and video and voice streaming data;
cleaning, de-weighting and de-noising the military training data through the standardized big data convergence sharing platform;
mapping each preprocessed military training data to a standard logic space of the HDFS distributed storage system by using a metadata mapping mode and storing the military training data;
and transmitting each military training data which is requested to be shared by each data sharing request in a corresponding way through the standardized big data convergence sharing platform according to the data sharing requests and corresponding sharing authorities of departments, applications and services which need to share data.
2. The multi-source heterogeneous training data fusion method according to claim 1, wherein the process of accessing and storing various types of message streaming data comprises:
collecting various message streaming data from the Kafka component of the standardized big data aggregation sharing platform at set time intervals in a distributed message queue mode; the set time interval is any value between 50ms and 500 ms;
and mapping the received various message Stream data into a two-dimensional relation table by adopting a Stream + Holodesk Stream type big data processing framework, converting the two-dimensional relation table into a memory column and storing the memory column into a Holodesk (SSD) component.
3. The multi-source heterogeneous training data fusion method of claim 1, wherein the process of accessing and storing the video and voice streaming data comprises:
accessing the real-time video voice streaming data through front-end convergence equipment or a direct-connected camera by using the standardized big data convergence sharing platform;
and after the video voice streaming data are subjected to streaming media forwarding, video analysis and video structuring processing through the standardized big data aggregation sharing platform, storing each video voice streaming data to the HDFS distributed storage system.
4. The multi-source heterogeneous training data fusion method according to any one of claims 1 to 3, wherein the set data sharing modes include an FTP mode, a database direct connection mode, a distributed message system Kaflca mode, a WebServices data exchange mode, a copy mode, a mail transmission mode and a network capture mode.
5. The multi-source heterogeneous training data fusion method according to claim 1, wherein a distributed NoSQL real-time database superbase is further provided on the HDFS distributed storage system; the real-time database Hyperbase is used for providing retrieval service when training data are shared.
6. A multi-source heterogeneous training data fusion device, comprising:
the data access module is used for respectively connecting each heterogeneous data source by adopting a built standardized big data convergence sharing platform and respectively accessing military training data of each heterogeneous data source; the standardized big data aggregation and sharing platform is a big data platform constructed based on a Hadoop distributed system and an HDFS distributed storage system, and the military training data comprises message streaming data, structured report data, attribute data, unstructured text and picture data and video and voice streaming data;
the preprocessing module is used for carrying out cleaning, weight removing and denoising preprocessing on the military training data through the standardized big data convergence sharing platform;
the mapping fusion module is used for mapping each preprocessed military training data to a standard logic space of the HDFS distributed storage system by using a metadata mapping mode and storing the military training data;
and the sharing service module is used for transmitting each military training data which is requested to be shared by each data sharing request in a corresponding way in a set data sharing way through the standardized big data convergence sharing platform according to the data sharing requests of departments, applications and services which need to share data and the corresponding sharing authority.
7. The multi-source heterogeneous training data fusion device according to claim 6, wherein during the process of accessing and storing various types of message streaming data:
the data access module is used for collecting various message streaming data from the Kafka component of the standardized big data aggregation sharing platform at set time intervals in a distributed message queue mode; the set time interval is any value between 50ms and 500 ms;
the mapping fusion module is used for mapping various received message Stream data into a two-dimensional relation table by adopting a Stream + Holodesk Stream-type big data processing framework, converting the two-dimensional relation table into a memory column and storing the memory column in a Holodesk (SSD) component.
8. The multi-source heterogeneous training data fusion device according to claim 6, wherein the set data sharing modes include an FTP mode, a database direct connection mode, a distributed message system Kaflca mode, a WebServices data exchange mode, a copy mode, a mail transmission mode, and a network capture mode.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the steps of the multi-source heterogeneous training data fusion method of any of claims 1 to 5.
10. A standardized big data aggregation and sharing platform is characterized by comprising a heterogeneous data aggregation layer, a data exchange integration layer, a big data storage layer, a data sharing layer and a data service layer, wherein the heterogeneous data aggregation layer, the data exchange integration layer, the big data storage layer, the data sharing layer and the data service layer are constructed on the basis of a Hadoop distributed system and an HDFS distributed storage system;
the heterogeneous data convergence layer is used for being respectively connected with each heterogeneous data source and respectively accessing military training data of each heterogeneous data source; the military training data comprises message streaming data, structured report data, attribute class data, unstructured text picture data and video voice streaming data;
the data exchange integration layer is used for preprocessing each military training data; the preprocessing comprises collecting, cleaning, removing duplication, denoising, exchanging, correlating and data comparing;
the big data storage layer is used for mapping each preprocessed military training data to a standard logic space by using a metadata mapping mode and storing the military training data;
the data sharing layer is used for outputting the military training data which are required to be shared and correspond to the data sharing requests in a set data sharing mode;
the data service layer is used for providing data development service for each military training data; the data development service comprises a retrieval query service, an uploading service, a synchronization service, a downloading service, an analysis service and a template service.
CN202110592669.4A 2021-05-28 2021-05-28 Multi-source heterogeneous training data fusion method, device and equipment Pending CN113312428A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110592669.4A CN113312428A (en) 2021-05-28 2021-05-28 Multi-source heterogeneous training data fusion method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110592669.4A CN113312428A (en) 2021-05-28 2021-05-28 Multi-source heterogeneous training data fusion method, device and equipment

Publications (1)

Publication Number Publication Date
CN113312428A true CN113312428A (en) 2021-08-27

Family

ID=77375968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110592669.4A Pending CN113312428A (en) 2021-05-28 2021-05-28 Multi-source heterogeneous training data fusion method, device and equipment

Country Status (1)

Country Link
CN (1) CN113312428A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112768040A (en) * 2020-12-31 2021-05-07 北京谊安医疗系统股份有限公司 Multi-type equipment monitoring data fusion device and method
CN113746855A (en) * 2021-09-09 2021-12-03 国网电子商务有限公司 Data access method of energy industry cloud network and related equipment
CN117076545A (en) * 2023-10-13 2023-11-17 中国电子科技集团公司第十五研究所 Data processing method and device suitable for military operation big data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607469A (en) * 2013-11-28 2014-02-26 东莞中国科学院云计算产业技术创新与育成中心 Cloud platform for achieving distributed isomerous data sharing and data sharing method thereof
CN109766378A (en) * 2018-12-26 2019-05-17 吕杨 A kind of multi-source heterogeneous water conservancy hydrographic data shared system
CA3063117A1 (en) * 2018-11-21 2020-10-17 Beijing Yutian Technology Co. Ltd An emergency resource sharing and exchange system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607469A (en) * 2013-11-28 2014-02-26 东莞中国科学院云计算产业技术创新与育成中心 Cloud platform for achieving distributed isomerous data sharing and data sharing method thereof
CA3063117A1 (en) * 2018-11-21 2020-10-17 Beijing Yutian Technology Co. Ltd An emergency resource sharing and exchange system
CN109766378A (en) * 2018-12-26 2019-05-17 吕杨 A kind of multi-source heterogeneous water conservancy hydrographic data shared system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙钦;张宏军;刘耀勋;张睿;: "军事训练演习数据汇集与融合系统的设计与实现", 指挥控制与仿真, no. 03 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112768040A (en) * 2020-12-31 2021-05-07 北京谊安医疗系统股份有限公司 Multi-type equipment monitoring data fusion device and method
CN113746855A (en) * 2021-09-09 2021-12-03 国网电子商务有限公司 Data access method of energy industry cloud network and related equipment
CN117076545A (en) * 2023-10-13 2023-11-17 中国电子科技集团公司第十五研究所 Data processing method and device suitable for military operation big data

Similar Documents

Publication Publication Date Title
Khare et al. Big data in IoT
US9582528B2 (en) System and method for operating a big-data platform
CN113312428A (en) Multi-source heterogeneous training data fusion method, device and equipment
WO2022022477A1 (en) Management operation and maintenance platform and data processing method
CN111400326B (en) Smart city data management system and method thereof
Kraska Finding the needle in the big data systems haystack
CN106815338A (en) A kind of real-time storage of big data, treatment and inquiry system
CN108536778B (en) Data application sharing platform and method
CN104111996A (en) Health insurance outpatient clinic big data extraction system and method based on hadoop platform
Gürcan et al. Real-time processing of big data streams: Lifecycle, tools, tasks, and challenges
CN104123288A (en) Method and device for inquiring data
CN111258978B (en) Data storage method
CN111221791A (en) Method for importing multi-source heterogeneous data into data lake
Jeong et al. An IoT platform for civil infrastructure monitoring
US20180107722A1 (en) Managing queries in business intelligence platforms
CN113918793A (en) Multi-source scientific and creative resource data acquisition method
CN102012946A (en) High-efficiency safety monitoring video/image data storage method
Kuderu et al. Relational database to NoSQL conversion by schema migration and mapping
CN113721856A (en) Digital community management data storage system
US9256641B1 (en) Dynamic optimization of data aggregation
Haroun et al. A big data architecture for automotive applications: PSA group deployment experience
CN114817256A (en) Quick unified storage system of thing networking
CN103678521A (en) Distributed file monitoring system based on Hadoop frame
Shona et al. A survey on the data management in IoT
CN113742313A (en) Data warehouse construction method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination