CN113901117A

CN113901117A - Multi-source test data leading processing method

Info

Publication number: CN113901117A
Application number: CN202111130413.8A
Authority: CN
Inventors: 段懿洋; 何晓; 刘东航; 刘翔; 田思佳
Original assignee: CSSC Systems Engineering Research Institute
Current assignee: CSSC Systems Engineering Research Institute
Priority date: 2021-09-26
Filing date: 2021-09-26
Publication date: 2022-01-07

Abstract

The invention provides a multi-source test data leading processing method, which comprises the following steps: and carrying out data connection, identifying the data type, calling a connection module of the data of the corresponding type, and starting a connection processing flow. The invention provides a uniform interface for source data, automatically identifies the data type by the adapter and calls a corresponding leading model, establishes a leading model of structured, semi-structured, unstructured, API (application program interface) interface and real-time message data, and meets the requirements of acquisition, integration and storage of test data.

Description

Multi-source test data leading processing method

Technical Field

The invention belongs to the technical field of test data management, and particularly relates to a multi-source test data leading processing method.

Background

At present, test data is 'first-hand data' for timely, accurately and truly recording important activities such as equipment testing, training, researching and researching, is high-value data concerned by an equipment identification and sizing party, a demonstration and research party and a use and improvement party, is an important basic strategic resource for supporting the construction and development of equipment, and has huge value to be mined.

Part of test data comes from research and development tests, verification tests, performance tests and simulation tests, and part of data comes from standing item demonstration reports, general research and development requirements, daily training, exercise and similar equipment data in a scheme design stage. With the improvement of the informatization level of the equipment test and the progress of various testing technologies, the test data volume acquired in the test process is continuously increased, and the device has the data characteristics of large data volume, multi-source isomerism, various types, high real-time processing requirements and the like.

Due to the restriction of factors such as concept, technology, system and the like, the test data are dispersed in units such as development industrial departments, test bases, naval troops and the like and are in a split management and dispersed storage state for a long time, so that a test data acquisition and access method needs to be established, data channels among test related units of various equipment are communicated, the construction efficiency of the test data is improved, and support is provided for efficient management and analysis application of various test data resources.

The existing data leading-in technology is mainly directed at leading-in of structured data, data information is firstly collected and then transmitted to a data storage unit, and the data storage unit is subjected to standardized processing and then warehoused and filed to form a data resource pool. The test data not only comprises structured data, but also comprises unstructured data and semi-structured data, wherein the unstructured data and the semi-structured data mainly comprise image, number, character, video, audio and other types of data generated in the test process, and in addition, the unstructured data and the semi-structured data also comprise interface data, message data and other special types of data, and the scale and the complexity of the unstructured data and the semi-structured data exceed the range of processing and analyzing by the conventional technology.

Disclosure of Invention

In order to solve the above problems, the present invention provides a method for processing multi-source test data leading, comprising the following steps:

carrying out data leading;

identifying a data type;

calling a leading model of the corresponding type data;

and starting a leading connection processing flow.

Further, the process flow of the leading and connecting includes: the method comprises a semi-structured data leading process, a structured data leading process, an unstructured data leading process, an API interface data leading process and a real-time message data leading process.

Further, the semi-structured data connection process includes the following steps:

connecting a first data source base by using a first search engine;

acquiring a set list from a first data source library according to specific service requirements;

connecting the collection list to a first local database;

the first search engine is constructed by integrating a MongoDB database operation component on the basis of Python technology, the semi-structured data of the first database is stored mainly in a MongoDB database, and the first local database is a MongoDB database.

Further, the structured data connection process includes the following steps:

connecting a second database by using a second search engine;

acquiring a data table list from a second data source library according to specific service requirements;

the data table list is connected to a second local database by using a field conversion engine;

the second search engine is constructed by integrating various conventional database operation components on the basis of a Python technology, the structured data of the second database comprise conventional databases such as Oracle, Mysql, sqlserver, postgresql, domestic databases (Dameng) and the like, and the second local database is a Mysql database.

Further, the unstructured data connection process includes the following steps:

data are input in an FTP mode;

selecting a file classification directory;

uploading the file data to the selected directory;

storing the related information of the file data to a third local database;

uploading the file data to a server through an FTP service;

wherein the third local database is a mysql database.

Further, the entry of the unstructured data is divided into an offline mode and an online mode:

the off-line mode is as follows: a user selects a file classification directory and directly uploads file data;

the online mode is as follows: and setting files needing to be connected and storage positions thereof through the provided service connection information, dynamically monitoring whether the file folder generates new files or not, and automatically and dynamically uploading the new files.

Further, the API interface data connection process includes the following steps:

acquiring data through an interface engine;

dividing the acquired data into structured data and semi-structured data;

structuring data, selecting a matched analysis component from the first analysis component library, analyzing and then leading to a fourth local database;

the semi-structured data is directly stored into a fourth local database;

the first analysis component library is developed and constructed based on Python technology according to different interface contents, and the fourth local database comprises a Mysql database and a mongoDB database.

Further, the real-time message data connection process includes the following steps:

the data is acquired through the UDP,

selecting a matched analysis component from the second analysis component library, analyzing and then leading to a fifth local database;

the second analysis component library is developed and constructed for different message formats on the basis of Python technology, and the fifth local database is a Mysql database.

The invention has the beneficial effects that: the invention provides a multi-source test data leading processing method, which provides a leading model for source data, automatically identifies data types by an adapter through a uniform interface, calls a corresponding leading model, establishes a structured, semi-structured, unstructured, API (application program interface) interface and real-time message data leading model, and meets the requirements of test data acquisition, integration and storage.

Drawings

FIG. 1 is a schematic flow chart of the present invention,

FIG. 2 is a schematic diagram of the semi-structured data connection process of the present invention,

FIG. 3 is a schematic diagram of the structured data joining process of the present invention,

FIG. 4 is a schematic diagram of the unstructured data tapping flow of the present invention,

FIG. 5 is a schematic diagram of the API interface data connection flow of the present invention,

fig. 6 is a schematic diagram of a real-time packet data connection process according to the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples, which are provided for the purpose of illustrating the general inventive concept and are not intended to limit the scope of the invention.

As shown in FIG. 1, the invention provides a multi-source test data leading processing method, which utilizes an adapter mode to provide a uniform data interface for multi-source heterogeneous test data access, wherein when data leading is carried out, leading data passes through the adapter, the adapter automatically identifies the data type, a leading model of the corresponding type of data is called, a leading processing flow is started, and the multi-source heterogeneous data leading is completed. Designing a leading model according to the data type, wherein the leading model mainly comprises a leading processing model of structured data, semi-structured data, unstructured data, an API (application programming interface) interface and real-time message data.

The semi-structured data joining process is shown in fig. 2, in which:

the semi-structured data of the source data is stored mainly in a MongoDB database. The method is based on Python technology, integrates MongoDB database operation components, constructs a database engine, and supports the realization of the direct connection of the database through information such as database addresses, users, passwords and the like. And (5) obtaining the collection list in the database after the connection test is successful. Creating a leading model: and a MongoDB database is built in a local resource pool, and a user selects a required set and directly connects data without field conversion. And setting a leading mode, incremental leading and covering leading. And setting the frequency of the connection task, and supporting timing connection and one-time connection. Leading to a log: in the process of executing the leading task, log recording is carried out on data with errors in data processing, failure reasons are searched, leading procedures are adjusted, and the quality of the leading data is guaranteed.

The structured data joining flow is shown in fig. 3, in which:

the structuralization includes regular databases such as Oracle, MySQL, sqlserver, postgresql, domestic database (dreams). The Python technology is used as a basis, various database operation components are integrated, a database engine is constructed, and the direct connection of the database is realized through information such as database addresses, users, passwords and the like. And if the connection test is successful, a data table list in the database can be obtained. Creating a leading model: the user selects the required data table (part or all) according to the specific service requirement, and determines the leading external source data. And the leading target library performs new creation of the database through local original data management, and the new creation is used as an output target of leading source data. And setting a leading mode, incremental leading and covering leading. And setting the frequency of the connection task, and supporting timing connection and one-time connection. A field conversion engine: when a user executes the connection, because the source data are stored in different databases and the supported field types are inconsistent, the system constructs the processing rule of transferring various database fields into the corresponding fields in the Mysql database through the python technology, thereby realizing the seamless migration of the data and ensuring the accuracy of the data. Leading to a log: in the process of executing the leading task, log recording is carried out on data with errors in data processing, failure reasons are searched, leading procedures are adjusted, and the quality of the leading data is guaranteed.

The unstructured data joining flow is shown in fig. 4, in which:

the unstructured data mainly comprise reports, audios and videos and other data generated in the test, and data storage and management are achieved in an FTP mode. The method for providing the context data is divided into an off-line mode and an on-line mode (FTP \ HTTP, etc.). An off-line mode: the user selects the file classification directory and directly uploads the file data. And (3) online mode: and setting files needing to be introduced and storage positions thereof through the provided service connection information, dynamically monitoring whether the folder files generate new files, and dynamically uploading and warehousing the new files. Managing a file classification directory: and the user creates a corresponding file storage directory according to the service requirement of the user. And selecting an upload file under the directory. And (3) file information storage: and specially establishing an entity file information table in a local mysql database, and recording information such as the source, name, type, size, storage path and the like of the file. And the entity file is uploaded to a designated directory of the server through the FTP service.

The API interface data tapping flow is shown in fig. 5, in which:

the API interface is provided by other business systems in the test system and is used for acquiring relevant data. Based on Python technology, an API (application program interface) engine is built, access modes such as POST (POST position) and GET (GET. The returned result is in json format and is divided into structured and semi-structured data. Semi-structured json data: and storing the data into a mongoDB database of a local resource pool by adopting a direct storage mode, and constructing a database set aiming at each interface through local data management. Structured json data: the development of analysis components is carried out according to different interface contents on the basis of Python technology to form an analysis component library, and the analysis and the storage of data are realized. And for the leading target table, the creation of a database and a data table is realized through local data management. And setting a leading mode, incremental leading and covering leading. And setting the frequency of the connection task, and supporting timing connection and one-time connection. Connecting the log: in the process of executing the leading task, log recording is carried out on data with errors in data processing, failure reasons are searched, leading procedures are adjusted, and the quality of the leading data is guaranteed.

The real-time message data connection process is shown in fig. 6, in which:

the real-time message data is provided for the relevant systems of the test system, and is generally sent through UDP. The Python technology is used as a basis, analysis component development is carried out on different message format data to form an analysis component library, connection, receiving and data analysis of data are achieved, and a structured data set is formed. And (3) connecting and leading configuration: and according to the analyzed result object structure, the leading target library carries out the creation of the database and the data table through local data management, and the corresponding relation between the object attribute and the table field is configured during leading. Connecting the log: and logging the contents of connection failure, analysis error and the like.

Therefore, the invention is not limited to the specific embodiments and examples, but rather, all equivalent variations and modifications are within the scope of the invention as defined in the claims and the specification.

Claims

1. A multi-source test data leading processing method is characterized by comprising the following steps:

carrying out data leading;

identifying a data type;

calling a leading model of the corresponding type data;

and starting a leading connection processing flow.

2. The method for multi-source test data leading processing according to claim 1, wherein the leading processing flow comprises: the method comprises a semi-structured data leading process, a structured data leading process, an unstructured data leading process, an API interface data leading process and a real-time message data leading process.

3. The method for multi-source test data splicing processing according to claim 2, wherein the semi-structured data splicing process comprises the following steps:

connecting a first data source base by using a first search engine;

connecting the collection list to a first local database;

4. The method for multi-source test data tieback processing according to claim 2, wherein the structured data tieback process comprises the steps of:

connecting a second database by using a second search engine;

5. The method for multi-source test data tieback processing according to claim 2, wherein the unstructured data tieback process comprises the following steps:

data are input in an FTP mode;

selecting a file classification directory;

uploading the file data to the selected directory;

storing the related information of the file data to a third local database;

uploading the file data to a server through an FTP service;

wherein the third local database is a mysql database.

6. The multi-source test data leading processing method according to claim 5, wherein the entry of the unstructured data is divided into an off-line mode and an on-line mode:

7. The multi-source test data leading processing method according to claim 2, wherein the API interface data leading process comprises the following steps:

acquiring data through an interface engine;

dividing the acquired data into structured data and semi-structured data;

the semi-structured data is directly stored into a fourth local database;

8. The multi-source test data leading processing method according to claim 2, wherein the real-time message data leading process comprises the following steps:

the data is acquired through the UDP,