CN113901117A - Multi-source test data leading processing method - Google Patents

Multi-source test data leading processing method Download PDF

Info

Publication number
CN113901117A
CN113901117A CN202111130413.8A CN202111130413A CN113901117A CN 113901117 A CN113901117 A CN 113901117A CN 202111130413 A CN202111130413 A CN 202111130413A CN 113901117 A CN113901117 A CN 113901117A
Authority
CN
China
Prior art keywords
data
leading
database
structured
local database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111130413.8A
Other languages
Chinese (zh)
Inventor
段懿洋
何晓
刘东航
刘翔
田思佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CSSC Systems Engineering Research Institute
Original Assignee
CSSC Systems Engineering Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CSSC Systems Engineering Research Institute filed Critical CSSC Systems Engineering Research Institute
Priority to CN202111130413.8A priority Critical patent/CN113901117A/en
Publication of CN113901117A publication Critical patent/CN113901117A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a multi-source test data leading processing method, which comprises the following steps: and carrying out data connection, identifying the data type, calling a connection module of the data of the corresponding type, and starting a connection processing flow. The invention provides a uniform interface for source data, automatically identifies the data type by the adapter and calls a corresponding leading model, establishes a leading model of structured, semi-structured, unstructured, API (application program interface) interface and real-time message data, and meets the requirements of acquisition, integration and storage of test data.

Description

Multi-source test data leading processing method
Technical Field
The invention belongs to the technical field of test data management, and particularly relates to a multi-source test data leading processing method.
Background
At present, test data is 'first-hand data' for timely, accurately and truly recording important activities such as equipment testing, training, researching and researching, is high-value data concerned by an equipment identification and sizing party, a demonstration and research party and a use and improvement party, is an important basic strategic resource for supporting the construction and development of equipment, and has huge value to be mined.
Part of test data comes from research and development tests, verification tests, performance tests and simulation tests, and part of data comes from standing item demonstration reports, general research and development requirements, daily training, exercise and similar equipment data in a scheme design stage. With the improvement of the informatization level of the equipment test and the progress of various testing technologies, the test data volume acquired in the test process is continuously increased, and the device has the data characteristics of large data volume, multi-source isomerism, various types, high real-time processing requirements and the like.
Due to the restriction of factors such as concept, technology, system and the like, the test data are dispersed in units such as development industrial departments, test bases, naval troops and the like and are in a split management and dispersed storage state for a long time, so that a test data acquisition and access method needs to be established, data channels among test related units of various equipment are communicated, the construction efficiency of the test data is improved, and support is provided for efficient management and analysis application of various test data resources.
The existing data leading-in technology is mainly directed at leading-in of structured data, data information is firstly collected and then transmitted to a data storage unit, and the data storage unit is subjected to standardized processing and then warehoused and filed to form a data resource pool. The test data not only comprises structured data, but also comprises unstructured data and semi-structured data, wherein the unstructured data and the semi-structured data mainly comprise image, number, character, video, audio and other types of data generated in the test process, and in addition, the unstructured data and the semi-structured data also comprise interface data, message data and other special types of data, and the scale and the complexity of the unstructured data and the semi-structured data exceed the range of processing and analyzing by the conventional technology.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method for processing multi-source test data leading, comprising the following steps:
carrying out data leading;
identifying a data type;
calling a leading model of the corresponding type data;
and starting a leading connection processing flow.
Further, the process flow of the leading and connecting includes: the method comprises a semi-structured data leading process, a structured data leading process, an unstructured data leading process, an API interface data leading process and a real-time message data leading process.
Further, the semi-structured data connection process includes the following steps:
connecting a first data source base by using a first search engine;
acquiring a set list from a first data source library according to specific service requirements;
connecting the collection list to a first local database;
the first search engine is constructed by integrating a MongoDB database operation component on the basis of Python technology, the semi-structured data of the first database is stored mainly in a MongoDB database, and the first local database is a MongoDB database.
Further, the structured data connection process includes the following steps:
connecting a second database by using a second search engine;
acquiring a data table list from a second data source library according to specific service requirements;
the data table list is connected to a second local database by using a field conversion engine;
the second search engine is constructed by integrating various conventional database operation components on the basis of a Python technology, the structured data of the second database comprise conventional databases such as Oracle, Mysql, sqlserver, postgresql, domestic databases (Dameng) and the like, and the second local database is a Mysql database.
Further, the unstructured data connection process includes the following steps:
data are input in an FTP mode;
selecting a file classification directory;
uploading the file data to the selected directory;
storing the related information of the file data to a third local database;
uploading the file data to a server through an FTP service;
wherein the third local database is a mysql database.
Further, the entry of the unstructured data is divided into an offline mode and an online mode:
the off-line mode is as follows: a user selects a file classification directory and directly uploads file data;
the online mode is as follows: and setting files needing to be connected and storage positions thereof through the provided service connection information, dynamically monitoring whether the file folder generates new files or not, and automatically and dynamically uploading the new files.
Further, the API interface data connection process includes the following steps:
acquiring data through an interface engine;
dividing the acquired data into structured data and semi-structured data;
structuring data, selecting a matched analysis component from the first analysis component library, analyzing and then leading to a fourth local database;
the semi-structured data is directly stored into a fourth local database;
the first analysis component library is developed and constructed based on Python technology according to different interface contents, and the fourth local database comprises a Mysql database and a mongoDB database.
Further, the real-time message data connection process includes the following steps:
the data is acquired through the UDP,
selecting a matched analysis component from the second analysis component library, analyzing and then leading to a fifth local database;
the second analysis component library is developed and constructed for different message formats on the basis of Python technology, and the fifth local database is a Mysql database.
The invention has the beneficial effects that: the invention provides a multi-source test data leading processing method, which provides a leading model for source data, automatically identifies data types by an adapter through a uniform interface, calls a corresponding leading model, establishes a structured, semi-structured, unstructured, API (application program interface) interface and real-time message data leading model, and meets the requirements of test data acquisition, integration and storage.
Drawings
FIG. 1 is a schematic flow chart of the present invention,
FIG. 2 is a schematic diagram of the semi-structured data connection process of the present invention,
FIG. 3 is a schematic diagram of the structured data joining process of the present invention,
FIG. 4 is a schematic diagram of the unstructured data tapping flow of the present invention,
FIG. 5 is a schematic diagram of the API interface data connection flow of the present invention,
fig. 6 is a schematic diagram of a real-time packet data connection process according to the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples, which are provided for the purpose of illustrating the general inventive concept and are not intended to limit the scope of the invention.
As shown in FIG. 1, the invention provides a multi-source test data leading processing method, which utilizes an adapter mode to provide a uniform data interface for multi-source heterogeneous test data access, wherein when data leading is carried out, leading data passes through the adapter, the adapter automatically identifies the data type, a leading model of the corresponding type of data is called, a leading processing flow is started, and the multi-source heterogeneous data leading is completed. Designing a leading model according to the data type, wherein the leading model mainly comprises a leading processing model of structured data, semi-structured data, unstructured data, an API (application programming interface) interface and real-time message data.
The semi-structured data joining process is shown in fig. 2, in which:
the semi-structured data of the source data is stored mainly in a MongoDB database. The method is based on Python technology, integrates MongoDB database operation components, constructs a database engine, and supports the realization of the direct connection of the database through information such as database addresses, users, passwords and the like. And (5) obtaining the collection list in the database after the connection test is successful. Creating a leading model: and a MongoDB database is built in a local resource pool, and a user selects a required set and directly connects data without field conversion. And setting a leading mode, incremental leading and covering leading. And setting the frequency of the connection task, and supporting timing connection and one-time connection. Leading to a log: in the process of executing the leading task, log recording is carried out on data with errors in data processing, failure reasons are searched, leading procedures are adjusted, and the quality of the leading data is guaranteed.
The structured data joining flow is shown in fig. 3, in which:
the structuralization includes regular databases such as Oracle, MySQL, sqlserver, postgresql, domestic database (dreams). The Python technology is used as a basis, various database operation components are integrated, a database engine is constructed, and the direct connection of the database is realized through information such as database addresses, users, passwords and the like. And if the connection test is successful, a data table list in the database can be obtained. Creating a leading model: the user selects the required data table (part or all) according to the specific service requirement, and determines the leading external source data. And the leading target library performs new creation of the database through local original data management, and the new creation is used as an output target of leading source data. And setting a leading mode, incremental leading and covering leading. And setting the frequency of the connection task, and supporting timing connection and one-time connection. A field conversion engine: when a user executes the connection, because the source data are stored in different databases and the supported field types are inconsistent, the system constructs the processing rule of transferring various database fields into the corresponding fields in the Mysql database through the python technology, thereby realizing the seamless migration of the data and ensuring the accuracy of the data. Leading to a log: in the process of executing the leading task, log recording is carried out on data with errors in data processing, failure reasons are searched, leading procedures are adjusted, and the quality of the leading data is guaranteed.
The unstructured data joining flow is shown in fig. 4, in which:
the unstructured data mainly comprise reports, audios and videos and other data generated in the test, and data storage and management are achieved in an FTP mode. The method for providing the context data is divided into an off-line mode and an on-line mode (FTP \ HTTP, etc.). An off-line mode: the user selects the file classification directory and directly uploads the file data. And (3) online mode: and setting files needing to be introduced and storage positions thereof through the provided service connection information, dynamically monitoring whether the folder files generate new files, and dynamically uploading and warehousing the new files. Managing a file classification directory: and the user creates a corresponding file storage directory according to the service requirement of the user. And selecting an upload file under the directory. And (3) file information storage: and specially establishing an entity file information table in a local mysql database, and recording information such as the source, name, type, size, storage path and the like of the file. And the entity file is uploaded to a designated directory of the server through the FTP service.
The API interface data tapping flow is shown in fig. 5, in which:
the API interface is provided by other business systems in the test system and is used for acquiring relevant data. Based on Python technology, an API (application program interface) engine is built, access modes such as POST (POST position) and GET (GET. The returned result is in json format and is divided into structured and semi-structured data. Semi-structured json data: and storing the data into a mongoDB database of a local resource pool by adopting a direct storage mode, and constructing a database set aiming at each interface through local data management. Structured json data: the development of analysis components is carried out according to different interface contents on the basis of Python technology to form an analysis component library, and the analysis and the storage of data are realized. And for the leading target table, the creation of a database and a data table is realized through local data management. And setting a leading mode, incremental leading and covering leading. And setting the frequency of the connection task, and supporting timing connection and one-time connection. Connecting the log: in the process of executing the leading task, log recording is carried out on data with errors in data processing, failure reasons are searched, leading procedures are adjusted, and the quality of the leading data is guaranteed.
The real-time message data connection process is shown in fig. 6, in which:
the real-time message data is provided for the relevant systems of the test system, and is generally sent through UDP. The Python technology is used as a basis, analysis component development is carried out on different message format data to form an analysis component library, connection, receiving and data analysis of data are achieved, and a structured data set is formed. And (3) connecting and leading configuration: and according to the analyzed result object structure, the leading target library carries out the creation of the database and the data table through local data management, and the corresponding relation between the object attribute and the table field is configured during leading. Connecting the log: and logging the contents of connection failure, analysis error and the like.
Therefore, the invention is not limited to the specific embodiments and examples, but rather, all equivalent variations and modifications are within the scope of the invention as defined in the claims and the specification.

Claims (8)

1. A multi-source test data leading processing method is characterized by comprising the following steps:
carrying out data leading;
identifying a data type;
calling a leading model of the corresponding type data;
and starting a leading connection processing flow.
2. The method for multi-source test data leading processing according to claim 1, wherein the leading processing flow comprises: the method comprises a semi-structured data leading process, a structured data leading process, an unstructured data leading process, an API interface data leading process and a real-time message data leading process.
3. The method for multi-source test data splicing processing according to claim 2, wherein the semi-structured data splicing process comprises the following steps:
connecting a first data source base by using a first search engine;
acquiring a set list from a first data source library according to specific service requirements;
connecting the collection list to a first local database;
the first search engine is constructed by integrating a MongoDB database operation component on the basis of Python technology, the semi-structured data of the first database is stored mainly in a MongoDB database, and the first local database is a MongoDB database.
4. The method for multi-source test data tieback processing according to claim 2, wherein the structured data tieback process comprises the steps of:
connecting a second database by using a second search engine;
acquiring a data table list from a second data source library according to specific service requirements;
the data table list is connected to a second local database by using a field conversion engine;
the second search engine is constructed by integrating various conventional database operation components on the basis of a Python technology, the structured data of the second database comprise conventional databases such as Oracle, Mysql, sqlserver, postgresql, domestic databases (Dameng) and the like, and the second local database is a Mysql database.
5. The method for multi-source test data tieback processing according to claim 2, wherein the unstructured data tieback process comprises the following steps:
data are input in an FTP mode;
selecting a file classification directory;
uploading the file data to the selected directory;
storing the related information of the file data to a third local database;
uploading the file data to a server through an FTP service;
wherein the third local database is a mysql database.
6. The multi-source test data leading processing method according to claim 5, wherein the entry of the unstructured data is divided into an off-line mode and an on-line mode:
the off-line mode is as follows: a user selects a file classification directory and directly uploads file data;
the online mode is as follows: and setting files needing to be connected and storage positions thereof through the provided service connection information, dynamically monitoring whether the file folder generates new files or not, and automatically and dynamically uploading the new files.
7. The multi-source test data leading processing method according to claim 2, wherein the API interface data leading process comprises the following steps:
acquiring data through an interface engine;
dividing the acquired data into structured data and semi-structured data;
structuring data, selecting a matched analysis component from the first analysis component library, analyzing and then leading to a fourth local database;
the semi-structured data is directly stored into a fourth local database;
the first analysis component library is developed and constructed based on Python technology according to different interface contents, and the fourth local database comprises a Mysql database and a mongoDB database.
8. The multi-source test data leading processing method according to claim 2, wherein the real-time message data leading process comprises the following steps:
the data is acquired through the UDP,
selecting a matched analysis component from the second analysis component library, analyzing and then leading to a fifth local database;
the second analysis component library is developed and constructed for different message formats on the basis of Python technology, and the fifth local database is a Mysql database.
CN202111130413.8A 2021-09-26 2021-09-26 Multi-source test data leading processing method Pending CN113901117A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111130413.8A CN113901117A (en) 2021-09-26 2021-09-26 Multi-source test data leading processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111130413.8A CN113901117A (en) 2021-09-26 2021-09-26 Multi-source test data leading processing method

Publications (1)

Publication Number Publication Date
CN113901117A true CN113901117A (en) 2022-01-07

Family

ID=79029527

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111130413.8A Pending CN113901117A (en) 2021-09-26 2021-09-26 Multi-source test data leading processing method

Country Status (1)

Country Link
CN (1) CN113901117A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114866541A (en) * 2022-07-11 2022-08-05 太极计算机股份有限公司 Data transmission method, device and system
CN116455678A (en) * 2023-06-16 2023-07-18 中国电子科技集团公司第十五研究所 Network security log tandem method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114866541A (en) * 2022-07-11 2022-08-05 太极计算机股份有限公司 Data transmission method, device and system
CN114866541B (en) * 2022-07-11 2022-09-23 太极计算机股份有限公司 Data transmission method, device and system
CN116455678A (en) * 2023-06-16 2023-07-18 中国电子科技集团公司第十五研究所 Network security log tandem method and system
CN116455678B (en) * 2023-06-16 2023-09-05 中国电子科技集团公司第十五研究所 Network security log tandem method and system

Similar Documents

Publication Publication Date Title
CN111522922A (en) Log information query method and device, storage medium and computer equipment
CN113901117A (en) Multi-source test data leading processing method
US8930772B2 (en) Method and system for implementing a test automation results importer
Konkiel Tracking citations and altmetrics for research data: Challenges and opportunities
CN111459944B (en) MR data storage method, device, server and storage medium
CN109241384B (en) Scientific research information visualization method and device
CN111813804B (en) Data query method and device, electronic equipment and storage medium
CN111694866A (en) Data searching and storing method, data searching system, data searching device, data searching equipment and data searching medium
CN110198327B (en) Data transmission method and related equipment
CN112162960A (en) Health government affair information sharing method, device and system
CN104901845A (en) Automation test system and method of domain name WHOIS service
CN111414410A (en) Data processing method, device, equipment and storage medium
CN113282611A (en) Method and device for synchronizing stream data, computer equipment and storage medium
CN106682210B (en) Log file query method and device
CN116244387A (en) Entity relationship construction method, device, electronic equipment and storage medium
CN115825312A (en) Chromatographic detection data interaction method, device, equipment and computer readable medium
US20210209013A1 (en) Method, apparatus, device and storage medium for map retrieval test
CN116975649A (en) Data processing method, device, electronic equipment, storage medium and program product
CN116450890A (en) Graph data processing method, device and system, electronic equipment and storage medium
CN116150236A (en) Data synchronization method and device, electronic equipment and computer readable storage medium
CN110955709B (en) Data processing method and device and electronic equipment
CN109062797B (en) Method and device for generating information
CN112445811A (en) Data service method, device, storage medium and component based on SQL configuration
CN113190236B (en) HQL script verification method and device
CN117076515B (en) Metadata tracing method and device in medical management system, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination