CN116521379A

CN116521379A - GPU data analysis system, electronic equipment and storage medium

Info

Publication number: CN116521379A
Application number: CN202310813507.8A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Moore Threads Technology Co Ltd
Current assignee: Moore Threads Technology Co Ltd
Priority date: 2023-07-04
Filing date: 2023-07-04
Publication date: 2023-08-01

Abstract

The present disclosure relates to the field of computer technologies, and in particular, to a GPU data analysis system, an electronic device, and a storage medium, where the GPU data analysis system includes: the method comprises the steps that a GPU client collects target GPU data, wherein the target GPU data comprise GPU hardware data, GPU software data, GPU use environment data and GPU user data; the public cloud receives and stores target GPU data uploaded by the GPU client; the private cloud acquires the target GPU data from the public cloud, and performs data analysis on the target GPU data to obtain a data analysis result. The embodiment of the disclosure comprehensively collects the GPU data under the condition of ensuring the data privacy safety, can effectively provide a general standard flow for the storage, migration and data analysis of the GPU data, and further can improve the aspects of GPU performance, operation and maintenance and the like based on analysis results.

Description

GPU data analysis system, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to a GPU data analysis system, an electronic device, and a storage medium.

Background

Because the development of the full-function image processor (Graphics Processing Unit, GPU) is still in an initial stage in China, all GPU manufacturers have not formed a general technical scheme on big data related problems related to key steps of definition, collection, storage, migration, analysis, visualization and the like of GPU use environments and GPU user use behaviors. On one hand, the prior cases of GPU use environment and GPU user use behavior analysis system are not available at present, and on the other hand, the widely used data analysis system in the Internet industry with certain similarity has certain problems in aspects of data safety, data acquisition limitation, data quality, data complexity, data integrity, real-time performance and the like. Therefore, constructing a GPU data analysis system for a full-function GPU usage scenario to support GPU data analysis for the GPU usage environment and the GPU user usage behavior is a current problem that needs to be solved.

Disclosure of Invention

The disclosure provides a technical scheme of a GPU data analysis system, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a GPU data analysis system, including: the GPU client is used for collecting target GPU data, wherein the target GPU data comprise GPU hardware data, GPU software data, GPU use environment data and GPU user data; the public cloud is used for receiving and storing the target GPU data uploaded by the GPU client; and the private cloud is used for acquiring the target GPU data from the public cloud, and carrying out data analysis on the target GPU data to obtain a data analysis result.

In one possible implementation, the GPU client includes: hardware GPU, metadata collection service, log data collection service, trace data collection service and performance index data collection service.

In one possible implementation, the GPU hardware data includes at least one of: configuration data of the hardware GPU, log data of the hardware GPU, tracking data of the hardware GPU and performance index data of the hardware GPU; the metadata collection service is used for collecting configuration data of the hardware GPU; the log data collection service is used for collecting log data of the hardware GPU; the tracking data collection service is used for collecting the tracking data of the hardware GPU; and the performance index data collection service is used for collecting the performance index data of the hardware GPU.

In one possible implementation, the GPU client includes: GPU software corresponding to the hardware GPU; the GPU software data includes at least one of: configuration data of the GPU software, log data of the GPU software and performance index data of the GPU software; the metadata collection service is used for collecting configuration data of the GPU software; the log data collection service is used for collecting log data of the GPU software; and the performance index data collection service is used for collecting the performance index data of the GPU software.

In one possible implementation, the GPU client includes: associated hardware on which the hardware GPU is required to rely when working; the GPU usage environment data includes at least one of: configuration data of the associated hardware and performance index data of the associated hardware; the metadata collection service is used for collecting configuration data of the associated hardware; the performance index data collection service is used for collecting performance index data of the associated hardware.

In one possible implementation, the GPU client includes: the user of the hardware GPU; the GPU user data includes at least one of: user portrait data of the user and user behavior data of the user; the metadata collection service is used for collecting the user portrait data; the performance index data collection service is used for collecting the user behavior data.

In one possible implementation, the GPU client includes: a data collector; the data collector is configured to obtain the target GPU data from at least one of the metadata collection service, the log data collection service, the trace data collection service, and the performance index data collection service; the data collector is used for carrying out data compression on the target GPU data to obtain compressed data, and uploading the compressed data to the public cloud.

In one possible implementation, the public cloud includes: GPU data collection service, GPU data stream generator, public cloud distributed data stream platform; the GPU data collection service is used for receiving the compressed data, decompressing the compressed data to obtain the target GPU data, and sending the target GPU data to the GPU data stream generator; the GPU data stream generator is used for generating a GPU data stream corresponding to the target GPU data and sending the GPU data stream to the public cloud distributed data stream platform.

In one possible implementation, the public cloud includes: a GPU data stream consumer, a first data checker; the GPU data stream consumer is used for acquiring the GPU data stream from the public cloud distributed data stream platform, recovering the GPU data stream into the target GPU data and sending the target GPU data to the first data verifier; and the first data checker is used for performing data check on the target GPU data.

In one possible implementation, the public cloud includes: public cloud distributed object storage and public cloud transactional databases; the GPU data stream consumer is used for generating a target data block according to the target GPU data after verification is passed, and sending the target data block to the public cloud distributed object storage for storage; the GPU data stream consumer is used for generating target metadata corresponding to the target data block and sending the target metadata to the public cloud transactional database for storage.

In one possible implementation, the public cloud includes: a data block task handling queue; the GPU data stream consumer is used for generating a data block carrying task according to the target metadata and sending the data block carrying task to the data block carrying task queue.

In one possible implementation, the private cloud includes: private cloud green areas, private cloud yellow areas and private cloud red areas; the security level of the private cloud red zone is higher than that of the private cloud yellow zone, and the security level of the private cloud yellow zone is higher than that of the private cloud green zone.

In one possible implementation, the private cloud green area includes: a data block handling planner, a data block handling resource pool, and a data handler; the data block handling planner is used for acquiring the data block handling task from the data block task handling queue, applying for a target resource corresponding to the data block handling task from the data block handling resource pool, and sending the data block handling task and the target resource to the data handler; the data carrier is used for acquiring the target metadata corresponding to the data carrying task from the public cloud transactional database by utilizing the target resource; and the data carrier is used for acquiring the target data block corresponding to the target metadata from the public cloud distributed object storage by utilizing the target resource.

In one possible implementation, the private cloud green area includes: the second data checker, the private cloud greenfield distributed object store and the private cloud greenfield transactional database; the data handler is configured to send the target data block and the target metadata to the second data verifier; the second data checker is configured to perform data checking on the target data block according to the target metadata; and the data handler is used for sending the target data block to the private cloud green area distributed object storage for storage after verification is passed, and sending the target metadata to the private cloud green area transactional database for storage.

In one possible implementation, the private cloud green area includes: ETL task queues; the data handler is configured to generate an ETL task according to the target data block and the target metadata, and send the ETL task to the ETL task queue.

In one possible implementation, the private cloud yellow zone includes: an ETL machine, ETL line; the ETL device is used for acquiring the ETL task from the ETL task queue, acquiring the target data block corresponding to the ETL task from the private cloud green zone distributed object storage, acquiring the target metadata corresponding to the ETL task from the private cloud green zone transactional database, and performing data verification on the target data block according to the target metadata; the ETL device is used for creating an ETL pipeline instance according to the category of the target data block after verification is passed and sending the ETL pipeline instance to the ETL pipeline, wherein the ETL pipeline instance comprises the target data block.

In one possible implementation, the private cloud yellow zone includes: data extraction service, data conversion service, data loading service, ETL data stream generator; the ETL pipeline is used for sending the ETL pipeline instance to the data extraction service; the data extraction service is used for carrying out data extraction on the target data block according to a first preset rule to obtain extracted data, and sending the extracted data to the data conversion service; the data conversion service is used for carrying out data conversion on the extracted data according to a second preset rule to obtain converted data, and sending the converted data to the data loading service; the data loading service is used for sending the converted data to the ETL data stream generator.

In one possible implementation, the private cloud red area includes: private cloud red area distributed data stream platform; the ETL data stream generator is used for generating an ETL data stream corresponding to the converted data and sending the ETL data stream to the private cloud red zone distributed data stream platform.

In one possible implementation, the private cloud red area includes: an ETL data consumer, a third data checker; the ETL data consumer is configured to obtain the ETL data stream from the private cloud red area distributed data platform, convert the ETL data stream into ETL data, and send the ETL data to the third data verifier, where the ETL data is used to indicate GPU data after performing ETL processing on the target data block and the target metadata; and the third data checker is used for performing data check on the ETL data.

In one possible implementation, the private cloud red area includes: private cloud data lakes, private cloud data warehouses, private cloud data markets; and the ETL data consumer is used for sending the ETL data to the private cloud data lake for storage after verification is passed, sending the structured data contained in the ETL data to the private cloud data warehouse for storage, and sending the single service line related data contained in the ETL data to the private cloud data market for storage.

In one possible implementation, the private cloud red area includes: a data quality controller; the data quality controller is used for carrying out quality detection on data in a target storage to obtain a quality detection result, wherein the target storage is at least one of the private cloud data lake, the private cloud data warehouse and the private cloud data market.

In one possible implementation, the private cloud red area includes: a data analysis service; the data analysis service is configured to perform data analysis on data in a target storage to obtain a data analysis result, where the target storage is at least one of the private cloud data lake, the private cloud data warehouse, and the private cloud data market, and the data analysis includes at least one of the following: data mining, machine learning, data modeling, data segmentation.

In one possible implementation, the private cloud red area includes: a data visualization service; the data visualization service is configured to perform visualization processing on data in a target storage to obtain a visualization result, where the target storage is at least one of the private cloud data lake, the private cloud data warehouse, and the private cloud data market, and the visualization result includes at least one of the following: visual key numbers, visual charts, visual color labels, visual elastic data.

In one possible implementation, the private cloud red area includes: a data insight service; the data insight service is configured to perform data insight analysis on data in a target storage, and determine a business insight index, where the target storage is at least one of the private cloud data lake, the private cloud data warehouse, and the private cloud data market, and the business insight index includes at least one of the following: user experience optimization, personal recommendation, marketing strategy refinement, and product features.

According to an aspect of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to run the system described above.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described system.

In an embodiment of the present disclosure, a GPU data analysis system includes: GPU clients, public clouds and private clouds; the GPU client collects target GPU data comprising GPU hardware data, GPU software data, GPU use environment data and GPU user data; the public cloud receives and stores target GPU data uploaded by the GPU client; the private cloud acquires the target GPU data from the public cloud, and performs data analysis on the target GPU data to obtain a data analysis result. According to the GPU data analysis system, the GPU client can be utilized to comprehensively collect GPU hardware data, GPU software data, GPU use environment data, GPU user data and other GPU data under the condition that data privacy safety is ensured, and public cloud and private cloud are utilized to effectively provide a general standard flow for storage, migration and data analysis of the GPU data, so that the aspects of GPU performance, operation and maintenance and the like can be improved subsequently based on analysis results.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.

Fig. 1 illustrates a block diagram of a GPU data analysis system, according to an embodiment of the present disclosure.

Fig. 2 illustrates a flowchart of a GPU client gathering target GPU data according to an embodiment of the present disclosure.

Fig. 3 illustrates a flow chart of public cloud receiving and storing data, according to an embodiment of the present disclosure.

Fig. 4 illustrates a flow chart of a private cloud greenfield receiving and storing data in accordance with an embodiment of the present disclosure.

Fig. 5 shows a flowchart of a private cloud yellow zone receiving and executing ETL processing according to an embodiment of the present disclosure.

Fig. 6 illustrates a flow chart of receiving and storing data in a private cloud red zone according to an embodiment of the present disclosure.

Fig. 7 shows a flowchart of private cloud red area execution data processing according to an embodiment of the present disclosure.

Fig. 8 shows a block diagram of an electronic device, according to an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.

In constructing a GPU data analysis system for a full-function GPU usage scenario, the following technical problems need to be considered. Data security problem: in performing user behavior analysis, personal data and behavior information of a user need to be collected and analyzed, which raises privacy and data security concerns; the transparency and control rights of users for their data to be collected and used are increasing, and thus, there is a need to ensure the legitimacy, security and privacy protection of the data. Data acquisition limit problem: some user behavior analysis is limited to a particular platform or environment, for example, a mobile application may require user authorization to track its behavior. Data quality problem: the quality of user behavior data may be affected by a variety of factors, such as data acquisition errors, data loss, data duplication, etc.; incorrect data or low quality data may lead to deviations and inaccuracies in the analysis, affecting insight and accuracy of decisions. Sample bias problem: the collection and analysis of user behavior data may be affected by sample bias, e.g., all users' behaviors may not be captured at the time of analysis, resulting in some deviation in the results, which may affect understanding and prediction of overall user population behavior. Data complexity problem: user behavior data is often large-scale, diverse, and complex, and processing and analyzing such data can require a high degree of technical and resource investment, resulting in challenges in terms of cost, technical capabilities, expertise, and the like for data storage and processing. Data interpretation problem: user behavior data often needs to be translated into meaningful insight through data visualization and interpretation, however, translating the data into a form that is easy to understand and operable may require specialized data analysis and interpretation capabilities to ensure accuracy and validity of the data. Real-time requirement problem: some scenarios place high demands on the real-time collection and analysis of user behavior data in order to make decisions and optimizations in time, however, real-time data processing and analysis may require more complex technical architecture and higher resource investment. Data integrity problem: user behavior analysis typically requires the integration and analysis of multiple data sources, and ensuring the integrity and consistency of data may require addressing challenges of data standardization, data consolidation, and data integration. System design complexity problem: the analysis shows that the flow path of the data analysis system is long and complex, the related technical field and technical problems are numerous, so the design difficulty is huge, and the method is an effective way and platform for realizing commercial profit for a long time by manufacturers of user behavior analysis. The data and application and related technical processes lack industry standard problems: as a result of business interests driving, various business solutions do not share technology externally, resulting in end-to-end technical flow of user behavioral analysis that is opaque, and thus lacks universal standards.

In order to solve the above technical problems, an embodiment of the present disclosure provides a GPU data analysis system. The GPU data analysis system provided by the embodiments of the present disclosure is described in detail below.

Fig. 1 illustrates a block diagram of a GPU data analysis system, according to an embodiment of the present disclosure. As shown in fig. 1, the GPU data analysis system includes:

the GPU client is used for collecting target GPU data, wherein the target GPU data comprise GPU hardware data, GPU software data, GPU use environment data and GPU user data;

the public cloud is used for receiving and storing target GPU data uploaded by the GPU client;

and the private cloud is used for acquiring the target GPU data from the public cloud, and carrying out data analysis on the target GPU data to obtain a data analysis result.

According to the GPU data analysis system, the GPU client can be utilized to comprehensively collect GPU hardware data, GPU software data, GPU use environment data, GPU user data and other GPU data under the condition that data privacy safety is ensured, and public cloud and private cloud are utilized to effectively provide a general standard flow for storage, migration and data analysis of the GPU data, so that the aspects of GPU performance, operation and maintenance and the like can be improved subsequently based on analysis results.

The GPU client is provided with a hardware GPU and faces to a user of the hardware GPU. On the premise of following data privacy and data security, four major types of target GPU data including GPU hardware data, GPU software data, GPU use environment data and GPU user data for subsequent data analysis can be comprehensively collected based on metadata collection service, log data collection service, tracking data collection service and performance index data collection service included in the GPU client.

The metadata collection service collects configuration data of the hardware GPU, wherein the configuration data of the hardware GPU may include hardware information such as a model, architecture, video memory capacity, speed, and the like of the hardware GPU. Under the condition that the first configuration of the hardware GPU is completed and the configuration updating is not performed, the metadata collection service only needs to collect the configuration information of the hardware GPU once; if the hardware GPU is updated in a subsequent configuration mode, the metadata collection service can collect configuration information of the hardware GPU again.

The log data collection service collects log data of the hardware GPU, wherein abnormal log records and error log records in the log data of the hardware GPU are subjected to key collection, for example, log records such as crashes of a GPU driver, resetting of the GPU and the like are subjected to key collection, and after data analysis is carried out on the basis of the data, GPU manufacturers can be helped to know GPU configuration and performance potential of the system. To obtain a more comprehensive log record, the log data collection service may collect log data of the hardware GPU in real-time.

The trace data collection service collects trace data of the hardware GPU, wherein the trace data of the hardware GPU can be used for reproducing environments when the GPU software corresponding to the hardware GPU is abnormal in a data analysis process so as to help GPU manufacturers to solve problems.

The performance index data collection service collects performance index data of the hardware GPU, wherein the performance index data of the hardware GPU can be real-time information such as working state and load of the hardware GPU.

Fig. 2 illustrates a flowchart of a GPU client gathering target GPU data according to an embodiment of the present disclosure. As shown in fig. 2, the GPU Client (GPU Client) includes: hardware GPU, metadata collection service (Meta), log data collection service (Log), trace data collection service (Trace), performance index data collection service (Metrics). The metadata collection service collects configuration data of the hardware GPU, the log data collection service collects log data of the hardware GPU, the tracking data collection service collects tracking data of the hardware GPU, and the performance index data collection service collects performance index data of the hardware GPU.

The GPU client comprises GPU software corresponding to the hardware GPU, wherein the GPU software can be an application program running based on the hardware GPU.

The metadata collection service collects configuration data for the GPU software. Under the condition that the first configuration of the GPU software is completed and the configuration is not updated, the metadata collection service only needs to collect the configuration information of the GPU software once; if the configuration of the GPU software is updated later, the metadata collection service can collect the configuration information of the GPU software again.

The log data collection service collects log data of the GPU software, wherein the log data of the GPU software can be log records in the running process of the GPU software, the log records of warning level or error level in the log data of the GPU software are subjected to key collection, and then after data analysis is performed based on the data, vulnerability resolution and performance optimization of the GPU software can be realized. To obtain a more comprehensive log record, the log data collection service may collect log data of the GPU software in real-time.

The performance index data collection service collects performance index data of the GPU software, wherein the performance index data of the GPU software can comprise frame rate, rendering time, calculation time and the like, and after data analysis is carried out based on the data, a GPU manufacturer can be helped to know the efficiency of the GPU software on a hardware GPU, and performance optimization is carried out. The performance index data collection service can collect the performance index data in the running process of the GPU software in real time.

As shown in fig. 2, the GPU client includes: GPU Software (GPU Software) corresponding to the hardware GPU. The metadata collection service collects configuration data of the GPU software, the log data collection service collects log data of the GPU software, and the performance index data collection service collects performance index data of the GPU software.

In one possible implementation, the GPU client includes: associated hardware on which the hardware GPU is required to rely when working; the GPU usage environment data includes at least one of: configuration data of associated hardware and performance index data of associated hardware; a metadata collection service for collecting configuration data of associated hardware; and the performance index data collection service is used for collecting the performance index data of the associated hardware.

The GPU client comprises associated hardware which is needed to be relied on when the hardware GPU works, wherein the associated hardware can be a central processing unit (Central Processing Unit, CPU), a hard disk and the like, and can reflect the use environment of the hardware GPU.

The metadata collection service collects configuration data of associated hardware, where the configuration data of associated hardware may include a CPU, memory, disk, network configuration, and the like. Under the condition that the configuration of the associated hardware is finished for the first time and the configuration is not updated, the metadata collection service only needs to collect the configuration information of the associated hardware once; if the associated hardware is subsequently updated in configuration, the metadata collection service may re-collect the configuration information of the associated hardware. The performance index data collection service collects performance index data for associated hardware.

The subsequent data analysis is based on GPU use environment data at least comprising configuration data of associated hardware and performance index data of the associated hardware, so that detection and optimization of the whole machine performance can be realized, and the method is greatly helpful in the aspect of avoiding the influence of the bottleneck of the whole machine performance on the hardware GPU use experience.

As shown in fig. 2, the GPU client includes: associated hardware (Hardware Environment) on which the hardware GPU is required to operate. The metadata collection service collects configuration data of associated hardware; the performance index data collection service collects performance index data for associated hardware.

In one possible implementation, the GPU client includes: a user of the hardware GPU; the GPU user data includes at least one of: user portrait data of the user and user behavior data of the user are used; a metadata collection service for collecting user portrayal data; and the performance index data collection service is used for collecting user behavior data.

The GPU client includes a user of the hardware GPU. The metadata collection service collects user profile data, which may include gender, age, occupation, and the like. The performance index data collection service collects user behavior data, which may include operating habits, feedback, opinion, and the like. The subsequent data analysis based on GPU user data including at least user portrait data, user behavior data may improve the functionality and performance of the hardware GPU.

As shown in fig. 2, the GPU client includes: user of hardware GPU (User). The metadata collection service collects user profile data of the user, and the performance index data collection service collects user behavior data of the user.

In one possible implementation, the GPU client includes: a data collector; a data collector for obtaining target GPU data from at least one of a metadata collection service, a log data collection service, a trace data collection service, a performance index data collection service; and the data collector is used for carrying out data compression on the target GPU data to obtain compressed data and uploading the compressed data to the public cloud.

As shown in fig. 2, the GPU client includes: data Collector (Data Collector). The data collector acquires log data from the log data collecting service in real time, stores the log data from the main memory to the disk, compresses the data, and periodically uploads the compressed data to the public cloud. The data collector collects the performance index data from the performance index data collection service in real time and uploads the performance index data to the public cloud in real time. The data collector acquires tracking data from the tracking data service, and uploads the tracking data to the public cloud on the premise of using the user to allow. The data collector acquires the configuration data from the metadata collection service and uploads the configuration data to the public cloud, generally, the configuration data is collected and sent when the system is started for the first time, and the configuration data is re-acquired and uploaded when the configuration update occurs later.

In the GPU data analysis system of the embodiment of the disclosure, the GPU client can collect and upload the data related to the user privacy to the public cloud only on the premise of being allowed by the user, so that the problem of data acquisition limitation is solved, and the data privacy and the data safety are effectively ensured.

The public cloud receives and stores the target GPU data uploaded by the GPU client, and prepares for transferring the target GPU data to a safer private cloud.

In one possible implementation, the public cloud includes: GPU data collection service, GPU data stream generator, public cloud distributed data stream platform; the GPU data collection service is used for receiving the compressed data, decompressing the compressed data to obtain target GPU data, and sending the target GPU data to the GPU data stream generator; and the GPU data stream generator is used for generating a GPU data stream corresponding to the target GPU data and sending the GPU data stream to the public cloud distributed data stream platform.

Fig. 3 illustrates a flow chart of public cloud receiving and storing data, according to an embodiment of the present disclosure. As shown in fig. 3, public clouds (Public clouds) include: GPU data collection service (GPU Data Collection Service), GPU data stream generator (GPU Data Stream Producer), public cloud distributed data stream platform (Public Cloud Distributed Streaming Platform).

As shown in fig. 3, the GPU data collection service receives compressed data uploaded by the GPU client. The GPU data collection service performs data decompression on the received compressed data to obtain target GPU data, namely original GPU data, and then sends the target GPU data to the GPU data stream generator. As shown in fig. 3, the GPU data collection service sends the target GPU data to the GPU dataflow generator.

The GPU data stream generator generates a GPU data stream corresponding to target GPU data and sends the GPU data stream to the public cloud distributed data stream platform to buffer data flow, and the processing mode plays an extremely important role in single large data transmission or high concurrency data transmission scenes which occupy bandwidth for a long time. As shown in fig. 3, the GPU dataflow generator generates and sends a GPU dataflow to the public cloud distributed data leveling platform.

In one possible implementation, the public cloud includes: a GPU data stream consumer, a first data checker; the GPU data stream consumer is used for acquiring a GPU data stream from the public cloud distributed data stream platform, recovering the GPU data stream into target GPU data and sending the target GPU data to the first data checker; and the first data checker is used for performing data check on the target GPU data.

As shown in fig. 3, the public cloud includes: a GPU dataflow consumer (GPU Data Stream Consumer), a first data checker (Raw Data Validator). As shown in fig. 3, the GPU dataflow consumer obtains the GPU dataflow from the public cloud distributed dataflow platform. In an example, the GPU data stream consumer sequentially obtains the GPU data streams from the public cloud distributed data stream platform according to a predetermined consumption rule, where the predetermined consumption rule may be set according to an actual situation, which is not specifically limited in the disclosure.

And the GPU data stream consumer recovers the obtained GPU data to obtain target GPU data, namely the original GPU, and then sends the target GPU data to the first data checker. As shown in fig. 3, the GPU data stream consumer sends the target GPU data to the first data checker, so that the first data checker performs data checking on the target GPU data. The specific form of the data verification can be flexibly selected according to actual needs, and the specific form is not specifically limited in the disclosure.

In one possible implementation, the public cloud includes: public cloud distributed object storage and public cloud transactional databases; the GPU data stream consumer is used for generating a target data block according to target GPU data after verification is passed, and sending the target data block to the public cloud distributed object storage for storage; and the GPU data stream consumer is used for generating target metadata corresponding to the target data block and sending the target metadata to the public cloud transactional database for storage.

As shown in fig. 3, the public cloud includes: public cloud distributed object store (Public Cloud Distributed Object Storage), public cloud transactional database (Public Cloud Transactional Database). After the target GPU data passes the verification, the GPU data stream consumer generates a target data block according to the target GPU data, and sends the target data block to the public cloud distributed object storage for storage so as to carry out data carrying on the private cloud later. And the GPU data stream consumer generates target metadata corresponding to the target data block, and sends the target metadata to the public cloud transactional data block for storage so as to carry out data verification when carrying data to the private cloud later. As shown in fig. 3, the GPU data stream consumer sends the target data block to the public cloud distributed object store for storage, and sends the target metadata to the public cloud transactional database for storage.

In one possible implementation, the public cloud includes: a data block task handling queue; and the GPU data stream consumer is used for generating a data block carrying task according to the target metadata and sending the data block carrying task to the data block carrying task queue.

As shown in fig. 3, the public cloud includes: a data block task handling queue (Bulk Moving Task Queue). And the GPU data stream consumer generates a data block carrying task according to the target metadata, and sends the data block carrying task to a data block carrying task queue for carrying out data carrying to the private cloud. As shown in fig. 3, the GPU dataflow consumer generates and sends a data block handling task to the data block task handling queue.

In an example, the public cloud further comprises a data encryption service for encrypting the target data block and the target metadata so as to ensure the security of the follow-up stored data. The data encryption mode can be flexibly selected according to actual needs, and the disclosure is not particularly limited.

The Private Cloud (Private Cloud) includes: a private cloud green zone (Private Cloud Green Zone), a private cloud yellow zone (Private Cloud Yellow Zone), a private cloud red zone (Private Cloud Red Zone). To avoid potential data security threats of public clouds, private clouds are divided into three regions of different security levels. The private cloud green areas with the lowest security level in the three areas are used for carrying public cloud data to private cloud in a planned manner, the private cloud yellow areas with the highest security level are used for carrying out corresponding processing on the data, and the private cloud red areas with the highest security level are used for completing data persistence storage, corresponding data analysis and other processing.

In one possible implementation, the private cloud greenfield includes: a data block handling planner, a data block handling resource pool, and a data handler; the data block carrying planner is used for acquiring the data block carrying task from the data block task carrying queue, applying for the target resource corresponding to the data block carrying task from the data block carrying resource pool, and sending the data block carrying task and the target resource to the data carrier; the data carrier is used for acquiring target metadata corresponding to the data carrying task from the public cloud transactional database by utilizing the target resource; and the data carrier is used for acquiring the target data block corresponding to the target metadata from the public cloud distributed object storage by utilizing the target resource.

Fig. 4 illustrates a flow chart of a private cloud greenfield receiving and storing data in accordance with an embodiment of the present disclosure. As shown in fig. 4, the private cloud greenfield includes: a data block handling planner (Bulk Moving Scheduler), a data block handling resource pool (Bulk Moving Resource Pool), a data handler (Bulk Moving Workers). As shown in fig. 4, the data block handling planner obtains the data block handling tasks from the data block task handling queues. The data block handling planner inquires whether enough resources exist in the data block handling resource pool to complete the data block handling task, and if so, the data block handling planner applies for target resources corresponding to the data block handling task from the data block handling resource pool. As shown in fig. 4, the data block handling planner applies for the target resource from the pool of data block handling resources. The target resources may include, among other things, computing resources, storage resources, network resources, and the like.

As shown in fig. 4, the data block handling planner sends the data block handling task and the target resource to the data handler, and the data handler acquires the target metadata corresponding to the data handling task from the public cloud transactional database by using the target resource; and the data carrier acquires the target data block corresponding to the target metadata from the public cloud distributed object storage by utilizing the target resource.

In one possible implementation, the private cloud greenfield includes: the second data checker, the private cloud greenfield distributed object store and the private cloud greenfield transactional database; a data handler for transmitting the target data block and the target metadata to a second data verifier; the second data checker is used for performing data check on the target data block according to the target metadata; and the data carrier is used for sending the target data block to the private cloud green area distributed object storage for storage after the verification is passed, and sending the target metadata to the private cloud green area transactional database for storage.

As shown in fig. 4, the private cloud greenfield includes: a second data verifier (Bulk Moving Validator), a private cloud greenfield distributed object store (Private Cloud Green Zone Distributed Object Storage), a private cloud greenfield transactional database (Private Cloud Green Zone Transactional Database). As shown in fig. 4, the data handler sends the target data block and the target metadata to the second data verifier such that the second data verifier performs data verification on the target data block according to the target metadata. The specific form of the data verification can be flexibly selected according to actual needs, and the specific form is not specifically limited in the disclosure.

After the target data block is verified according to the target metadata, as shown in fig. 4, the data handler sends the target data block to the private cloud green area distributed object storage for storage, and sends the target metadata to the private cloud green area transactional database for storage.

In one possible implementation, the private cloud greenfield includes: ETL task queues; and the data carrier is used for generating ETL tasks according to the target data blocks and the target metadata and sending the ETL tasks to the ETL task queue.

As shown in fig. 4, the private cloud greenfield includes: ETL Task queues (ETL Task Queue). The data handler generates an ETL job from the target data block and the target metadata, and sends the ETL job to the ETL job queue. As shown in fig. 4, the data handler sends ETL tasks to the ETL task queue.

In one possible implementation, the private cloud yellow zone comprises: an ETL machine, ETL line; the ETL device is used for acquiring an ETL task from the ETL task queue, acquiring a target data block corresponding to the ETL task from the private cloud green area distributed object storage, acquiring target metadata corresponding to the ETL task from the private cloud green area transactional database, and performing data verification on the target data block according to the target metadata; and the ETL device is used for creating an ETL pipeline instance according to the category of the target data block after verification, and sending the ETL pipeline instance to the ETL pipeline, wherein the ETL pipeline instance comprises the target data block.

Fig. 5 shows a flowchart of a private cloud yellow zone receiving and executing ETL processing according to an embodiment of the present disclosure. As shown in fig. 5, the private cloud yellow zone includes: ETL machines (ETL works), ETL pipelines (ETL Pipeline). As shown in fig. 5, the ETL device acquires the ETL task from the ETL task queue, further acquires a target data block corresponding to the ETL task from the private cloud green area distributed object storage, and acquires target metadata corresponding to the ETL task from the private cloud green area transactional database

And the ETL device performs data verification on the target data block according to the target metadata, and after the verification is passed, creates an ETL pipeline instance according to the category of the target data block, wherein the ETL pipeline instance comprises the target data block, and further sends the ETL pipeline instance to the ETL pipeline. As shown in FIG. 5, the ETL engine creates and sends ETL pipeline instances to the ETL pipeline.

In one possible implementation, the private cloud yellow zone comprises: data extraction service, data conversion service, data loading service, ETL data stream generator; an ETL pipeline for sending the ETL pipeline instance to a data extraction service; the data extraction service is used for carrying out data extraction on the target data blocks included in the ETL pipeline examples according to a first preset rule to obtain extracted data, and sending the extracted data to the data conversion service; the data conversion service is used for carrying out data conversion on the extracted data according to a second preset rule to obtain converted data, and sending the converted data to the data loading service; and the data loading service is used for sending the converted data to the ETL data stream generator.

As shown in fig. 5, the private cloud yellow zone includes: a data extraction Service (Extraction Service), a data conversion Service (Transformation Service), a data Loading Service (Loading Service), and an ETL data stream generator (ETL Data Stream Producers). As shown in FIG. 5, the ETL pipeline sends an ETL pipeline instance to the data extraction service for data extraction. The data extraction service performs data extraction on the target data blocks included in the ETL pipeline example according to a first preset rule to obtain extracted data, where the first preset rule may include one or more of data predefining, rules, schema, policies and compliance, or may flexibly select other rules according to actual situations, which is not specifically limited in the disclosure.

As shown in fig. 5, the data extraction service transmits the extracted data to the data conversion service for data conversion. The data conversion service performs data conversion on the extracted data according to a second preset rule to obtain converted data, where the second preset rule may include one or more of cleaning, normalization, filtering, sorting, merging, aggregation, segmentation, improvement, reasoning, and shielding, or may flexibly select other rules according to actual situations, which is not specifically limited in the disclosure.

As shown in fig. 5, the data conversion service transmits the converted data to the data loading service, and the data loading service transmits the converted data to the ETL data stream generator.

In one possible implementation, the private cloud red zone includes: private cloud red area distributed data stream platform; and the ETL data stream generator is used for generating an ETL data stream corresponding to the converted data and sending the ETL data stream to the private cloud red zone distributed data stream platform.

And the ETL data stream generator generates an ETL data stream corresponding to the converted data, and sends the ETL data stream to the private cloud red area distributed data stream platform, so that the subsequent data storage peak elimination is performed. As shown in fig. 5, the private cloud red area includes: the private cloud red distributed data stream platform (Private Cloud Red Zone Distributed Streaming Platform) the ETL data stream generator generates and transmits ETL data streams to the private cloud red distributed data stream platform.

In one possible implementation, the private cloud red zone includes: an ETL data consumer, a third data checker; the ETL data consumer is used for acquiring an ETL data stream from the private cloud red area distributed data stream platform, converting the ETL data stream into ETL data and sending the ETL data to the third data checker, wherein the ETL data is used for indicating GPU data after ETL processing is performed on the target data block and the target metadata; and the third data checker is used for performing data check on the ETL data.

Fig. 6 illustrates a flow chart of receiving and storing data in a private cloud red zone according to an embodiment of the present disclosure. As shown in fig. 6, the private cloud red area includes: an ETL data consumer (ETL Data Consumer), a third data checker (ETL Data Validator). The ETL data consumer acquires an ETL data stream from the private cloud red-zone distributed data stream platform, converts the ETL data stream into ETL data, and sends the ETL data to the third data verifier so that the third data verifier performs data verification on the ETL data. The specific form of the data verification can be flexibly selected according to actual needs, and the specific form is not specifically limited in the disclosure. As shown in fig. 6, the ETL data consumer obtains the ETL data stream from the private cloud red area distributed data stream platform and sends the ETL data to the third data verifier for data verification.

In one possible implementation, the private cloud red zone includes: private cloud data lakes, private cloud data warehouses, private cloud data markets; and the ETL data consumer is used for sending the ETL data to the private cloud data lake for storage after the verification is passed, sending the structured data contained in the ETL data to the private cloud data warehouse for storage, and sending the related data of the single service line contained in the ETL data to the private cloud data market for storage.

As shown in fig. 6, the private red area includes: private cloud data lake (Private Cloud Data Lake), private cloud data warehouse (Private Cloud Data Warehouse), private cloud data market (Private Cloud Data Mart). Different data can be stored in different storage locations of the private cloud red area in a persistent manner according to different uses of the data. As shown in fig. 6, the ETL data consumer sends ETL data to the private cloud data lake for storage, sends structured data included in the ETL data to the private cloud data warehouse for storage, and sends single service line related data included in the ETL data to the private cloud data market for storage. That is, all ETL data, including both structured and unstructured data, is stored in the private cloud data lake. Only structured data in ETL data is stored in the private cloud data warehouse. And the private cloud data market only stores the related data of the single service line in the ETL data.

In one possible implementation, the private cloud red zone includes: a data quality controller; and the data quality controller is used for carrying out quality detection on the data in the target storage to obtain a quality detection result, wherein the target storage is at least one of a private cloud data lake, a private cloud data warehouse and a private cloud data market.

Fig. 7 shows a flowchart of private cloud red area execution data processing according to an embodiment of the present disclosure. As shown in fig. 7, the private cloud red area includes: a data quality controller (Data Quality Controller). As shown in fig. 7, the data in the data quality controller target storage is quality detected. The target store herein may be any one or more of a private cloud data lake, a private cloud data warehouse, a private cloud data market. The quality detection includes detection of one or more indexes of rules, policies, compliance, constraints, precision, integrity, integration, requirements, limitations, etc., or the quality detection may also flexibly select other indexes according to actual situations to perform quality detection, which is not specifically limited in the present disclosure.

In one possible implementation, the private cloud red zone includes: a data analysis service; the data analysis service is used for carrying out data analysis on the data in the target storage to obtain a data analysis result, wherein the target storage is at least one of a private cloud data lake, a private cloud data warehouse and a private cloud data market, and the data analysis comprises at least one of the following steps: data mining, machine learning, data modeling, data segmentation.

As shown in fig. 7, the private cloud red area includes: a data analysis service (Data Analysis Service). As shown in fig. 7, the data analysis service performs data analysis on the data in the target storage. The target store herein may be any one or more of a private cloud data lake, a private cloud data warehouse, a private cloud data market. The data analysis includes one or more of data mining, machine learning, data modeling, and data segmentation, or the data analysis may flexibly select other analysis operations according to actual situations, which is not specifically limited in this disclosure. Data analysis oriented objects include: business oriented data analysts, data scientists, data model function designers, data engineers, etc., as this disclosure is not specifically limited thereto.

In one possible implementation, the private cloud red zone includes: a data visualization service; the data visualization service is used for performing visualization processing on the data in the target storage to obtain a visualization result, wherein the target storage is at least one of a private cloud data lake, a private cloud data warehouse and a private cloud data market, and the visualization result comprises at least one of the following: visual key numbers, visual charts, visual color labels, visual elastic data.

As shown in fig. 7, the private cloud red area includes: a data visualization service (Data Visualization Service). As shown in fig. 7, the data visualization service performs a visualization process on the data in the target storage. The target store herein may be any one or more of a private cloud data lake, a private cloud data warehouse, a private cloud data market. The visualization processing can generate visualization results of visualization key numbers, various visualization icons, visualization color labels, visualization elastic data and the like aiming at the use users and the use scenes so as to complete the visualization targets, and the visualization results can provide smooth narrative performance and deep understanding capability of the data.

In one possible implementation, the private cloud red zone includes: a data insight service; the data insight service is used for carrying out data insight analysis on data in the target storage and determining business insight indexes, wherein the target storage is at least one of a private cloud data lake, a private cloud data warehouse and a private cloud data market, and the business insight indexes comprise at least one of the following: user experience optimization, personal recommendation, marketing strategy refinement, and product features.

As shown in fig. 7, the private cloud red area includes: a data insight service (Data Insights Service). As shown in fig. 7, the data insight service performs data insight analysis on the data in the target storage. The target store herein may be any one or more of a private cloud data lake, a private cloud data warehouse, a private cloud data market. The data insight analysis oriented use users include: based on the data insight analysis, business insight indicators including one or more of user experience optimization, personal recommendation, marketing strategy refinement, product features, etc. can be determined to assist in making business decisions using the user.

It will be appreciated that the above embodiments mentioned in the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the descriptions of the embodiments are omitted. It will be appreciated by those of skill in the art that in the foregoing description of the specific embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.

In addition, the disclosure further provides an electronic device, a computer readable storage medium, and a program, where the foregoing may be used to implement any one of the GPU data analysis systems provided in the disclosure, and the corresponding technical schemes and descriptions and corresponding descriptions of the system parts are not repeated.

The method has specific technical association with the internal structure of the computer system, and can solve the technical problems of improving the hardware operation efficiency or the execution effect (including reducing the data storage amount, reducing the data transmission amount, improving the hardware processing speed and the like), thereby obtaining the technical effect of improving the internal performance of the computer system which accords with the natural law.

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described system. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.

The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to run the system described above.

Embodiments of the present disclosure also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when executed in a processor of an electronic device, causes the processor in the electronic device to operate the system described above.

The electronic device may be provided as a terminal, server or other form of device.

Fig. 8 shows a block diagram of an electronic device, according to an embodiment of the disclosure. Referring to fig. 8, an electronic device 1900 may be provided as a server or terminal device. Referring to fig. 8, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output interface 1958. Electronic device 1900 may operate an operating system based on memory 1932, such as the Microsoft Server operating system (Windows Server) ^TM ) Apple Inc. developed graphical user interface based operating System (Mac OS X ^TM ) Multi-user multi-process computer operating system (Unix) ^TM ) Unix-like operating system (Linux) of free and open source code ^TM ) Unix-like operating system (FreeBSD) with open source code ^TM ) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.

The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

If the technical scheme of the application relates to personal information, the product applying the technical scheme of the application clearly informs the personal information processing rule before processing the personal information, and obtains independent consent of the individual. If the technical scheme of the application relates to sensitive personal information, the product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'explicit consent'. For example, a clear and remarkable mark is set at a personal information acquisition device such as a camera to inform that the personal information acquisition range is entered, personal information is acquired, and if the personal voluntarily enters the acquisition range, the personal information is considered as consent to be acquired; or on the device for processing the personal information, under the condition that obvious identification/information is utilized to inform the personal information processing rule, personal authorization is obtained by popup information or a person is requested to upload personal information and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing mode, and a type of personal information to be processed.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A GPU data analysis system, the system comprising:

the public cloud is used for receiving and storing the target GPU data uploaded by the GPU client;

2. The system according to claim 1, wherein the GPU client comprises: hardware GPU, metadata collection service, log data collection service, trace data collection service and performance index data collection service.

3. The system of claim 2, wherein the GPU hardware data comprises at least one of: configuration data of the hardware GPU, log data of the hardware GPU, tracking data of the hardware GPU and performance index data of the hardware GPU;

the metadata collection service is used for collecting configuration data of the hardware GPU;

the log data collection service is used for collecting log data of the hardware GPU;

the tracking data collection service is used for collecting the tracking data of the hardware GPU;

and the performance index data collection service is used for collecting the performance index data of the hardware GPU.

4. The system according to claim 2, wherein the GPU client comprises: GPU software corresponding to the hardware GPU; the GPU software data includes at least one of: configuration data of the GPU software, log data of the GPU software and performance index data of the GPU software;

the metadata collection service is used for collecting configuration data of the GPU software;

the log data collection service is used for collecting log data of the GPU software;

and the performance index data collection service is used for collecting the performance index data of the GPU software.

5. The system according to claim 2, wherein the GPU client comprises: associated hardware on which the hardware GPU is required to rely when working; the GPU usage environment data includes at least one of: configuration data of the associated hardware and performance index data of the associated hardware;

the metadata collection service is used for collecting configuration data of the associated hardware;

the performance index data collection service is used for collecting performance index data of the associated hardware.

6. The system according to claim 2, wherein the GPU client comprises: the user of the hardware GPU; the GPU user data includes at least one of: user portrait data of the user and user behavior data of the user;

the metadata collection service is used for collecting the user portrait data;

the performance index data collection service is used for collecting the user behavior data.

7. The system according to any one of claims 2 to 6, wherein the GPU client comprises: a data collector;

the data collector is configured to obtain the target GPU data from at least one of the metadata collection service, the log data collection service, the trace data collection service, and the performance index data collection service;

The data collector is used for carrying out data compression on the target GPU data to obtain compressed data, and uploading the compressed data to the public cloud.

8. The system of claim 7, wherein the public cloud comprises: GPU data collection service, GPU data stream generator, public cloud distributed data stream platform;

the GPU data collection service is used for receiving the compressed data, decompressing the compressed data to obtain the target GPU data, and sending the target GPU data to the GPU data stream generator;

the GPU data stream generator is used for generating a GPU data stream corresponding to the target GPU data and sending the GPU data stream to the public cloud distributed data stream platform.

9. The system of claim 8, wherein the public cloud comprises: a GPU data stream consumer, a first data checker;

the GPU data stream consumer is used for acquiring the GPU data stream from the public cloud distributed data stream platform, recovering the GPU data stream into the target GPU data and sending the target GPU data to the first data verifier;

And the first data checker is used for performing data check on the target GPU data.

10. The system of claim 9, wherein the public cloud comprises: public cloud distributed object storage and public cloud transactional databases;

the GPU data stream consumer is used for generating a target data block according to the target GPU data after verification is passed, and sending the target data block to the public cloud distributed object storage for storage;

the GPU data stream consumer is used for generating target metadata corresponding to the target data block and sending the target metadata to the public cloud transactional database for storage.

11. The system of claim 10, wherein the public cloud comprises: a data block task handling queue;

the GPU data stream consumer is used for generating a data block carrying task according to the target metadata and sending the data block carrying task to the data block carrying task queue.

12. The system of claim 11, wherein the private cloud comprises: private cloud green areas, private cloud yellow areas and private cloud red areas;

The security level of the private cloud red zone is higher than that of the private cloud yellow zone, and the security level of the private cloud yellow zone is higher than that of the private cloud green zone.

13. The system of claim 12, wherein the private cloud greenfield comprises: a data block handling planner, a data block handling resource pool, and a data handler;

the data block handling planner is used for acquiring the data block handling task from the data block task handling queue, applying for a target resource corresponding to the data block handling task from the data block handling resource pool, and sending the data block handling task and the target resource to the data handler;

the data carrier is used for acquiring the target metadata corresponding to the data carrying task from the public cloud transactional database by utilizing the target resource;

and the data carrier is used for acquiring the target data block corresponding to the target metadata from the public cloud distributed object storage by utilizing the target resource.

14. The system of claim 13, wherein the private cloud greenfield comprises: the second data checker, the private cloud greenfield distributed object store and the private cloud greenfield transactional database;

The data handler is configured to send the target data block and the target metadata to the second data verifier;

the second data checker is configured to perform data checking on the target data block according to the target metadata;

and the data handler is used for sending the target data block to the private cloud green area distributed object storage for storage after verification is passed, and sending the target metadata to the private cloud green area transactional database for storage.

15. The system of claim 14, wherein the private cloud greenfield comprises: ETL task queues;

the data handler is configured to generate an ETL task according to the target data block and the target metadata, and send the ETL task to the ETL task queue.

16. The system of claim 15, wherein the private cloud yellow zone comprises: an ETL machine, ETL line;

the ETL device is used for acquiring the ETL task from the ETL task queue, acquiring the target data block corresponding to the ETL task from the private cloud green zone distributed object storage, acquiring the target metadata corresponding to the ETL task from the private cloud green zone transactional database, and performing data verification on the target data block according to the target metadata;

The ETL device is used for creating an ETL pipeline instance according to the category of the target data block after verification is passed and sending the ETL pipeline instance to the ETL pipeline, wherein the ETL pipeline instance comprises the target data block.

17. The system of claim 16, wherein the private cloud yellow zone comprises: data extraction service, data conversion service, data loading service, ETL data stream generator;

the ETL pipeline is used for sending the ETL pipeline instance to the data extraction service;

the data extraction service is used for carrying out data extraction on the target data block according to a first preset rule to obtain extracted data, and sending the extracted data to the data conversion service;

the data conversion service is used for carrying out data conversion on the extracted data according to a second preset rule to obtain converted data, and sending the converted data to the data loading service;

the data loading service is used for sending the converted data to the ETL data stream generator.

18. The system of claim 17, wherein the private cloud red zone comprises: private cloud red area distributed data stream platform;

The ETL data stream generator is used for generating an ETL data stream corresponding to the converted data and sending the ETL data stream to the private cloud red zone distributed data stream platform.

19. The system of claim 18, wherein the private cloud red zone comprises: an ETL data consumer, a third data checker;

the ETL data consumer is configured to obtain the ETL data stream from the private cloud red area distributed data platform, convert the ETL data stream into ETL data, and send the ETL data to the third data verifier, where the ETL data is used to indicate GPU data after performing ETL processing on the target data block and the target metadata;

and the third data checker is used for performing data check on the ETL data.

20. The system of claim 19, wherein the private cloud red zone comprises: private cloud data lakes, private cloud data warehouses, private cloud data markets;

and the ETL data consumer is used for sending the ETL data to the private cloud data lake for storage after verification is passed, sending the structured data contained in the ETL data to the private cloud data warehouse for storage, and sending the single service line related data contained in the ETL data to the private cloud data market for storage.

21. The system of claim 20, wherein the private cloud red zone comprises: a data quality controller;

the data quality controller is used for carrying out quality detection on data in a target storage to obtain a quality detection result, wherein the target storage is at least one of the private cloud data lake, the private cloud data warehouse and the private cloud data market.

22. The system of claim 20, wherein the private cloud red zone comprises: a data analysis service;

the data analysis service is configured to perform data analysis on data in a target storage to obtain a data analysis result, where the target storage is at least one of the private cloud data lake, the private cloud data warehouse, and the private cloud data market, and the data analysis includes at least one of the following: data mining, machine learning, data modeling, data segmentation.

23. The system of claim 20, wherein the private cloud red zone comprises: a data visualization service;

the data visualization service is configured to perform visualization processing on data in a target storage to obtain a visualization result, where the target storage is at least one of the private cloud data lake, the private cloud data warehouse, and the private cloud data market, and the visualization result includes at least one of the following: visual key numbers, visual charts, visual color labels, visual elastic data.

24. The system of claim 20, wherein the private cloud red zone comprises: a data insight service;

the data insight service is configured to perform data insight analysis on data in a target storage, and determine a business insight index, where the target storage is at least one of the private cloud data lake, the private cloud data warehouse, and the private cloud data market, and the business insight index includes at least one of the following: user experience optimization, personal recommendation, marketing strategy refinement, and product features.

25. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the instructions stored in the memory to run the system of any of claims 1 to 24.

26. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the system of any of claims 1 to 24.