WO2023103190A1 - Multi-level linkage transparent sample model sharing apparatus for artificial intelligence platform - Google Patents

Multi-level linkage transparent sample model sharing apparatus for artificial intelligence platform Download PDF

Info

Publication number
WO2023103190A1
WO2023103190A1 PCT/CN2022/079255 CN2022079255W WO2023103190A1 WO 2023103190 A1 WO2023103190 A1 WO 2023103190A1 CN 2022079255 W CN2022079255 W CN 2022079255W WO 2023103190 A1 WO2023103190 A1 WO 2023103190A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample model
artificial intelligence
data
file
subsystem
Prior art date
Application number
PCT/CN2022/079255
Other languages
French (fr)
Chinese (zh)
Inventor
宋立华
邱镇
苏江文
黄晓光
吴佩颖
Original Assignee
福建亿榕信息技术有限公司
国网信息通信产业集团有限公司
国网四川省电力公司
国家电网有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 福建亿榕信息技术有限公司, 国网信息通信产业集团有限公司, 国网四川省电力公司, 国家电网有限公司 filed Critical 福建亿榕信息技术有限公司
Publication of WO2023103190A1 publication Critical patent/WO2023103190A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Definitions

  • the invention relates to the technical field of artificial intelligence, in particular to a device for transparently sharing sample models of a multi-level linkage artificial intelligence platform.
  • Artificial intelligence technology has gradually become a key element to promote the development of productivity, change the production operation mode, and improve production efficiency.
  • large enterprises have also developed and launched their own “artificial intelligence platforms" to realize the aggregation and integration of artificial intelligence-related capabilities, including face authentication, process robots, knowledge Provide support for various enterprise artificial intelligence application scenarios such as retrieval and risk prevention and control.
  • the so-called artificial intelligence platform usually consists of "two libraries and one platform", as well as a sample library, a model library and an operating platform.
  • the sample library is a component for storing and managing various professional and various sample resources, relying on functions such as sample storage, sample preprocessing, sample labeling, sample label management, and sample service catalog to provide sample resources for artificial intelligence model training;
  • model As a component for storing and managing general-purpose and special-purpose models of various disciplines, the library provides various general-purpose and power-specific algorithm models, relying on functions such as model testing, image packaging, version management, model uploading, model downloading, and model service catalogs to provide AI applications Provides intelligent model resources;
  • the operating platform provides functions such as model import, model verification, model deployment, service release, and cloud-edge collaboration, supporting model reasoning and application integration.
  • the network environment for applying artificial intelligence technology includes physically isolated intranets, extranets, and the Internet.
  • the network of a large enterprise generally involves three types: a physically isolated proprietary information network (intranet), a logically isolated proprietary information network (external network), and the Internet.
  • intranet physically isolated proprietary information network
  • external network logically isolated proprietary information network
  • the Internet the Internet area cannot store and use confidential data in any form
  • the external network can use and cache low-level files
  • the internal network can use and store all confidential files for a long time;
  • the technical problem that needs to be solved for the transparent and secure sharing of sample models between multi-level artificial intelligence platforms in large enterprises is mainly: how to realize the high-performance transmission of ultra-large-capacity artificial intelligence model data and sample data between multi-level organizations and multi-type networks , transparent sharing, and supports common security equipment, in line with enterprise security regulations.
  • the existing technical solution is mainly aimed at large files, and solves the problem of high-performance transmission of large files through file data fragmentation and multi-threaded parallel transmission.
  • a typical comparative document is the name of the invention: a large file transmission method, device and system, and the application number is: 202011337777.9, which decomposes large file transmission into three links: file fragmentation, multi-thread transmission, and file identification-based merging. ;
  • This solution better improves the performance of large file transfers and reduces the failure rate.
  • it does not solve the time-consuming problem of file integrity guarantee or digital digest calculation involved in integrity guarantee.
  • the technical problem to be solved by the present invention is to provide a multi-level linkage artificial intelligence platform sample model transparent sharing device to solve the related requirements of transparent sharing, safe sharing, and efficient transmission of massive sample model data on cross-regional multi-level artificial intelligence platforms.
  • the present invention provides a multi-level linkage artificial intelligence platform sample model transparent sharing device, comprising: a global directory service subsystem, at least one sample model transparent sharing subsystem and at least one artificial intelligence platform; the artificial intelligence platform is transparent to the sample model One-to-one pairing deployment of shared subsystems; each sample model transparent shared subsystem is connected to said global directory service subsystem;
  • the sample model transparent sharing subsystem stores and transmits the sample model data synchronously.
  • sample model transparent sharing subsystem includes local directory service, global synchronization service and data storage service; specifically includes sample model update and sample model cross-platform sharing;
  • the sample model update includes: the artificial intelligence platform calls the local directory service of the sample model transparent sharing system deployed in the same network area, and submits the file data; the local directory service calls the local data storage service to store the file data, and at the same time, the newly added The directory of the file data is submitted to the global directory service subsystem as a message text; the global directory service subsystem updates the directory;
  • the cross-platform sharing of the sample model includes: the local catalog service initiates a query to the global catalog service subsystem at intervals of a set time, and the global catalog service subsystem returns catalog data changes that occurred within the past set time to the global synchronization service ; After obtaining the changed global catalog data, the global synchronization service calls the local catalog service to merge and update the local catalog.
  • the data storage service is provided with a network isolation device adaptation plug-in, which extracts the network isolation device adaptation function separately and designs a unified interface for adapting to different Firewall and information security isolation device in network environment.
  • the data storage service is provided with a storage resource read-write module;
  • the storage resource read-write module is a cloud storage protocol aimed at the mainstream in the Java language, which unifies the block data read-write interface and supports modification of all data through configuration files.
  • the storage resource read-write module will need temporarily cached files according to the confidentiality level of the file and the enterprise's configuration requirement information on whether data with different confidentiality levels can be stored for a long time in different network areas, whether it can be temporarily cached, and how long the temporary cache time is.
  • Write into the distributed cache and set the expiration time at the same time; the distributed cache is IT middleware, which supports automatic deletion of configuration expiration; and the artificial intelligence platform accesses the sample model file according to the returned file path; for confidential data, the artificial intelligence The platform does not provide file secondary distribution function.
  • synchronous transmission is further specifically:
  • the file receiver receives the file, calculates the digital summary of the fixed block in parallel, and saves them one by one;
  • the embodiment of the present application provides a multi-level linkage artificial intelligence platform sample model transparent sharing device, which consists of a "global directory service subsystem” and a “sample model transparent sharing subsystem” to form a basic service facility that supports the transparent sharing of artificial intelligence platform model samples
  • the architecture system and through the "transparent sharing mechanism of model files based on hierarchical directory", "high-performance sample model data synchronization and heterogeneous storage integration based on segmented transmission verification in cross-network environment”, “cross- Network regional data security compliance utilization” and other solutions to solve the related needs of transparent sharing, safe sharing, and efficient transmission of massive sample model data on cross-regional multi-level artificial intelligence platforms.
  • Figure 1 is a schematic diagram of the structure of the transparent and secure sharing of sample models between multi-level artificial intelligence platforms of large-scale enterprises in the prior art
  • Fig. 2 is the overall architecture diagram of the device of the present invention.
  • Fig. 3 is a schematic diagram of the transparent sharing mechanism of model files based on hierarchical directories in the present invention
  • Fig. 4 is a schematic diagram of the high-performance sample model data synchronization scheme based on segmented transmission and verification in the present invention
  • FIG. 5 is a sequence diagram of the security compliance utilization of data across network regions based on the unified caching scheme in the present invention.
  • the content of the invention mainly includes the following parts:
  • a transparent sharing mechanism based on a hierarchical directory is proposed, that is, the unified sample model directory of the entire network is maintained through the "global directory" to ensure global information consistency, and all requests from the local artificial intelligence platform are taken over through the "local directory service” and communicated with it.
  • the collaboration of the global directory service can quickly query the data distribution of the entire network without the local artificial intelligence platform being aware of the global directory.
  • a high-performance sample model data synchronization and heterogeneous storage integration design based on segmentation transmission verification in a cross-network environment A high-performance sample model data synchronization design based on segmented transmission and verification is proposed.
  • the design splits the file into segments, transfers and verifies by segment.
  • the method can significantly improve the synchronization performance of sample model data; it is proposed to extract the network isolation device adaptation function separately and design it into a plug-in form with a unified interface, so as to realize integration with different devices and improve the adaptability of the system to different network environments ;Proposed to set up a separate "storage resource read-write module” module, and adapt to different storage resources through plug-in, and support the technical solution of the evolution of the storage technology route.
  • a design for cross-network regional data security and compliance utilization based on a unified caching scheme is proposed, which converts the "cross-network security utilization problem" of confidential files into the cache time problem of files with different confidentiality levels in different network areas, completely avoiding additional encryption overhead, and satisfying enterprises.
  • the data security specification solves the problem of cross-network regional data security compliance and utilization at low cost.
  • the transparent sharing mechanism design of model files based on hierarchical directories Through the design of the overall system architecture, the transparent sharing mechanism design of model files based on hierarchical directories, the high-performance sample model data synchronization and heterogeneous storage integration design based on segmented transmission verification in a cross-network environment, and the cross-network regional data based on a unified caching scheme
  • the specific implementation of the present invention is described in four aspects of security and compliance utilization design.
  • the overall architecture is composed of "global directory service subsystem” and “sample model transparent sharing subsystem".
  • the "global directory service subsystem” only needs to deploy one service example in the whole network;
  • the “sample model transparent sharing subsystem” is deployed in a one-to-one pairing with the artificial intelligence platform, which can be used as part of the artificial intelligence platform service group or as a separate Service, providing complete sample model data storage and synchronous transmission services for artificial intelligence.
  • the hierarchically deployed artificial intelligence platform can upload samples and model data through any deployment point.
  • the present invention proposes a transparent sharing mechanism based on hierarchical directories, that is, maintains a unified sample model directory for the entire network through a "global directory" to ensure global information consistency; through " "Local directory service” takes over all requests from the local artificial intelligence platform, and through its collaboration with the global directory service, it can quickly query the data distribution of the entire network without the local artificial intelligence platform being aware of the global directory.
  • the global directory and the local directory together constitute the AI sample model directory service that supports AI platforms at all levels.
  • the global synchronization is limited to directory data, and the files of the sample models are still maintained locally, and they are only transferred on demand when they need to be called from different places later.
  • the directory data is much smaller than the sample model file itself, thus effectively avoiding repeated storage and transmission of a large amount of data while supporting network-wide sharing.
  • the global transparent sharing mechanism scheme includes a two-stage process of "sample model update” and “sample model cross-platform sharing”:
  • Step 1 Upload data locally.
  • the artificial intelligence platform after the user uploads the interface, modifies the sample with the labeling tool, or trains and generates a new model, the artificial intelligence platform calls the local "local directory service" of the "sample model transparent sharing system" deployed in the same network area, Submit file data.
  • Step 2 Submit to the global directory.
  • the "local directory service” calls the local “data storage service” to store file data, and at the same time submits the new data directory (including name, metadata, etc.) as message text to the distributed message middleware of the "global directory”.
  • Step 3 Update to the global catalog.
  • "Global directory service” monitors the messages of the local distributed message middleware, and updates the content of the messages to the global directory. Relying on the high-availability and high-consistency features of the distributed message middleware, it can ensure that the content in the global directory is complete and non-repetitive.
  • Step 1 Scheduled synchronization requests for the global directory.
  • the local "global synchronization service” initiates a query to the global directory service periodically (eg, every hour), and the "global directory service” returns the directory data changes that occurred in the past hour to the "global synchronization service”.
  • Step 2 Local directory merge updates. After obtaining the changed global catalog data, the "global synchronization service" calls the update interface of the local catalog service, and submits the changed catalog data to the local catalog for merge update.
  • the models and sample data in the artificial intelligence platform will be modified many times throughout the life cycle, such as adding sample annotations, or model superposition and fusion, etc.
  • the file itself may only undergo partial changes. If only the changed content is transmitted as much as possible, the transmission efficiency of the sample model data between multiple platforms can be greatly improved.
  • the mainstream solution usually adopts digital digest technology (such as MD5), and calculates the digital digest of a single file before and after file synchronous transmission. If the two are completely equal, it can prove that the synchronized data is complete.
  • digital digest technology such as MD5
  • the execution of the data summary algorithm is usually very time-consuming and proportional to the file size, the method of calculating the digital summary for a single file takes a lot of time. If the running time of the digital summary algorithm can be reduced, it will help to improve the model data. synchronization transmission efficiency.
  • the present invention proposes a synchronous design of high-performance sample model data based on segmented transmission and verification. Aiming at the characteristics of multiple partial changes in the entire life cycle of the sample model data of the artificial intelligence platform, the design splits the file into segments, transmits them by segment, and The verification method can significantly improve the synchronization performance of sample model data.
  • the overall scheme is shown in the figure.
  • Segmented transmission and verification It is executed by the "segmented transfer verification module" during the file transfer process.
  • the specific process is:
  • the file receiver receives the file, calculates the digital summary of the fixed block in parallel, and saves them one by one;
  • the segmented transmission and verification design provided by the present invention can effectively utilize the idle computing resources of the current multi-core computer system to carry out file transmission and digital summary calculation in parallel, thereby improving the performance of file transmission and integrity verification.
  • Network isolation device adapter plug-in In an enterprise area network interconnection environment, different network partitions may be connected through "firewalls" or "information security isolation devices". These devices, especially “information security isolation devices”, usually do not support transparent transmission, but provide unique interfaces Called by the data transfer process.
  • the present invention extracts the adaptation function of the network isolation device separately, and designs it into a plug-in form of a unified interface, so as to realize integration with different devices and improve the adaptability of the system to different network environments.
  • a single file of an artificial intelligence sample model may reach the size of GB. In a mature artificial intelligence platform, it may consume hundreds of terabytes or even petabytes of storage resources to store each sample model file, which puts forward high requirements on storage resources. Since the informatization infrastructure structures in different regions are inconsistent, and there may be many different storage resources at the same time (such as enterprise private cloud storage, distributed storage, centralized storage array equipment, etc.), the present invention sets up a separate "storage resource read "Write Module” module, similar to "Network Isolation Device Adaptation", adapts to different storage resources through plug-in, and supports the evolution of storage technology routes.
  • the mainstream cloud storage protocol such as the S3 protocol
  • the block data read-write interface have been uniformly implemented, and it supports modification of all resources through configuration files.
  • the "storage resource read-write module” is also the main carrier to realize the "safe and compliant utilization of data across network regions based on the unified cache solution” solution, please refer to the next step for related introduction.
  • the present invention proposes a data security and compliance utilization scheme based on unified caching, which converts the "cross-network security utilization problem" of confidential files into the caching time problem of files with different confidentiality levels in different network areas, completely avoids additional encryption overhead, and satisfies
  • the enterprise data security specification that "the Internet area cannot store and use confidential data in any form, the external network can use and cache low-level files, and the internal network can use and store all confidential files for a long time" solves the problem at a low cost to a certain extent. Cross-network area data security and compliance use issues.
  • the “storage resource read-write module” is based on the confidentiality level of the file and the enterprise's configuration requirements for whether data with different confidentiality levels can be stored for a long time in different network areas, whether it can be temporarily cached, and how long the temporary cache time is. Write the files that need to be cached temporarily (such as when the files of ordinary secret level are used in the Internet area) into the “distributed cache”, and set the expiration time at the same time.
  • “Distributed cache” is the current mainstream IT middleware, which supports automatic deletion of configuration expiration, which can meet the requirements of this solution; the artificial intelligence platform system accesses the sample model files according to the returned file path. For confidential data, the platform does not provide secondary distribution functions such as file downloads on the interface, so as to achieve compliance with enterprise data security regulations.
  • This embodiment provides a multi-level linkage artificial intelligence platform sample model transparent sharing device, including: a global directory service subsystem, at least one sample model transparent sharing subsystem and at least one artificial intelligence platform; the artificial intelligence platform is transparent to the sample model One-to-one pairing deployment of shared subsystems; each sample model transparent shared subsystem is connected to said global directory service subsystem;
  • the sample model transparent sharing subsystem stores and synchronously transmits sample model data
  • the synchronous transmission is further specifically: before the transmission, the file is split into blocks of the set threshold MB until all blocks are less than or equal to the set threshold MB, if the file is smaller than the set threshold MB, then no splitting is performed, Calculate the digital summaries of all blocks, merge them into one digital summaries, and then transmit them in parallel in multi-threaded blocks;
  • the file receiver receives the file, calculates the digital summary of the fixed block in parallel, and saves them one by one;
  • the sample model transparent sharing subsystem includes local directory service, global synchronization service and data storage service; specifically includes sample model update and sample model cross-platform sharing;
  • the sample model update includes: the artificial intelligence platform calls the local directory service of the sample model transparent sharing system deployed in the same network area, and submits the file data; the local directory service calls the local data storage service to store the file data, and at the same time, the newly added The directory of the file data is submitted to the global directory service subsystem as a message text; the global directory service subsystem updates the directory;
  • the cross-platform sharing of the sample model includes: the local catalog service initiates a query to the global catalog service subsystem at intervals of a set time, and the global catalog service subsystem returns catalog data changes that occurred within the past set time to the global synchronization service ; After obtaining the changed global catalog data, the global synchronization service calls the local catalog service to merge and update the local catalog.
  • the data storage service is provided with a network isolation device adapter plug-in, which extracts the network isolation device adaptation function separately and designs a unified interface for adapting to different network environments Firewall and information security isolation device.
  • the data storage service is provided with a storage resource read-write module;
  • the storage resource read-write module is Java language for the mainstream cloud storage protocol, which unifies the block data read-write interface and supports modification of the specific data used by configuration files. Realize and implement plug-in management.
  • the storage resource read-write module writes the files that need to be temporarily cached into distribution according to the confidentiality level of the file and the enterprise's configuration requirement information on whether data with different confidentiality levels can be stored for a long time in different network areas, whether it can be temporarily cached, and how long the temporary cache time is. cache, and set the expiration time at the same time; the distributed cache is an IT middleware, which supports automatic deletion after configuration expiration; and the artificial intelligence platform accesses the sample model file according to the returned file path; for confidential data, the artificial intelligence platform does not provide File secondary distribution function.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A multi-level linkage transparent sample model sharing apparatus for an artificial intelligence platform. The apparatus comprises: a global directory service subsystem, at least one transparent sample model sharing subsystem and at least one artificial intelligence platform, wherein the artificial intelligence platform and the transparent sample model sharing subsystem are deployed in a manner of pairing with each other on a one-to-one basis; each transparent sample model sharing subsystem is connected to the global directory service subsystem; all sample model directories are maintained by means of the global directory service subsystem, such that the consistency of the sample model directories is ensured; and a request from a local artificial intelligence platform is taken over by the transparent sample model sharing subsystem, data distribution over the whole network is queried in a manner of cooperating with the global directory service subsystem, and the storage and synchronous transmission of sample model data are then performed by means of the transparent sample model sharing subsystem, thereby satisfying related requirements of transparent sharing, secure sharing and efficient transmission of mass sample model data of a cross-region multi-level artificial intelligence platform.

Description

一种多级联动人工智能平台样本模型透明共享装置A multi-level linkage artificial intelligence platform sample model transparent sharing device 技术领域technical field
本发明涉及人工智能技术领域,特别涉及一种多级联动人工智能平台样本模型透明共享装置。The invention relates to the technical field of artificial intelligence, in particular to a device for transparently sharing sample models of a multi-level linkage artificial intelligence platform.
背景技术Background technique
人工智能技术日渐成为推动生产力发展,改变生产作业模式,提高生产效率的关键要素。为支撑人工智能应用的的规模化开发和运行,大型企业也纷纷研发、上线了各自的“人工智能平台”,实现人工智能相关能力的汇总与融合,为包括人脸身份验证、流程机器人、知识检索、风险防控等各类企业人工智能应用场景提供支撑。Artificial intelligence technology has gradually become a key element to promote the development of productivity, change the production operation mode, and improve production efficiency. In order to support the large-scale development and operation of artificial intelligence applications, large enterprises have also developed and launched their own "artificial intelligence platforms" to realize the aggregation and integration of artificial intelligence-related capabilities, including face authentication, process robots, knowledge Provide support for various enterprise artificial intelligence application scenarios such as retrieval and risk prevention and control.
所谓人工智能平台,通常由“两库一平台”构成,及样本库、模型库和运行平台。其中,样本库作为存储和管理各专业、各类型样本资源的组件,依托样本入库、样本预处理、样本标注、样本标签管理和样本服务目录等功能,为人工智能模型训练提供样本资源;模型库作为存储和管理各专业通用和专用模型的组件,提供各类通用及电力专用算法模型,依托模型测试、镜像封装、版本管理、模型上传、模型下载和模型服务目录等功能,为人工智能应用提供智能模型资源;运行平台提供模型导入、模型校验、模型部署、服务发布和云边协同等功能,支撑模型推理和应用集成。The so-called artificial intelligence platform usually consists of "two libraries and one platform", as well as a sample library, a model library and an operating platform. Among them, the sample library is a component for storing and managing various professional and various sample resources, relying on functions such as sample storage, sample preprocessing, sample labeling, sample label management, and sample service catalog to provide sample resources for artificial intelligence model training; model As a component for storing and managing general-purpose and special-purpose models of various disciplines, the library provides various general-purpose and power-specific algorithm models, relying on functions such as model testing, image packaging, version management, model uploading, model downloading, and model service catalogs to provide AI applications Provides intelligent model resources; the operating platform provides functions such as model import, model verification, model deployment, service release, and cloud-edge collaboration, supporting model reasoning and application integration.
人工智能平台中包含的各类数据样本、模型,需要投入大量的智力资源和人力劳动制作。无论是外购或自研,都希望能在整个企业范围内集约使用,避免重复采购或研发。另一方面,对于包括中央企业等大型企业而言,其分支生成机构分布在全国乃至全世界,应用人工智能技术的网络环境包括物理隔离的内网、外网及互联网,应用环境包括企业内部和作业现场,从实时访问的性能和应用推广难度等角度考虑,无法靠一套人工智能平台为所有用户提供服务,而需要在不同的分支结构、网络中部署人工智能平台,从而大型 企业有很强的意愿打通各个部署点的人工智能平台,实现多级人工智能平台间数据样本和模型文件的透明共享。All kinds of data samples and models contained in the artificial intelligence platform require a lot of intellectual resources and human labor to produce. Whether it is outsourcing or self-developed, it is hoped that it can be used intensively throughout the enterprise to avoid repeated procurement or research and development. On the other hand, for large enterprises, including central enterprises, whose branches are distributed throughout the country and even the world, the network environment for applying artificial intelligence technology includes physically isolated intranets, extranets, and the Internet. On the job site, considering the performance of real-time access and the difficulty of application promotion, it is impossible to rely on a set of artificial intelligence platforms to provide services for all users, but it is necessary to deploy artificial intelligence platforms in different branch structures and networks, so that large enterprises have strong The willingness to open up the artificial intelligence platforms at each deployment point to realize the transparent sharing of data samples and model files between multi-level artificial intelligence platforms.
如图1所示,大型企业多级人工智能平台间样本模型透明安全共享需求;针对面向大企业的多级人工智能平台部署及多级平台间样本模型的透明安全共享难点问题,目前存在的技术难点主要包括:As shown in Figure 1, the requirements for transparent and secure sharing of sample models between multi-level artificial intelligence platforms of large enterprises; for the deployment of multi-level artificial intelligence platforms for large enterprises and the difficulties of transparent and secure sharing of sample models between multi-level platforms, the existing technologies Difficulties mainly include:
(1)难以透明共享:需要在多级人工智能平台之间(包括总部、地区中心、边缘侧作业现场等)、不同网络之间(内网、外网、互联网)共享所有人工智能模型、样本数据,为所有用户提供统一的目录及访问手段。如何面向大量重复数据存储的情况下,让不同区域的用户可以访问全网的模型和样本,是需要考量的问题;(1) It is difficult to share transparently: it is necessary to share all artificial intelligence models and samples between multi-level artificial intelligence platforms (including headquarters, regional centers, edge-side operation sites, etc.), and between different networks (intranet, extranet, Internet) Data, providing a unified directory and access means for all users. How to allow users in different regions to access the models and samples of the entire network in the case of a large amount of repeated data storage is a problem that needs to be considered;
(2)难以实现安全规范统一遵从:不同级别平台、网络有不同的数据安全规范和特殊装置(防火墙、信息安全隔离装置),这对实现跨网络级别的模型样本共享带来挑战:不同网络的“密级”要求不同,多级联动的人工智能平台需要满足不同网络区域的密级规范,提供一致、完整的支持。大型企业的网络一般会涉及三种:物理隔离的专有信息网络(内网)、逻辑隔离的专有信息网络(外网)和互联网。在三种网络中,互联网区域不能以任何形式存储使用涉密数据、外网可以使用及缓存低密级的文件、内网则可以使用及长期保存所有密级文件;(2) It is difficult to achieve unified compliance with security regulations: different levels of platforms and networks have different data security specifications and special devices (firewalls, information security isolation devices), which brings challenges to the realization of model sample sharing across network levels: different networks The "secrecy level" requirements are different, and the multi-level linkage artificial intelligence platform needs to meet the security level specifications of different network areas and provide consistent and complete support. The network of a large enterprise generally involves three types: a physically isolated proprietary information network (intranet), a logically isolated proprietary information network (external network), and the Internet. Among the three types of networks, the Internet area cannot store and use confidential data in any form, the external network can use and cache low-level files, and the internal network can use and store all confidential files for a long time;
(3)对数据传输的传输性能传输及完整性校验:多级人工智能平台需要传输的数据分两类,一是单体GB级别的模型大文件,二是数量众多但单个文件较小的数据样本文件(如图片、音频等)。在不同级别平台,不同网络环境之间,如何充分利用网络带宽和信息安全隔离装置带宽,实现GB级别大模型和KB级别小样本文件的高效传输共享,也是需要整体考虑的问题。(3) Transmission performance transmission and integrity verification of data transmission: The data that needs to be transmitted by the multi-level artificial intelligence platform is divided into two categories, one is a single GB-level model large file, and the other is a large number but a single file is small Data sample files (such as pictures, audio, etc.). Between different levels of platforms and different network environments, how to make full use of network bandwidth and information security isolation device bandwidth to realize efficient transmission and sharing of GB-level large models and KB-level small sample files is also a problem that needs to be considered as a whole.
因此,大型企业多级人工智能平台间样本模型透明安全共享需要解决的技术问题主要是:如何在多级机构、多类型网络之间,实现超大容量的人工智能模型数据和样本数据的高性能传输、透明共享,并支持通用的安全设备、符合企业安全规定。Therefore, the technical problem that needs to be solved for the transparent and secure sharing of sample models between multi-level artificial intelligence platforms in large enterprises is mainly: how to realize the high-performance transmission of ultra-large-capacity artificial intelligence model data and sample data between multi-level organizations and multi-type networks , transparent sharing, and supports common security equipment, in line with enterprise security regulations.
目前,尚未见公开文献有针对大型企业多级人工智能平台间样本模型透明安全共享问题给出整体解决方案。但针对其中涉及的技术问题,包括大文件高性能传输、网络间数据传输等都存在有技术方案。分析如下:At present, there is no public literature that provides an overall solution to the problem of transparent and secure sharing of sample models among multi-level artificial intelligence platforms in large enterprises. However, there are technical solutions for the technical problems involved, including high-performance transmission of large files and data transmission between networks. analyse as below:
现有技术方案主要针对大文件,通过文件数据分片及多线程并行传输,解决大文件高性能传输问题。典型的对比文件是发明名称为:一种大文件传输方法、装置及系统,申请号为:202011337777.9,其将大文件传输分解为文件分片、多线程传输、基于文件标识的合并三个环节完成;该方案较好提升了大文件传输的性能、降低失败率。但没有解决文件完整性保证,或完整性保证所涉及的数字摘要计算较为耗时的问题。The existing technical solution is mainly aimed at large files, and solves the problem of high-performance transmission of large files through file data fragmentation and multi-threaded parallel transmission. A typical comparative document is the name of the invention: a large file transmission method, device and system, and the application number is: 202011337777.9, which decomposes large file transmission into three links: file fragmentation, multi-thread transmission, and file identification-based merging. ; This solution better improves the performance of large file transfers and reduces the failure rate. However, it does not solve the time-consuming problem of file integrity guarantee or digital digest calculation involved in integrity guarantee.
综上所述,目前尚未见公开文献有针对大型企业多级人工智能平台间样本模型透明安全共享问题给出整体解决方案;涉及的海量文件的高性能传输技术、网间数据双向交换技术等,不能完全满足本发明背景中指出的数据高效传输、安全合规、透明共享方面存在的问题现状,对大型企业人工智能平台多级部署的相关问题均不完全适用。To sum up, there is currently no public literature that provides an overall solution to the problem of transparent and secure sharing of sample models between multi-level artificial intelligence platforms in large enterprises; the high-performance transmission technology of massive files involved, the two-way data exchange technology between networks, etc. It cannot fully meet the current problems in efficient data transmission, security compliance, and transparent sharing pointed out in the background of the present invention, and is not fully applicable to the related issues of multi-level deployment of artificial intelligence platforms in large enterprises.
发明内容Contents of the invention
本发明要解决的技术问题,在于提供一种多级联动人工智能平台样本模型透明共享装置,解决跨区域多级人工智能平台海量样本模型数据透明共享、安全共享、高效传输的相关需求。The technical problem to be solved by the present invention is to provide a multi-level linkage artificial intelligence platform sample model transparent sharing device to solve the related requirements of transparent sharing, safe sharing, and efficient transmission of massive sample model data on cross-regional multi-level artificial intelligence platforms.
本发明提供了一种多级联动人工智能平台样本模型透明共享装置,包括:全局目录服务子系统、至少一个样本模型透明共享子系统以及至少一人工智能平台;人工智能平台与所述样本模型透明共享子系统一比一配对部署;每个样本模型透明共享子系统均连接至所述所述全局目录服务子系统;The present invention provides a multi-level linkage artificial intelligence platform sample model transparent sharing device, comprising: a global directory service subsystem, at least one sample model transparent sharing subsystem and at least one artificial intelligence platform; the artificial intelligence platform is transparent to the sample model One-to-one pairing deployment of shared subsystems; each sample model transparent shared subsystem is connected to said global directory service subsystem;
通过全局目录服务子系统维护全部的样本模型目录,确保一致性;通过样本模型透明共享子系统接管来自本地人工智能平台的请求,并通过全局目录服务子系统协同,查询全网数据分布,再通过样本模型透明共享子系统进行样本模型数据的存储以及同步传输。Maintain all sample model directories through the global directory service subsystem to ensure consistency; take over requests from the local artificial intelligence platform through the sample model transparent sharing subsystem, and coordinate with the global directory service subsystem to query the data distribution of the entire network, and then pass The sample model transparent sharing subsystem stores and transmits the sample model data synchronously.
进一步地,所述样本模型透明共享子系统包括本地目录服务、全局同步 服务以及数据存储服务;具体包括样本模型更新以及样本模型跨平台共享;Further, the sample model transparent sharing subsystem includes local directory service, global synchronization service and data storage service; specifically includes sample model update and sample model cross-platform sharing;
所述样本模型更新包括:人工智能平台调用部署在同一个网络区域的样本模型透明共享系统的本地目录服务,提交文件数据;本地目录服务调用本地的数据存储服务存储文件数据,同时将新增的文件数据的目录作为消息文本提交到全局目录服务子系统;全局目录服务子系统进行目录更新;The sample model update includes: the artificial intelligence platform calls the local directory service of the sample model transparent sharing system deployed in the same network area, and submits the file data; the local directory service calls the local data storage service to store the file data, and at the same time, the newly added The directory of the file data is submitted to the global directory service subsystem as a message text; the global directory service subsystem updates the directory;
所述样本模型跨平台共享包括:所述本地目录服务每间隔设定时间发起对全局目录服务子系统的查询,全局目录服务子系统将过去设定时间内发生的目录数据变动返回给全局同步服务;获取变更的全局目录数据后,全局同步服务调用本地目录服务进行本地目录合并更新。The cross-platform sharing of the sample model includes: the local catalog service initiates a query to the global catalog service subsystem at intervals of a set time, and the global catalog service subsystem returns catalog data changes that occurred within the past set time to the global synchronization service ; After obtaining the changed global catalog data, the global synchronization service calls the local catalog service to merge and update the local catalog.
进一步地,所述数据存储服务中设有网络隔离设备适配插件,所述网络隔离设备适配插件是将网络隔离设备适配功能单独提取出,设计形成统一接口的形式,用于适配不同网络环境中防火墙以及信息安全隔离装置。Further, the data storage service is provided with a network isolation device adaptation plug-in, which extracts the network isolation device adaptation function separately and designs a unified interface for adapting to different Firewall and information security isolation device in network environment.
进一步地,所述数据存储服务中设有一存储资源读写模块;所述存储资源读写模块是Java语言针对主流的云存储协议,将块数据读写接口进行统一,并支持通过配置文件修改所采用的具体实现,实现插件化管理。Further, the data storage service is provided with a storage resource read-write module; the storage resource read-write module is a cloud storage protocol aimed at the mainstream in the Java language, which unifies the block data read-write interface and supports modification of all data through configuration files. The specific implementation adopted to realize plug-in management.
进一步地,所述存储资源读写模块根据文件的密级以及企业对不同密级数据在不同网络区域是否能长期存储、是否能临时缓存、临时缓存时间多长的配置要求信息,将需要临时缓存的文件写入分布式缓存,并同时设置过期时间;所述分布式缓存为IT中间件,支持配置过期自动删除;且人工智能平台根据返回的文件路径,访问样本模型文件;针对涉密数据,人工智能平台不提供文件二次分发功能。Further, the storage resource read-write module will need temporarily cached files according to the confidentiality level of the file and the enterprise's configuration requirement information on whether data with different confidentiality levels can be stored for a long time in different network areas, whether it can be temporarily cached, and how long the temporary cache time is. Write into the distributed cache, and set the expiration time at the same time; the distributed cache is IT middleware, which supports automatic deletion of configuration expiration; and the artificial intelligence platform accesses the sample model file according to the returned file path; for confidential data, the artificial intelligence The platform does not provide file secondary distribution function.
进一步地,所述同步传输进一步具体为:Further, the synchronous transmission is further specifically:
在传输前,将文件拆分为设定阈值MB的块,直至所有块均小于等于设定阈值MB的块,如果文件小于设定阈值MB,则不作拆分,计算所有块的数字摘要,合并成一个数字摘要,而后分块多线程并行传输;Before transmission, split the file into blocks with a set threshold MB until all blocks are less than or equal to the set threshold MB. If the file is smaller than the set threshold MB, do not split, calculate the digital summary of all blocks, and merge into a digital summary, and then block multi-threaded parallel transmission;
传输过程中,文件接收方接收文件、并行化地计算固定块的数字摘要,逐一保存;During the transmission process, the file receiver receives the file, calculates the digital summary of the fixed block in parallel, and saves them one by one;
传输完成后,将所有块按顺序合并成原始大文件,并将所有块的数字摘 要也合并为一个数字摘要,得到最终同步传输得到的样本模型文件及其对应的数字摘要,将得到的数字摘要与传输前合并的数字摘要进行比对,如果相同,则文件传输完整;如果不同,则回滚重新传输。After the transmission is completed, all blocks are merged into the original large file in order, and the digital digests of all blocks are also merged into one digital digest to obtain the sample model file and its corresponding digital digest obtained by the final synchronous transmission, and the obtained digital digest Compare with the merged digital digest before transmission, if the same, the file transfer is complete; if not, roll back and retransmit.
本发明实施例中提供的一个或多个技术方案,至少具有如下技术效果或优点:One or more technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages:
本申请实施例提供的一种多级联动人工智能平台样本模型透明共享装置,由“全局目录服务子系统”和“样本模型透明共享子系统”构成支撑人工智能平台模型样本透明共享的基础服务设施的架构体系,并通过“基于分级目录的模型文件透明共享机制”、“跨网络环境下基于分段传输校验的高性能样本模型数据同步及异构存储集成”、“基于统一缓存方案的跨网络区域数据安全合规利用”等方案,解决跨区域多级人工智能平台海量样本模型数据透明共享、安全共享、高效传输的相关需求。The embodiment of the present application provides a multi-level linkage artificial intelligence platform sample model transparent sharing device, which consists of a "global directory service subsystem" and a "sample model transparent sharing subsystem" to form a basic service facility that supports the transparent sharing of artificial intelligence platform model samples The architecture system, and through the "transparent sharing mechanism of model files based on hierarchical directory", "high-performance sample model data synchronization and heterogeneous storage integration based on segmented transmission verification in cross-network environment", "cross- Network regional data security compliance utilization” and other solutions to solve the related needs of transparent sharing, safe sharing, and efficient transmission of massive sample model data on cross-regional multi-level artificial intelligence platforms.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and understandable , the specific embodiments of the present invention are enumerated below.
附图说明Description of drawings
下面参照附图结合实施例对本发明作进一步的说明。The present invention will be further described below in conjunction with the embodiments with reference to the accompanying drawings.
图1为现有技术中大型企业多级人工智能平台间样本模型透明安全共享需求的架构示意图;Figure 1 is a schematic diagram of the structure of the transparent and secure sharing of sample models between multi-level artificial intelligence platforms of large-scale enterprises in the prior art;
图2为本发明装置的总体架构图;Fig. 2 is the overall architecture diagram of the device of the present invention;
图3为本发明基于分级目录的模型文件透明共享机制的示意图;Fig. 3 is a schematic diagram of the transparent sharing mechanism of model files based on hierarchical directories in the present invention;
图4为本发明基于分段传输与校验的高性能样本模型数据同步方案的示意图;Fig. 4 is a schematic diagram of the high-performance sample model data synchronization scheme based on segmented transmission and verification in the present invention;
图5为本发明基于统一缓存方案的跨网络区域数据安全合规利用的时序图。FIG. 5 is a sequence diagram of the security compliance utilization of data across network regions based on the unified caching scheme in the present invention.
具体实施方式Detailed ways
本申请实施例中的技术方案,总体思路如下:The general idea of the technical solution in the embodiment of the application is as follows:
针对多级联动人工智能平台大小不一的样本模型数据的高速传输、分布在不同地域网络层级的样本模型数据透明共享获取,以及不同网络区域样本模型数据的安全合规利用,提供了系统的、整体的方法,为大型企业人工智能平台的多级、跨网部署提供技术基础。发明内容主要包括以下几个部分:Aiming at the high-speed transmission of sample model data of different sizes on the multi-level linkage artificial intelligence platform, the transparent sharing and acquisition of sample model data distributed in different regional network levels, and the safe and compliant utilization of sample model data in different network areas, it provides a systematic, The overall approach provides a technical basis for the multi-level and cross-network deployment of large-scale enterprise artificial intelligence platforms. The content of the invention mainly includes the following parts:
(1)一个支撑多级联动人工智能平台透明共享的系统架构。提出由“全局目录服务子系统”和“样本模型透明共享子系统”构成支撑人工智能平台模型样本透明共享的基础服务设施的架构体系。(1) A system architecture that supports transparent sharing of multi-level linkage artificial intelligence platforms. It is proposed that the "global directory service subsystem" and the "sample model transparent sharing subsystem" constitute the architecture system of the basic service facilities supporting the transparent sharing of artificial intelligence platform model samples.
(2)一种基于分级目录的模型文件透明共享机制设计。提出基于分级目录的透明共享机制,即通过“全局目录”维护全网的统一样本模型目录,确保全局信息一致性,通过“本地目录服务”接管来自本地人工智能平台的所有请求,并通过其与全局目录服务的协同,在本地人工智能平台对全局目录无感知的情况下,就可以快速查询全网数据分布。(2) Design of a transparent sharing mechanism for model files based on hierarchical directories. A transparent sharing mechanism based on a hierarchical directory is proposed, that is, the unified sample model directory of the entire network is maintained through the "global directory" to ensure global information consistency, and all requests from the local artificial intelligence platform are taken over through the "local directory service" and communicated with it. The collaboration of the global directory service can quickly query the data distribution of the entire network without the local artificial intelligence platform being aware of the global directory.
(3)一种跨网络环境下基于分段传输校验的高性能样本模型数据同步及异构存储集成设计。提出基于分段传输与校验的高性能样本模型数据同步设计,针对人工智能平台样本模型数据的全生命周期多次局部变更的特点,设计将文件进行分段拆分、按段传输并校验的方式,可以显著提升样本模型数据的同步性能;提出将网络隔离设备适配功能单独提取出设计成统一接口的插件化形式,以实现与不同的设备集成,提升系统对不同网络环境的适应性;提出设置单独的“存储资源读写模块”模块,并通过插件化适配不同的存储资源、支持存储技术路线的演进的技术方案。(3) A high-performance sample model data synchronization and heterogeneous storage integration design based on segmentation transmission verification in a cross-network environment. A high-performance sample model data synchronization design based on segmented transmission and verification is proposed. In view of the characteristics of multiple partial changes in the entire life cycle of the sample model data of the artificial intelligence platform, the design splits the file into segments, transfers and verifies by segment. The method can significantly improve the synchronization performance of sample model data; it is proposed to extract the network isolation device adaptation function separately and design it into a plug-in form with a unified interface, so as to realize integration with different devices and improve the adaptability of the system to different network environments ;Proposed to set up a separate "storage resource read-write module" module, and adapt to different storage resources through plug-in, and support the technical solution of the evolution of the storage technology route.
(4)一种基于统一缓存方案的跨网络区域数据安全合规利用设计。提出基于统一缓存的数据安全合规利用方案,将涉密文件的“跨网络安全利用问题”转换为不同密级文件在不同网络区域的缓存时间问题,完全避免了额外的加密开销,且满足的企业数据安全规范,一定程度上低成本地解决了跨网络区域数据安全合规利用问题。(4) A design for cross-network regional data security and compliance utilization based on a unified caching scheme. A data security and compliance utilization scheme based on unified cache is proposed, which converts the "cross-network security utilization problem" of confidential files into the cache time problem of files with different confidentiality levels in different network areas, completely avoiding additional encryption overhead, and satisfying enterprises The data security specification, to a certain extent, solves the problem of cross-network regional data security compliance and utilization at low cost.
通过系统整体架构设计、基于分级目录的模型文件透明共享机制设计、跨网络环境下基于分段传输校验的高性能样本模型数据同步及异构存储集 成设计、基于统一缓存方案的跨网络区域数据安全合规利用设计四个方面,阐述本发明的具体实施方式。Through the design of the overall system architecture, the transparent sharing mechanism design of model files based on hierarchical directories, the high-performance sample model data synchronization and heterogeneous storage integration design based on segmented transmission verification in a cross-network environment, and the cross-network regional data based on a unified caching scheme The specific implementation of the present invention is described in four aspects of security and compliance utilization design.
(1)总体架构设计(1) Overall architecture design
如图2所示,整体架构由“全局目录服务子系统”和“样本模型透明共享子系统”构成。其中“全局目录服务子系统”在全网仅需部署一个服务示例;“样本模型透明共享子系统”与人工智能平台一比一配对部署,可以作为人工智能平台服务群的一部分,也可以作为单独服务,为人工智能提供完整的的样本模型数据存储、同步传输服务。As shown in Figure 2, the overall architecture is composed of "global directory service subsystem" and "sample model transparent sharing subsystem". Among them, the "global directory service subsystem" only needs to deploy one service example in the whole network; the "sample model transparent sharing subsystem" is deployed in a one-to-one pairing with the artificial intelligence platform, which can be used as part of the artificial intelligence platform service group or as a separate Service, providing complete sample model data storage and synchronous transmission services for artificial intelligence.
“全局目录服务子系统”及“样本模型透明共享子系统”的主要模块构成及运行机制,将在以下的具体方案中进行阐述。The main module composition and operating mechanism of the "global directory service subsystem" and "sample model transparent sharing subsystem" will be described in the following specific schemes.
(2)基于分级目录的模型文件透明共享机制设计(2) Design of transparent sharing mechanism for model files based on hierarchical directory
分级部署的人工智能平台,可以通过任意的部署点上传样本和模型数据。为了使这些样本模型数据都能被其他的部署点透明共享,本发明提出基于分级目录的透明共享机制,即通过“全局目录”维护全网的统一样本模型目录,确保全局信息一致性;通过“本地目录服务”接管来自本地人工智能平台的所有请求,并通过其与全局目录服务的协同,在本地人工智能平台对全局目录无感知的情况下,就可以快速查询全网数据分布。全局目录和本地目录共同构成支撑各级人工智能平台的人工智能样本模型目录服务。The hierarchically deployed artificial intelligence platform can upload samples and model data through any deployment point. In order to make these sample model data transparently shared by other deployment points, the present invention proposes a transparent sharing mechanism based on hierarchical directories, that is, maintains a unified sample model directory for the entire network through a "global directory" to ensure global information consistency; through " "Local directory service" takes over all requests from the local artificial intelligence platform, and through its collaboration with the global directory service, it can quickly query the data distribution of the entire network without the local artificial intelligence platform being aware of the global directory. The global directory and the local directory together constitute the AI sample model directory service that supports AI platforms at all levels.
需要指出的是,全局同步的仅限于目录数据,样本模型的文件本身还在各自本地维护,只有在后续需要异地调用时才按需传输。目录数据比样本模型文件本身要小得多,从而在支持全网共享的同时有效避免大量数据重复存储、传输。It should be pointed out that the global synchronization is limited to directory data, and the files of the sample models are still maintained locally, and they are only transferred on demand when they need to be called from different places later. The directory data is much smaller than the sample model file itself, thus effectively avoiding repeated storage and transmission of a large amount of data while supporting network-wide sharing.
如图3所示,全局透明共享机制方案包括通过“样本模型更新”与“样本模型跨平台共享”两个阶段过程:As shown in Figure 3, the global transparent sharing mechanism scheme includes a two-stage process of "sample model update" and "sample model cross-platform sharing":
1)样本模型更新阶段1) Sample model update stage
步骤1:本地上传数据。用户在人工智能平台中,提供界面上传、标注工具修改样本或训练生成新的模型后,由人工智能平台调用部署在同一个网络区域的“样本模型透明共享系统”的本“地目录服务”,提交文件数据。Step 1: Upload data locally. In the artificial intelligence platform, after the user uploads the interface, modifies the sample with the labeling tool, or trains and generates a new model, the artificial intelligence platform calls the local "local directory service" of the "sample model transparent sharing system" deployed in the same network area, Submit file data.
步骤2:提交到全局目录。“本地目录服务”调用本地的“数据存储服务”存储文件数据,同时将新增数据的名录(包含名称、元数据等)作为消息文本提交到“全局目录”的分布式消息中间件。Step 2: Submit to the global directory. The "local directory service" calls the local "data storage service" to store file data, and at the same time submits the new data directory (including name, metadata, etc.) as message text to the distributed message middleware of the "global directory".
步骤3:更新到全局目录。“全局目录服务”监听本地分布式消息中间件的消息,并将消息内容更新到全局目录中。依托分布式消息中间件高可用、高一致性特性,可以确保全局目录中的内容无遗漏、不重复。Step 3: Update to the global catalog. "Global directory service" monitors the messages of the local distributed message middleware, and updates the content of the messages to the global directory. Relying on the high-availability and high-consistency features of the distributed message middleware, it can ensure that the content in the global directory is complete and non-repetitive.
2)样本模型跨平台共享2) Sample model cross-platform sharing
为了确保本地人工智能平台能够查询、浏览全网样本模型目录数据,需要定期从“全局目录服务”同步目录数据。具体步骤为:In order to ensure that the local artificial intelligence platform can query and browse the sample model catalog data of the entire network, it is necessary to periodically synchronize the catalog data from the "global catalog service". The specific steps are:
步骤1:定时同步请求全局目录。本地的“全局同步服务”定期(如每小时)发起对全局目录服务的查询,“全局目录服务”将过去一个小时发生的目录数据变动返回给“全局同步服务”。Step 1: Scheduled synchronization requests for the global directory. The local "global synchronization service" initiates a query to the global directory service periodically (eg, every hour), and the "global directory service" returns the directory data changes that occurred in the past hour to the "global synchronization service".
步骤2:本地目录合并更新。获取变更的全局目录数据后,“全局同步服务”调用本地目录服务的更新接口,将变更的目录数据提交给本地目录合并更新。Step 2: Local directory merge updates. After obtaining the changed global catalog data, the "global synchronization service" calls the update interface of the local catalog service, and submits the changed catalog data to the local catalog for merge update.
(3)跨网络环境下基于分段传输校验的高性能样本模型数据同步及异构存储集成设计(3) High-performance sample model data synchronization and heterogeneous storage integration design based on segmentation transmission verification in cross-network environment
人工智能平台中的模型、样本数据,在整个生命周期中会进行多次修改,如增加样本标注,或者模型叠加融合等。在上述的多种数据修改中,文件本身可能只发生局部的变化,如果尽可能只传输变更的内容,可大大提升样本模型数据的在多个平台间的传输效率。The models and sample data in the artificial intelligence platform will be modified many times throughout the life cycle, such as adding sample annotations, or model superposition and fusion, etc. In the above-mentioned various data modifications, the file itself may only undergo partial changes. If only the changed content is transmitted as much as possible, the transmission efficiency of the sample model data between multiple platforms can be greatly improved.
另一方面,数据传输后需要保证同步前后数据的完整性。主流方案通常采用数字摘要技术(如MD5),在文件同步传输前后分别计算单个文件的数字摘要,如果二者完全相等,就可以证明同步的数据是完整的。而由于数据摘要算法执行通常非常耗时,且与文件大小成正比例关系,针对单个文件计算数字摘要的方法需要耗费较多时间,如果能缩减数字摘要算法的运行时间,将有助于提升模型数据的同步传输效率。On the other hand, after data transmission, it is necessary to ensure the integrity of the data before and after synchronization. The mainstream solution usually adopts digital digest technology (such as MD5), and calculates the digital digest of a single file before and after file synchronous transmission. If the two are completely equal, it can prove that the synchronized data is complete. However, since the execution of the data summary algorithm is usually very time-consuming and proportional to the file size, the method of calculating the digital summary for a single file takes a lot of time. If the running time of the digital summary algorithm can be reduced, it will help to improve the model data. synchronization transmission efficiency.
本发明提出基于分段传输与校验的高性能样本模型数据同步设计,针对 人工智能平台样本模型数据的全生命周期多次局部变更的特点,设计将文件进行分段拆分、按段传输并校验的方式,可以显著提升样本模型数据的同步性能。总体方案如图所示。The present invention proposes a synchronous design of high-performance sample model data based on segmented transmission and verification. Aiming at the characteristics of multiple partial changes in the entire life cycle of the sample model data of the artificial intelligence platform, the design splits the file into segments, transmits them by segment, and The verification method can significantly improve the synchronization performance of sample model data. The overall scheme is shown in the figure.
如图4所示,通过以下三个方面对具体机制设计进行介绍:As shown in Figure 4, the specific mechanism design is introduced through the following three aspects:
1)分段传输与校验。由“分段传输校验模块”在文件传输过程执行。具体过程是:1) Segmented transmission and verification. It is executed by the "segmented transfer verification module" during the file transfer process. The specific process is:
在传输前,将人工智能样本模型大文件拆分为1MB大小的块(如果部分样本文件小于1MB,则不作拆分),计算所有块的数字摘要,合并成一个数字摘要。而后分块多线程并行传输;Before transmission, split the large file of the artificial intelligence sample model into 1MB blocks (if some sample files are smaller than 1MB, they will not be split), calculate the digital summaries of all blocks, and merge them into one digital digest. Then block multi-threaded parallel transmission;
传输过程中,文件接收方接收文件、并行化地计算固定块的数字摘要,逐一保存;During the transmission process, the file receiver receives the file, calculates the digital summary of the fixed block in parallel, and saves them one by one;
传输完成后,将所有块按顺序合并成原始大文件、将所有块的数字摘要也合并为一个数字摘要,得到最终同步传输得到的样本模型文件及其对应的数字摘要。将得到的数字摘要与传输前合并的数字摘要进行比对,如果相同,则文件传输完整。如果不同,则回滚重新传输。After the transmission is completed, all the blocks are merged into the original large file in order, and the digital digests of all the blocks are also merged into one digital digest to obtain the sample model file and its corresponding digital digest obtained by the final synchronous transmission. Compare the obtained digital digest with the merged digital digest before transmission, if they are the same, the file transfer is complete. If different, rollback the retransmission.
在数据流传输方面,本发明具体采用的是基于Java Mina框架的流式文件传输技术。该技术是在行业中普遍应用的成熟技术,不再展开介绍。Aspect the data flow transmission, what the present invention specifically adopts is the streaming file transmission technology based on the Java Mina framework. This technology is a mature technology commonly used in the industry and will not be introduced here.
本发明给出的分段传输及校验设计,能有效利用现在多核计算机系统闲置的计算资源,并行化地开展文件传输和数字摘要计算,从而提升了文件传输及完整性校验的性能。The segmented transmission and verification design provided by the present invention can effectively utilize the idle computing resources of the current multi-core computer system to carry out file transmission and digital summary calculation in parallel, thereby improving the performance of file transmission and integrity verification.
2)网络隔离设备适配插件。在企业区域网络互联环境中,不同网络分区可能是通过“防火墙”或“信息安全隔离装置”连接,这些设备,特别是“信息安全隔离装置”,通常不支持透明传输,而提供了独特的接口供数据传输过程调用。2) Network isolation device adapter plug-in. In an enterprise area network interconnection environment, different network partitions may be connected through "firewalls" or "information security isolation devices". These devices, especially "information security isolation devices", usually do not support transparent transmission, but provide unique interfaces Called by the data transfer process.
本发明将网络隔离设备适配功能单独提取出,设计成统一接口的插件化形式,以实现与不同的设备集成,提升系统对不同网络环境的适应性。The present invention extracts the adaptation function of the network isolation device separately, and designs it into a plug-in form of a unified interface, so as to realize integration with different devices and improve the adaptability of the system to different network environments.
3)异构存储资源的读写。人工智能样本模型单个文件可能达到GB大小级别,在成熟推广的人工智能平台中,可能耗费高达数百TB甚至PB的 存储资源以存储各样本模型文件,对存储资源提出了很高的要求。由于不同区域的信息化基础设施架构不一致,并且可能同时存在多种不同的存储资源(如企业私有云存储、分布式存储、集中式存储阵列设备等),本发明设置了单独的“存储资源读写模块”模块,与“网络隔离设备适配”类似,通过插件化适配不同的存储资源,并支持存储技术路线的演进。3) Read and write of heterogeneous storage resources. A single file of an artificial intelligence sample model may reach the size of GB. In a mature artificial intelligence platform, it may consume hundreds of terabytes or even petabytes of storage resources to store each sample model file, which puts forward high requirements on storage resources. Since the informatization infrastructure structures in different regions are inconsistent, and there may be many different storage resources at the same time (such as enterprise private cloud storage, distributed storage, centralized storage array equipment, etc.), the present invention sets up a separate "storage resource read "Write Module" module, similar to "Network Isolation Device Adaptation", adapts to different storage resources through plug-in, and supports the evolution of storage technology routes.
具体而言,本发明在“存储资源读写模块”模块中,基于Java语言针对主流的云存储协议(如S3协议)、块数据读写接口进行了统一的实现,并支持通过配置文件修改所采用的具体实现,实现插件化管理。其中,针对不同存储资源的读写是共性技术。Specifically, in the "storage resource read-write module" module of the present invention, based on the Java language, the mainstream cloud storage protocol (such as the S3 protocol) and the block data read-write interface have been uniformly implemented, and it supports modification of all resources through configuration files. The specific implementation adopted to realize plug-in management. Among them, reading and writing for different storage resources is a common technology.
同时,“存储资源读写模块”也是实现“基于统一缓存方案的跨网络区域数据安全合规利用”方案的主要载体,请参见下一步相关介绍。At the same time, the "storage resource read-write module" is also the main carrier to realize the "safe and compliant utilization of data across network regions based on the unified cache solution" solution, please refer to the next step for related introduction.
(4)基于统一缓存方案的跨网络区域数据安全合规利用设计(4) Design of cross-network regional data security and compliance utilization based on unified caching scheme
针对不同网络区域,文件安全级别(密级)不同的问题,针对跨网络区域数据的安全保障,主流的方法是通过对文件进行加密而实现的。然而,由于人工智能样本模型文件数量大、单体文件最高可达GB级别,对文件进行加密、解密需要大量的计算资源开销和时间开销,在实际应用过程中几乎无法接受。In view of the problem that different network areas have different file security levels (secret levels), the mainstream method for data security across network areas is to encrypt files. However, due to the large number of artificial intelligence sample model files and single files up to GB level, encrypting and decrypting files requires a lot of computing resource overhead and time overhead, which is almost unacceptable in the actual application process.
本发明提出基于统一缓存的数据安全合规利用方案,将涉密文件的“跨网络安全利用问题”转换为不同密级文件在不同网络区域的缓存时间问题,完全避免了额外的加密开销,且满足“互联网区域不能以任何形式存储使用涉密数据、外网可以使用及缓存低密级的文件、内网则可以使用及长期保存所有密级文件”的企业数据安全规范,一定程度上低成本地解决了跨网络区域数据安全合规利用问题。The present invention proposes a data security and compliance utilization scheme based on unified caching, which converts the "cross-network security utilization problem" of confidential files into the caching time problem of files with different confidentiality levels in different network areas, completely avoids additional encryption overhead, and satisfies The enterprise data security specification that "the Internet area cannot store and use confidential data in any form, the external network can use and cache low-level files, and the internal network can use and store all confidential files for a long time" solves the problem at a low cost to a certain extent. Cross-network area data security and compliance use issues.
具体的方案如图5所示,“存储资源读写模块”根据文件的密级以及企业对不同密级数据在不同网络区域是否能长期存储、是否能临时缓存、临时缓存时间多长的配置要求信息,将需要临时缓存的文件(如普通密级的文件在互联网区域利用时)写入“分布式缓存”,并同时设置过期时间。“分布式缓存”是当前主流的IT中间件,支持配置过期自动删除,能满足本方案 的要求;人工智能平台系统根据返回的文件路径,访问样本模型文件。针对涉密数据,平台在界面上不提供文件下载等二次分发功能,从而实现对企业数据安全规范的遵从。The specific scheme is shown in Figure 5. The "storage resource read-write module" is based on the confidentiality level of the file and the enterprise's configuration requirements for whether data with different confidentiality levels can be stored for a long time in different network areas, whether it can be temporarily cached, and how long the temporary cache time is. Write the files that need to be cached temporarily (such as when the files of ordinary secret level are used in the Internet area) into the "distributed cache", and set the expiration time at the same time. "Distributed cache" is the current mainstream IT middleware, which supports automatic deletion of configuration expiration, which can meet the requirements of this solution; the artificial intelligence platform system accesses the sample model files according to the returned file path. For confidential data, the platform does not provide secondary distribution functions such as file downloads on the interface, so as to achieve compliance with enterprise data security regulations.
实施例一Embodiment one
本实施例提供一种多级联动人工智能平台样本模型透明共享装置,包括:全局目录服务子系统、至少一个样本模型透明共享子系统以及至少一人工智能平台;人工智能平台与所述样本模型透明共享子系统一比一配对部署;每个样本模型透明共享子系统均连接至所述所述全局目录服务子系统;This embodiment provides a multi-level linkage artificial intelligence platform sample model transparent sharing device, including: a global directory service subsystem, at least one sample model transparent sharing subsystem and at least one artificial intelligence platform; the artificial intelligence platform is transparent to the sample model One-to-one pairing deployment of shared subsystems; each sample model transparent shared subsystem is connected to said global directory service subsystem;
通过全局目录服务子系统维护全部的样本模型目录,确保一致性;通过样本模型透明共享子系统接管来自本地人工智能平台的请求,并通过全局目录服务子系统协同,查询全网数据分布,再通过样本模型透明共享子系统进行样本模型数据的存储以及同步传输;Maintain all sample model directories through the global directory service subsystem to ensure consistency; take over requests from the local artificial intelligence platform through the sample model transparent sharing subsystem, and coordinate with the global directory service subsystem to query the data distribution of the entire network, and then pass The sample model transparent sharing subsystem stores and synchronously transmits sample model data;
所述同步传输进一步具体为:在传输前,将文件拆分为设定阈值MB的块,直至所有块均小于等于设定阈值MB的块,如果文件小于设定阈值MB,则不作拆分,计算所有块的数字摘要,合并成一个数字摘要,而后分块多线程并行传输;The synchronous transmission is further specifically: before the transmission, the file is split into blocks of the set threshold MB until all blocks are less than or equal to the set threshold MB, if the file is smaller than the set threshold MB, then no splitting is performed, Calculate the digital summaries of all blocks, merge them into one digital summaries, and then transmit them in parallel in multi-threaded blocks;
传输过程中,文件接收方接收文件、并行化地计算固定块的数字摘要,逐一保存;During the transmission process, the file receiver receives the file, calculates the digital summary of the fixed block in parallel, and saves them one by one;
传输完成后,将所有块按顺序合并成原始大文件,并将所有块的数字摘要也合并为一个数字摘要,得到最终同步传输得到的样本模型文件及其对应的数字摘要,将得到的数字摘要与传输前合并的数字摘要进行比对,如果相同,则文件传输完整;如果不同,则回滚重新传输。After the transmission is completed, all blocks are merged into the original large file in order, and the digital digests of all blocks are also merged into one digital digest to obtain the sample model file and its corresponding digital digest obtained by the final synchronous transmission, and the obtained digital digest Compare with the merged digital digest before transmission, if the same, the file transfer is complete; if not, roll back and retransmit.
所述样本模型透明共享子系统包括本地目录服务、全局同步服务以及数据存储服务;具体包括样本模型更新以及样本模型跨平台共享;The sample model transparent sharing subsystem includes local directory service, global synchronization service and data storage service; specifically includes sample model update and sample model cross-platform sharing;
所述样本模型更新包括:人工智能平台调用部署在同一个网络区域的样本模型透明共享系统的本地目录服务,提交文件数据;本地目录服务调用本地的数据存储服务存储文件数据,同时将新增的文件数据的目录作为消息文本提交到全局目录服务子系统;全局目录服务子系统进行目录更新;The sample model update includes: the artificial intelligence platform calls the local directory service of the sample model transparent sharing system deployed in the same network area, and submits the file data; the local directory service calls the local data storage service to store the file data, and at the same time, the newly added The directory of the file data is submitted to the global directory service subsystem as a message text; the global directory service subsystem updates the directory;
所述样本模型跨平台共享包括:所述本地目录服务每间隔设定时间发起对全局目录服务子系统的查询,全局目录服务子系统将过去设定时间内发生的目录数据变动返回给全局同步服务;获取变更的全局目录数据后,全局同步服务调用本地目录服务进行本地目录合并更新。The cross-platform sharing of the sample model includes: the local catalog service initiates a query to the global catalog service subsystem at intervals of a set time, and the global catalog service subsystem returns catalog data changes that occurred within the past set time to the global synchronization service ; After obtaining the changed global catalog data, the global synchronization service calls the local catalog service to merge and update the local catalog.
所述数据存储服务中设有网络隔离设备适配插件,所述网络隔离设备适配插件是将网络隔离设备适配功能单独提取出,设计形成统一接口的形式,用于适配不同网络环境中防火墙以及信息安全隔离装置。The data storage service is provided with a network isolation device adapter plug-in, which extracts the network isolation device adaptation function separately and designs a unified interface for adapting to different network environments Firewall and information security isolation device.
所述数据存储服务中设有一存储资源读写模块;所述存储资源读写模块是Java语言针对主流的云存储协议,将块数据读写接口进行统一,并支持通过配置文件修改所采用的具体实现,实现插件化管理。The data storage service is provided with a storage resource read-write module; the storage resource read-write module is Java language for the mainstream cloud storage protocol, which unifies the block data read-write interface and supports modification of the specific data used by configuration files. Realize and implement plug-in management.
所述存储资源读写模块根据文件的密级以及企业对不同密级数据在不同网络区域是否能长期存储、是否能临时缓存、临时缓存时间多长的配置要求信息,将需要临时缓存的文件写入分布式缓存,并同时设置过期时间;所述分布式缓存为IT中间件,支持配置过期自动删除;且人工智能平台根据返回的文件路径,访问样本模型文件;针对涉密数据,人工智能平台不提供文件二次分发功能。The storage resource read-write module writes the files that need to be temporarily cached into distribution according to the confidentiality level of the file and the enterprise's configuration requirement information on whether data with different confidentiality levels can be stored for a long time in different network areas, whether it can be temporarily cached, and how long the temporary cache time is. cache, and set the expiration time at the same time; the distributed cache is an IT middleware, which supports automatic deletion after configuration expiration; and the artificial intelligence platform accesses the sample model file according to the returned file path; for confidential data, the artificial intelligence platform does not provide File secondary distribution function.
虽然以上描述了本发明的具体实施方式,但是熟悉本技术领域的技术人员应当理解,我们所描述的具体的实施例只是说明性的,而不是用于对本发明的范围的限定,熟悉本领域的技术人员在依照本发明的精神所作的等效的修饰以及变化,都应当涵盖在本发明的权利要求所保护的范围内。Although the specific embodiments of the present invention have been described above, those skilled in the art should understand that the specific embodiments we have described are only illustrative, rather than used to limit the scope of the present invention. Equivalent modifications and changes made by skilled personnel in accordance with the spirit of the present invention shall fall within the protection scope of the claims of the present invention.

Claims (6)

  1. 一种多级联动人工智能平台样本模型透明共享装置,其特征在于:包括:全局目录服务子系统、至少一个样本模型透明共享子系统以及至少一人工智能平台;人工智能平台与所述样本模型透明共享子系统一比一配对部署;每个样本模型透明共享子系统均连接至所述所述全局目录服务子系统;A multi-level linkage artificial intelligence platform sample model transparent sharing device is characterized in that it includes: a global directory service subsystem, at least one sample model transparent sharing subsystem and at least one artificial intelligence platform; the artificial intelligence platform is transparent to the sample model One-to-one pairing deployment of shared subsystems; each sample model transparent shared subsystem is connected to said global directory service subsystem;
    通过全局目录服务子系统维护全部的样本模型目录,确保一致性;通过样本模型透明共享子系统接管来自本地人工智能平台的请求,并通过全局目录服务子系统协同,查询全网数据分布,再通过样本模型透明共享子系统进行样本模型数据的存储以及同步传输。Maintain all sample model directories through the global directory service subsystem to ensure consistency; take over requests from the local artificial intelligence platform through the sample model transparent sharing subsystem, and coordinate with the global directory service subsystem to query the data distribution of the entire network, and then pass The sample model transparent sharing subsystem stores and transmits the sample model data synchronously.
  2. 根据权利要求1所述的一种多级联动人工智能平台样本模型透明共享装置,其特征在于:A multi-level linkage artificial intelligence platform sample model transparent sharing device according to claim 1, characterized in that:
    所述样本模型透明共享子系统包括本地目录服务、全局同步服务以及数据存储服务;具体包括样本模型更新以及样本模型跨平台共享;The sample model transparent sharing subsystem includes local directory service, global synchronization service and data storage service; specifically includes sample model update and sample model cross-platform sharing;
    所述样本模型更新包括:人工智能平台调用部署在同一个网络区域的样本模型透明共享系统的本地目录服务,提交文件数据;本地目录服务调用本地的数据存储服务存储文件数据,同时将新增的文件数据的目录作为消息文本提交到全局目录服务子系统;全局目录服务子系统进行目录更新;The sample model update includes: the artificial intelligence platform calls the local directory service of the sample model transparent sharing system deployed in the same network area, and submits the file data; the local directory service calls the local data storage service to store the file data, and at the same time, the newly added The directory of the file data is submitted to the global directory service subsystem as a message text; the global directory service subsystem updates the directory;
    所述样本模型跨平台共享包括:所述本地目录服务每间隔设定时间发起对全局目录服务子系统的查询,全局目录服务子系统将过去设定时间内发生的目录数据变动返回给全局同步服务;获取变更的全局目录数据后,全局同步服务调用本地目录服务进行本地目录合并更新。The cross-platform sharing of the sample model includes: the local catalog service initiates a query to the global catalog service subsystem at intervals of a set time, and the global catalog service subsystem returns catalog data changes that occurred within the past set time to the global synchronization service ; After obtaining the changed global catalog data, the global synchronization service calls the local catalog service to merge and update the local catalog.
  3. 根据权利要求2所述的一种多级联动人工智能平台样本模型透明共享装置,其特征在于:所述数据存储服务中设有网络隔离设备适配插件,所述网络隔离设备适配插件是将网络隔离设备适配功能单独提取出,设计形成统一接口的形式,用于适配不同网络环境中防火墙以及信息安全隔离装置。A multi-level linkage artificial intelligence platform sample model transparent sharing device according to claim 2, characterized in that: the data storage service is provided with a network isolation device adapter plug-in, and the network isolation device adapter plug-in is The network isolation device adaptation function is extracted separately, and the design forms a unified interface form, which is used to adapt to firewalls and information security isolation devices in different network environments.
  4. 根据权利要求2所述的一种多级联动人工智能平台样本模型透明共享装置,其特征在于:所述数据存储服务中设有一存储资源读写模块;所述 存储资源读写模块是Java语言针对主流的云存储协议,将块数据读写接口进行统一,并支持通过配置文件修改所采用的具体实现,实现插件化管理。A kind of multi-level linkage artificial intelligence platform sample model transparent sharing device according to claim 2, it is characterized in that: described data storage service is provided with a storage resource read-write module; Described storage resource read-write module is Java language for The mainstream cloud storage protocol unifies the block data read and write interfaces, and supports the specific implementation adopted by modifying the configuration file to realize plug-in management.
  5. 根据权利要求4所述的一种多级联动人工智能平台样本模型透明共享装置,其特征在于:所述存储资源读写模块根据文件的密级以及企业对不同密级数据在不同网络区域是否能长期存储、是否能临时缓存、临时缓存时间多长的配置要求信息,将需要临时缓存的文件写入分布式缓存,并同时设置过期时间;所述分布式缓存为IT中间件,支持配置过期自动删除;且人工智能平台根据返回的文件路径,访问样本模型文件;针对涉密数据,人工智能平台不提供文件二次分发功能。A multi-level linkage artificial intelligence platform sample model transparent sharing device according to claim 4, characterized in that: the storage resource read-write module is based on the confidentiality level of the file and whether the enterprise can store data with different confidentiality levels for a long time in different network areas , whether it can be temporarily cached, and how long the temporary cache time is required to configure the information, write the files that need to be temporarily cached into the distributed cache, and set the expiration time at the same time; the distributed cache is an IT middleware that supports automatic deletion of configuration expiration; And the artificial intelligence platform accesses the sample model files according to the returned file path; for confidential data, the artificial intelligence platform does not provide the file secondary distribution function.
  6. 根据权利要求1所述的一种多级联动人工智能平台样本模型透明共享装置,其特征在于:所述同步传输进一步具体为:A multi-level linkage artificial intelligence platform sample model transparent sharing device according to claim 1, characterized in that: the synchronous transmission is further specifically:
    在传输前,将文件拆分为设定阈值MB的块,直至所有块均小于等于设定阈值MB的块,如果文件小于设定阈值MB,则不作拆分,计算所有块的数字摘要,合并成一个数字摘要,而后分块多线程并行传输;Before transmission, split the file into blocks with a set threshold MB until all blocks are less than or equal to the set threshold MB. If the file is smaller than the set threshold MB, do not split, calculate the digital summary of all blocks, and merge into a digital summary, and then block multi-threaded parallel transmission;
    传输过程中,文件接收方接收文件、并行化地计算固定块的数字摘要,逐一保存;During the transmission process, the file receiver receives the file, calculates the digital summary of the fixed block in parallel, and saves them one by one;
    传输完成后,将所有块按顺序合并成原始大文件,并将所有块的数字摘要也合并为一个数字摘要,得到最终同步传输得到的样本模型文件及其对应的数字摘要,将得到的数字摘要与传输前合并的数字摘要进行比对,如果相同,则文件传输完整;如果不同,则回滚重新传输。After the transmission is completed, all blocks are merged into the original large file in order, and the digital digests of all blocks are also merged into one digital digest to obtain the sample model file and its corresponding digital digest obtained by the final synchronous transmission, and the obtained digital digest Compare with the merged digital digest before transmission, if the same, the file transfer is complete; if not, roll back and retransmit.
PCT/CN2022/079255 2021-12-06 2022-03-04 Multi-level linkage transparent sample model sharing apparatus for artificial intelligence platform WO2023103190A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111474479.9A CN114374701B (en) 2021-12-06 2021-12-06 Transparent sharing device for sample model of multistage linkage artificial intelligent platform
CN202111474479.9 2021-12-06

Publications (1)

Publication Number Publication Date
WO2023103190A1 true WO2023103190A1 (en) 2023-06-15

Family

ID=81140352

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/079255 WO2023103190A1 (en) 2021-12-06 2022-03-04 Multi-level linkage transparent sample model sharing apparatus for artificial intelligence platform

Country Status (2)

Country Link
CN (1) CN114374701B (en)
WO (1) WO2023103190A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116668968A (en) * 2023-07-25 2023-08-29 西安优光谱信息科技有限公司 Cross-platform communication information processing method and system
CN116861673A (en) * 2023-07-10 2023-10-10 贵州宏信达高新科技有限责任公司 Multi-user remote online collaborative design system and method based on data sharing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104506632A (en) * 2014-12-25 2015-04-08 中国科学院电子学研究所 Resource sharing system and method based on distributed multi-center
CN106484533A (en) * 2016-09-21 2017-03-08 南方电网科学研究院有限责任公司 Service modeling system and method based on electric power PaaS cloud platform
CN107016069A (en) * 2017-03-22 2017-08-04 南京理工大学 Towards the metadata interchange system of intelligent transportation
US20170371895A1 (en) * 2016-06-22 2017-12-28 Nasuni Corporation Shard-level synchronization of cloud-based data store and local file systems
CN112615899A (en) * 2020-11-25 2021-04-06 北京中电普华信息技术有限公司 Large file transmission method, device and system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577936B (en) * 2013-11-15 2016-08-31 国家电网公司 A kind of distributed maintenance of electric network model and globally shared system and its implementation
CN105447175A (en) * 2015-12-09 2016-03-30 中国电力科学研究院 Power grid model sharing method applicable to distributed computation of power system
CN107016478B (en) * 2016-01-28 2021-01-15 中国电力科学研究院 Full-network model rapid generation and sharing method based on two-stage deployment
CN107071001A (en) * 2017-03-22 2017-08-18 南京理工大学 Intelligent transportation Web information sharing service platform framework method
US11102214B2 (en) * 2018-08-27 2021-08-24 Amazon Technologies, Inc. Directory access sharing across web services accounts
CN110266775A (en) * 2019-06-04 2019-09-20 南京南瑞继保电气有限公司 Document transmission method, device, computer equipment and storage medium
CN112398655B (en) * 2019-08-19 2022-06-03 中移(苏州)软件技术有限公司 File transmission method, server and computer storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104506632A (en) * 2014-12-25 2015-04-08 中国科学院电子学研究所 Resource sharing system and method based on distributed multi-center
US20170371895A1 (en) * 2016-06-22 2017-12-28 Nasuni Corporation Shard-level synchronization of cloud-based data store and local file systems
CN106484533A (en) * 2016-09-21 2017-03-08 南方电网科学研究院有限责任公司 Service modeling system and method based on electric power PaaS cloud platform
CN107016069A (en) * 2017-03-22 2017-08-04 南京理工大学 Towards the metadata interchange system of intelligent transportation
CN112615899A (en) * 2020-11-25 2021-04-06 北京中电普华信息技术有限公司 Large file transmission method, device and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116861673A (en) * 2023-07-10 2023-10-10 贵州宏信达高新科技有限责任公司 Multi-user remote online collaborative design system and method based on data sharing
CN116861673B (en) * 2023-07-10 2024-02-02 贵州宏信达高新科技有限责任公司 Multi-user remote online collaborative design system and method based on data sharing
CN116668968A (en) * 2023-07-25 2023-08-29 西安优光谱信息科技有限公司 Cross-platform communication information processing method and system
CN116668968B (en) * 2023-07-25 2023-10-13 西安优光谱信息科技有限公司 Cross-platform communication information processing method and system

Also Published As

Publication number Publication date
CN114374701A (en) 2022-04-19
CN114374701B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
WO2023103190A1 (en) Multi-level linkage transparent sample model sharing apparatus for artificial intelligence platform
US11481139B1 (en) Methods and systems to interface between a multi-site distributed storage system and an external mediator to efficiently process events related to continuity
US10387673B2 (en) Fully managed account level blob data encryption in a distributed storage environment
US9400828B2 (en) Hierarchical chunking of objects in a distributed storage system
US8271455B2 (en) Storing replication requests for objects in a distributed storage system
US8341118B2 (en) Method and system for dynamically replicating data within a distributed storage system
CN106209947B (en) Data processing method and system for decentralized autonomous organization
ES2881606T3 (en) Geographically distributed file system using coordinated namespace replication
CN103379159B (en) A kind of method that distributed Web station data synchronizes
US9396228B2 (en) Method of optimizing the interaction between a software application and a database server or other kind of remote data source
EP2534571B1 (en) Method and system for dynamically replicating data within a distributed storage system
JP2015035020A (en) Storage system, storage control device, and control program
CN110807039A (en) Data consistency maintenance system and method in cloud computing environment
US7765197B2 (en) System and method for producing data replica
WO2016095329A1 (en) Log recording system and log recording operating method
CN117874143A (en) Cloud edge database middleware synchronization method in distributed environment
JP2002007191A (en) Information duplicating method between information expressed in language with tag
TW201810090A (en) Data synchronization method and device without redundant replication
Kasu et al. DLFT: Data and layout aware fault tolerance framework for big data transfer systems
US20220391409A1 (en) Hybrid cloud asynchronous data synchronization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22902634

Country of ref document: EP

Kind code of ref document: A1