CN117112528B

CN117112528B - Method and system for optimizing data storage in Filecoin

Info

Publication number: CN117112528B
Application number: CN202311361609.7A
Authority: CN
Inventors: 解绘绘
Original assignee: Beijing Lexun Technology Co ltd
Current assignee: Beijing Lexun Technology Co ltd
Priority date: 2023-10-20
Filing date: 2023-10-20
Publication date: 2024-01-05
Anticipated expiration: 2043-10-20
Also published as: CN117112528A

Abstract

The method and the system for optimizing data storage in the Filecoin are characterized in that carrier integration is carried out through different constructed carrier relation diagrams, the obtained integrated characterization carriers are fused with information of adjacent point characterization carriers to improve the precision of the obtained integrated characterization carriers, the first integrated characterization carriers and the second integrated characterization carriers corresponding to the same data cluster characterization carriers are combined to obtain target data cluster characterization carriers corresponding to the data cluster characterization carriers, the obtained target data cluster characterization carriers are fused with file content information and file distribution information, the precision of the obtained target data cluster characterization carriers is increased again, file priority determination is carried out based on the target data cluster characterization carriers corresponding to the data cluster characterization carriers, file priority determination results corresponding to files to be stored are obtained, and the precision of the obtained file priority determination results is increased.

Description

Method and system for optimizing data storage in Filecoin

Technical Field

The present application relates to the field of data storage, and more particularly, to a method and system for optimizing data storage in a Filecoin.

Background

IPFS (Inter Planetary File System, interplanetary file storage) is a network transport protocol that aims to create persistent and distributed storage and sharing files. It is a content addressable peer-to-peer hypermedia distribution protocol. Nodes in the IPFS network will constitute a distributed file system. The Filecoin system is an IPFS-based decentralized distributed storage item. The blockchain technology bottom layer of the blockchain system relies on distributed storage for data storage, and users who perform the maintenance of the blockchain nodes and provide storage functions on the blockchain public chain give them a token prize. The faster the writing and reading of the document, the greater the probability that the user gets the reward, whereas if the document cannot be read in time or the data is lost, the user cannot get the reward and even deducts a certain amount of tokens. Therefore, how to improve the access efficiency of data in a blockchain is a technical problem to be solved by those skilled in the art. At present, a technology for determining the file storage sequence based on the priority division of the storage files appears, which relates to mining the characteristic information of the files to carry out priority classification, so that the quick determination of the priority is completed, the storage efficiency is ensured, the priority backup of important data is ensured, the safety is improved, and in the characteristic mining process, the deviation between the result and the actual result is often caused because of the fault of the characteristic mining, the follow-up priority identification is caused, and the storage reliability is influenced.

Disclosure of Invention

In view of this, embodiments of the present application provide at least a method and a system for optimizing data storage in a Filecoin.

According to an aspect of the embodiments of the present application, there is provided a method for optimizing data storage in a fileoin, which is applied to a data storage system, the method including: obtaining a to-be-stored file, splitting the to-be-stored file to obtain each data cluster, and mining the characterization carrier of each data cluster to obtain the characterization carrier of each data cluster; grouping the data cluster characterization carriers respectively to obtain a first range characterization carrier corresponding to each data cluster characterization carrier to form a first range characterization carrier set, and obtaining a second range characterization carrier corresponding to each data cluster characterization carrier to form a second range characterization carrier set; constructing a first carrier relation diagram corresponding to the first range representation carrier set through a commonality score among all first range representation carriers in the first range representation carrier set, and constructing a second carrier relation diagram corresponding to the second range representation carrier set through data distribution of all data clusters; carrying out carrier integration through a first range representation carrier and adjacent point representation carriers corresponding to the first range representation carrier in the first carrier relation diagram to obtain first integration representation carriers corresponding to each first range representation carrier in the first range representation carrier set, and carrying out carrier integration through a second range representation carrier and adjacent point representation carriers corresponding to the second range representation carrier in the second carrier relation diagram to obtain second integration representation carriers corresponding to each second range representation carrier in the second range representation carrier set; combining a first integration characterization carrier and a second integration characterization carrier which are respectively corresponding to the same data cluster characterization carrier to obtain target data cluster characterization carriers which are respectively corresponding to the data cluster characterization carriers, and determining file priority through the target data cluster characterization carriers which are respectively corresponding to the data cluster characterization carriers to obtain a file priority determination result which is corresponding to the file to be stored; and storing the file to be stored based on the file priority determining result.

According to another aspect of an embodiment of the present application, there is provided a data storage system including: one or more processors; and one or more memories, wherein the memories have stored therein computer readable code, which when executed by the one or more processors, causes the one or more processors to perform the method described above.

The application at least comprises the following beneficial effects: according to the method and the system for optimizing data storage in the Filecoin, the data clusters are obtained by splitting the files to be stored, the characterization carriers of the data clusters are split to obtain the first range characterization carrier set and the second range characterization carrier set, the first carrier relation diagram corresponding to the first range characterization carrier set is built through the commonality scores among the first range characterization carriers in the first range characterization carrier set, and the second carrier relation diagram corresponding to the second range characterization carrier set is built through the data distribution of the data clusters. And then carrying out carrier integration through a first range representation carrier in the first carrier relation diagram and adjacent point representation carriers corresponding to the first range representation carrier to obtain first integration representation carriers corresponding to the first range representation carriers in the first range representation carrier set, and then carrying out carrier integration through a second range representation carrier in the second carrier relation diagram and adjacent point representation carriers corresponding to the second range representation carrier to obtain second integration representation carriers corresponding to the second range representation carriers in the second range representation carrier set. According to the method, carrier integration is carried out through different constructed carrier relation graphs, the obtained integrated characterization carriers are enabled to fuse information of the characterization carriers with adjacent points, so that accuracy of the obtained integrated characterization carriers is improved, the first integrated characterization carriers and the second integrated characterization carriers corresponding to the same data cluster characterization carriers are combined to obtain target data cluster characterization carriers corresponding to the data cluster characterization carriers, the obtained target data cluster characterization carriers are enabled to fuse file content information and file distribution information, accuracy of the obtained target data cluster characterization carriers is increased again, file priority determination is carried out based on the target data cluster characterization carriers corresponding to the data cluster characterization carriers, file priority determination results corresponding to files to be stored are obtained, and accuracy of the obtained file priority determination results is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the aspects of the present application.

Drawings

The foregoing and other objects, features, and advantages of the embodiments of the application will become more apparent from the following more particular description of the embodiments of the application, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application.

FIG. 2 is a functional architecture diagram of a data storage system.

FIG. 3 is a schematic diagram of an SSD cache assembly.

FIG. 4 is a flow chart of an IO channel of an integrated storage architecture.

Fig. 5 is a schematic implementation flow chart of a method for optimizing data storage in a filename according to an embodiment of the present application.

Fig. 6 is a schematic hardware entity diagram of a data storage system according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are intended to be within the scope of the present application, based on the embodiments herein. For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application are further elaborated below in conjunction with the accompanying drawings and examples, which should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making inventive efforts are within the scope of protection of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing the present application only and is not intended to be limiting of the present application.

The method for optimizing data storage in the Filecoin provided by the embodiment of the application can be applied to an application environment shown in FIG. 1. Wherein the terminal 102 communicates with the data storage system 104 via a network. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The data storage system 104 may be implemented as a stand-alone server or as a cluster of servers.

Fig. 2 is a schematic diagram of an architecture of a data storage system, and specifically includes a first storage unit, a second storage unit, an SSD cache module, and a data read-write allocation unit.

The first storage unit is used for storing sector files which are already sealed, and the second storage unit is used for storing sector files which are partially sealed and are being sealed.

The SSD cache assembly comprises a cache pool, wherein the cache pool comprises a cache main control node, a cache service node and a flow limiting module. The cache main control node is used for managing the cache, the cache service node is used for providing SSD cache of the cluster, and the flow limiting module is used for limiting the SSD layer back flushing data; the data read-write distribution unit comprises read-write threads, read-write connection and read-write queues which all complete read-write splitting; when the file to be stored is acquired, the data read-write distribution unit is used for splitting and distributing the file to be stored to the corresponding read-write thread, read-write queue and read-write connection, and storing the file to be stored in the first storage unit and/or the second storage unit.

When the storage system deploys the storage units, the hard disks of the plurality of nodes are segmented according to rules, and the storage redundancy and the optimal layout of the upper limit performance are met. When in use, a sequential use mode is adopted, namely, the storage sector of the Filecoin is preferentially selected in a storage unit with a higher order, and when the storage unit space is full, the storage unit is switched to the next storage unit. Since the sector files after being sealed are not modified any more, the space of the single storage unit is basically fixed. The reading of the storage unit is mainly triggered by time-space proving, and the data of the sector file LBA is randomly accessed; the writing of the storage unit is mainly triggered by the identification file from the Cache background dump, and the writing performance can be improved by means of merging.

Fig. 3 is a schematic structural diagram of an SSD cache module, determines a cache pool corresponding to the SSD cache module, accepts a modification operation on an identification file in a filename service, and periodically backs up a background to a bottom storage unit, so as to avoid releasing excessive background requests in scenes such as space-time proof by means of limitation of a QoS module. The cache master control node (namely the CacheMaster) is used for providing functions such as topology management, service state monitoring, fault detection, control operation and maintenance docking and the like. And adopting active-standby main and standby modes, and realizing strategies such as metadata persistence, main selection and the like through a zookeeper. The cache service node (i.e. the CacheServer) is used for providing SSDCache function of the cluster and is responsible for core logic such as internal metadata management, read-write control, cache elimination, pre-reading, garbage collection, deduplication compression and the like. The service interfacing calls the related API interface, so that the use of Cache Pool resources can be completed, in addition, the SDK can pull Cache topology information from the Cache master and store the Cache topology information into a memory, and then the data is sent to the corresponding Cache Server for execution through a routing algorithm.

Referring to fig. 4, a flow chart of integrating IO paths is shown, in which reading and writing are separated, and in the space-time proof scene of fileoin, there is a bias between reading service and writing service, and in this case, if data can be read out more quickly to generate space-time proof information, the storage computing power will be improved, so that the response efficiency of the reading request needs to be ensured. If the priority is distinguished, the interaction between read-write business is caused by too high load, and the scheme of separating read-write is adopted to relieve competition, and the scheme needs extra resource expenditure, but the method is simple and can be flexibly controlled. The read-write thread, the read-write connection and the read-write storage unit are all split in the whole IO channel, and the read-write can be staggered in time division by combining the storage units, so that the effect of read-write separation is better.

The following describes the reason that the application is built according to the architecture, firstly, the background is introduced, in the ecological environment of the Filecoin, on the premise of ensuring the performance, the lower the cost, the higher the income, so that the HDD with large capacity and low cost is still the main storage medium. However, the HDD is used as a mechanical hard disk, so that the random LBA is very unfriendly to read and write, and bandwidth advantages are hardly exerted, and particularly, the delay of the HDD in a mixed read and write scene is more remarkable. In combination with the service scene of Filecoin, the processes of pre-sealing the sector file and writing the identification file are mainly sequential writing, and are relatively friendly to the HDD. However, the subsequent periodic receiving of the time-space proving instruction will generate a large number of random read-write requests, and the superposition of persistent seal sector file writing will generate a large pressure on the response of the disk, so that the generation of the time-space proving information will face a very large challenge. In the patent CN 113885797A, it is mentioned that by introducing a fast storage pool and a capacity storage pool, the identification file and the sector file of the filename are stored respectively, so as to alleviate the influence of random reading and writing on the HDD. However, this design has several problems: a) The capacity storage pool is constructed by using the HDD, and the space-time proving request is received simultaneously in the process of sealing the sector file, so that the HDD can face a large number of concurrent accesses of random reading and sequential writing requests, and the method is not friendly to a mechanical disk; b) In the traditional storage design, read-write paths are mostly not distinguished, and when the service load is heavier, the storage performance depends on the tolerance degree of a disk medium to IO; c) The quick storage pool separately stores the identification files, so that the quick storage pool and the quick storage pool have a fixed proportion relation, and once a certain storage pool is full, the other storage pool can have capacity waste or unbalanced performance; in order to ensure the storage efficiency in the Filecoin scene, the storage system is designed by combining the characteristics of the HDD disk as far as possible, customizing the special data storage layout according to the service characteristics, reducing the influence on delay in the process of read-write mixing, optimizing the read-write passage of a software stack, reducing the time consumption of generating messages by space-time demonstration, and improving the calculation power of the Filecoin.

The application is based on the above architecture, and has the following advantages:

(1) In the Filecoin scene, different logic storage units are planned by means of the concept of disk grouping or logic storage pool, and the business of sealing and storing sector files is sequentially accepted, so that the effect of reading and writing separation is achieved to a certain extent.

(2) In the Filecoin scene, a Cache space is constructed by means of SSD, a write-in request of an identification file is accepted in a WriteBack mode, a back-flushing strategy is adjusted according to an interval mechanism, and the influence of time-space proving process read-write mixing on an HDD disk is reduced.

(3) And part of random write requests are cached by virtue of the SSD Cache component, so that the occupation of SSD space is built while the performance is ensured, and the cost is reduced.

(4) By means of the elastic change of the logic storage unit and the SSD Cache, a file system with elastic capability can be constructed.

(5) By means of the design of separating the read and write flows on the multipass on the software stack, the influence of the write flow on the read request is solved, and the response efficiency of the read request is improved to a certain extent.

Based on the system architecture, the application also provides a method for optimizing data storage in the Filecoin, which is used for orderly storing by determining the storage priority of the file to be stored (namely, the data to be stored), and combining the storage architectures to determine a better storage strategy. Referring to fig. 5, the method for optimizing data storage in the filewheel comprises the following steps:

Step S110, obtaining a to-be-stored file, splitting the to-be-stored file to obtain each data cluster, and mining the characterization carrier of each data cluster to obtain the characterization carrier of each data cluster.

The file to be stored is a data file to be stored, the type and content of the file are not limited, for example, a log file, the data can be single data type such as numeric data, category data, time data and the like or a combination of multiple data types, and the file to be stored is composed of multiple data items. The data clusters are a set formed by a plurality of data items in the file to be stored, and all the data clusters are combined to obtain the file to be stored. The data cluster characterization carrier is used for characterizing the carrier of the characteristic information of the data clusters, the characterization carrier can be a characteristic vector, a matrix or a tensor, and the data cluster characterization carrier can comprise a data cluster content characterization carrier and a data cluster distribution characterization carrier. The data cluster content characterizes the content of the carrier characterizing data clusters, such as feature information obtained based on data item values of the data clusters, and the data cluster distribution characterizes the distribution of the carrier characterizing data clusters in the file to be stored, such as location feature information obtained based on the locations of the data clusters.

Splitting the to-be-stored file to obtain each data cluster, which may be splitting the to-be-stored file according to a preset number of data clusters, for example, splitting the to-be-stored file according to a preset size of the data cluster. And after splitting, mining the characterization carrier of each data cluster to obtain a content characterization carrier and a data distribution characterization carrier of the data cluster, and obtaining the data cluster characterization carrier according to the content characterization carrier and the data distribution characterization carrier.

Step S120, grouping the data cluster characterization carriers respectively to obtain a first range characterization carrier corresponding to each data cluster characterization carrier to form a first range characterization carrier set, and obtaining a second range characterization carrier corresponding to each data cluster characterization carrier to form a second range characterization carrier set.

The token carrier contains a plurality of elements, for example, a token vector has 10 elements, representing a 10-dimensional vector, or a channel number of 10. The first range characterization carrier represents a characterization carrier composed of elements of each first dimension in the data cluster characterization carrier; the second range characterizing features represent the characterizing features of the elemental composition of each second dimension in the data cluster characterizing features. In other words, embodiments herein split one data cluster representation carrier into a first range representation carrier and a second range representation carrier, for example, the data cluster characterizes the carrier as a vector (1, 2), splitting the data cluster characterization carrier into a first range characterization carrier (1, 1) and a second range characterization carrier (2, 2).

The first range representation carrier set is a set formed by first range representation carriers corresponding to each data cluster; the second range characterization carrier set is a set of second range characterization carrier components corresponding to each data cluster. For example, splitting each data cluster representation carrier in sequence, and splitting according to the number of elements of the preset first range representation carrier and the number of elements of the second range representation carrier to obtain a first range representation carrier corresponding to each data cluster representation carrier and a second range representation carrier corresponding to each data cluster representation carrier. The number of elements of the first range characterizing carrier plus the number of elements of the second range characterizing carrier is equal to the number of elements of the data cluster characterizing carrier. The number of elements of the first range characterizing carrier and the number of elements of the second range characterizing carrier may be equal or unequal.

Step S130, a first carrier relation diagram corresponding to the first range representation carrier set is constructed through the commonality scores among the first range representation carriers in the first range representation carrier set, and a second carrier relation diagram corresponding to the second range representation carrier set is constructed through the data distribution of each data cluster.

The first carrier relationship graph is a connection relationship graph constructed based on each first range characterization carrier and the commonality score, and the data distribution of the data clusters represents the positions of the data clusters in the to-be-stored file, for example, the positions are represented by labels. The second carrier relationship graph is a connection relationship graph constructed based on the data distribution of each second range representation carrier and the data cluster, and the similarity score between each first range representation carrier in the first range representation carrier set, in other words, the similarity score represents the similarity degree of the two, can be obtained through calculating a similarity algorithm created by cosine distance (pre-similarity), euclidean distance (euclidean similarity) and the like. And then determining the similarity relation of each first range representation carrier based on the commonality score, determining each first range representation carrier as a composition point (namely each node of the composition relation graph), and connecting each composition point according to the similarity relation of each first range representation carrier so as to obtain a first carrier relation graph corresponding to the first range representation carrier set. And then, acquiring the data distribution of each data cluster, determining adjacent data clusters based on the data distribution of each data cluster to obtain the distribution relation of the data clusters, finally determining each second range characterization carrier as a composition point, and connecting the composition points according to the distribution relation among the second range characterization carriers to obtain a second carrier relation diagram corresponding to the second range characterization carrier set.

Step S140, performing carrier integration through the first range representation carrier and the adjacent point representation carrier corresponding to the first range representation carrier in the first carrier relationship diagram to obtain first integration representation carriers corresponding to each first range representation carrier in the first range representation carrier set, and performing carrier integration through the second range representation carrier and the adjacent point representation carrier corresponding to the second range representation carrier in the second carrier relationship diagram to obtain second integration representation carriers corresponding to each second range representation carrier in the second range representation carrier set.

The adjacent point representation carrier corresponding to the first range representation carrier is the first range representation carrier corresponding to the composition point connected with the composition point where the first range representation carrier is located in the first carrier relation diagram. The first integration characterization carrier is a characterization carrier obtained by fusing adjacent point characterization carriers of the first range characterization carrier and iterating the first range characterization carrier, and the adjacent point characterization carrier corresponding to the second range characterization carrier is a second range characterization carrier corresponding to a composition point connected with the composition point where the second range characterization carrier is located in the second carrier relation diagram. The second integration characterization vector is a characterization vector obtained by fusing adjacent point characterization vectors of the second range characterization vector and iterating the second range characterization vector.

For example, the first range characterizing vectors in the first vector relational graph and adjacent point characterizing vectors corresponding to the first range characterizing vectors are integrated (for example, one of adding, splicing or connecting is performed) to obtain first integrated characterizing vectors corresponding to each first range characterizing vector in the first range characterizing vector set, for example, all adjacent point characterizing vectors corresponding to the first range characterizing vectors are fused, and iteration is performed on the first range characterizing vectors to obtain first integrated characterizing vectors corresponding to the first range characterizing vectors. And carrying out carrier integration on the second range representation carriers and adjacent point representation carriers corresponding to the second range representation carriers in the second carrier relation diagram to obtain second integration representation carriers corresponding to each second range representation carrier in the second range representation carrier set, for example, fusing all adjacent point representation carriers corresponding to the second range representation carriers and iterating the second range representation carriers to obtain second integration representation carriers corresponding to the second range representation carriers.

Step S150, combining the first integration characterization carrier and the second integration characterization carrier which are respectively corresponding to the same data cluster characterization carrier to obtain target data cluster characterization carriers which are respectively corresponding to the data cluster characterization carriers, and determining file priority through the target data cluster characterization carriers which are respectively corresponding to the data cluster characterization carriers to obtain a file priority determination result which is corresponding to the file to be stored.

The target data cluster characterization vector is a characterization vector obtained based on the first integration characterization vector and the second integration characterization vector. The file priority determining result is the result of the priority classification corresponding to the file to be stored. The classification of the priority may be preset according to actual needs, for example, granularity, type, and the like of the classification, which is not particularly limited. Combining (for example, vector splicing, head and tail are not limited) the first integration characterization carrier and the second integration characterization carrier corresponding to the same data cluster characterization carrier, and browsing and combining the first integration characterization carrier and the second integration characterization carrier corresponding to each data cluster characterization carrier to obtain a target data cluster characterization carrier corresponding to each data cluster characterization carrier, and then determining file priority based on the target data cluster characterization carrier corresponding to each data cluster characterization carrier, for example, determining file priority of the target data cluster characterization carriers corresponding to each data cluster characterization carrier together through affine calculation (full connection mapping) and a classifier (such as softmax), so as to obtain a file priority determination result corresponding to a to-be-stored file.

According to the method for optimizing data storage in the Filecoin, a to-be-stored file is split to obtain data clusters, then the characterization carriers of the data clusters are split to obtain a first range characterization carrier set and a second range characterization carrier set, then a first carrier relation diagram corresponding to the first range characterization carrier set is constructed through the commonality scores of the first range characterization carriers in the first range characterization carrier set, a second carrier relation diagram corresponding to the second range characterization carrier set is constructed through the data distribution of the data clusters, carrier integration is carried out through the first range characterization carriers in the first carrier relation diagram and adjacent point characterization carriers corresponding to the first range characterization carriers in the first range characterization carrier set to obtain first integration characterization carriers corresponding to the first range characterization carriers in the first range characterization carrier set, and carrier integration is carried out through adjacent point characterization carriers corresponding to the second range characterization carriers in the second range characterization carrier set to obtain second integration carriers corresponding to the second range characterization carriers in the second range characterization carrier set. In other words, carrier integration is performed through different constructed carrier relation graphs, the obtained integrated characterization carriers are fused with information of adjacent point characterization carriers so as to improve the accuracy of the obtained integrated characterization carriers, then the first integrated characterization carriers and the second integrated characterization carriers corresponding to the same data cluster characterization carriers are combined to obtain target data cluster characterization carriers corresponding to the data cluster characterization carriers, the obtained target data cluster characterization carriers are fused with file content information and file distribution information, the accuracy of the obtained target data cluster characterization carriers is increased again, file priority determination is performed based on the target data cluster characterization carriers corresponding to the data cluster characterization carriers, file priority determination results corresponding to the to-be-stored files are obtained, and the accuracy of the obtained file priority determination results is increased.

Step S160, the to-be-stored file is stored based on the file priority determination result.

For example, according to the priority, the files to be stored with high priority are stored first, and then the files to be stored with low priority are stored, so that disaster recovery performance is improved.

Optionally, in step S110, the characterizing carrier of each data cluster is mined, so as to obtain the characterizing carrier of each data cluster, which specifically includes: excavating data item representation carriers of all the data clusters to obtain data item representation carriers of all the data clusters; acquiring the data distribution of each data cluster, and carrying out quantization expression on the data distribution of each data cluster to obtain a distribution characterization carrier of each data cluster; and fusing the data item representation carriers of the data clusters with the corresponding data cluster distribution representation carriers to obtain the data cluster representation carriers.

The data cluster data item characterization carrier is used for characterizing the content of the data clusters, and the data cluster distribution characterization carrier is used for characterizing the positions of the data clusters in the files to be stored. For example, the data item value of the data item in each data cluster is obtained, the data item value of the data item in each data cluster is quantitatively expressed (converted into a vector form), and the data item representation carrier of the data cluster corresponding to each data cluster is obtained. Next, a data distribution of each data cluster in the file to be stored, for example, a sequence value of the data cluster, is obtained. And carrying out quantization expression (such as embedded coding) on the data distribution of each data cluster to obtain a data cluster distribution characterization carrier corresponding to each data cluster. And then fusing (such as adding or multiplying) the data item representation carrier of the data cluster corresponding to each data cluster with the data cluster distribution representation carrier to obtain the data cluster representation carrier corresponding to each data cluster.

According to the method, the data cluster data item representation carrier and the data cluster distribution representation carrier are mined, the data cluster data item representation carrier and the corresponding data cluster distribution representation carrier are fused, so that each data cluster representation carrier is obtained, the data cluster representation carrier has file distribution information and file content information, and the accuracy of the data cluster representation carrier is improved.

Optionally, in step S130, constructing a first carrier relationship diagram corresponding to the first range characterization carrier set according to the commonality score between each first range characterization carrier in the first range characterization carrier set may include: obtaining the characterization carrier commonality scores among the first range characterization carriers, and determining the commonality connection result among the first range characterization carriers through the characterization carrier commonality scores; and determining each first range representation carrier as a composition point, and connecting each first range representation carrier according to a common contact result to obtain a first carrier relationship diagram.

The larger the characterization carrier commonality score is, the higher the degree of commonality of the first range characterization carriers is, the more the first range characterization carriers are associated, and the commonality association result is an association result or a connection relation between the first range characterization carriers determined based on the characterization carrier commonality score. And the commonality scores of each first range representation carrier and the rest first range representation carriers can be obtained one by one, so that the commonality scores of the representation carriers corresponding to each first range representation carrier are obtained. And determining a common connection result between each first range representation carrier and the rest first range representation carriers according to the common connection result, for example, determining that the common connection result exists between two first range representation carriers corresponding to the common score of the representation carriers with the common score larger than the set score, or performing descending arrangement on the common scores of the representation carriers corresponding to each first range representation carrier, and determining that the first range representation carriers with the preset number are associated first range representation carriers. For example, obtaining the commonality scores of the first range characterization carrier and all the remaining first range characterization carriers, determining the first three remaining first range characterization carriers in the obtained commonality scores of the characterization carriers as similar relations with the current first range characterization carrier, connecting the three remaining first range characterization carriers with the current first range characterization carrier when constructing a relation graph, taking each first range characterization carrier as the characterization carrier of the composition point of the relation graph, and connecting the first range characterization carriers according to the commonality relation result to obtain the first carrier relation graph.

The commonality relation result among the first range characterization carriers is determined by acquiring the commonality scores of the characterization carriers among the first range characterization carriers, the first range characterization carriers are respectively determined to be composition points, and the first range characterization carriers are connected according to the commonality relation result, so that the accuracy of the first carrier relation diagram can be improved.

Optionally, in step S120, constructing a second carrier relationship diagram corresponding to the second range representation carrier set through the data distribution of each data cluster may include: determining carrier distribution of a second range representation carrier corresponding to each data cluster representation carrier through data distribution of each data cluster, and determining distribution relation among each second range representation carrier in a second range representation carrier set through carrier distribution; and determining each second range representation carrier as a composition point, and connecting each second range representation carrier according to the distribution relation to obtain a second carrier relation diagram.

The carrier distribution is the carrier distribution of the second range representation carrier, and the distribution relation is the connection result between the second range representation carriers determined according to the carrier distribution. Determining adjacent data clusters according to the data distribution of each data cluster, determining adjacent second range representation carriers based on the adjacent data clusters, determining the distribution relation among the second range representation carriers based on the adjacent second range representation carriers, and constructing a connection result among the adjacent second range representation carriers. Each second range representation carrier is determined as a component point representation carrier of the relationship graph, and adjacent second range representation carriers are connected to obtain a second carrier relationship graph.

And determining the distribution relation among the second range representation carriers in the second range representation carrier set through the data distribution of each data cluster, then determining each second range representation carrier as a composition point, and connecting each second range representation carrier according to the distribution relation, so as to obtain a second carrier relation diagram containing the distribution information of the composition point carriers, thereby improving the precision of the second carrier relation diagram.

Optionally, in step S140, carrier integration is performed by the first range representation carrier and the adjacent point representation carrier corresponding to the first range representation carrier in the first carrier relationship diagram, so as to obtain first integration representation carriers corresponding to each first range representation carrier in the first range representation carrier set, which specifically includes: acquiring a carrier mean value of adjacent point representation carriers corresponding to a first range representation carrier, obtaining the first carrier mean value, and acquiring a difference making carrier between the first range representation carrier and the adjacent point representation carrier corresponding to the first range representation carrier, so as to obtain a first difference making carrier; combining the first range representation carrier, the first difference carrier and the first carrier mean value to obtain a first combined representation carrier, and carrying out affine calculation through the first combined representation carrier to obtain a first integrated representation carrier corresponding to the first range representation carrier; and (3) browsing each first range representation carrier in the first carrier relation diagram to obtain a first integration representation carrier corresponding to each first range representation carrier in the first range representation carrier set.

For example, acquiring the adjacent point representation vectors corresponding to the first range representation vector in the first vector relation diagram, acquiring the vector mean value of the adjacent point representation vectors, namely acquiring the vector sum value and the vector number of the adjacent point representation vectors, and acquiring the ratio of the vector sum value to the vector number to obtain the first vector mean value, then acquiring the difference value between the first range representation vector and each adjacent point representation vector to obtain a difference vector, taking the largest difference vector as the first difference vector, combining the first range representation vector, the first difference vector and the first vector mean value, for example, determining the first range representation vector as a connector, determining the first difference vector as the middle, determining the first vector mean value as a connecting tail, and completing head-tail combination to obtain the first combined representation vector. Finally, obtaining an affine variable (namely a fully connected parameter) prepared in advance, and planning affine calculation on the first combined representation carrier in the affine variable to obtain a first integrated representation carrier corresponding to the first range representation carrier. And finally, browsing each first range representation carrier in the first carrier relation diagram to obtain a first integration representation carrier corresponding to each first range representation carrier.

The first combined characterization carrier is obtained by obtaining the first carrier mean value and the first difference carrier, combining the first range characterization carrier, the first difference carrier and the first carrier mean value, affine calculation is carried out through the first combined characterization carrier, and the first integrated characterization carrier corresponding to the first range characterization carrier is obtained, so that the obtained integrated characterization carrier can be fused with the feature expression information of adjacent points, and the precision of the first integrated characterization carrier is improved.

Optionally, in step S140, carrier integration is performed by the second range representation carrier and the neighboring point representation carriers corresponding to the second range representation carrier in the second carrier relationship diagram, so as to obtain second integration representation carriers corresponding to each second range representation carrier in the second range representation carrier set, which may include: acquiring a carrier mean value of adjacent point representation carriers corresponding to the second range representation carriers to obtain a second carrier mean value, and acquiring a difference making carrier between the second range representation carriers and the adjacent point representation carriers corresponding to the second range representation carriers to obtain a second difference making carrier; combining the second range representation carrier, the second difference carrier and the second carrier mean value to obtain a second combined representation carrier, and carrying out affine calculation through the second combined representation carrier to obtain a second integrated representation carrier corresponding to the second range representation carrier; and (3) browsing each second range representation carrier in the second carrier relation diagram to obtain a second integration representation carrier corresponding to each second range representation carrier in the second range representation carrier set.

For example, determining each adjacent point representation carrier corresponding to the second range representation carrier in the second carrier relation diagram, then obtaining a carrier mean value of each adjacent point representation carrier corresponding to the second range representation carrier, namely obtaining a carrier sum value and a carrier number of each adjacent point representation carrier, then obtaining a ratio of the carrier sum value to the carrier number to obtain a second carrier mean value, then obtaining a carrier difference value between the second range representation carrier and each adjacent point representation carrier, determining the maximum carrier difference value as a second difference carrier, and then merging the second range representation carrier, the second difference carrier and the first carrier mean value to obtain a second merged representation carrier. Obtaining affine variables prepared in advance, carrying out affine calculation on the second combined representation carriers based on the affine variables to obtain second integrated representation carriers corresponding to the second range representation carriers, and browsing each second range representation carrier in the second carrier relation diagram to obtain second integrated representation carriers corresponding to each second range representation carrier.

The second range representation carrier, the second difference carrier and the second carrier mean are combined to obtain a second combined representation carrier, affine calculation is performed through the second combined representation carrier to obtain a second integration representation carrier corresponding to the second range representation carrier, and the obtained integration representation carrier can fuse the characteristic information of adjacent points to improve the precision of the second integration representation carrier.

Optionally, in step S150, the first integration characterization carrier and the second integration characterization carrier corresponding to the same data cluster characterization carrier are combined to obtain the target data cluster characterization carrier corresponding to each data cluster characterization carrier, and the file priority is determined by the target data cluster characterization carrier corresponding to each data cluster characterization carrier, so as to obtain a file priority determination result corresponding to the file to be stored, which includes:

step S151, obtaining a first strengthening factor, and performing nonlinear transformation on the first integration characterization carriers corresponding to the first range characterization carriers in the first range characterization carrier set through the first strengthening factor to obtain the first strengthening characterization carriers corresponding to the first range characterization carriers in the first range characterization carrier set.

The first enhancement factor is a variable that can retain more fine-grained information when the first integrated characterization vector is subjected to nonlinear transformation (i.e., activation, such as ReLu, is completed), and the first enhancement factor is obtained, for example, by debugging in advance, and the first enhancement characterization vector is obtained by performing nonlinear transformation on the first integrated characterization vector through the first enhancement factor. And carrying out nonlinear transformation on the first integration characterization carriers corresponding to the first range characterization carriers in the first range characterization carrier set based on the first enhancement factors, and browsing the first integration characterization carriers corresponding to each first range characterization carrier to obtain the first enhancement characterization carriers corresponding to the first range characterization carriers in the first range characterization carrier set.

Step S152, performing nonlinear transformation on the second integration characterization carriers corresponding to the second range characterization carriers in the second range characterization carrier set through the first enhancement factors to obtain second enhancement characterization carriers corresponding to the second range characterization carriers in the second range characterization carrier set.

Step S153, combining the first enhanced representation carrier and the second enhanced representation carrier which are respectively corresponding to the same data cluster representation carrier to obtain enhanced data cluster representation carriers which are respectively corresponding to the data cluster representation carriers, and determining file priority through the enhanced data cluster representation carriers which are respectively corresponding to the data cluster representation carriers to obtain a target file priority determination result corresponding to the file to be stored.

The second enhanced representation carrier is obtained by nonlinear transformation of the second integrated representation carrier through the first enhancement factor, and the enhanced data cluster representation carrier is obtained by combining the first enhanced representation carrier and the second enhanced representation carrier. For example, based on the first enhancement factors, carrying out nonlinear transformation on the second integrated characterization carriers corresponding to each second range characterization carrier by adopting ReLu to obtain second enhancement characterization carriers of each second range characterization carrier, merging (splicing) the first enhancement characterization carriers and the second enhancement characterization carriers corresponding to the same data cluster, obtaining classification parameter values (used for carrying out priority classification), carrying out file priority determination through the enhancement data cluster characterization carriers corresponding to each data cluster characterization carrier, and obtaining a target file priority determination result corresponding to a to-be-stored file.

The first enhancement factor is adopted to strengthen the information of the first integrated characterization carrier and the second integrated characterization carrier, so that the first enhancement characterization carrier and the second enhancement characterization carrier can be obtained, more fine granularity information can be reserved, the obtained characterization carriers are prevented from being too smooth, then the enhancement data cluster characterization carriers corresponding to the data cluster characterization carriers are obtained based on the first enhancement characterization carrier and the second enhancement characterization carrier, the accuracy of the obtained enhancement data cluster characterization carriers is increased again, the file priority determination is carried out based on the data cluster characterization carriers, the target file priority determination result corresponding to the to-be-stored file is obtained, and the accuracy of the file priority determination is increased.

Optionally, in step S151, performing nonlinear transformation on the first integrated token vectors corresponding to each of the first range token vectors in the first range token vector set by using the first strengthening factor to obtain first strengthening token vectors corresponding to each of the first range token vectors in the first range token vector set, which specifically may include: fitting optimization transformation is carried out on first integration characterization carriers corresponding to each first range characterization carrier in a first range characterization carrier set through a first enhancement factor to obtain first optimization characterization carriers corresponding to each first range characterization carrier in the first range characterization carrier set, and normal distribution difference values corresponding to the first optimization characterization carriers are obtained to obtain first normal distribution difference values; weighting the first integration characterization carriers corresponding to the first range characterization carriers in the first range characterization carrier set to obtain first weighted characterization carriers corresponding to the first range characterization carriers in the first range characterization carrier set; and obtaining a multiplication result of the first weighted representation carrier and the first normal distribution difference value to obtain first enhanced representation carriers corresponding to each first range representation carrier in the first range representation carrier set.

The first optimization characterization carrier is a result after fitting optimization transformation corresponding to the first integration characterization carrier, the first normal distribution difference value is obtained by calculating the first optimization characterization carrier by adopting a Gaussian error algorithm, and the first weighting characterization carrier is obtained by weighting the first integration characterization carrier based on preset weights. The method comprises the steps of performing fitting optimization transformation on a first integrated characterization carrier corresponding to each first range characterization carrier by using a first enhancement factor to obtain a nonlinear transformation, for example, obtaining a proportion between the first integrated characterization carrier and the first enhancement factor, or determining a proportion between the first integrated characterization carrier and a transformed parameter after performing fitting optimization transformation on the first enhancement factor to obtain a first optimized characterization carrier corresponding to the first integrated characterization carrier, wherein the larger the first enhancement factor is, the less fine grain information is remained in the first enhanced characterization carrier, then obtaining a first normal distribution difference value corresponding to the first optimized characterization carrier based on Gaussian errors, weighting each first integrated characterization carrier based on preset weights to obtain a first weighted characterization carrier, and finally obtaining a multiplication result of the first weighted characterization carrier and the first normal distribution difference value to obtain first enhanced characterization carriers corresponding to each first range characterization carrier in a first range characterization carrier set.

Optionally, in step S152, performing nonlinear transformation on the second integrated token vectors corresponding to each of the second range token vectors in the second range token vector set by using the first reinforcement factor to obtain second reinforcement token vectors corresponding to each of the second range token vectors in the second range token vector set, which may include: fitting optimization transformation is carried out on second integration characterization carriers corresponding to each second range characterization carrier in a second range characterization carrier set through a first enhancement factor to obtain second optimization characterization carriers corresponding to each second range characterization carrier in the second range characterization carrier set, and normal distribution difference values corresponding to the second optimization characterization carriers are obtained to obtain second normal distribution difference values; weighting the second integration characterization carriers corresponding to the second range characterization carriers in the second range characterization carrier set to obtain second weighted characterization carriers corresponding to the second range characterization carriers in the second range characterization carrier set; and obtaining a multiplication result of the second weighted representation carrier and a second normal distribution difference value to obtain second enhanced representation carriers corresponding to each second range representation carrier in the second range representation carrier set. The second optimization characterization carrier is obtained after fitting optimization transformation corresponding to the second integration characterization carrier, the second normal distribution difference value is obtained by calculating the second optimization characterization carrier by adopting a Gaussian error algorithm, and the second weighting characterization carrier is obtained by weighting the second integration characterization carrier by adopting a preset weight.

Optionally, in step S153, determining the file priority by using the enhanced data cluster characterization carriers corresponding to the data cluster characterization carriers to obtain a target file priority determination result corresponding to the file to be stored, which may include:

step S1531, splitting the reinforced data cluster characterization carriers corresponding to the data cluster characterization carriers to obtain a first reinforced range characterization carrier set, a second reinforced range characterization carrier set and a third reinforced range characterization carrier set, wherein the sum of the number of elements of the second reinforced range characterization carrier in the second reinforced range characterization carrier set and the number of elements of the third reinforced range characterization carrier in the third reinforced range characterization carrier set is the same as the number of elements of the second range characterization carrier.

The first enhancement range characterization carrier is the element composition of each first characterization carrier range (i.e., the range formed by the first characterization carriers) in the enhancement data cluster characterization carrier, the second enhancement range characterization carrier is the element composition of each second characterization carrier range in the enhancement data cluster characterization carrier, and the third enhancement range characterization carrier is the element composition of each third characterization carrier range in the enhancement data cluster characterization carrier. The number of elements is equal to the number of channels in the range of the characterizing carrier, or dimension.

For example, the reinforced data cluster characterization carrier is split into three sections, the number of the characterization carrier elements of the first section is equal to that of the characterization carrier elements of the first range characterization carrier, the characterization carrier range of the first section is equal to that of the characterization carrier range of the first range characterization carrier, and the split first section is determined as the first reinforced range characterization carrier. The sum of the number of the characterization carrier elements of the second section and the number of the characterization carrier elements of the third section is equal to the number of the characterization carrier elements of the second range characterization carrier, and the characterization carrier range of the second section and the characterization carrier range of the third section are equal to the characterization carrier range of the second range characterization carrier. During splitting, the characterization carrier range of the first range characterization carrier is fixed and serves as the characterization carrier range of the first enhancement range characterization carrier, the characterization carrier range corresponding to the second range characterization carrier is split into two sections, one section is the characterization carrier range of the second enhancement range characterization carrier, the other section is the characterization carrier range of the third enhancement range characterization carrier, elements of the characterization carrier range are obtained, and the first enhancement range characterization carrier, the second enhancement range characterization carrier set and the third enhancement range characterization carrier are obtained.

Splitting each reinforcement data cluster representation carrier to obtain a first reinforcement range representation carrier corresponding to each reinforcement data cluster representation carrier, obtaining a first reinforcement range representation carrier set, obtaining a second reinforcement range representation carrier corresponding to each reinforcement data cluster representation carrier, namely a second reinforcement range representation carrier set, and obtaining a third reinforcement range representation carrier corresponding to each reinforcement data cluster representation carrier, namely a third reinforcement range representation carrier set. Optionally, before splitting the reinforced data cluster representation carrier, the distribution representation carrier of the data cluster can be obtained, the representation carrier and the value of the distribution representation carrier of the data cluster and the corresponding reinforced data cluster representation carrier are obtained, the to-be-split representation carrier of the data cluster is obtained, the to-be-split representation carrier is split, and the first reinforced range representation carrier, the second reinforced range representation carrier and the third reinforced range representation carrier are obtained, so that the carrier contains file distribution information, file distribution information loss is prevented, and the accuracy of the data cluster representation carrier is improved.

Step S1532, constructing a first reinforced carrier relation diagram corresponding to the first reinforced range representation carrier set through the commonality scores among the first reinforced range representation carriers in the first reinforced range representation carrier set, and constructing a second reinforced carrier relation diagram corresponding to the second reinforced range representation carrier set through the data distribution of each data cluster.

The first reinforcement vector relationship graph is a relationship graph constructed using the first reinforcement range characterizing vectors and the commonality scoring relationship between the first reinforcement range characterizing vectors. The second reinforcement matrix relationship graph is a relationship graph constructed using adjacent distribution relationships between each second reinforcement range characterizing matrix and each second reinforcement range characterizing matrix.

For example, a commonality score between each first enhancement range representation carrier is obtained, that is, a commonality score between the first enhancement range representation carrier and the remaining first enhancement range representation carriers is obtained, a first enhancement range representation carrier having similar relation to the first enhancement range representation carrier is determined in the remaining first enhancement range representation carriers based on the commonality score, if a set number of remaining first enhancement range representation carriers with the commonality score in front of the commonality score are determined as the first enhancement range representation carriers having similar relation to the first enhancement range representation carrier, the first enhancement range representation carriers with similar relation are connected to obtain a first enhancement carrier relation graph, then a distribution relation between corresponding second enhancement range representation carriers is determined based on data distribution of each data cluster, such that the corresponding second enhancement range representation carriers contain adjacent distribution relation, and the second enhancement representation carriers containing the adjacent distribution relation are connected to obtain a second enhancement carrier relation graph.

Step S1533, determining adjacent strengthening range characterization carriers corresponding to the third strengthening range characterization carriers in the third strengthening range characterization carrier set through data distribution of the data clusters, and constructing a third strengthening carrier relation diagram corresponding to the third strengthening range characterization carrier set through commonality scores among the adjacent strengthening range characterization carriers corresponding to the third strengthening range characterization carriers in the third strengthening range characterization carrier set.

The adjacent strengthening range characterization carrier is a strengthening range characterization carrier which contains adjacent distribution relation with the third strengthening range characterization carrier, and the third strengthening carrier relation graph is a relation graph constructed based on the commonality scores between the third strengthening characterization carrier and the adjacent strengthening range characterization carrier.

For example, based on the relationship between the data clusters and the third enhancement region characterization object, an adjacent enhancement region characterization object corresponding to the third enhancement region characterization object may be obtained, and if two data clusters are adjacent, then the corresponding third enhancement region characterization object is adjacent. Each third strengthening range representation carrier can correspond to the adjacent strengthening range representation carrier, the strengthening range representation carrier mean value of the adjacent strengthening range representation carriers is obtained, and the adjacent strengthening range representation carrier of the third strengthening range representation carrier is obtained. The method comprises the steps of obtaining adjacent strengthening range characterization carriers corresponding to each third strengthening range characterization carrier through browsing, obtaining common scores among the adjacent strengthening range characterization carriers, determining the common scores among the adjacent strengthening range characterization carriers as common scores among the corresponding third strengthening range characterization carriers, determining the rest third strengthening range characterization carriers with the common scores of the third strengthening range characterization carriers being larger than the set scores, obtaining rest third strengthening range characterization carriers containing similar relations with the third strengthening range characterization carriers, then taking each third strengthening range characterization carrier as a composition point in a relation chart, connecting the third strengthening range characterization carriers containing similar relations to obtain a third strengthening carrier relation chart, and in the process of determining a contact result, considering not only characteristic information of the composition point, but also characteristic common characteristics of adjacent points of the composition point, so that the obtained similar relations are more reliable, and the obtained third strengthening carrier relation chart is more accurate.

Step S1534, carrying out carrier integration through the first reinforcement range characterization carriers and the adjacent point characterization carriers corresponding to the first reinforcement range characterization carriers in the first reinforcement carrier relation diagram to obtain first integration reinforcement characterization carriers corresponding to the first reinforcement range characterization carriers in the first reinforcement range characterization carrier collection.

Step S1535, carrying out carrier integration through the second reinforcement range characterization carriers and the adjacent point characterization carriers corresponding to the second reinforcement range characterization carriers in the second reinforcement range characterization carrier relation diagram to obtain second integration reinforcement characterization carriers corresponding to the second reinforcement range characterization carriers in the second reinforcement range characterization carrier collection.

The first integration strengthening characterization carrier is obtained by fusing and iterating the first strengthening range characterization carrier by using the adjacent point characterization carrier. The second integrated enhanced characterization carrier is a characterization carrier obtained by fusing and iterating the second enhanced range characterization carrier based on the adjacent point characterization carrier.

For example, a carrier mean value of adjacent point representation carriers corresponding to the first enhancement range representation carrier is obtained, then a difference making carrier between the first enhancement range representation carrier and the adjacent point representation carrier corresponding to the first enhancement range representation carrier is obtained, the first enhancement range representation carrier, the carrier mean value and the difference making carrier are combined, and affine calculation is carried out on the combined result to obtain a first integrated enhancement representation carrier corresponding to the first enhancement range representation carrier. Then, obtaining a carrier mean value of adjacent point representation carriers corresponding to the second enhancement range representation carrier, obtaining a difference making carrier between the second enhancement range representation carrier and the adjacent point representation carrier corresponding to the second enhancement range representation carrier, combining the second enhancement range representation carrier, the carrier mean value and the difference making carrier, and carrying out affine calculation on a combined result to obtain a second integrated enhancement representation carrier corresponding to the second enhancement range representation carrier.

And step S1536, carrying out carrier integration through the adjacent point characterization carriers corresponding to the third enhancement range characterization carriers and the third enhancement range characterization carriers in the third enhancement carrier relation diagram to obtain third integration enhancement characterization carriers corresponding to the third enhancement range characterization carriers in the third enhancement range characterization carrier collection.

The third integration strengthening characterization carrier is a characterization carrier obtained by fusing and iterating the third strengthening range characterization carrier based on the adjacent point characterization carrier. For example, the adjacent point representation vectors corresponding to the third enhancement range representation vectors are obtained in the third enhancement vector relation diagram, the vector average value of all the adjacent point representation vectors corresponding to the third enhancement range representation vectors is obtained, the difference vector between the adjacent point representation vectors corresponding to the third enhancement range representation vectors and the third enhancement range representation vectors is obtained, and the difference vector is the maximum difference vector between the third enhancement range representation vectors and the adjacent point representation vectors. And combining the third strengthening range representation carrier, the carrier mean value and the serving as a difference carrier, and carrying out affine calculation on the combined result to obtain a third integration strengthening representation carrier corresponding to the third strengthening range representation carrier.

Step S1537, combining the first integrated enhanced characterization carrier, the second integrated enhanced characterization carrier and the third integrated enhanced characterization carrier corresponding to the same data cluster characterization carrier to obtain the target enhanced data cluster characterization carrier corresponding to each data cluster characterization carrier.

Step S1538, determining file priority by the target enhanced data cluster characterization carriers corresponding to the data cluster characterization carriers, and obtaining enhanced file priority determination results corresponding to the files to be stored.

The target reinforced data cluster characterization carrier is a data cluster characterization carrier after reinforcing the reinforced data cluster characterization carrier. For example, the first integrated enhanced representation carrier, the second integrated enhanced representation carrier and the third integrated enhanced representation carrier corresponding to the same data cluster representation carrier are combined to obtain the target enhanced data cluster representation carrier corresponding to each data cluster representation carrier, and file priority determination is performed based on the target enhanced data cluster representation carrier to obtain an enhanced file priority determination result corresponding to the file to be stored.

The reinforced data cluster representation carrier is split to obtain a first reinforced range representation carrier set, a second reinforced range representation carrier set and a third reinforced range representation carrier set, corresponding reinforced carrier relation diagrams are respectively constructed, adjacent point representation carriers are fused based on the reinforced carrier relation diagrams, the reinforced range representation carriers of the composition points are iterated, the integrated reinforced representation carrier is obtained, and the accuracy of the obtained data cluster representation carrier is improved. The integrated enhanced characterization carriers of the same data cluster characterization carriers are combined to obtain the target integrated enhanced characterization carriers, namely, the characterization carrier ranges of the second range characterization carriers are split sequentially to obtain the second enhanced range characterization carriers and the third enhanced range characterization carriers, the characterization carrier ranges based on the commonality grading construction relation diagram are enhanced, the characterization carrier ranges based on the data distribution construction relation diagram are reduced, the characterization carriers with variable characterization carrier ranges contain adjacent point commonality grading information, the obtained target integrated enhanced characterization carriers are more accurate, file priority determination is carried out through the target integrated enhanced characterization carriers, and the file priority determination precision is increased.

Optionally, in step S1533, constructing a third reinforced carrier relationship diagram corresponding to the third reinforced range representation carrier set by using the commonality scores between adjacent reinforced range representation carriers corresponding to each third reinforced range representation carrier in the third reinforced range representation carrier set, including:

step S15331, determining the current characterization vector and the target characterization vector in each third enhancement region characterization vector.

And step S15332, determining each current adjacent characterization carrier corresponding to the current characterization carrier from the third enhancement range characterization carriers through the data distribution of each data cluster, and fusing each current adjacent characterization carrier to obtain the current fusion adjacent characterization carrier.

The current characterization vector is a current third enhancement range characterization vector, and the target characterization vector may be an optional third enhancement range characterization vector other than the current characterization vector, which is a third enhancement range characterization vector to be scored for commonality with the current characterization vector.

For example, determining the current characterization carrier and the target characterization carrier one by one in each third enhancement range characterization carrier, then determining the position of the third enhancement range characterization carrier according to the data distribution of each data cluster, determining each current adjacent characterization carrier corresponding to the current characterization carrier based on the adjacent distribution relation of the data clusters, obtaining the third enhancement range characterization carrier corresponding to the data cluster adjacent to the data cluster corresponding to the current characterization carrier, determining the third enhancement range characterization carrier corresponding to the adjacent data cluster as each current adjacent characterization carrier, and then obtaining each current adjacent characterization carrier for fusion, such as obtaining the carrier mean value of each current adjacent characterization carrier, or obtaining the carrier sum value of each current adjacent characterization carrier, and obtaining the current fusion adjacent characterization carrier.

Step S15333, determining each target adjacent characterization carrier corresponding to the target characterization carrier from the third enhancement range characterization carriers through the data distribution of each data cluster, and fusing each target adjacent characterization carrier to obtain the target fusion adjacent characterization carrier.

For example, each data cluster adjacent to the data cluster corresponding to the target representation carrier is determined based on the data distribution, the third strengthening range representation carrier corresponding to each adjacent data cluster is used as each target adjacent representation carrier corresponding to the target representation carrier, then the carrier mean value of each target adjacent representation carrier can be obtained to obtain the target fusion adjacent representation carrier, or the carrier sum value of each target adjacent representation carrier is obtained to obtain the target fusion adjacent representation carrier.

Step S15334, obtaining the commonality score between the current fusion adjacent characterization vector and the target fusion adjacent characterization vector, and obtaining the commonality score between the current characterization vector and the target characterization vector.

Step S15335, the third strengthening range characterization carriers are browsed, the commonality scores among adjacent strengthening range characterization carriers corresponding to the third strengthening range characterization carriers are obtained, and the commonality scores among the adjacent strengthening range characterization carriers corresponding to the third strengthening range characterization carriers are determined as target commonality scores among the third strengthening range characterization carriers.

Step S15336, determining target contact results among the third strengthening range representation carriers through the target commonality scores, determining the third strengthening range representation carriers as composition points, and connecting the third strengthening range representation carriers according to the target contact results to obtain a third strengthening carrier relationship diagram.

The method comprises the steps of obtaining a common score of a current fusion adjacent representation carrier and a target fusion adjacent representation carrier, taking the common score as the common score between the current representation carrier and the target representation carrier, obtaining the common score of each third strengthening range representation carrier and the rest of third strengthening range representation carriers, determining the rest of third strengthening range representation carriers containing target contact results with each third strengthening range representation carrier based on the common score, for example, descending and sorting the common scores of the third strengthening range representation carriers and the rest of third strengthening range representation carriers, determining the rest of third strengthening range representation carriers corresponding to the first five common scores as the third strengthening range representation carriers containing target contact results with the third strengthening range representation carriers, then determining each third strengthening range representation carrier as a composition point, and connecting the third strengthening range representation carriers according to the target contact results to obtain a third strengthening carrier relation graph.

Optionally, in step S1537, combining the first integrated enhanced representation carrier, the second integrated enhanced representation carrier and the third integrated enhanced representation carrier corresponding to each of the same data cluster representation carriers to obtain a target enhanced data cluster representation carrier corresponding to each of the data cluster representation carriers, including:

step S15371, obtaining a second strengthening factor, and performing nonlinear transformation on the first integrated strengthening characterization carriers corresponding to the first strengthening range characterization carriers in the first strengthening range characterization carrier set through the second strengthening factor to obtain first nonlinear characterization carriers corresponding to the first strengthening range characterization carriers in the first strengthening range characterization carrier set.

Step S15372, performing nonlinear transformation on the second integrated characterization vectors corresponding to the second enhancement range characterization vectors in the second enhancement range characterization vector set through the second enhancement factors to obtain second nonlinear characterization vectors corresponding to the second enhancement range characterization vectors in the second enhancement range characterization vector set.

Step S15373, performing nonlinear transformation on the third integration enhancement characteristic vectors corresponding to the third enhancement range characteristic vectors in the third enhancement range characteristic vector set through the second enhancement factors to obtain third nonlinear characteristic vectors corresponding to the third enhancement range characteristic vectors in the third enhancement range characteristic vector set.

Step S15374, combining the first nonlinear characterization carrier, the second nonlinear characterization carrier and the third nonlinear characterization carrier corresponding to the same data cluster characterization carrier to obtain the target enhanced data cluster characterization carrier corresponding to each data cluster characterization carrier.

The second strengthening factor is the strengthening factor adopted when the integrated strengthening characterization carrier is subjected to nonlinear transformation, and the second strengthening factor acts similarly to the first strengthening factor. And acquiring a second strengthening factor, and respectively carrying out nonlinear transformation on the first strengthening range representation carrier, the second strengthening range representation carrier and the third strengthening range representation carrier based on the second strengthening factor to obtain a target strengthening data cluster representation carrier corresponding to each data cluster representation carrier.

The enhancement range representation carriers corresponding to the same data cluster representation carrier are subjected to nonlinear transformation based on the second enhancement factors to obtain nonlinear representation carriers, and the nonlinear representation carriers are combined to obtain target enhancement data cluster representation carriers corresponding to the data cluster representation carriers, so that the obtained target enhancement data cluster representation carriers can retain more fine-grained information, and the accuracy of the target enhancement data cluster representation carriers is improved.

Optionally, in step S15373, performing nonlinear transformation on the third integrated enhancement token vectors corresponding to the third enhancement token vectors in the third enhancement token vector set by using the second enhancement factors to obtain third nonlinear token vectors corresponding to the third enhancement token vectors in the third enhancement token vector set, where the step of performing nonlinear transformation includes: fitting optimization transformation is carried out on the third integration strengthening characterization carriers corresponding to the third strengthening range characterization carriers in the third strengthening range characterization carrier set through the second strengthening factors to obtain third optimization characterization carriers corresponding to the third strengthening range characterization carriers in the third strengthening range characterization carrier set, and normal distribution difference values corresponding to the third optimization characterization carriers are obtained to obtain third normal distribution difference values; weighting the second integration strengthening characterization carriers corresponding to the third strengthening range characterization carriers in the third strengthening range characterization carrier set to obtain third weighted characterization carriers corresponding to the third strengthening range characterization carriers in the third strengthening range characterization carrier set; and obtaining a multiplication result of the third weighted representation carrier and a third normal distribution difference value to obtain third nonlinear representation carriers corresponding to the third enhancement range representation carriers in the third enhancement range representation carrier set.

The third optimization characterization carrier is a result after fitting optimization transformation corresponding to the third enhancement range characterization carrier, the third normal distribution difference value is a result obtained by calculating the third optimization characterization carrier based on a Gaussian error algorithm, the third weighting characterization carrier is a result obtained by weighting the third enhancement range characterization carrier through a preset weight, and the third nonlinear characterization carrier is a result obtained by nonlinear transformation of the third enhancement range characterization carrier through a second enhancement factor. And carrying out fitting optimization transformation, in particular nonlinear transformation, on each third strengthening range representation carrier through a second strengthening factor, for example, obtaining the proportion between the third strengthening range representation carrier and the second strengthening factor, or carrying out fitting optimization transformation on the second strengthening factor, obtaining the proportion between the third strengthening range representation carrier and the transformed parameter, obtaining a third optimization representation carrier, then weighting each third strengthening range representation carrier by preset weights, obtaining a third weighted representation carrier, obtaining the multiplication result of the third weighted representation carrier and a third normal distribution difference value, and obtaining the third nonlinear representation carrier corresponding to each third strengthening range representation carrier in the third strengthening range representation carrier set.

And carrying out fitting optimization transformation on the second third strengthening range characterization carriers by adopting the second strengthening factors to obtain third optimization characterization carriers, obtaining normal distribution difference values corresponding to the third optimization characterization carriers to obtain third normal distribution difference values, weighting the third strengthening range characterization carriers to obtain third weighted characterization carriers, obtaining multiplication results of the third weighted characterization carriers and the third normal distribution difference values to obtain third nonlinear characterization carriers corresponding to the third strengthening range characterization carriers, and enabling the obtained third nonlinear characterization carriers to maintain more fine granularity information so as to improve the accuracy of the obtained third strengthening characterization carriers.

Optionally, the method for optimizing data storage in the filename provided in the embodiment of the present application further includes: taking the target reinforced data cluster representation carrier as a reinforced data cluster representation carrier, jumping to split reinforced data cluster representation carriers corresponding to the data cluster representation carriers respectively, and executing operations of obtaining a first reinforced range representation carrier set, a second reinforced range representation carrier set and a third reinforced range representation carrier set, wherein the number of elements of the second reinforced range representation carrier in the second reinforced range representation carrier set is increased according to the preset number, and the number of elements of the third reinforced range representation carrier in the third reinforced range representation carrier set is reduced according to the preset number; when the iteration stopping requirement determined in advance is met, obtaining the final data cluster characterization carriers corresponding to the data cluster characterization carriers, and determining the file priority through the final data cluster characterization carriers corresponding to the data cluster characterization carriers, so as to obtain the final file priority determining result corresponding to the file to be stored.

The preset number is the number of the preset characterization carrier elements to be moved, namely, the range of the part of characterization carriers in the third strengthening range characterization carrier is to be moved into the range of the characterization carriers in the second strengthening range characterization carrier, and the preset iteration stop requirement is the requirement of completing the excavation of the characterization carriers of the preset data cluster. For example, each time of iterative updating, the range of the characterization carrier of the third strengthening range characterization carrier is reduced according to the number of the characterization carrier elements required to be reduced, the reduced characterization carrier range is expanded to the characterization carrier range of the second strengthening range characterization carrier, the number of the characterization carrier elements of the first strengthening range characterization carrier is fixed, the characterization carrier of the composition point corresponding to the second strengthening range characterization carrier starts to be similar to the adjacent point, and the iteration stop requirement is met when the minimum value of the number of the characterization carrier elements of the third strengthening range characterization carrier meets the preset iteration round or the preset iteration stop requirement. Determining the final reinforced data cluster representation carrier of the last time as a final data cluster representation carrier, obtaining a final data cluster representation carrier corresponding to each data cluster representation carrier, determining the file priority by the final data cluster representation carrier corresponding to each data cluster representation carrier, obtaining a final file priority determination result corresponding to a to-be-stored file, taking the target reinforced data cluster representation carrier as the reinforced data cluster representation carrier, jumping to splitting the reinforced data cluster representation carrier corresponding to each data cluster representation carrier to obtain a first reinforced range representation carrier set, a second reinforced range representation carrier set and a third reinforced range representation carrier set, executing the operation of obtaining the first reinforced range representation carrier set, the second reinforced range representation carrier set and the third reinforced range representation carrier set, increasing the number of elements of the second reinforced range representation carrier set according to the preset number, decreasing the number of elements of the third reinforced range representation carrier set according to the preset number, obtaining the final data cluster representation carrier corresponding to each data cluster representation carrier when the predetermined iteration stop requirement is met, and obtaining the final data cluster representation carrier corresponding to each data cluster representation carrier corresponding to the data cluster representation carrier can enable the final data cluster representation carrier to contain similar information of adjacent points to be split, thereby obtaining the final data cluster representation carrier corresponding to the data cluster representation carrier, and obtaining the priority determination result, and determining the final file priority of the to be stored file priority.

Optionally, in step S150, determining the file priority by using the target data cluster characterization carriers corresponding to the data cluster characterization carriers to obtain a file priority determination result corresponding to the file to be stored, including: splitting target data cluster representation carriers corresponding to the data cluster representation carriers respectively to obtain a first target range representation carrier set, a second target range representation carrier set and a third target range representation carrier set, wherein the sum of the number of elements of the second target range representation carrier in the second target range representation carrier set and the number of elements of the third target range representation carrier in the third target range representation carrier set is the same as the number of elements of the second range representation carrier; constructing a first target carrier relation diagram corresponding to the first target range representation carrier set through a commonality score among all the first target range representation carriers in the first target range representation carrier set, and constructing a second target carrier relation diagram corresponding to the second target range representation carrier set through data distribution of all the data clusters; determining adjacent target range representation carriers corresponding to each third target range representation carrier in a third target range representation carrier set through data distribution of each data cluster, and constructing a third target carrier relationship diagram corresponding to the third target range representation carrier set through commonality scores among the adjacent target range representation carriers corresponding to each third target range representation carrier in the third target range representation carrier set; carrying out carrier integration through a first target range representation carrier and adjacent point representation carriers corresponding to the first target range representation carrier in a first target carrier relation diagram to obtain first integration target representation carriers corresponding to each first target range representation carrier in a first target range representation carrier set; carrying out carrier integration through a second target range representation carrier and adjacent point representation carriers corresponding to the second target range representation carriers in a second target carrier relation diagram to obtain second integration target representation carriers corresponding to the second target range representation carriers in a second target range representation carrier set; carrying out carrier integration through a third target range representation carrier and adjacent point representation carriers corresponding to the third target range representation carrier in a third target carrier relation diagram to obtain third integration target representation carriers corresponding to the third target range representation carriers in a third target range representation carrier set; combining the first integration target characterization carrier, the second integration target characterization carrier and the third integration target characterization carrier which are respectively corresponding to the same data cluster characterization carrier to obtain a current data cluster characterization carrier which is respectively corresponding to each data cluster characterization carrier; and determining the file priority by the current data cluster characterization carriers corresponding to the data cluster characterization carriers respectively to obtain a current file priority determination result corresponding to the file to be stored. Or, directly splitting to obtain a first target range representation carrier set, a second target range representation carrier set and a third target range representation carrier set, constructing a corresponding first target carrier relationship diagram, a second target carrier relationship diagram and a third target carrier relationship diagram, carrying out carrier integration based on the first target carrier relationship diagram, the second target carrier relationship diagram and the third target carrier relationship diagram to obtain a first integration target representation carrier, a second integration target representation carrier and a third integration target representation carrier, merging the first integration target representation carrier, the second integration target representation carrier and the third integration target representation carrier corresponding to the same data cluster representation carrier respectively to obtain a current data cluster representation carrier, and determining priority based on the current data cluster representation carrier to obtain a current file priority determination result.

The first target range representation carrier set, the second target range representation carrier set and the third target range representation carrier set are obtained by splitting the target data cluster representation carriers, the corresponding first target carrier relationship diagram, second target carrier relationship diagram and third target carrier relationship diagram are built again, carrier integration is carried out based on the first target carrier relationship diagram, the second target carrier relationship diagram and the third target carrier relationship diagram, the first integration target representation carrier, the second integration target representation carrier and the third integration target representation carrier are obtained, the first integration target representation carrier, the second integration target representation carrier and the third integration target representation carrier which are respectively corresponding to the same data cluster representation carrier are combined, priority determination is carried out, the reinforcement link is reduced, and the priority determination speed is improved.

Optionally, the method for optimizing data storage in the filename provided in the embodiment of the present application further includes: inputting the file to be stored into a file priority determining neural network, splitting the file to be stored through the file priority determining neural network to obtain each data cluster, and mining the characterization carriers of each data cluster to obtain the characterization carriers of each data cluster; the method comprises the steps that a neural network is determined through file priority, data cluster characterization carriers are respectively grouped, a first range characterization carrier corresponding to each data cluster characterization carrier is obtained to form a first range characterization carrier set, a second range characterization carrier corresponding to each data cluster characterization carrier is obtained to form a second range characterization carrier set; determining a neural network based on file priority, constructing a first carrier relation diagram corresponding to a first range representation carrier set by using a commonality score among all first range representation carriers in the first range representation carrier set, and constructing a second carrier relation diagram corresponding to a second range representation carrier set by data distribution of all data clusters; determining a neural network based on file priority, carrying out carrier integration by using a first range representation carrier in a first carrier relation diagram and adjacent point representation carriers corresponding to the first range representation carrier to obtain first integration representation carriers corresponding to each first range representation carrier in a first range representation carrier set, and carrying out carrier integration by using a second range representation carrier in a second carrier relation diagram and adjacent point representation carriers corresponding to the second range representation carrier to obtain second integration representation carriers corresponding to each second range representation carrier in the second range representation carrier set; and combining the first integration characterization carrier and the second integration characterization carrier which are respectively corresponding to the same data cluster characterization carrier through the file priority determination neural network to obtain target data cluster characterization carriers which are respectively corresponding to the data cluster characterization carriers, and determining the file priority through the target data cluster characterization carriers which are respectively corresponding to the data cluster characterization carriers to obtain a file priority determination result corresponding to the output to-be-stored file.

The file priority determining neural network is a deep neural network prepared in advance, such as GRU, RNN, LSTM. The file priority determining neural network which is initialized by the neural network can be debugged, the file priority determining neural network is obtained, and then the file priority determining neural network is applied.

According to the method, the target data cluster representation carriers corresponding to the files to be stored are obtained through the file priority determining neural network mining, the accuracy of the obtained target data cluster representation carriers is improved, the priority determination is carried out through the target data cluster representation carriers corresponding to the files to be stored, and the accuracy of the file priority determination is improved.

Then, as an embodiment, when the above steps are performed using the file priority determining neural network, the steps include:

step S210, obtaining a to-be-stored file, inputting the to-be-stored file into a file priority determining neural network, splitting the to-be-stored file through the file priority determining neural network to obtain each data cluster, and mining the data item representation carrier of each data cluster to obtain the data item representation carrier of each data cluster. Acquiring the data distribution of each data cluster, and carrying out quantization expression on the data distribution of each data cluster to obtain a distribution characterization carrier of each data cluster; and fusing the data item representation carriers of the data clusters with the corresponding data cluster distribution representation carriers to obtain the data cluster representation carriers.

Step S220, the data cluster characterization carriers are respectively grouped through a file priority determining neural network, a first range characterization carrier corresponding to each data cluster characterization carrier is obtained to form a first range characterization carrier set, a second range characterization carrier corresponding to each data cluster characterization carrier is obtained to form a second range characterization carrier set.

Step S230, determining a neural network through file priority, constructing a first carrier relation diagram corresponding to a first range representation carrier set through a commonality score among all first range representation carriers in the first range representation carrier set, and constructing a second carrier relation diagram corresponding to a second range representation carrier set through data distribution of all data clusters.

Step S240, determining that the neural network performs carrier integration through the first range representation carriers in the first carrier relation diagram and adjacent point representation carriers corresponding to the first range representation carriers through file priority, obtaining first integration representation carriers corresponding to the first range representation carriers in the first range representation carrier set, and performing carrier integration through the second range representation carriers in the second carrier relation diagram and adjacent point representation carriers corresponding to the second range representation carriers, obtaining second integration representation carriers corresponding to the second range representation carriers in the second range representation carrier set.

Step S250, determining a neural network through file priority to obtain first enhancement factors, and performing nonlinear transformation on first integration characterization carriers corresponding to each first range characterization carrier in a first range characterization carrier set through the first enhancement factors to obtain first enhancement characterization carriers corresponding to each first range characterization carrier in the first range characterization carrier set; nonlinear transformation is carried out on the second integration characterization carriers corresponding to the second range characterization carriers in the second range characterization carrier set through the first enhancement factors, so that the second enhancement characterization carriers corresponding to the second range characterization carriers in the second range characterization carrier set are obtained; and combining the first reinforced characterization carrier and the second reinforced characterization carrier which correspond to the same data cluster characterization carrier respectively to obtain reinforced data cluster characterization carriers which correspond to the data cluster characterization carriers respectively.

Step S260, determining a neural network through file priority to obtain the sum of data cluster distribution characterization carriers and reinforcement data cluster characterization carriers corresponding to each data cluster, obtaining data cluster characterization carriers to be split corresponding to each data cluster, splitting the data cluster characterization carriers to be split corresponding to each data cluster, and obtaining a first reinforcement range characterization carrier set, a second reinforcement range characterization carrier set and a third reinforcement range characterization carrier set. Constructing a first reinforced carrier relation diagram corresponding to the first reinforced range representation carrier set through the commonality scores among all the first reinforced range representation carriers in the first reinforced range representation carrier set, and constructing a second reinforced carrier relation diagram corresponding to the second reinforced range representation carrier set through the data distribution of all the data clusters.

Step S270, determining adjacent strengthening range characterization carriers corresponding to the third strengthening range characterization carriers in the third strengthening range characterization carrier set through data distribution of the data clusters by the file priority determining neural network, and constructing a third strengthening carrier relation diagram corresponding to the third strengthening range characterization carrier set through commonality scores among the adjacent strengthening range characterization carriers corresponding to the third strengthening range characterization carriers in the third strengthening range characterization carrier set.

Step S280, determining that the neural network performs carrier integration through the first reinforcement range characterization carriers and the adjacent point characterization carriers corresponding to the first reinforcement range characterization carriers in the first reinforcement carrier relationship diagram according to the file priority, so as to obtain first integrated reinforcement characterization carriers corresponding to each first reinforcement range characterization carrier in the first reinforcement range characterization carrier set. And carrying out carrier integration through the second reinforcement range characterization carriers and adjacent point characterization carriers corresponding to the second reinforcement range characterization carriers in the second reinforcement carrier relation diagram to obtain second integration reinforcement characterization carriers corresponding to the second reinforcement range characterization carriers in the second reinforcement range characterization carrier collection.

Step S290, determining that the neural network performs carrier integration through the adjacent point characterization carriers corresponding to the third enhancement range characterization carriers and the third enhancement range characterization carriers in the third enhancement carrier relation diagram according to the file priority, so as to obtain third integration enhancement characterization carriers corresponding to the third enhancement range characterization carriers in the third enhancement range characterization carrier set; combining the first integrated enhanced characterization carrier, the second integrated enhanced characterization carrier and the third integrated enhanced characterization carrier which are respectively corresponding to the same data cluster characterization carrier to obtain target enhanced data cluster characterization carriers respectively corresponding to the data cluster characterization carriers.

Step S300, determining a neural network through file priority, taking a target enhanced data cluster representation carrier as an enhanced data cluster representation carrier, jumping to obtain the sum of the data cluster distribution representation carrier and the enhanced data cluster representation carrier corresponding to each data cluster, obtaining a to-be-split data cluster representation carrier corresponding to each data cluster, splitting the to-be-split data cluster representation carrier corresponding to each data cluster, and obtaining a first enhanced range representation carrier set, a second enhanced range representation carrier set and a third enhanced range representation carrier set, wherein the number of elements of the third enhanced range representation carrier in the third enhanced range representation carrier set is reduced according to the preset number, and the number of elements of the second enhanced range representation carrier in the second enhanced range representation carrier set is increased according to the preset number; when the iteration stopping requirement determined in advance is met, the final data cluster representation carrier corresponding to each data cluster representation carrier is obtained, file priority determination is carried out through the final data cluster representation carrier corresponding to each data cluster representation carrier, priority determination is carried out through the file priority determination neural network above the final file priority determination result of the file to be stored, the accuracy of file priority determination can be improved, and meanwhile, the network configuration variable of the file priority determination neural network is reduced, so that the execution efficiency of the network is improved.

In one embodiment, a hardware architecture of a data storage system, which may be a server, is provided, and an internal structure diagram thereof may be as shown in fig. 6. The data storage system includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the data storage system is configured to provide computing and control capabilities. The memory of the data storage system includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the data storage system is for storing data. The input/output interface of the data storage system is used to exchange information between the processor and the external device. The communication interface of the data storage system is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method of optimizing data storage in a Filecoin. It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present application and is not limiting of the data storage system to which the present application may be applied, and that a particular data storage system may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is also provided a data storage system including a memory and a processor, the memory having stored therein a computer program which when executed by the processor performs the steps of the method embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto. The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description. The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method of optimizing data storage in a filename, the method comprising:

obtaining a to-be-stored file, splitting the to-be-stored file to obtain each data cluster, and mining the characterization carrier of each data cluster to obtain the characterization carrier of each data cluster;

grouping the data cluster characterization carriers respectively to obtain a first range characterization carrier corresponding to each data cluster characterization carrier to form a first range characterization carrier set, and obtaining a second range characterization carrier corresponding to each data cluster characterization carrier to form a second range characterization carrier set;

constructing a first carrier relation diagram corresponding to the first range representation carrier set through a commonality score among all first range representation carriers in the first range representation carrier set, and constructing a second carrier relation diagram corresponding to the second range representation carrier set through data distribution of all data clusters;

carrying out carrier integration through a first range representation carrier and adjacent point representation carriers corresponding to the first range representation carrier in the first carrier relation diagram to obtain first integration representation carriers corresponding to each first range representation carrier in the first range representation carrier set, and carrying out carrier integration through a second range representation carrier and adjacent point representation carriers corresponding to the second range representation carrier in the second carrier relation diagram to obtain second integration representation carriers corresponding to each second range representation carrier in the second range representation carrier set;

Combining a first integration characterization carrier and a second integration characterization carrier which are respectively corresponding to the same data cluster characterization carrier to obtain target data cluster characterization carriers which are respectively corresponding to the data cluster characterization carriers, and determining file priority through the target data cluster characterization carriers which are respectively corresponding to the data cluster characterization carriers to obtain a file priority determination result which is corresponding to the file to be stored;

and storing the file to be stored based on the file priority determining result.

2. The method of claim 1, wherein mining the characterization vector for each data cluster to obtain each data cluster characterization vector comprises:

mining the data item representation carrier of each data cluster to obtain the data item representation carrier of each data cluster;

acquiring the data distribution of each data cluster, and carrying out quantization expression on the data distribution of each data cluster to obtain a data cluster distribution characterization carrier;

fusing the data item representation carriers of the data clusters with the corresponding data cluster distribution representation carriers to obtain the data cluster representation carriers;

constructing a first carrier relationship diagram corresponding to the first range characterization carrier set through the commonality scores among the first range characterization carriers in the first range characterization carrier set, including:

Obtaining the characterization carrier commonality scores among the first range characterization carriers, and determining the commonality connection result among the first range characterization carriers through the characterization carrier commonality scores;

each first range representation carrier is determined to be a composition point, and the first range representation carriers are connected according to the common contact result to obtain the first carrier relation diagram;

constructing a second carrier relationship diagram corresponding to the second range representation carrier set through the data distribution of each data cluster, including:

determining carrier distribution of second range representation carriers corresponding to the data cluster representation carriers respectively through the data distribution of the data clusters, and determining distribution relations among the second range representation carriers in the second range representation carrier set through the carrier distribution;

and determining each second range representation carrier as a composition point, and connecting each second range representation carrier according to the distribution relation to obtain the second carrier relation diagram.

3. The method according to claim 1, wherein the performing carrier integration by the first range characterizing carrier and the neighboring point characterizing carrier corresponding to the first range characterizing carrier in the first carrier relational graph to obtain a first integrated characterizing carrier corresponding to each first range characterizing carrier in the first range characterizing carrier set, includes:

Acquiring a carrier mean value of adjacent point representation carriers corresponding to the first range representation carrier to obtain a first carrier mean value, and acquiring a difference making carrier between the first range representation carrier and the adjacent point representation carrier corresponding to the first range representation carrier to obtain a first difference making carrier;

combining the first range characterization carrier, the first difference vector and the first carrier mean value to obtain a first combined characterization carrier, and carrying out affine calculation through the first combined characterization carrier to obtain a first integrated characterization carrier corresponding to the first range characterization carrier;

the first range characterization vectors in the first vector relation diagram are browsed to obtain first integration characterization vectors corresponding to the first range characterization vectors in the first range characterization vector set;

the carrier integration is performed through a second range representation carrier and adjacent point representation carriers corresponding to the second range representation carrier in the second carrier relation diagram, so as to obtain second integration representation carriers corresponding to each second range representation carrier in the second range representation carrier set, including:

acquiring a carrier mean value of adjacent point representation carriers corresponding to the second range representation carriers to obtain a second carrier mean value, and acquiring a difference making carrier between the second range representation carriers and the adjacent point representation carriers corresponding to the second range representation carriers to obtain a second difference making carrier;

Combining the second range representation carrier, the second difference carrier and the second carrier mean value to obtain a second combined representation carrier, and carrying out affine calculation through the second combined representation carrier to obtain a second integrated representation carrier corresponding to the second range representation carrier;

and browsing each second range representation carrier in the second carrier relation diagram to obtain a second integration representation carrier corresponding to each second range representation carrier in the second range representation carrier set.

4. The method according to claim 1, wherein the merging, by the first and second integration token carriers corresponding to the same data cluster token carrier, to obtain the target data cluster token carrier corresponding to each data cluster token carrier, and determining the file priority by the target data cluster token carrier corresponding to each data cluster token carrier, to obtain the file priority determination result corresponding to the file to be stored, includes:

acquiring a first strengthening factor, and performing nonlinear transformation on first integration characterization carriers corresponding to each first range characterization carrier in the first range characterization carrier set through the first strengthening factor to obtain first strengthening characterization carriers corresponding to each first range characterization carrier in the first range characterization carrier set;

Nonlinear transformation is carried out on the second integration characterization carriers corresponding to the second range characterization carriers in the second range characterization carrier set through the first enhancement factors, so that the second enhancement characterization carriers corresponding to the second range characterization carriers in the second range characterization carrier set are obtained;

combining the first enhanced representation carrier and the second enhanced representation carrier which are respectively corresponding to the same data cluster representation carrier to obtain enhanced data cluster representation carriers which are respectively corresponding to the data cluster representation carriers, and determining file priority through the enhanced data cluster representation carriers which are respectively corresponding to the data cluster representation carriers to obtain a target file priority determination result which is corresponding to the to-be-stored file.

5. The method of claim 4, wherein the non-linearly transforming, by the first reinforcement factor, the first integrated characterization vector of each first range characterization vector of the set of first range characterization vectors to obtain a first reinforcement characterization vector of each first range characterization vector of the set of first range characterization vectors, comprising:

fitting optimization transformation is carried out on first integration characterization carriers corresponding to each first range characterization carrier in the first range characterization carrier set through the first enhancement factors to obtain first optimization characterization carriers corresponding to each first range characterization carrier in the first range characterization carrier set, and normal distribution difference values corresponding to the first optimization characterization carriers are obtained to obtain first normal distribution difference values;

Weighting the first integration characterization carriers corresponding to the first range characterization carriers in the first range characterization carrier set to obtain first weighted characterization carriers corresponding to the first range characterization carriers in the first range characterization carrier set;

obtaining a multiplication result of the first weighted representation carrier and the first normal distribution difference value to obtain first enhanced representation carriers corresponding to each first range representation carrier in the first range representation carrier set;

the nonlinear transformation is performed on the second integration characterization carriers corresponding to the second range characterization carriers in the second range characterization carrier set through the first enhancement factors to obtain second enhancement characterization carriers corresponding to the second range characterization carriers in the second range characterization carrier set, including:

fitting optimization transformation is carried out on second integration characterization carriers corresponding to each second range characterization carrier in the second range characterization carrier set through the first enhancement factors to obtain second optimization characterization carriers corresponding to each second range characterization carrier in the second range characterization carrier set, and normal distribution difference values corresponding to the second optimization characterization carriers are obtained to obtain second normal distribution difference values;

Weighting the second integration characterization carriers corresponding to the second range characterization carriers in the second range characterization carrier set to obtain second weighted characterization carriers corresponding to the second range characterization carriers in the second range characterization carrier set;

and obtaining a multiplication result of the second weighted representation carrier and the second normal distribution difference value to obtain second enhanced representation carriers corresponding to each second range representation carrier in the second range representation carrier set.

6. The method of claim 4, wherein the determining the file priority by the enhanced data cluster characterization carrier corresponding to each data cluster characterization carrier to obtain the target file priority determination result corresponding to the file to be stored includes:

splitting the reinforced data cluster representation carriers corresponding to the data cluster representation carriers respectively to obtain a first reinforced range representation carrier set, a second reinforced range representation carrier set and a third reinforced range representation carrier set, wherein the sum of the number of elements of the second reinforced range representation carrier in the second reinforced range representation carrier set and the number of elements of the third reinforced range representation carrier in the third reinforced range representation carrier set is the same as the number of elements of the second range representation carrier;

Constructing a first reinforced carrier relation diagram corresponding to the first reinforced range representation carrier set through a commonality score among all the first reinforced range representation carriers in the first reinforced range representation carrier set, and constructing a second reinforced carrier relation diagram corresponding to the second reinforced range representation carrier set through data distribution of all the data clusters;

determining adjacent strengthening range characterization carriers corresponding to the third strengthening range characterization carriers in the third strengthening range characterization carrier set through data distribution of the data clusters, and constructing a third strengthening carrier relation diagram corresponding to the third strengthening range characterization carrier set through commonness scores among the adjacent strengthening range characterization carriers corresponding to the third strengthening range characterization carriers in the third strengthening range characterization carrier set;

carrying out carrier integration through a first reinforcement range representation carrier and adjacent point representation carriers corresponding to the first reinforcement range representation carrier in the first reinforcement carrier relation diagram to obtain first integration reinforcement representation carriers corresponding to each first reinforcement range representation carrier in the first reinforcement range representation carrier set;

carrying out carrier integration through a second reinforcement range representation carrier and adjacent point representation carriers corresponding to the second reinforcement range representation carriers in the second reinforcement range representation carrier relation diagram to obtain second integration reinforcement representation carriers corresponding to the second reinforcement range representation carriers in the second reinforcement range representation carrier collection;

Carrying out carrier integration through adjacent point characterization carriers corresponding to a third enhancement range characterization carrier and a third enhancement range characterization carrier in the third enhancement carrier relation diagram to obtain a third integration enhancement characterization carrier corresponding to each third enhancement range characterization carrier in the third enhancement range characterization carrier set;

combining a first integrated enhanced characterization carrier, a second integrated enhanced characterization carrier and a third integrated enhanced characterization carrier which are respectively corresponding to the same data cluster characterization carrier to obtain target enhanced data cluster characterization carriers respectively corresponding to the data cluster characterization carriers;

and determining file priority through the target enhanced data cluster characterization carriers corresponding to the data cluster characterization carriers respectively to obtain an enhanced file priority determination result corresponding to the file to be stored.

7. The method of claim 6, wherein constructing a third enhanced carrier relationship graph corresponding to the third enhanced range representation carrier set from the commonality scores between adjacent enhanced range representation carriers corresponding to each third enhanced range representation carrier in the third enhanced range representation carrier set comprises:

Determining a current characterization vector and a target characterization vector from the third enhancement range characterization vectors;

determining each current adjacent characterization carrier corresponding to the current characterization carrier from the third enhancement range characterization carriers through the data distribution of each data cluster, and fusing each current adjacent characterization carrier to obtain a current fusion adjacent characterization carrier;

determining each target adjacent representation carrier corresponding to the target representation carrier from the third enhancement range representation carriers through the data distribution of each data cluster, and fusing each target adjacent representation carrier to obtain a target fusion adjacent representation carrier;

obtaining a commonality score between the current fusion adjacent characterization vector and the target fusion adjacent characterization vector to obtain a commonality score between the current characterization vector and the target characterization vector;

the third strengthening range representation carriers are browsed to obtain a commonality score between adjacent strengthening range representation carriers corresponding to the third strengthening range representation carriers, and the commonality score between adjacent strengthening range representation carriers corresponding to the third strengthening range representation carriers is determined as a target commonality score between the third strengthening range representation carriers;

And determining target contact results among the third strengthening range representation carriers through the target commonality scores, determining the third strengthening range representation carriers as composition points, and connecting the third strengthening range representation carriers according to the target contact results to obtain the third strengthening carrier relationship diagram.

8. The method of claim 6, wherein the merging the first, second, and third integrated enhancement characterizing carriers, respectively, corresponding to the same data cluster characterizing carrier to obtain the target enhancement data cluster characterizing carrier, respectively, corresponding to the data cluster characterizing carriers, comprises:

obtaining a second strengthening factor, and performing nonlinear transformation on first integration strengthening characterization carriers corresponding to each first strengthening range characterization carrier in the first strengthening range characterization carrier set through the second strengthening factor to obtain first nonlinear characterization carriers corresponding to each first strengthening range characterization carrier in the first strengthening range characterization carrier set;

nonlinear transformation is carried out on second integration strengthening characterization carriers corresponding to each second strengthening range characterization carrier in the second strengthening range characterization carrier set through the second strengthening factors, so that second nonlinear characterization carriers corresponding to each second strengthening range characterization carrier in the second strengthening range characterization carrier set are obtained;

Nonlinear transformation is carried out on the third integration strengthening characterization carriers corresponding to the third strengthening range characterization carriers in the third strengthening range characterization carrier set through the second strengthening factors, so that third nonlinear characterization carriers corresponding to the third strengthening range characterization carriers in the third strengthening range characterization carrier set are obtained;

combining a first nonlinear characterization carrier, a second nonlinear characterization carrier and a third nonlinear characterization carrier which are respectively corresponding to the same data cluster characterization carrier to obtain target enhanced data cluster characterization carriers respectively corresponding to the data cluster characterization carriers;

the nonlinear transformation is performed on the third integration enhancement characteristic carriers corresponding to the third enhancement range characteristic carriers in the third enhancement range characteristic carrier set through the second enhancement factors, so as to obtain third nonlinear characteristic carriers corresponding to the third enhancement range characteristic carriers in the third enhancement range characteristic carrier set, including:

fitting optimization transformation is carried out on the third integrated enhancement representation carriers corresponding to the third enhancement range representation carriers in the third enhancement range representation carrier set through the second enhancement factors to obtain third optimization representation carriers corresponding to the third enhancement range representation carriers in the third enhancement range representation carrier set, and normal distribution difference values corresponding to the third optimization representation carriers are obtained to obtain third normal distribution difference values;

Weighting the second integration strengthening characterization carriers corresponding to the third strengthening range characterization carriers in the third strengthening range characterization carrier set to obtain third weighted characterization carriers corresponding to the third strengthening range characterization carriers in the third strengthening range characterization carrier set;

and obtaining a multiplication result of the third weighted representation carrier and the third normal distribution difference value to obtain third nonlinear representation carriers corresponding to the third enhancement range representation carriers in the third enhancement range representation carrier set.

9. The method according to any one of claims 1 to 8, wherein the data storage system includes a first storage unit, a second storage unit, an SSD cache component, and a data read-write allocation unit, where the first storage unit is configured to store a sector file that has completed sealing, and the second storage unit is configured to store a sector file that has completed sealing and is being sealed; the SSD cache assembly comprises a cache pool, wherein the cache pool comprises a cache main control node, a cache service node and a flow limiting module, the cache main control node is used for managing cache, the cache service node is used for providing SSD cache of a cluster, and the flow limiting module is used for limiting the SSD layer back-flushing data; the data read-write distribution unit comprises a read-write thread, a read-write connection and a read-write queue which all finish read-write splitting;

When a file to be stored is acquired, the data read-write distribution unit is used for splitting and distributing the file to be stored to a corresponding read-write thread, read-write queue and read-write connection, and storing the file to be stored in the first storage unit and/or the second storage unit.

10. A data storage system, comprising:

one or more processors;

and one or more memories, wherein the memories have stored therein computer readable code, which, when executed by the one or more processors, causes the one or more processors to perform the method of any of claims 1-9.