CN110750497B - Data scheduling system - Google Patents

Data scheduling system Download PDF

Info

Publication number
CN110750497B
CN110750497B CN201911037777.4A CN201911037777A CN110750497B CN 110750497 B CN110750497 B CN 110750497B CN 201911037777 A CN201911037777 A CN 201911037777A CN 110750497 B CN110750497 B CN 110750497B
Authority
CN
China
Prior art keywords
name
user
container
data
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911037777.4A
Other languages
Chinese (zh)
Other versions
CN110750497A (en
Inventor
卓维晨
杨伟龙
刘梦莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yihai Luyuan Shandong Digital Technology Co ltd
Original Assignee
Yihai Luyuan Shandong Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yihai Luyuan Shandong Digital Technology Co ltd filed Critical Yihai Luyuan Shandong Digital Technology Co ltd
Priority to CN201911037777.4A priority Critical patent/CN110750497B/en
Publication of CN110750497A publication Critical patent/CN110750497A/en
Application granted granted Critical
Publication of CN110750497B publication Critical patent/CN110750497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/119Details of migration of file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/143Termination or inactivation of sessions, e.g. event-controlled end of session
    • H04L67/145Termination or inactivation of sessions, e.g. event-controlled end of session avoiding end of session, e.g. keep-alive, heartbeats, resumption message or wake-up for inactive or interrupted session

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data scheduling system, which discovers the operation condition of a user on an object in a storage server and the operated object by analyzing a network message through network monitoring, records the discovered operation related to the change of the object into a database, and stores the state of the corresponding operated object in the database. The user can configure the management strategy for the data, and the task scheduling module performs task scheduling according to the management strategy configured by the user. And for the task taking the blue light storage as a target address, the task scheduling module scans the database, and takes the object information which accords with the configuration strategy and is positioned in the storage server and the available blue light storage position selected by the blue light storage node management module as a target storage position. The invention can be widely applied to a storage cluster supporting a standard object storage protocol, realizes real-time object change discovery, automatically performs data scheduling, does not bring extra pressure to storage, and does not influence the normal operation and bandwidth of the service.

Description

Data scheduling system
Technical Field
The invention relates to a system for automatically archiving and retrieving object data, in particular to a data scheduling system.
Background
With the development of society, human society generates a large amount of data every day, and many data need to be permanently stored for a long time, but the current mode of storing data by a magnetic storage cluster has the problems of high cost and high power consumption. To solve this problem, it is necessary to store data which is not frequently used with low-cost and low-power consumption devices.
With current technology, blu-ray disc storage is one option for low cost long-term storage. The data is initially generated and stored on the magnetic storage, which is a rule used by the industry, and there are solutions in the industry for how the data stored on the magnetic storage can be stored on the optical disc, some solutions in which the functions are lacking (only to support manual writing of files to or retrieval from the optical disc), some in which the coupling degree is too high to realize (only with manufacturer's own storage software or depending on the storage of a specific manufacturer) or the pressure to store is too great.
Disclosure of Invention
In order to solve the above problems, the present invention proposes a data scheduling system.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the data scheduling system comprises a control center, an Agent module, a storage server and a blue light storage, wherein the control center comprises a data management strategy module, a blue light storage node management module, a task scheduling module, a monitoring analysis module and a database;
the data management policy module is used for completing policy configuration of a user on the container, and comprises a starting policy, a configuration policy, a deleting policy, a modifying policy and a disabling policy;
the blue light storage node management module is used for providing blue light storage target storage position information for the task scheduling module, and the blue light storage target storage position information comprises URL and authentication information;
the monitoring analysis module is used for monitoring the network flow of the storage server, analyzing the operation information of a user on the data in the storage server, acquiring the data change condition information from the operation information and recording the data change condition information in the database;
the task scheduling module is used for periodically scanning the database, judging whether a container in the storage server meets a data management strategy configured by a user according to the data change condition information acquired by the monitoring analysis module, and if so, generating a data scheduling task and transmitting the data scheduling task to the message queue;
the Agent module is used for executing data scheduling, and finishing the backup of data from the storage server to the blue light storage or the retrieval of the data from the blue light storage to the storage server according to the tasks in the message queue;
the database: the data management strategy is used for storing the monitored data operation information of the data in the storage server and configured by the user.
Preferably, the monitoring analysis module monitors the IP address and the network port of the storage server in a bypass monitoring manner, when a user operates the data in the storage server or writes the data into the storage server through the client, the user grabs and filters the data packet through the configured filter to obtain an http/https message for operating the data in the storage server or writing the data into the storage server, then performs message analysis, needs to analyze a request and a response part of the http/https message, obtains a request method, corresponding user and container information and a processing result from the request and response part, and stores/updates the information obtained after the http/https message is analyzed into the database.
Preferably, the http/https message includes a request line and a request header, and the request line includes a request method field: the PUT/GET/HEAD/POST/DELETE/COPY, and the request header comprises a Content-Length field and a Destination field;
after analyzing the PUT method, explaining that a user uploads an object or a container to a storage server at a client, and continuously analyzing the URL field after the object or the container is needed, wherein the URL field sequentially comprises a user name, a container name and an object name, sequentially analyzes the URL field according to the sequence, and if only the user name is analyzed, the container name is not analyzed later, the operation of creating an account is explained, and continuously analyzing is abandoned; if the container name and the object name are resolved, or only the container name is resolved, continuing to resolve the Content-Length field in the request header so as to acquire the object size; then grabbing a response message aiming at the request, resolving a status code from a response row, if the status code is 200-299, judging that the user uploading is successful, resolving a response head, acquiring time from a Last-Modified field in the response head as the Last object modification time, and finally storing/updating the resolved information of the user name, the container name, the object size and the Last object modification time into a database;
after resolving to the GET method, the user is required to read data from the storage server at the client, the GET method does not modify the container in the storage server at all, and the http/https message resolving is finished;
after analyzing the HEAD method, the user is explained to check metadata information corresponding to the user, the container or the object in the storage server at the client, the HEAD method does not modify the container in the storage server at all, and the http/https message analysis is ended;
after resolving to the POST method, explaining that the customer needs to create, update or modify metadata information corresponding to the user, container or object, and needs to continuously resolve the subsequent URL fields, wherein the URL fields sequentially comprise a user name, a container name and an object name, sequentially resolve according to the sequence, if only the user name, the container name, the user name and the container name are resolved, or the user name, the container name and the object name are resolved, a response message aiming at the request needs to be grabbed, a state code is resolved from a response row, if the state code is 200-299, the user creation, update or modification is judged to be successful, then a response head is resolved, the current modification time is acquired from a Date field in the response head, and finally the metadata information corresponding to the resolved user, container or object and the current modification time information are saved/updated in a database;
after resolving to the DELETE method, the client needs to DELETE the object or the container from the storage server, needs to continuously resolve the subsequent URL field, the URL field sequentially comprises a user name, a container name and an object name, sequentially resolves according to the sequence, if only the container name or the object name is resolved, or the container name and the object name are resolved, the response message aiming at the request needs to be grabbed, the state code is resolved from the response row, if the state code is 200-299, the success of deleting the object or the container by the user is judged, and finally, the information corresponding to the container name and the object name of the user is deleted from the database;
after analyzing the COPY method, explaining that a client needs to COPY an object, continuing to analyze the subsequent URL field, sequentially analyzing the URL field including a user name, a container name and an object name to the object name according to the sequence, analyzing a Destination field in a request header so as to record a target position and a name, capturing a response message aiming at the request, analyzing a status code from a response row, judging that the COPY is successful if the status code is 200-299, analyzing the response header, acquiring the current modification time from a Date field in the response header, and finally storing/updating the analyzed target position information with the URL, the source object name and the current COPY time information into a database.
Furthermore, the configuration strategy of the data management strategy module is configured by setting a time threshold of an archiving strategy for a container by a user, the strategy can be started in a unit of day for the configured strategy, after the strategy is validated, the task scheduling module can acquire the modification time of the object from the database, meanwhile, the time of the current system can also be acquired, a time difference is obtained by the two times, finally, the time difference is compared with the configured time threshold, and the object with the time difference exceeding the set time threshold needs to be archived on a blue light storage; the modification strategy is to modify the time threshold value of the configuration strategy; deleting a policy refers to deleting the configuration policy; disabling policies refers to ceasing to use the configured configuration policies, rendering them inactive.
Further, the task scheduling module generates a data scheduling task, wherein the data scheduling task comprises a URL (uniform resource locator), a project name, a user name and a password of a container to be archived, which are positioned in a storage server, and comprises a URL, a project name, a user name and a password of a target storage position of blue light storage, and the URL, the project name, the user name and the password of the container to be archived, which are positioned in the storage server, are acquired through a scanning database; the URL, the project name, the user name and the password at the target location of the blue-ray storage are set by the user when creating a new container in the blue-ray storage through the client, or are acquired when associating an existing container, and are stored in the blue-ray storage node management module.
Further, the Agent module is deployed on the storage server, and sends a heartbeat message to the task scheduling module of the control center in a unit of seconds, the task scheduling module considers that the Agent module is normal after receiving the heartbeat message sent by the Agent module, and if the heartbeat message is not received for more than 3 times, the Agent module is considered to be faulty, and the task scheduling module can not send a task to the message queue.
Further, the method for retrieving the Agent module from the blue light storage to the storage server comprises the following steps: the user initiates a read request from the blue light storage at the client, the information of the read request comprises a URL of an object to be read, a project name for accessing the blue light storage, a user name and a password, a task scheduling module sends a read request message to a message queue, and an Agent module initiates the read request to the blue light storage after acquiring a task of the read request from the message queue, and the read object data is written into a storage server.
Further, the database type is a MongoDB database.
The invention has the advantages of convenient use and low coupling degree, can be widely applied to a storage cluster supporting a standard object storage protocol, realizes real-time object change discovery, automatically performs data scheduling, does not bring extra pressure to storage, does not influence the normal operation of the service, and does not influence the bandwidth of the service.
Drawings
The accompanying drawings are included to provide a further understanding of the invention.
In the drawings:
fig. 1 is a system diagram of a data scheduling system according to the present invention.
FIG. 2 is a block diagram of the workflow of the method for resolving a snoop resolution module to a PUT according to the present invention.
Fig. 3 is a block diagram of a workflow of the method of resolving a listening resolution module to GET or HEAD according to the present invention.
Fig. 4 is a block diagram of a workflow of a method for resolving a listening and resolving module to POST according to the present invention.
FIG. 5 is a block diagram of the workflow of the method of the invention for resolving a snoop resolution module to a DELETE.
Fig. 6 is a block diagram of the workflow of the method of the invention for resolving a snoop resolution module to a COPY.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the data scheduling system comprises a control center, an Agent module, a storage server and a blue light storage, wherein the control center comprises a data management strategy module, a blue light storage node management module, a task scheduling module, a monitoring analysis module and a database;
the data management policy module is used for completing policy configuration of a user on the container, and comprises a starting policy, a configuration policy, a deleting policy, a modifying policy and a disabling policy;
the blue light storage node management module is used for providing blue light storage target storage position information for the task scheduling module, and the blue light storage target storage position information comprises URL and authentication information;
the monitoring analysis module is used for monitoring the network flow of the storage server, analyzing the operation information of a user on the data in the storage server, acquiring the data change condition information from the operation information and recording the data change condition information in the database;
the task scheduling module is used for periodically scanning the database, judging whether a container in the storage server meets a data management strategy configured by a user according to the data change condition information acquired by the monitoring analysis module, and if so, generating a data scheduling task and transmitting the data scheduling task to the message queue;
the Agent module is used for executing data scheduling, and finishing the backup of data from the storage server to the blue light storage or the retrieval of the data from the blue light storage to the storage server according to the tasks in the message queue;
the database: the database type is MongoDB database, which is used for storing the monitored data operation information of the data in the storage server and the data management strategy configured by the user.
The monitoring analysis module monitors an IP address and a network port of the storage server in a bypass monitoring mode, when a user operates data in the storage server or writes data into the storage server through a client or other software, the user grabs and filters the data packet through a configured filter to obtain an http/https message for operating the data in the storage server or writing the data into the storage server, then performs message analysis, requests and response parts of the http/https message are required to be analyzed, a request method, corresponding user and container information and a processing result are obtained, and information obtained after the http/https message is analyzed is stored/updated in a database. The message parsing is performed by using a swift template.
The http/https message comprises a request line and a request header, wherein the request line comprises a request method field: the PUT/GET/HEAD/POST/DELETE/COPY, and the request header comprises a Content-Length field and a Destination field;
after analyzing to the PUT method, the user is explained to upload an object or a container to a storage server at a client, the URL field after the object or the container needs to be continuously analyzed is sequentially analyzed according to the sequence, the URL field comprises a user name (account), a container name (container) and an object name (object), if only the user name is analyzed, the container name is not analyzed later, the operation of creating an account is explained, and the continuous analysis is abandoned; if the container name and the object name are resolved, or only the container name is resolved, continuing to resolve the Content-Length field in the request header so as to acquire the object size; then grabbing a response message aiming at the request, resolving a status code from a response row, if the status code is 200-299, judging that the user uploading is successful, resolving a response head, acquiring time from a Last-Modified field in the response head as the Last object modification time, and finally storing/updating the resolved information of the user name, the container name, the object size and the Last object modification time into a database;
after resolving to the GET method, the user is required to read data from the storage server at the client, the GET method does not modify the container in the storage server at all, and the http/https message resolving is finished;
after analyzing the HEAD method, the user is explained to check metadata information corresponding to the user, the container or the object in the storage server at the client, the HEAD method does not modify the container in the storage server at all, and the http/https message analysis is ended;
after resolving to the POST method, explaining that the customer needs to create, update or modify metadata information corresponding to the user, container or object, and needs to continuously resolve the subsequent URL fields, wherein the URL fields sequentially comprise a user name, a container name and an object name, sequentially resolve according to the sequence, if only the user name, the container name, the user name and the container name are resolved, or the user name, the container name and the object name are resolved, a response message aiming at the request needs to be grabbed, a state code is resolved from a response row, if the state code is 200-299, the user creation, update or modification is judged to be successful, then a response head is resolved, the current modification time is acquired from a Date field in the response head, and finally the metadata information corresponding to the resolved user, container or object and the current modification time information are saved/updated in a database;
after resolving to the DELETE method, the client needs to DELETE the object or the container from the storage server, needs to continuously resolve the subsequent URL field, the URL field sequentially comprises a user name, a container name and an object name, sequentially resolves according to the sequence, if only the container name or the object name is resolved, or the container name and the object name are resolved, the response message aiming at the request needs to be grabbed, the state code is resolved from the response row, if the state code is 200-299, the success of deleting the object or the container by the user is judged, and finally, the information corresponding to the container name and the object name of the user is deleted from the database;
after analyzing the COPY method, explaining that a client needs to COPY an object, continuing to analyze the subsequent URL field, sequentially analyzing the URL field including a user name, a container name and an object name to the object name according to the sequence, analyzing a Destination field in a request header so as to record a target position and a name, capturing a response message aiming at the request, analyzing a status code from a response row, judging that the COPY is successful if the status code is 200-299, analyzing the response header, acquiring the current modification time from a Date field in the response header, and finally storing/updating the analyzed target position information with the URL, the source object name and the current COPY time information into a database.
The configuration strategy of the data management strategy module is configured by setting a time threshold of an archiving strategy for a container by a user, a plurality of strategies can be simultaneously configured in a unit of day, the strategies can be started after the configured strategies are effective, the task scheduling module can acquire the modification time of the object from the database, meanwhile, the time of the current system can also be acquired, a time difference is obtained by the two times, finally, the time difference is compared with the configured time threshold, and the object with the time difference exceeding the set time threshold needs to be archived on a blue light storage; the modification strategy is to modify the time threshold value of the configuration strategy; deleting a policy refers to deleting the configuration policy; disabling policies refers to ceasing to use the configured configuration policies, rendering them inactive. The configured filing strategy is configured according to the use situation of the user, and in the medical industry, the electronic medical record, the inspection report, the examination report and the like need to be filed three months after the patient visits; in public security departments, fingerprint information, image identification information and video information which are collected are placed in different containers, the fingerprint information is required to be archived in one week after collection, the image identification information is required to be archived in one month after collection, the video information is required to be archived in three months after collection, and the like.
The task scheduling module generates a data scheduling task, wherein the data scheduling task comprises a URL (uniform resource locator), a project name, a user name and a password of a container to be archived, which are positioned in a storage server, and also comprises the URL, the project name, the user name and the password of a target storage position of blue light storage, and the URL, the project name, the user name and the password of the container to be archived, which are positioned in the storage server, are acquired through a scanning database; the URL, the project name, the user name and the password at the target location of the blue-ray storage are set by the user when creating a new container in the blue-ray storage through the client, or are acquired when associating an existing container, and are stored in the blue-ray storage node management module.
The Agent module is deployed on the storage server or the control center, and sends a heartbeat message to the task scheduling module of the control center in a second unit, the task scheduling module considers that the Agent module is normal after receiving the heartbeat message sent by the Agent module, and if the heartbeat message is not received for more than 3 times, the Agent module is considered to be faulty, and the task scheduling module can not send a task to the message queue. The Agent module does not actively acquire the running state of the task scheduling module in the control center, the task scheduling module stores the task in the message queue after completing the task issuing, the Agent module takes the task out of the message queue and executes the task, and at the moment, even if the control center breaks down, the execution of the task is not influenced, because the execution of the task by the Agent module does not depend on the control center, and the data transmission does not need to pass through the control center, the single-point fault of the control center can be avoided, the actual read-write speed of the system for data scheduling is the sum of the read-write speeds of the nodes where each Agent module is located, but not the read-write speed of a single server, namely, the read-write speed of the single server is 2 times of the read-write speed of the single server, and the transmission efficiency of the data scheduling is greatly improved.
The method for retrieving the Agent module from the blue light storage to the storage server comprises the following steps: the user initiates a read request from the blue light storage at the client, the information of the read request comprises a URL of an object to be read, a project name for accessing the blue light storage, a user name and a password, a task scheduling module sends a read request message to a message queue, and an Agent module initiates the read request to the blue light storage after acquiring a task of the read request from the message queue, and the read object data is written into a storage server.
The data scheduling system monitors through a network and analyzes the network message to find the operation condition of the user on the object in the storage server and the operated object, the found operation related to the change of the object is recorded in a database, and the state of the corresponding operated object is stored in the database. The user can configure the management strategy for the data through the client, and the task scheduling module performs task scheduling according to the management strategy configured by the user. And for the task taking the blue light storage as a target address, the task scheduling module scans the database, and takes the object information which accords with the configuration strategy and is positioned in the storage server and the available blue light storage position selected by the blue light storage node management module as a target storage position.
The task scheduling module of the control center scans a database of the control center according to an archiving strategy configured by a user, and executes archiving operation on an object conforming to the archiving strategy:
1. the task scheduling module initiates an archiving request to the Agent module through A Message Queue (AMQP), wherein the request contains an object URL (uniform resource locator), a project name (project name), a user name (user name), a password (password) or a key pair (access key, secret key) which are acquired from the database and also comprises the URL (uniform resource locator), the project name (project name), the user name (user name), the password (password) or the key pair (access key, secret key) of the archiving target position which is acquired from the blue-ray storage node management module.
And 2, the agent module acquires an archiving request from the message queue, searches an object to be archived from the storage server according to the request content sent by the task scheduling module, and reads the object.
And 3, connecting the agent module with the blue light storage according to the target position information of the blue light storage designated by the blue light storage node management module of the control center, and writing the read object data into the blue light storage through a swift interface of the blue light storage.
The user can configure the strategy according to the own requirement, and can manually schedule the object, so that the purposes of reasonably utilizing the magnetic storage space, permanently storing the object data and meeting the access requirement of the user are realized.
The customer reads data from the blue light storage through the control center:
1. the client initiates a read request from the blue light storage through a task scheduling module of the control center, and the task scheduling module sends the read request message to a message queue.
The agent module obtains relevant information (including URL, project name, user name, password or key pair of the object of the blue-ray storage to be read from the blue-ray storage node management module) of the read request from the message queue.
And 3, the agent module initiates a read request to the blue light storage according to the information of the acquired message queue, and then writes the read object data into the storage server.
Although the present invention has been described with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described, or equivalents may be substituted for elements thereof, and any modifications, equivalents, improvements and changes may be made without departing from the spirit and principles of the present invention.

Claims (5)

1. A data scheduling system, characterized by: the system comprises a control center, an Agent module, a storage server and a blue light storage, wherein the control center comprises a data management strategy module, a blue light storage node management module, a task scheduling module, a monitoring analysis module and a database;
the data management policy module is used for completing policy configuration of a user on the container, and comprises a starting policy, a configuration policy, a deleting policy, a modifying policy and a disabling policy;
the blue light storage node management module is used for providing blue light storage target storage position information for the task scheduling module, and the blue light storage target storage position information comprises URL and authentication information;
the monitoring analysis module is used for monitoring the network flow of the storage server, analyzing the operation information of a user on the data in the storage server, acquiring the data change condition information from the operation information and recording the data change condition information in the database;
the task scheduling module is used for periodically scanning the database, judging whether a container in the storage server meets a data management strategy configured by a user according to the data change condition information acquired by the monitoring analysis module, and if so, generating a data scheduling task and transmitting the data scheduling task to the message queue;
the Agent module is used for executing data scheduling, and finishing the backup of data from the storage server to the blue light storage or the retrieval of the data from the blue light storage to the storage server according to the tasks in the message queue;
the database: the data management strategy is used for storing the monitored data operation information of the data in the storage server and configured by a user;
the monitoring analysis module monitors an IP address and a network port of the storage server in a bypass monitoring mode, when a user operates data in the storage server or writes data into the storage server through a client, the user grabs and filters a data packet through a configured filter to obtain an http/https message for operating the data in the storage server or writing the data into the storage server, then performs message analysis, requests and response parts needing to analyze the http/https message, obtains a request method, corresponding user and container information from the request and the response parts, and stores/updates information obtained after analyzing the http/https message into a database;
the http/https message comprises a request line and a request header, wherein the request line comprises a request method field: the PUT/GET/HEAD/POST/DELETE/COPY, and the request header comprises a Content-Length field and a Destination field;
after analyzing the PUT method, explaining that a user uploads an object or a container to a storage server at a client, and continuously analyzing the URL field after the object or the container is needed, wherein the URL field sequentially comprises a user name, a container name and an object name, sequentially analyzes the URL field according to the sequence, and if only the user name is analyzed, the container name is not analyzed later, the operation of creating an account is explained, and continuously analyzing is abandoned; if the container name and the object name are resolved, or only the container name is resolved, continuing to resolve the Content-Length field in the request header so as to acquire the object size; then grabbing a response message aiming at the request, resolving a status code from a response row, if the status code is 200-299, judging that the user uploading is successful, resolving a response head, acquiring time from a Last-Modified field in the response head as the Last object modification time, and finally storing/updating the resolved information of the user name, the container name, the object size and the Last object modification time into a database;
after resolving to the GET method, the user is required to read data from the storage server at the client, the GET method does not modify the container in the storage server at all, and the http/https message resolving is finished;
after analyzing the HEAD method, the user is explained to check metadata information corresponding to the user, the container or the object in the storage server at the client, the HEAD method does not modify the container in the storage server at all, and the http/https message analysis is ended;
after resolving to the POST method, explaining that the customer needs to create, update or modify metadata information corresponding to the user, container or object, and needs to continuously resolve the subsequent URL fields, wherein the URL fields sequentially comprise a user name, a container name and an object name, sequentially resolve according to the sequence, if only the user name, the container name, the user name and the container name are resolved, or the user name, the container name and the object name are resolved, a response message aiming at the request needs to be grabbed, a state code is resolved from a response row, if the state code is 200-299, the user creation, update or modification is judged to be successful, then a response head is resolved, the current modification time is acquired from a Date field in the response head, and finally the metadata information corresponding to the resolved user, container or object and the current modification time information are saved/updated in a database;
after resolving to the DELETE method, the client needs to DELETE the object or the container from the storage server, needs to continuously resolve the subsequent URL field, the URL field sequentially comprises a user name, a container name and an object name, sequentially resolves according to the sequence, if only the container name or the object name is resolved, or the container name and the object name are resolved, the response message aiming at the request needs to be grabbed, the state code is resolved from the response row, if the state code is 200-299, the success of deleting the object or the container by the user is judged, and finally, the information corresponding to the container name and the object name of the user is deleted from the database;
after analyzing the COPY method, explaining that a client needs to COPY an object, continuing to analyze the subsequent URL field, wherein the URL field sequentially comprises a user name, a container name and an object name, sequentially analyzing the URL field to the object name according to the sequence, analyzing a Destination field in a request header so as to record a target position and a name, capturing a response message aiming at the request, analyzing a status code from a response row, judging that the COPY is successful if the status code is 200-299, analyzing the response header, acquiring the current modification time from a Date field in the response header, and finally storing/updating the analyzed target position information with the URL, the source object name and the current COPY time information into a database;
the configuration strategy of the data management strategy module is configured by setting a time threshold of an archiving strategy for a container by a user, the strategy can be started for the configured strategy in a unit of day, after the strategy is effective, the task scheduling module can acquire the modification time of an object from a database, meanwhile, the time of the current system can also be acquired, a time difference is obtained by the two times, finally, the time difference is compared with the configured time threshold, and the object with the time difference exceeding the set time threshold needs to be archived on a blue light storage; the modification strategy is to modify the time threshold value of the configuration strategy; deleting a policy refers to deleting the configuration policy; disabling policies refers to ceasing to use the configured configuration policies, rendering them inactive.
2.A data scheduling system according to claim 1 wherein: the task scheduling module generates a data scheduling task, wherein the data scheduling task comprises a URL (uniform resource locator), a project name, a user name and a password of a container to be archived, which are positioned in a storage server, and also comprises the URL, the project name, the user name and the password of a target storage position of blue light storage, and the URL, the project name, the user name and the password of the container to be archived, which are positioned in the storage server, are acquired through a scanning database; the URL, the project name, the user name and the password at the target location of the blue-ray storage are set by the user when creating a new container in the blue-ray storage through the client, or are acquired when associating an existing container, and are stored in the blue-ray storage node management module.
3.A data scheduling system according to claim 1 wherein: the Agent module is deployed on the storage server and can send a heartbeat message to the task scheduling module of the control center in a second unit, the task scheduling module considers that the Agent module is normal after receiving the heartbeat message sent by the Agent module, if the heartbeat message is not received for more than 3 times, the Agent module is considered to be faulty, and the task scheduling module can not send tasks to the message queue.
4. A data scheduling system according to claim 2 wherein: the method for retrieving the Agent module from the blue light storage to the storage server comprises the following steps: the user initiates a read request from the blue light storage at the client, the information of the read request comprises a URL of an object to be read, a project name for accessing the blue light storage, a user name and a password, a task scheduling module sends a read request message to a message queue, and an Agent module initiates the read request to the blue light storage after acquiring a task of the read request from the message queue, and the read object data is written into a storage server.
5. A data scheduling system according to claim 1 wherein: the database type is MongoDB database.
CN201911037777.4A 2019-10-29 2019-10-29 Data scheduling system Active CN110750497B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911037777.4A CN110750497B (en) 2019-10-29 2019-10-29 Data scheduling system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911037777.4A CN110750497B (en) 2019-10-29 2019-10-29 Data scheduling system

Publications (2)

Publication Number Publication Date
CN110750497A CN110750497A (en) 2020-02-04
CN110750497B true CN110750497B (en) 2023-09-26

Family

ID=69280770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911037777.4A Active CN110750497B (en) 2019-10-29 2019-10-29 Data scheduling system

Country Status (1)

Country Link
CN (1) CN110750497B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111736956B (en) * 2020-06-29 2023-01-10 苏州浪潮智能科技有限公司 Container service deployment method, device, equipment and readable storage medium
CN111934723A (en) * 2020-08-26 2020-11-13 上海仪电(集团)有限公司中央研究院 Bypass interception Bluetooth communication device, method and application thereof
CN112818059B (en) * 2021-01-27 2024-05-17 百果园技术(新加坡)有限公司 Information real-time synchronization method and device based on container release platform
CN113032598B (en) * 2021-04-12 2023-07-14 郑州航空工业管理学院 Image design visual conveying system based on big data
CN115242677B (en) * 2021-04-23 2023-09-01 中国移动通信集团四川有限公司 Home-wide user state monitoring system, method and device
CN117076094B (en) * 2023-10-16 2024-01-16 中国船舶集团有限公司第七〇七研究所 Method for concurrently processing multiple tasks of cryptographic operation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648961A (en) * 2016-09-27 2017-05-10 上海爱数信息技术股份有限公司 Integrated blue-ray disc jukebox backup and archiving method
CN106649467A (en) * 2016-09-27 2017-05-10 上海爱数信息技术股份有限公司 Blue-ray disc jukebox archiving management method and system
WO2018145739A1 (en) * 2017-02-08 2018-08-16 Telefonaktiebolaget Lm Ericsson (Publ) Methods, client and server relating to a distributed database

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9697247B2 (en) * 2014-07-16 2017-07-04 Facebook, Inc. Tiered data storage architecture

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648961A (en) * 2016-09-27 2017-05-10 上海爱数信息技术股份有限公司 Integrated blue-ray disc jukebox backup and archiving method
CN106649467A (en) * 2016-09-27 2017-05-10 上海爱数信息技术股份有限公司 Blue-ray disc jukebox archiving management method and system
WO2018145739A1 (en) * 2017-02-08 2018-08-16 Telefonaktiebolaget Lm Ericsson (Publ) Methods, client and server relating to a distributed database

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
严文瑞 ; 曹强 ; 姚杰 ; 谢长生 ; .一种面向大容量光盘库的新型文件系统.计算机研究与发展.2015,(第S2期),全文. *
屠要峰 ; 刘辉 ; 张国良 ; 刘春 ; .一种分布式缓存系统的关键技术及应用.计算机科学.2018,(05),全文. *

Also Published As

Publication number Publication date
CN110750497A (en) 2020-02-04

Similar Documents

Publication Publication Date Title
CN110750497B (en) Data scheduling system
JP6566330B2 (en) Video editing method
US9262899B2 (en) Method, device and system for implementing video recording retrieval
US20160062992A1 (en) Shared server methods and systems for information storage, access, and security
US8745155B2 (en) Network storage device collector
US20070136397A1 (en) Information life-cycle management architecture for a device with infinite storage capacity
US20120023145A1 (en) Policy-based computer file management based on content-based analytics
CN112000741A (en) Intranet and extranet data exchange system, method, device, computer equipment and medium
CN112883011A (en) Real-time data processing method and device
CN109587141A (en) A kind of system and method for remote server evidence obtaining
WO2017088701A1 (en) Mass picture management method and apparatus
CN106233262A (en) Picture pick-up device and the control method of picture pick-up device
WO2009053356A1 (en) Method of managing operations for administration, maintenance and operational upkeep, management entity, and corresponding computer program product
CN109996031B (en) Monitoring system and monitoring method
JP2008258846A (en) Ethernet switch and remote capture system
US9002788B2 (en) System for configurable reporting of network data and related method
CN110737635B (en) Data blocking method
US20170262543A1 (en) Method and system for improving sessions and open files enumerations by data structures changes
JP4782353B2 (en) Information management apparatus, information processing apparatus and control method therefor, information management system, and program
CN109670027B (en) Image query, cache and retention method and system
CN112447280A (en) Intelligent medical system for medical image information management
WO2019179050A1 (en) Call log management method and system, computer device and storage medium
CN105578134A (en) Video acquisition system and method
CN112749142A (en) Handle management method and system
CN108614873A (en) A kind of data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 266237 5th Floor, Building C, Haike Entrepreneurship Center, Aoshanwei Street Office, Jimo District, Qingdao City, Shandong Province

Applicant after: Yihai Luyuan (Shandong) Digital Technology Co.,Ltd.

Address before: 266237 5th Floor, Building C, Haike Entrepreneurship Center, Aoshanwei Street Office, Jimo District, Qingdao City, Shandong Province

Applicant before: SHANDONG E.HUALU INFORMATION TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant