CN113971166A - Content data storage method, device, server and storage medium - Google Patents

Content data storage method, device, server and storage medium Download PDF

Info

Publication number
CN113971166A
CN113971166A CN202010715874.0A CN202010715874A CN113971166A CN 113971166 A CN113971166 A CN 113971166A CN 202010715874 A CN202010715874 A CN 202010715874A CN 113971166 A CN113971166 A CN 113971166A
Authority
CN
China
Prior art keywords
content data
target content
distributed
stored
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010715874.0A
Other languages
Chinese (zh)
Inventor
洪亮
陈林
赵博
王金龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202010715874.0A priority Critical patent/CN113971166A/en
Publication of CN113971166A publication Critical patent/CN113971166A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The disclosure relates to a content data storage method, a content data storage device, a server and a storage medium, which relate to the technical field of storage and can improve the response speed of a reading service. The embodiment of the present disclosure includes: detecting the storage duration of content data stored in a distributed publish-subscribe message system according to a preset time interval; determining first target content data with the storage duration being greater than or equal to a preset duration from the content data stored in the distributed publish-subscribe message system; acquiring first storage position information of the first target content data in the distributed publishing and subscribing message system; storing the first target content data into a distributed file system based on the first storage location information, wherein the storage location of the first target content data in the distributed file system is determined based on the first storage location information.

Description

Content data storage method, device, server and storage medium
Technical Field
The present disclosure relates to the field of storage technologies, and in particular, to a content data storage method, an apparatus, a server, and a storage medium.
Background
A distributed publish-subscribe messaging system that can handle all the activity flow data of a consumer in a web site, such as: the Kafka distributed publish-subscribe messaging system is used to process all the action flow data of the consumer in the web site. In addition, the distributed publish-subscribe message system unifies online and offline message processing through a parallel loading mechanism of Hadoop, and can provide real-time messages through clustering.
In the related art, most of data of the topic content is stored in a disk in the distributed publish-subscribe message system, when a Consumer (Consumer) client reads history data of a certain topic (topic), a large amount of data on the disk storing the topic content is read, and other disks are idle, so that the response speed of a read service is slow due to a content data storage method in the related art.
Disclosure of Invention
The present disclosure provides a content data storage method, apparatus, server and storage medium, to at least solve the problem of slow response speed of read service in the related art. The technical scheme of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, there is provided a content data storage method, including:
determining first target content data with the storage duration being greater than or equal to a preset duration in a distributed publish-subscribe message system;
and storing the first target content data to a distributed file system.
According to a second aspect of the embodiments of the present disclosure, there is provided a content data storage device including:
a detection module configured to perform detection of a storage duration of content data stored in the distributed publish-subscribe message system at preset time intervals
The first determining module is configured to execute the determination of first target content data with the storage duration being greater than or equal to a preset duration from the content data stored in the distributed publish-subscribe message system;
an obtaining module configured to perform obtaining first storage location information of the first target content data in the distributed publish-subscribe message system;
a first storage module configured to perform storing the first target content data into a distributed file system based on the first storage location information, wherein a storage location of the first target content data in the distributed file system is determined based on the first storage location information.
According to a third aspect of embodiments of the present disclosure, there is provided a server, including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the content data storage method according to the first aspect.
According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having instructions that, when executed by a processor of a server, enable the server to perform the content data storage method according to the first aspect.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product whose instructions, when executed by a processor of a server, enable the server to perform the content data storage method according to the first aspect.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: in the embodiment of the present disclosure, the first target content data with the storage duration greater than or equal to the preset duration in the distributed publish-subscribe message system is stored in the distributed file system, so that when the Consumer client reads the history data of a certain topic, the history data can be read from the distributed file system, and the distributed file system can fully utilize the capacity of the disk, thereby improving the response speed of the read service.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a block diagram illustrating a content data storage system in accordance with an exemplary embodiment.
FIG. 2 is a block diagram illustrating a distributed publish-subscribe message system according to an exemplary embodiment.
Fig. 3 is a flow chart illustrating a content data storage method according to an exemplary embodiment.
FIG. 4 is an illustration of a traffic-time relationship in accordance with an exemplary embodiment.
FIG. 5 is an image of util value versus time, shown in accordance with an exemplary embodiment.
Fig. 6 is an illustration of an iostat value versus time relationship image, according to an example embodiment.
Fig. 7 is an illustration of a time delay versus time graph in accordance with an exemplary embodiment.
FIG. 8 is a block diagram illustrating a content data storage device, according to an example embodiment.
FIG. 9 is a block diagram illustrating a server in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Referring to fig. 1, embodiments of the present disclosure provide a content data storage system that may include a distributed publish-subscribe message system including a proxy server, a consumer client, and a producer client, and a distributed file system. The producer client, acting as a producer of messages, may send messages to a proxy server. The proxy server is used for storing the messages sent by the producer client. The consumer client, acting as a message consumer, may read a message from the proxy server. The distributed file system may receive and store content data sent by the distributed publish-subscribe message system.
In one possible implementation, as shown in fig. 2, the distributed publish-subscribe message system may be a Kafka system, which may include a Broker server, a Consumer client, and a Producer client. The Kafka system processes the message source of the resource and classifies with different Topic, can store multiple Topic content data on the same Broker server, in the case of more Topic content data, the content data of one Topic can also be distributed in multiple Broker servers. The producer client can produce messages and upload the messages to the proxy server; the consumer client may consume the message and retrieve the message from the proxy server. The Distributed File System may be an HDFS (Hadoop Distributed File System).
Fig. 3 is a flowchart illustrating a content data storage method according to an exemplary embodiment, where the content data storage method may be used for a proxy server in the distributed publish-subscribe message system shown in fig. 1, as shown in fig. 3, and includes the following steps:
in step S101, a storage duration of content data stored in the distributed publish-subscribe message system is detected according to a preset time interval.
The preset time interval may be a preset time interval, for example, may be 1s, or may be 3s, or may be 5s, and the like, which is not limited in this disclosure. The distributed publish-subscribe message system may be a Kafka system, or may also be another system.
In step S102, a first target content datum with a storage duration greater than or equal to a preset duration is determined from the content data stored in the distributed publish-subscribe message system.
The preset time period may be 1 hour, or 3 hours, or 5 hours, and the like, which is not limited in this embodiment. The distributed publish-subscribe message system can store content data only through a memory; alternatively, the distributed publish-subscribe message system may store content data only via disk; alternatively, the distributed publish-subscribe messaging system may store content data via memory and disk. The first target content data may be stored in the proxy server, or may be stored in a component having a storage function, such as a memory or a disk of the proxy server. For example, the first target content data may be stored in a memory; alternatively, the first target content data may be stored in a disk; alternatively, the first target content data may be stored partially in the memory and partially in the disk.
In addition, the first target content data may be content data sent by the Producer client to the proxy server, and after receiving the first target content data, the proxy server may store the first target content data in an internal memory or a disk.
In step S103, first storage location information of the first target content data in the distributed publish-subscribe message system is obtained.
Wherein the first storage location information may be used to find the first target content data in the distributed publish-subscribe message system. The first storage location information may include a topic, a partition, and an offset corresponding to the first target content data. Taking the distributed publish-subscribe message system as the Kafka system as an example, the topic, the partition and the offset corresponding to the first target content data may be respectively: topic a, Partition B, and offset 8000.
In step S104, storing the first target content data into a distributed file system based on the first storage location information, wherein the storage location of the first target content data in the distributed file system is determined based on the first storage location information.
The distributed file system can store content data according to a theme, a partition and an offset. After the content data stored in the distributed publish-subscribe message system is stored in the distributed file system, the storage location information may remain unchanged, so that the offset of the content data stored in the distributed publish-subscribe message system may be continuous with the offset of the content data stored in the distributed file system. For example, the topic, partition, and offset of the content data that was last stored into the distributed file system may be: topic a, Partition B, and offset 8001, the Topic, Partition, and offset corresponding to the first target content data may be: topic a, Partition B, and offset 8000, the storage address of the first target content data in the distributed file system may be consecutive to the storage address of the last stored content data.
It should be noted that the first target content data may be read, and the read first target content data may be stored in the distributed file system. Processes may be initiated to store the first target content data to the distributed file system, and each process may initiate a plurality of synchronization threads by which the first target content data is stored to the distributed file system. Taking the distributed publish-subscribe message system as the Kafka system as an example, each Kafka process may start multiple synchronization threads, and the multiple synchronization threads may read data stored in a disk of the Kafka system and write the data into the distributed file system.
In addition, the first target content data with the storage duration being greater than or equal to the preset duration in the distributed publish-subscribe message system can be obtained in real time, and the first target content data is stored in the distributed file system. Taking the preset time duration as 3 hours as an example, if the current time is 16:00, the content data stored at 13:00 can be determined as first target content data, and the content data stored at 13:00 is stored in the distributed file system; if the current time is 16:01, the content data stored at 13:01 can be determined to be first target content data, and the content data stored at 13:01 can be stored in the distributed file system, so that the first target content data can be stored in the distributed file system in real time.
In this embodiment, the response speed of the read service can be improved by storing the first target content data, which is stored in the distributed publish-subscribe message system for a time period greater than or equal to a preset time period, in the distributed file system. Taking a distributed publish-subscribe message system as an example of a Kafka system, assume that one Broker server of the Kafka system stores content data of two topics, namely test1 and test2, each Topic has only one Partition and is stored on two disks of disk1 and disk2, respectively. In the related art, when a consumer client reads content data stored on a Broker server, if the content data written recently is read, the data cached in an operating system is read, and a disk is basically not read, so that IO (input/output) resources of the disk are not consumed. If the customer client reads the historical content data stored on the Broker server, the data on the disk is usually read in a large amount, so that a large amount of disk IO resources are consumed. If a client reads a large amount of historical content data of the Topic test1, the reading service is concentrated on the disk1, and the IO of the disk2 is idle, so that a disk IO hotspot problem is generated. In this embodiment, the first target content data with the storage duration greater than or equal to the preset duration in the Kafka system may be stored in the HDFS, and a hierarchical storage architecture of the Kafka on HDFS may be implemented, so that hot data may be stored locally in the Kafka, cold data may be stored in the HDFS, and the problem of disk IO hot spots may be alleviated.
In addition, the first target content data with the storage duration being greater than or equal to the preset duration in the distributed publish-subscribe message system is stored in the distributed file system, so that the storage and calculation can be separated, and the storage resources can be conveniently and independently expanded. Taking the distributed publish-subscribe message system as the Kafka system as an example, the Kafka system may include a Kafka cluster, and it is assumed that the Kafka cluster includes three Broker servers, and the disk usage rates of the three Broker servers are greater than or equal to 90%, but resources such as CPUs, network cards and the like are idle. In the related art, in order to increase the disk capacity of the Kafka cluster without restarting the Kafka cluster, only the Broker server can be added to the Kafka cluster. However, the addition of the Broker server not only increases the disk capacity of the entire Kafka cluster, but also increases resources such as a CPU and a network card. But at this time, resources such as a CPU and a network card do not need to be added, which causes resource waste. In the embodiment, the HDFS can be deployed at other nodes through Kafka system local storage and HDFS remote storage, so that the storage and calculation separation is realized; and the Kafka system only needs to pay attention to hot data stored locally, cold data is stored on the HDFS, and computing services can be arranged on the HDFS, so that the separation of the Kafka system and the computing services can be realized.
Further, the first target content data with the storage duration being greater than or equal to the preset duration in the distributed publish-subscribe message system is stored in the distributed file system, so that historical content data can be prevented from being lost. Taking a distributed publish-subscribe message system as an example of a Kafka system, content data stored in the Kafka system is periodically deleted, and when the content data stored in the Kafka system is longer than or equal to a certain duration, the content data with the storage duration longer than or equal to the certain duration is deleted, so that the content data with the storage duration longer than or equal to the certain duration cannot be consumed. In this embodiment, through the hierarchical Kafka storage based on the HDFS, hot data of recent hours can be stored in a Kafka local disk, and a large amount of historical data can be stored in the HDFS cluster, while data on the HDFS cluster can be permanently stored, so that the historical content data can be prevented from being lost.
In practical application, taking a distributed publish-subscribe message system as a Kafka system and a distributed file system as an HDFS as an example, the first target content data with the storage duration greater than or equal to a preset duration in the Kafka system may be stored in the HDFS. The start switch for reading the HDFS may be designed so that only the content data stored on the Kafka system can be read when the start switch is not turned on, and the content data stored on the Kafka system can be read when the start switch is turned on. When processing the request of the Consumer client to read the content data of Topic, as shown in fig. 4 to fig. 7, the start switch is turned on at time a, and the Kafka system reads the data of HDFS and returns the data to the Consumer client. Fig. 4 shows the traffic conditions for content data transmission before and after the start switch is turned on. As shown in FIG. 5, after the start switch is turned on, the util value of the disk of the Kafka system drops, which can be used to indicate how busy the disk drive is. As shown in fig. 6, after the start switch is turned on, the iostat value of the disk of the Kafka system decreases, and the iostat value of the disk can be used to represent the statistics of the input and output of the disk. As shown in fig. 7, after the start switch is turned on, the response delay of the Consumer client decreases, so that the request for reading the content data of Topic from the Consumer client can be more quickly responded.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: in the embodiment of the present disclosure, the first target content data with the storage duration greater than or equal to the preset duration in the distributed publish-subscribe message system is stored in the distributed file system, so that when the Consumer client reads the history data of a certain Topic, the history data can be read from the distributed file system, and the distributed file system can fully utilize the capacity of a disk, thereby improving the response speed of the read service.
Optionally, the content data storage method further includes:
receiving a request message which is sent by a consumer client and used for requesting to acquire second target content data;
if the second target content data is determined to be stored in the distributed publishing and subscribing message system, reading the second target content data from the distributed publishing and subscribing message system;
if the second target content data is determined to be stored in the distributed file system, reading the second target content data from the distributed file system;
and sending the read second target content data to the consumer client.
The request message may carry second storage location information corresponding to the second target content data, and it may be determined that the second target content data is stored in the distributed publish-subscribe message system or the distributed file system according to the second storage location information. Through the second storage location information, the second target content data can be found quickly.
In addition, if it is determined that the second target content data is stored in the distributed file system, an application interface may be called to read the second target content data, and taking the distributed publish-subscribe message system as the Kafka system as an example, a Handler thread on the Kafka system may call an API for reading the HDFS, and may return the read second target content data to the Consumer client.
In this embodiment, the consumer client may read the content data with the storage duration within the preset duration from the distributed publish-subscribe message system, may read the content data with the storage duration greater than or equal to the preset duration from the distributed file system, and may utilize the characteristic that the response speed of the distributed file system for reading the content data is fast, thereby improving the response speed of the read service.
Optionally, the request message carries second storage location information corresponding to the second target content data, where the second storage location information includes a subject, a partition, and an offset;
after receiving a request message for requesting to acquire second target content data sent by a consumer client, the content data storage method further includes:
and determining that the second target content data is stored in the distributed publish-subscribe message system or the distributed file system based on the topic, the partition and the offset corresponding to the second target content data.
The message sources of the processing resources of the distributed publish-subscribe message system can be classified according to different topics, and a plurality of topic content data can be stored on the same proxy server. The content data of a topic may be divided into a plurality of partitions (partitions), each Partition may be an ordered queue, each message in the Partition may be set with an offset (offset), and the position of each message in the Partition may be determined by the offset. For example, the offset may be an ID number. Taking the distributed publish-subscribe message system as the Kafka system as an example, the storage file of the Kafka system may be named according to offset. The first message in a Partition may be the first offset and the file name may be named 00000000000. kafka. The content data can be stored in order by storing in the form of the theme, the partition and the offset, and the stored content data can be quickly found based on the theme, the partition and the offset.
In addition, a topic, a partition and an offset corresponding to the second target content data may be searched in the stored index information, so as to determine that the second target content data is stored in the distributed publish-subscribe message system or the distributed file system; or, it may also be that, in the distributed publish-subscribe message system, whether the second target content data is stored is searched based on the topic, the partition, and the offset corresponding to the second target content data, and if the second target content data is not stored in the distributed publish-subscribe message system, whether the second target content data is stored is searched in the distributed file system based on the topic, the partition, and the offset corresponding to the second target content data.
In this embodiment, the specific storage location of the second target content data can be quickly determined by the theme, the partition and the offset corresponding to the second target content data, so that the second target content data can be quickly found.
Optionally, the determining that the second target content data is stored in the distributed publish-subscribe message system or the distributed file system based on the topic, the partition, and the offset corresponding to the second target content data includes:
searching a theme, a partition and an offset corresponding to the second target content data in stored index information, wherein the index information comprises the theme, the partition and the offset of the content data stored in the distributed publish-subscribe message system and the theme, the partition and the offset of the content data stored in the distributed file system;
determining that the second target content data is stored in the distributed publish-subscribe message system or the distributed file system based on the lookup result.
The distributed publish-subscribe message system may store the index information therein. The index information may be stored in the distributed publish-subscribe message system in the form of a list or a linked list, and the storage manner of the index information is not limited in this embodiment. Taking the distributed publish-subscribe message system as the Kafka system as an example, the index information stored by the Kafka system indicates that messages with an offset of 1000 to 5000 in Partition B in Topic a are stored in the Kafka system, messages with an offset of 5000 to 8000 in Partition B in Topic a are stored in the HDFS, and if the Topic, the Partition and the offset corresponding to the second target content data are Topic a, Partition B and 3000 respectively, it can be determined that the second target content data are stored in the Kafka system.
In the embodiment, the second target content data can be quickly located through the stored index information, the need of scanning the content data stored in the distributed publish-subscribe message system and the distributed file system to search for the second target content data is avoided, the time for locating the content data can be reduced, and the response speed of reading the service can be further improved.
Optionally, the content data storage method further includes:
and if the first target content data is determined to be stored in the distributed file system under the condition that a preset condition is met, storing the content data to be stored in a storage position corresponding to the first target content data in the distributed publishing and subscribing message system.
The meeting of the preset condition may be that a storage duration of the first target content data in the distributed publish-subscribe message system is greater than or equal to a preset duration; or, the storage spaces in the distributed publish-subscribe message system all have stored content data; alternatively, other conditions may be used, and this embodiment is not limited to this. The determining that the first target content data is stored in the distributed file system may be that, in a case that the distributed publish-subscribe message system stores the first target content data in the distributed file system, the distributed publish-subscribe message system receives a response message sent by the distributed file system, where the response message is used to indicate that the first target content data is stored in the distributed file system. Through the response message, the distributed publish-subscribe message system can accurately acquire the storage result of the content data stored in the distributed file system, so that the content data can be managed conveniently.
In this embodiment, after the first target content data is stored in the distributed file system, the content data to be stored is adopted in the distributed publish-subscribe message system to cover the first target content data, so that the storage resources in the distributed publish-subscribe message system can be fully utilized, and the content data stored in the distributed publish-subscribe message system can be prevented from being lost.
FIG. 8 is a block diagram illustrating a content data storage device, according to an example embodiment. Referring to fig. 8, the apparatus includes:
a detection module 201 configured to execute detecting a storage duration of content data stored in the distributed publish-subscribe message system according to a preset time interval
A first determining module 202, configured to perform determining, from the content data stored in the distributed publish-subscribe message system, first target content data whose storage duration is greater than or equal to a preset duration;
an obtaining module 203 configured to perform obtaining first storage location information of the first target content data in the distributed publish-subscribe message system;
a first storage module 204 configured to perform storing the first target content data into a distributed file system based on the first storage location information, wherein a storage location of the first target content data in the distributed file system is determined based on the first storage location information.
Optionally, the content data storage device further includes:
the receiving module is configured to execute receiving of a request message sent by the consumer client for requesting to acquire the second target content data;
a first reading module configured to execute reading the second target content data from the distributed publish-subscribe message system if it is determined that the second target content data is stored in the distributed publish-subscribe message system;
a second reading module configured to execute reading the second target content data from the distributed file system if it is determined that the second target content data is stored in the distributed file system;
a sending module configured to execute sending the read second target content data to the consumer client.
Optionally, the request message carries second storage location information corresponding to the second target content data, where the second storage location information includes a subject, a partition, and an offset;
the content data storage device further includes:
a second determining module configured to perform determining that the second target content data is stored in the distributed publish-subscribe message system or the distributed file system based on a topic, a partition, and an offset corresponding to the second target content data.
Optionally, the second determining module includes:
a searching unit configured to perform searching for a topic, a partition, and an offset corresponding to the second target content data in stored index information, where the index information includes the topic, the partition, and the offset of the content data stored in the distributed publish-subscribe message system, and the topic, the partition, and the offset of the content data stored in the distributed file system;
a determining unit configured to perform determining that the second target content data is stored in the distributed publish-subscribe message system or the distributed file system based on a result of the lookup.
Optionally, the content data storage device further includes:
and the second storage module is configured to store the content data to be stored to a storage position corresponding to the first target content data in the distributed publish-subscribe message system if the first target content data is determined to be stored in the distributed file system under the condition that a preset condition is met.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
FIG. 9 is a block diagram illustrating a server in accordance with an example embodiment. Referring to fig. 9, the server includes:
a processor 301;
a memory 302 for storing instructions executable by the processor 301;
wherein the processor 301 is configured to execute the instructions to implement the content data storage method for the server in the above embodiment.
In an exemplary embodiment, a storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of a server to perform the above method is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product is also provided that includes one or more instructions executable by a processor of a server to perform the above-described method of content data storage.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A content data storage method, characterized by comprising:
detecting the storage duration of content data stored in a distributed publish-subscribe message system according to a preset time interval;
determining first target content data with the storage duration being greater than or equal to a preset duration from the content data stored in the distributed publish-subscribe message system;
acquiring first storage position information of the first target content data in the distributed publishing and subscribing message system;
storing the first target content data into a distributed file system based on the first storage location information, wherein the storage location of the first target content data in the distributed file system is determined based on the first storage location information.
2. The content data storage method according to claim 1, characterized by further comprising:
receiving a request message which is sent by a consumer client and used for requesting to acquire second target content data;
if the second target content data is determined to be stored in the distributed publishing and subscribing message system, reading the second target content data from the distributed publishing and subscribing message system;
if the second target content data is determined to be stored in the distributed file system, reading the second target content data from the distributed file system;
and sending the read second target content data to the consumer client.
3. The content data storage method according to claim 2, wherein the request message carries second storage location information corresponding to the second target content data, and the second storage location information includes a subject, a partition, and an offset;
after receiving a request message for requesting to acquire second target content data sent by a consumer client, the content data storage method further includes:
and determining that the second target content data is stored in the distributed publish-subscribe message system or the distributed file system based on the topic, the partition and the offset corresponding to the second target content data.
4. The content data storage method according to claim 3, wherein the determining that the second target content data is stored in the distributed publish-subscribe message system or the distributed file system based on a topic, a partition, and an offset corresponding to the second target content data comprises:
searching a theme, a partition and an offset corresponding to the second target content data in stored index information, wherein the index information comprises the theme, the partition and the offset of the content data stored in the distributed publish-subscribe message system and the theme, the partition and the offset of the content data stored in the distributed file system;
determining that the second target content data is stored in the distributed publish-subscribe message system or the distributed file system based on the lookup result.
5. The content data storage method according to claim 1, characterized by further comprising:
and if the first target content data is determined to be stored in the distributed file system under the condition that a preset condition is met, storing the content data to be stored in a storage position corresponding to the first target content data in the distributed publishing and subscribing message system.
6. A content data storage device, characterized in that the content data storage device comprises:
a detection module configured to perform detection of a storage duration of content data stored in the distributed publish-subscribe message system at preset time intervals
The first determining module is configured to execute the determination of first target content data with the storage duration being greater than or equal to a preset duration from the content data stored in the distributed publish-subscribe message system;
an obtaining module configured to perform obtaining first storage location information of the first target content data in the distributed publish-subscribe message system;
a first storage module configured to perform storing the first target content data into a distributed file system based on the first storage location information, wherein a storage location of the first target content data in the distributed file system is determined based on the first storage location information.
7. The content data storage device according to claim 6, further comprising:
the receiving module is configured to execute receiving of a request message sent by the consumer client for requesting to acquire the second target content data;
a first reading module configured to execute reading the second target content data from the distributed publish-subscribe message system if it is determined that the second target content data is stored in the distributed publish-subscribe message system;
a second reading module configured to execute reading the second target content data from the distributed file system if it is determined that the second target content data is stored in the distributed file system;
a sending module configured to execute sending the read second target content data to the consumer client.
8. The content data storage device according to claim 7, wherein the request message carries second storage location information corresponding to the second target content data, and the second storage location information includes a subject, a partition, and an offset;
the content data storage device further includes:
a second determining module configured to perform determining that the second target content data is stored in the distributed publish-subscribe message system or the distributed file system based on a topic, a partition, and an offset corresponding to the second target content data.
9. A server, characterized in that the server comprises:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the content data storage method of any one of claims 1 to 5.
10. A storage medium in which instructions are executed by a processor of a server to enable the server to perform the content data storage method according to any one of claims 1 to 5.
CN202010715874.0A 2020-07-23 2020-07-23 Content data storage method, device, server and storage medium Pending CN113971166A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010715874.0A CN113971166A (en) 2020-07-23 2020-07-23 Content data storage method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010715874.0A CN113971166A (en) 2020-07-23 2020-07-23 Content data storage method, device, server and storage medium

Publications (1)

Publication Number Publication Date
CN113971166A true CN113971166A (en) 2022-01-25

Family

ID=79585200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010715874.0A Pending CN113971166A (en) 2020-07-23 2020-07-23 Content data storage method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN113971166A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105681397A (en) * 2015-12-30 2016-06-15 曙光信息产业(北京)有限公司 Network traffic data storage method and system, query method and device
CN105933326A (en) * 2016-06-08 2016-09-07 乐视控股(北京)有限公司 Remote terminal data reporting method and device
CN107038162A (en) * 2016-02-03 2017-08-11 滴滴(中国)科技有限公司 Real time data querying method and system based on database journal
CN108255875A (en) * 2016-12-29 2018-07-06 北京奇虎科技有限公司 Message is stored to the method and apparatus of distributed file system
CN108984564A (en) * 2017-06-02 2018-12-11 北京京东尚科信息技术有限公司 Data-storage system, method and apparatus
CN109617869A (en) * 2018-12-06 2019-04-12 中铁程科技有限责任公司 Inter-network log real-time collecting method and terminal
CN110908788A (en) * 2019-12-02 2020-03-24 北京锐安科技有限公司 Spark Streaming based data processing method and device, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105681397A (en) * 2015-12-30 2016-06-15 曙光信息产业(北京)有限公司 Network traffic data storage method and system, query method and device
CN107038162A (en) * 2016-02-03 2017-08-11 滴滴(中国)科技有限公司 Real time data querying method and system based on database journal
CN105933326A (en) * 2016-06-08 2016-09-07 乐视控股(北京)有限公司 Remote terminal data reporting method and device
CN108255875A (en) * 2016-12-29 2018-07-06 北京奇虎科技有限公司 Message is stored to the method and apparatus of distributed file system
CN108984564A (en) * 2017-06-02 2018-12-11 北京京东尚科信息技术有限公司 Data-storage system, method and apparatus
CN109617869A (en) * 2018-12-06 2019-04-12 中铁程科技有限责任公司 Inter-network log real-time collecting method and terminal
CN110908788A (en) * 2019-12-02 2020-03-24 北京锐安科技有限公司 Spark Streaming based data processing method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US7631034B1 (en) Optimizing node selection when handling client requests for a distributed file system (DFS) based on a dynamically determined performance index
CN101196912B (en) Method and apparatus for application state synchronization
CN110888889B (en) Data information updating method, device and equipment
CN107515784B (en) Method and equipment for calculating resources in distributed system
CN102473134A (en) Management server, management method, and management program for virtual hard disk
JP2015511347A (en) System and method for improving access to search results
WO2019057193A1 (en) Data deletion method and distributed storage system
CN110784498B (en) Personalized data disaster tolerance method and device
CN111475759A (en) Message pushing platform, method, device, server and storage medium
CN111782692A (en) Frequency control method and device
CN115587118A (en) Task data dimension table association processing method and device and electronic equipment
EP1569110A2 (en) A method for managing execution of a process based on available services
CN113672169A (en) Data reading and writing method of stream processing system and stream processing system
CN112052104A (en) Message queue management method based on multi-computer-room realization and electronic equipment
CN113971166A (en) Content data storage method, device, server and storage medium
US10812390B2 (en) Intelligent load shedding of traffic based on current load state of target capacity
CN110955461A (en) Processing method, device and system of computing task, server and storage medium
CN115562933A (en) Processing method and device of operation monitoring data, storage medium and electronic equipment
US11366864B2 (en) Bot integration in a web-based search engine
WO2018188959A1 (en) Method and apparatus for managing events in a network that adopts event-driven programming framework
CN114444440A (en) Identifier generation method, device, storage medium and system
CN115237960A (en) Information pushing method and device, storage medium and electronic equipment
JP2010152435A (en) Information processing apparatus and method, and program
CN111488370B (en) List paging quick response system and method
Balasubramanian et al. Auto-tuned publisher in a pub/sub system: Design and performance evaluation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination