WO2015067115A1 - Deduplication method and apparatus of real-time system data - Google Patents

Deduplication method and apparatus of real-time system data Download PDF

Info

Publication number
WO2015067115A1
WO2015067115A1 PCT/CN2014/088312 CN2014088312W WO2015067115A1 WO 2015067115 A1 WO2015067115 A1 WO 2015067115A1 CN 2014088312 W CN2014088312 W CN 2014088312W WO 2015067115 A1 WO2015067115 A1 WO 2015067115A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
real
time system
time
source
Prior art date
Application number
PCT/CN2014/088312
Other languages
French (fr)
Chinese (zh)
Inventor
杨基彬
Original Assignee
北京国双科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京国双科技有限公司 filed Critical 北京国双科技有限公司
Publication of WO2015067115A1 publication Critical patent/WO2015067115A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination

Definitions

  • the present invention relates to the field of computers, and in particular to a method and apparatus for deduplicating real-time system data.
  • StreamInsight is a real-time data stream processing framework provided by Microsoft Corporation of the United States for efficient real-time computing.
  • StreamInsight does not have the function of deduplicating statistics. In actual applications, it is sometimes necessary to use deduplication statistics.
  • Real-time processing systems usually acquire the latest data at regular intervals, which facilitates real-time management of the system, but at the same time the amount of data is large.
  • the data of the real-time system generally has a data cycle, and repeated statistics often occur when statistics are performed on the data.
  • the video real-time processing system needs to receive the broadcast data in real time. Each play Id sends a snapshot data every interval, indicating the latest playback information.
  • the prior art solution is to use the stream clip method to count the number of times of play.
  • the play data is given a certain life cycle first, and if the number of play times is calculated once every minute, the life cycle of the play data is played.
  • the life cycle of the old snapshot is truncated to the start time of the new snapshot.
  • the main object of the present invention is to provide a method and apparatus for deduplicating real-time system data to solve the problem of low data de-duplication efficiency of real-time systems.
  • a method for de-emphasizing real-time system data includes: receiving real-time system data; determining whether a data source of the first data is the same as a data source of real-time system data, wherein the first number According to the data stored in the data buffer; when it is determined that the data source of the first data is the same as the data source of the real-time system data, deleting the data in the first data and the data source of the real-time system data; and Temporarily store real-time system data to the data buffer.
  • the number of real-time system data is multiple, and determining whether the data source of the first data is the same as the data source of the real-time system data includes: determining a data source and real-time system data of the first data each time a real-time system data is received. Is the data source the same?
  • the de-duplication method further includes: storing the data in the data buffer to the target storage area; and clearing the data in the data buffer.
  • the data in the data buffer is stored to the target storage area every predetermined time interval.
  • the real-time system data is data from a video real-time processing system or a webpage real-time processing system.
  • a deduplication device for real-time system data the deduplication device being mainly for performing a de-duplication method for any real-time system data provided by the above-mentioned contents of the present invention.
  • a deduplication device for real-time system data comprising: a receiving unit, configured to receive real-time system data; and a determining unit, configured to determine a data source of the first data Whether the data source of the real-time system data is the same, wherein the first data is data stored in the data buffer; and the deleting unit is configured to determine, when the data source of the first data is the same as the data source of the real-time system data, The data in the first data is deleted from the data source of the real-time system data source; and the temporary storage unit is configured to temporarily store the real-time system data to the data buffer.
  • the determining unit includes: a determining sub-unit, configured to determine whether the data source of the first data and the data source of the real-time system data are the same each time a real-time system data is received.
  • the determining subunit includes: a determining module, configured to determine whether the identifier ID of each first data is the same as the identifier ID of the real-time system data, to determine whether the data source of the first data is the same as the data source of the real-time system data. .
  • the deduplication device further includes: a storage unit configured to store data in the data buffer to the target storage area; and a clearing unit to clear data in the data buffer.
  • the storage unit is configured to store data in the data buffer to the target storage area every predetermined time interval.
  • the real-time system data is data from a video real-time processing system or a webpage real-time processing system.
  • the invention adopts receiving real-time system data; determining whether the data source of the first data is the same as the data source of the real-time system data, wherein the first data is data stored in the data buffer; and the data source of the first data is determined When the data source of the real-time system data is the same, the data source in the first data is deleted and the data of the real-time system data source is the same; and the real-time system data is temporarily stored in the data buffer.
  • a data buffer is created, and the real-time system data is temporarily stored in the buffer, and the data is judged before the data is temporarily buffered to the buffer, if there is a data source from the same data source.
  • the data first deletes the data of the data source in the data buffer, and deduplicates the data source before the next step of processing, avoiding a large amount of data of the same data source directly entering the real-time processing system for deduplication.
  • the problem of low efficiency of real-time system is solved, and the effect of real-time system is improved.
  • FIG. 1 is a flow chart of a real-time system data deduplication method according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a data life cycle before stream editing according to the prior art
  • FIG. 3 is a schematic diagram of a data life cycle after stream editing according to the prior art
  • FIG. 4 is a schematic structural diagram of a real-time system data deduplication apparatus according to an embodiment of the present invention.
  • the invention provides a deduplication method for real-time system data.
  • the following describes the de-duplication method of the real-time system data of the present invention:
  • FIG. 1 is a flow chart of a real-time system data deduplication method according to an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps S102 to S108:
  • Step S102 receiving real-time system data. Specifically, a data buffer is opened in the memory before receiving the data.
  • Step S104 determining whether the data source of the first data is the same as the data source of the real-time system data, wherein the first data is data stored in the data buffer.
  • Step S106 in the case that it is determined that the data source of the first data is the same as the data source of the real-time system data, deleting the same data of the data source in the first data and the real-time system data data source,
  • Step S108 temporarily storing the real-time system data to the data buffer.
  • the deduplication method of the real-time system data in the embodiment of the present invention first establishes a data buffer before receiving the real-time system data, and temporarily stores the real-time system data into the buffer, and before the data is temporarily stored in the buffer, the data is first Judging, if there is data from the same data source in the buffer, the data of the data source in the data buffer is deleted first, and the duplicate data of the data source is deduplicated before proceeding to the next step, avoiding the same A large amount of data of the data source directly enters the real-time processing system for deduplication, and solves the problem that the real-time system has low de-emphasis efficiency, thereby achieving the effect of improving the efficiency of the real-time system processing data.
  • Each data source of the real-time system will send an up-to-date data every interval to monitor the running status of the system. Whenever a new data is received, a judgment is made, that is, the data source of the data in the data buffer is received and received. The data source of the real-time system data is compared. If the data source of a certain data in the data buffer is the same as the data source of the received real-time system data, the old data in the data buffer is deleted, and the data source is temporarily stored. The latest data sent.
  • whether the data source of the first data is the same as the data source of the real-time system data is determined by determining whether the identifier ID of each first data is the same as the identifier ID of the real-time system data. Wherein, when it is determined that the identifier ID of the first data is the same as the identifier ID of the real-time system data, the data source of the first data is determined to be the same as the data source of the real-time system data, and vice versa.
  • the de-duplication method of the real-time system data in the embodiment of the present invention further includes: storing the data in the data buffer to the target storage area every predetermined time interval, and clearing The data in the data buffer.
  • the buffer is emptied to buffer the data received after the buffer.
  • the length of the predetermined time can be determined according to the requirements of the subsequent processing system for real-time data.
  • the de-duplication method of the real-time data of the present invention establishes a data buffer before data is input into StreamInsight, and receives "play 1 snapshot 1" at 00:00:00. Since there is no data in the data buffer at this time, it is directly placed in the data buffer, and 00:00:05 receives "Play 1 Snapshot 2".
  • the present invention also provides a real-time system data deduplication device, which is mainly used to implement the de-duplication method of real-time system data provided by the above content in the embodiment of the present invention, and the following describes the de-duplication method of the real-time system data of the present invention.
  • the device mainly includes a receiving unit 10, a determining unit 20, a deleting unit 30, and a temporary storage unit 40, wherein:
  • the receiving unit 10 is configured to receive real-time system data. Specifically, a data buffer is opened in the memory before receiving the data.
  • the determining unit 20 is configured to determine whether the data source of the first data is the same as the data source of the real-time system data, wherein the first data is data stored in the data buffer.
  • the deleting unit 30 is configured to delete the data in the first data and the data in the real-time system data source when the data source of the first data is determined to be the same as the data source of the real-time system data.
  • the temporary storage unit 40 is configured to temporarily store real-time system data to the data buffer.
  • the deduplication device of the real-time system data in the embodiment of the present invention first establishes a data buffer before receiving the real-time system data, and temporarily stores the real-time system data into the data buffer, and before the data is temporarily stored in the data buffer, Judging the data, if there is data from the same data source in the buffer, the data of the data source in the data buffer is deleted first, and the duplicate data of the data source is deduplicated before proceeding to the next step, thereby avoiding A large amount of data of the same data source directly enters the real-time processing system for de-duplication, and solves the problem that the real-time system has low de-emphasis efficiency, thereby achieving the effect of improving the efficiency of real-time system processing data.
  • the real-time system data is multiple, and the determining unit 20 includes a determining sub-unit, configured to determine whether the data source of the first data and the data source of the real-time system data are the same each time a real-time system data is received.
  • Each data source of the real-time system will send an up-to-date data every interval to monitor the running status of the system. Whenever a new data is received, a judgment is made, that is, the data source of the data in the data buffer is received and received. The data source of the real-time system data is compared. If the data source of the data buffer has the same data source as the data of the real-time system received, the old data in the data buffer is deleted, and the data source is sent. The latest data coming.
  • the determining subunit includes a determining module, configured to determine whether the identifier ID of each first data is the same as the identifier ID of the real-time system data, to determine whether the data source of the first data is the same as the data source of the real-time system data, wherein, when it is determined that the identifier ID of the first data is the same as the identifier ID of the real-time system data, the data source of the first data is determined to be the same as the data source of the real-time system data, and vice versa.
  • Such a method of judging by the identification ID is also called a self-connection judging method, and the real-time system data is used as an example for the data from the video real-time processing system, and it is assumed that a plurality of “play recording records” are currently being processed, each of which is The play record has a unique play Id.
  • the newly received video play data set and the already recorded video are The play data set is connected.
  • the play ID is determined according to the condition of the connection, that is, the play record of any one play Id is selected from the newly received video data set, and the video play data of the already recorded video is recorded. Find the playlist with the same play Id in the collection. .
  • the deduplication device further comprises a storage unit for storing data in the data buffer to the target storage area; and a clearing unit for clearing data in the data buffer.
  • the storage unit mainly stores the data in the data buffer to the target storage area every predetermined time interval. In order to output the latest data of the real-time system in time, so as to obtain the running state of the real-time system in real time, it is necessary to go every other time. The heavy data is stored for easy processing in the next step. After the data is stored, the buffer is emptied to buffer the data received after the buffer. The length of the predetermined time can be determined according to the requirements of the subsequent processing system for real-time data.
  • the deduplication device of the real-time data in the embodiment of the present invention may be used to count the number of times of video playback in a period of time, or to count the number of times of web browsing in a period of time, that is, in real time provided by the embodiment of the present invention.
  • the real-time system data may be data from a video real-time processing system, or may be data from a webpage real-time processing system.
  • the de-duty device is configured to collect a specific statistical manner of the number of video playbacks within a period of time, and the de-duplication method of the real-time system data provided by the foregoing content in the embodiment of the present invention counts the same number of times of video playback in a period of time, where No longer.
  • the present invention solves the problem of low de-emphasis efficiency of the real-time system, thereby achieving the effect of improving the data processing efficiency of the real-time system.
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device, such that they may be stored in a storage device for execution by the computing device, or they may be separately fabricated into individual integrated circuit modules. Blocks, or a plurality of modules or steps in them, are implemented as a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.

Abstract

Disclosed are a deduplication method and apparatus of real-time system data. The deduplication method of real-time system data comprises: receiving the real-time system data; determining whether a data source of first data is same as a data source of the real-time system data, the first data being data stored in a data buffer; and if it is determined that the data source of the first data is same as the data source of the real-time system data, deleting data that is same in the data source of the first data and the data source of the real-time system data; and temporarily storing the real-time system data to the data buffer. The present invention solves the problem of low deduplication efficiency of a real-time system, thereby improving the efficiency of the real-time system.

Description

实时系统数据的去重方法和装置Deduplication method and device for real-time system data 技术领域Technical field
本发明涉及计算机领域,具体而言,涉及一种实时系统数据的去重方法和装置。The present invention relates to the field of computers, and in particular to a method and apparatus for deduplicating real-time system data.
背景技术Background technique
StreamInsight是美国微软公司提供的实时数据流处理框架,可以用于高效实时计算。但是StreamInsight没有自带去重统计数据功能,而实际应用中有时候需要用到去重统计数据。StreamInsight is a real-time data stream processing framework provided by Microsoft Corporation of the United States for efficient real-time computing. However, StreamInsight does not have the function of deduplicating statistics. In actual applications, it is sometimes necessary to use deduplication statistics.
实时处理系统通常是每隔一段时间获取一次最新的数据,这样方便了系统的实时管理,但是同时数据量会很大。同时,实时系统的数据一般而言都有数据周期,对数据进行统计时常常会发生重复统计的情况。以视频实时处理系统需要实时接收播放数据为例,每个播放Id每间隔一段时间会发送一个快照数据,表示最新的播放信息。Real-time processing systems usually acquire the latest data at regular intervals, which facilitates real-time management of the system, but at the same time the amount of data is large. At the same time, the data of the real-time system generally has a data cycle, and repeated statistics often occur when statistics are performed on the data. For example, the video real-time processing system needs to receive the broadcast data in real time. Each play Id sends a snapshot data every interval, indicating the latest playback information.
现有技术解决方案是使用流剪辑的方法来统计播放次数,采用流剪辑的方法计播放次数时,先给播放数据赋予一定的生命周期,假如1分钟计算一次播放次数,则播放数据的生命周期设置为1分钟,当遇到播放Id相同的新快照时,将旧快照的生命周期截断至新快照的起始时间,当想要计算00:00:00到00:00:59这个时间区间内的播放次数时,只需要在59秒的瞬间统计一下快照总数即可。虽然流剪辑方案看起来可以很方便的统计播放次数,但实施起来却不容易,因为需要使用同一个流的自连接来实现,假如1分钟内有10w次播放快照,那么相当于10w个播放快照条目和10w个播放快照条目进行集合的笛卡儿积运算,然后过滤出符合筛选条件的记录。这个计算是非常消耗CPU以及内存的。The prior art solution is to use the stream clip method to count the number of times of play. When the number of times of play is counted by the stream clip method, the play data is given a certain life cycle first, and if the number of play times is calculated once every minute, the life cycle of the play data is played. Set to 1 minute, when the new snapshot with the same Id is played, the life cycle of the old snapshot is truncated to the start time of the new snapshot. When you want to calculate the time interval from 00:00:00 to 00:00:59 For the number of playbacks, you only need to count the total number of snapshots at 59 seconds. Although the stream editing scheme seems to be very convenient to count the number of playbacks, it is not easy to implement, because it needs to use the same stream self-joining. If there are 10w playback snapshots in 1 minute, it is equivalent to 10w playback snapshots. The entry and 10w play snapshot entries are subjected to a Cartesian product of the set, and then the records that match the filter criteria are filtered out. This calculation is very CPU and memory.
针对相关技术中对实时系统数据进行去重的效率较低的问题,目前尚未提出有效的解决方案。In view of the low efficiency of deduplication of real-time system data in the related art, an effective solution has not yet been proposed.
发明内容Summary of the invention
本发明的主要目的在于提供一种实时系统数据的去重方法和装置,以解决实时系统数据去重效率较低的问题。The main object of the present invention is to provide a method and apparatus for deduplicating real-time system data to solve the problem of low data de-duplication efficiency of real-time systems.
根据本发明的一个方面,提供了一种实时系统数据的去重方法,包括:接收实时系统数据;判断第一数据的数据源与实时系统数据的数据源是否相同,其中,第一数 据为存储在数据缓冲区中的数据;在判断出第一数据的数据源与实时系统数据的数据源相同的情况下,删除第一数据中数据源与实时系统数据数据源相同的数据;以及暂存实时系统数据至数据缓冲区。According to an aspect of the present invention, a method for de-emphasizing real-time system data includes: receiving real-time system data; determining whether a data source of the first data is the same as a data source of real-time system data, wherein the first number According to the data stored in the data buffer; when it is determined that the data source of the first data is the same as the data source of the real-time system data, deleting the data in the first data and the data source of the real-time system data; and Temporarily store real-time system data to the data buffer.
可选地,实时系统数据的数量为多个,判断第一数据的数据源与实时系统数据的数据源是否相同包括:每接收一个实时系统数据,判断一次第一数据的数据源与实时系统数据的数据源是否相同。Optionally, the number of real-time system data is multiple, and determining whether the data source of the first data is the same as the data source of the real-time system data includes: determining a data source and real-time system data of the first data each time a real-time system data is received. Is the data source the same?
可选地,通过判断每一个第一数据的标识ID与实时系统数据的标识ID是否相同,来判断第一数据的数据源与实时系统数据的数据源是否相同。Optionally, it is determined whether the data source of the first data is the same as the data source of the real-time system data by determining whether the identifier ID of each first data is the same as the identifier ID of the real-time system data.
可选地,在暂存实时系统数据至数据缓冲区之后,去重方法还包括:将数据缓冲区中的数据存储至目标存储区域;以及清空数据缓冲区中的数据。Optionally, after temporarily storing the real-time system data to the data buffer, the de-duplication method further includes: storing the data in the data buffer to the target storage area; and clearing the data in the data buffer.
可选地,每间隔预定时间将数据缓冲区中的数据存储至目标存储区域。Optionally, the data in the data buffer is stored to the target storage area every predetermined time interval.
可选地,实时系统数据为来自视频实时处理系统或者网页实时处理系统的数据。Optionally, the real-time system data is data from a video real-time processing system or a webpage real-time processing system.
根据本发明的另一方面,提供了一种实时系统数据的去重装置,该去重装置主要用于执行本发明上述内容所提供的任一种实时系统数据的去重方法。According to another aspect of the present invention, there is provided a deduplication device for real-time system data, the deduplication device being mainly for performing a de-duplication method for any real-time system data provided by the above-mentioned contents of the present invention.
为了实现上述目的,根据本发明的另一方面,提供了一种实时系统数据的去重装置,包括:接收单元,用于接收实时系统数据;判断单元,用于判断第一数据的数据源与实时系统数据的数据源是否相同,其中,第一数据为存储在数据缓冲区中的数据;删除单元,用于在判断出第一数据的数据源与实时系统数据的数据源相同的情况下,删除第一数据中数据源与实时系统数据数据源相同的数据;以及暂存单元,用于暂存实时系统数据至数据缓冲区。In order to achieve the above object, according to another aspect of the present invention, a deduplication device for real-time system data is provided, comprising: a receiving unit, configured to receive real-time system data; and a determining unit, configured to determine a data source of the first data Whether the data source of the real-time system data is the same, wherein the first data is data stored in the data buffer; and the deleting unit is configured to determine, when the data source of the first data is the same as the data source of the real-time system data, The data in the first data is deleted from the data source of the real-time system data source; and the temporary storage unit is configured to temporarily store the real-time system data to the data buffer.
可选地,实时系统数据的数量为多个,判断单元包括:判断子单元,用于每接收一个实时系统数据,判断一次第一数据的数据源与实时系统数据的数据源是否相同。Optionally, the number of real-time system data is multiple, and the determining unit includes: a determining sub-unit, configured to determine whether the data source of the first data and the data source of the real-time system data are the same each time a real-time system data is received.
可选地,判断子单元包括:判断模块,用于判断每一个第一数据的标识ID与实时系统数据的标识ID是否相同,来判断第一数据的数据源与实时系统数据的数据源是否相同。Optionally, the determining subunit includes: a determining module, configured to determine whether the identifier ID of each first data is the same as the identifier ID of the real-time system data, to determine whether the data source of the first data is the same as the data source of the real-time system data. .
可选地,去重装置还包括:存储单元,用于将数据缓冲区中的数据存储至目标存储区域;以及清空单元,用于清空数据缓冲区中的数据。 Optionally, the deduplication device further includes: a storage unit configured to store data in the data buffer to the target storage area; and a clearing unit to clear data in the data buffer.
可选地,存储单元用于每间隔预定时间将数据缓冲区中的数据存储至目标存储区域。Optionally, the storage unit is configured to store data in the data buffer to the target storage area every predetermined time interval.
可选地,实时系统数据为来自视频实时处理系统或者网页实时处理系统的数据。Optionally, the real-time system data is data from a video real-time processing system or a webpage real-time processing system.
本发明采用接收实时系统数据;判断第一数据的数据源与实时系统数据的数据源是否相同,其中,第一数据为存储在数据缓冲区中的数据;在判断出第一数据的数据源与实时系统数据的数据源相同的情况下,删除第一数据中的数据源与实时系统数据数据源相同的数据;以及暂存实时系统数据至数据缓冲区。在接收实时系统数据之前先建立一个数据缓冲区,通过将实时系统数据暂存至缓冲区,且在数据暂存至缓冲区之前,先对数据进行判断,如果缓冲区中有来自同一个数据源的数据则先删除数据缓冲区中的这个数据源的数据,在进行下一步的处理之前就将数据源的重复数据去重,避免了同一个数据源的大量数据直接进入实时处理系统进行去重,解决了实时系统去重效率低的问题,进而达到了提高实时系统效果。The invention adopts receiving real-time system data; determining whether the data source of the first data is the same as the data source of the real-time system data, wherein the first data is data stored in the data buffer; and the data source of the first data is determined When the data source of the real-time system data is the same, the data source in the first data is deleted and the data of the real-time system data source is the same; and the real-time system data is temporarily stored in the data buffer. Before receiving real-time system data, a data buffer is created, and the real-time system data is temporarily stored in the buffer, and the data is judged before the data is temporarily buffered to the buffer, if there is a data source from the same data source. The data first deletes the data of the data source in the data buffer, and deduplicates the data source before the next step of processing, avoiding a large amount of data of the same data source directly entering the real-time processing system for deduplication. The problem of low efficiency of real-time system is solved, and the effect of real-time system is improved.
附图说明DRAWINGS
构成本申请的一部分的附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The accompanying drawings, which are incorporated in the claims In the drawing:
图1是根据本发明实施例的实时系统数据去重方法的流程图;1 is a flow chart of a real-time system data deduplication method according to an embodiment of the present invention;
图2是根据现有技术的流剪辑之前的数据生命周期示意图;2 is a schematic diagram of a data life cycle before stream editing according to the prior art;
图3是根据现有技术的流剪辑之后的数据生命周期示意图;以及3 is a schematic diagram of a data life cycle after stream editing according to the prior art;
图4是根据本发明实施例的实时系统数据去重装置的结构示意图。4 is a schematic structural diagram of a real-time system data deduplication apparatus according to an embodiment of the present invention.
具体实施方式detailed description
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict. The invention will be described in detail below with reference to the drawings in conjunction with the embodiments.
本发明提供一种实时系统数据的去重方法,下面对本发明的实时系统数据的去重方法进行具体介绍:The invention provides a deduplication method for real-time system data. The following describes the de-duplication method of the real-time system data of the present invention:
图1是本发明实施例的实时系统数据去重方法的流程图。如图1所示,该方法包括如下的步骤S102至步骤S108: 1 is a flow chart of a real-time system data deduplication method according to an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps S102 to S108:
步骤S102,接收实时系统数据。具体地,在接收数据之前,先在内存里开辟一个数据缓冲区。Step S102, receiving real-time system data. Specifically, a data buffer is opened in the memory before receiving the data.
步骤S104,判断第一数据的数据源与实时系统数据的数据源是否相同,其中,第一数据为存储在数据缓冲区中的数据。Step S104, determining whether the data source of the first data is the same as the data source of the real-time system data, wherein the first data is data stored in the data buffer.
步骤S106,在判断出第一数据的数据源与实时系统数据的数据源相同的情况下,删除第一数据中的数据源与实时系统数据数据源相同的数据,Step S106, in the case that it is determined that the data source of the first data is the same as the data source of the real-time system data, deleting the same data of the data source in the first data and the real-time system data data source,
步骤S108,暂存实时系统数据至数据缓冲区。Step S108, temporarily storing the real-time system data to the data buffer.
本发明实施例的实时系统数据的去重方法,在接收实时系统数据之前先建立一个数据缓冲区,通过将实时系统数据暂存至缓冲区,且在数据暂存至缓冲区之前,先对数据进行判断,如果缓冲区中有来自同一个数据源的数据则先删除数据缓冲区中的这个数据源的数据,在进行下一步的处理之前就将数据源的重复数据去重,避免了同一个数据源的大量数据直接进入实时处理系统进行去重,解决了实时系统去重效率低的问题,进而达到了提高实时系统处理数据效率的效果。The deduplication method of the real-time system data in the embodiment of the present invention first establishes a data buffer before receiving the real-time system data, and temporarily stores the real-time system data into the buffer, and before the data is temporarily stored in the buffer, the data is first Judging, if there is data from the same data source in the buffer, the data of the data source in the data buffer is deleted first, and the duplicate data of the data source is deduplicated before proceeding to the next step, avoiding the same A large amount of data of the data source directly enters the real-time processing system for deduplication, and solves the problem that the real-time system has low de-emphasis efficiency, thereby achieving the effect of improving the efficiency of the real-time system processing data.
可选地,实时系统数据为多个,判断第一数据的数据源与实时系统数据的数据源是否相同包括:每接收一个实时系统数据,判断一次第一数据的数据源与接收到的实时系统数据的数据源是否相同。实时系统的各个数据源会每间隔一段时间发送一个最新的数据,以便于监控系统的运行状态,每接收一个新的数据时,进行一次判断,即将数据缓冲区中的数据的数据源与接收到的这个实时系统数据的数据源进行比较,如果数据缓冲区中某个数据的数据源与接收到的这个实时系统数据的数据源相同,则删除数据缓冲区中的旧数据,暂存这个数据源发来的最新数据。Optionally, the real-time system data is multiple, and determining whether the data source of the first data is the same as the data source of the real-time system data comprises: determining the data source of the first data and the received real-time system each time a real-time system data is received. Whether the data source of the data is the same. Each data source of the real-time system will send an up-to-date data every interval to monitor the running status of the system. Whenever a new data is received, a judgment is made, that is, the data source of the data in the data buffer is received and received. The data source of the real-time system data is compared. If the data source of a certain data in the data buffer is the same as the data source of the received real-time system data, the old data in the data buffer is deleted, and the data source is temporarily stored. The latest data sent.
可选地,在本发明实施例中,可以通过判断每一个第一数据的标识ID与实时系统数据的标识ID是否相同,来判断第一数据的数据源与实时系统数据的数据源是否相同,其中,当判断出某个第一数据的标识ID与实时系统数据的标识ID相同的情况下,确定这个第一数据的数据源与实时系统数据的数据源相同,反之,则不相同。此种通过标识ID进行判断的方式也称作自连接判断方式,以实时系统数据为来自视频实时处理系统的数据为例进一步说明,假设目前在处理很多条“播放记录”的集合,每条播放记录都有一个唯一的播放Id,对于判断新接收到的视频播放数据的数据源与已经记录的视频播放数据的数据源是否相同,则将新接收到的视频播放数据集合与已经记录的视频播放数据集合进行连接,这两个集合在连接的时候,根据播放Id作为连接的条件判定,即从新接收到的视频数据集合中选择任意一个播放Id的播放记录,到已经记录的视频播放数据的集合中去查找是否具有同样播放Id的播放记录。 Optionally, in the embodiment of the present invention, whether the data source of the first data is the same as the data source of the real-time system data is determined by determining whether the identifier ID of each first data is the same as the identifier ID of the real-time system data. Wherein, when it is determined that the identifier ID of the first data is the same as the identifier ID of the real-time system data, the data source of the first data is determined to be the same as the data source of the real-time system data, and vice versa. Such a method of judging by the ID is also called a self-joining judging method, and the real-time system data is taken as an example for the data from the video real-time processing system, and it is assumed that a plurality of "playing records" are currently being processed, and each of the pieces is played. The record has a unique play Id. For judging whether the data source of the newly received video play data is the same as the data source of the already recorded video play data, the newly received video play data set and the already recorded video play are played. The data set is connected. When the two sets are connected, the playback ID is determined according to the condition of the connection, that is, the play record of any one play Id is selected from the newly received video data set, and the set of the video play data has been recorded. Go to find if there is a play record with the same play Id.
通过一对一的自连接,来判断数据缓冲区中的数据与实时系统数据是否来自同一个源的效率要远远高于多对多的数据自连接,判断的效率较高。Through one-to-one self-connection, it is more efficient to judge whether the data in the data buffer and the real-time system data are from the same source than the many-to-many data self-joining.
可选地,在暂存实时系统数据至数据缓冲区之后,本发明实施例的实时系统数据的去重方法还包括,每间隔预定时间将数据缓冲区中的数据存储至目标存储区域,并清空数据缓冲区中的数据。Optionally, after the real-time system data is temporarily buffered to the data buffer, the de-duplication method of the real-time system data in the embodiment of the present invention further includes: storing the data in the data buffer to the target storage area every predetermined time interval, and clearing The data in the data buffer.
为了让实时系统的最新数据及时输出,以便实时获取实时系统的运行状态,所以需要每隔一段时间将去重的数据存储起来,以便于下一步的处理。在数据存储之后,清空缓冲区,以便与缓冲区缓冲后面接收到的数据。预定时间的长短可以根据后续处理系统对数据实时性的要求来确定。In order to make the latest data of the real-time system output in time, in order to obtain the running state of the real-time system in real time, it is necessary to store the deduplicated data at intervals to facilitate the next processing. After the data is stored, the buffer is emptied to buffer the data received after the buffer. The length of the predetermined time can be determined according to the requirements of the subsequent processing system for real-time data.
可选地,本发明实施例的实时数据的去重方法可以用于统计一段时间以内地视屏播放次数,或者用于统计一段时间内网页浏览的次数,即,在本发明实施例所提供的实时系统数据的去重方法中,实时系统数据可以是来自视频实时处理系统的数据,也可以是来自网页实时处理系统的数据。为了更好的理解本发明实施例,下面以统计一段时间内的视频播放次数为例来说明本发明实施例的实时数据的去重方法。Optionally, the de-duplication method of the real-time data in the embodiment of the present invention may be used to count the number of times of video playback in a period of time, or to count the number of times of web browsing in a period of time, that is, in real time provided by the embodiment of the present invention. In the deduplication method of the system data, the real-time system data may be data from a video real-time processing system, or may be data from a webpage real-time processing system. For a better understanding of the embodiments of the present invention, the deduplication method of the real-time data in the embodiment of the present invention is described by taking the statistics of the number of times of video playback in a period of time as an example.
在00:00:00到00:00:59这段时间内收到的快照如下表所示:The snapshots received during the time between 00:00:00 and 00:00:59 are shown in the following table:
时间time 00:00:0000:00:00 00:00:0500:00:05 00:00:1000:00:10 00:00:1500:00:15 00:00:2000:00:20
播放IdPlay Id 播放1快照1Play 1 snapshot 1 播放1快照2Play 1 snapshot 2 播放1快照3Play 1 snapshot 3 播放2快照1Play 2 snapshot 1 播放2快照2Play 2 snapshot 2
表中虽然有5个快照数据,但是只有播放1和播放2两次播放。假定每个数据的生命周期为1分钟,则在00:00:00到00:00:59这个时间段统计播放次数时会得到播放次数为5次(如图2所示),而实际上只有2次播放。为了得到准确的播放次数,在统计播放次数之前采用流剪辑的方法,当遇到相同ID的新快照时,将旧快照的生命周期截断至新快照开始的时间。流剪辑之后的快照生命周期如图3所示,流剪辑之后统计00:00:00到00:00:59这个时间段的播放次数为2。Although there are 5 snapshot data in the table, only Play 1 and Play 2 play twice. Assuming that the lifetime of each data is 1 minute, the number of playbacks will be 5 times during the time period from 00:00:00 to 00:00:59 (as shown in Figure 2), but only Play 2 times. In order to get an accurate number of plays, the stream clip method is used before the number of play times. When a new snapshot of the same ID is encountered, the life cycle of the old snapshot is truncated to the start time of the new snapshot. The snapshot life cycle after stream editing is shown in Figure 3. After the stream clip, statistics are counted from 00:00:00 to 00:00:59.
如果快照的数量很大,比如00:00:00到00:00:59这个时间段内,有100000个播放快照,那么,在StreamInsight中对其进行流剪辑时,将进行10000*10000次自连接,才能将所有快照进行去重,去重效率较低,本发明的实时数据的去重方法在数据输入StreamInsight之前,先建立一个数据缓冲区,00:00:00接收“播放1快照1”,由于此时数据缓冲区中没有数据,直接放入数据缓冲区中,00:00:05接收“播放1快照2”,此时先进行一次判断,由于“播放1快照1”和“播放1快照2”均为来自“播放1”这个数据源, 所以删除“播放1快照1”,将“播放1快照2”保存到数据缓冲区中。按照此方法依次接收数据,到00:00:59的时候,数据缓冲区中的只有“播放1快照3”和“播放2快照2”这两个数据,即,00:00:00到00:00:59这个时间段内有两次播放。If the number of snapshots is large, such as 00:00:00 to 00:00:59, there are 100,000 playback snapshots. Then, when streaming clips in StreamInsight, 10000*10000 self-connections will be performed. In order to de-duplicate all snapshots, the de-duplication method of the real-time data of the present invention establishes a data buffer before data is input into StreamInsight, and receives "play 1 snapshot 1" at 00:00:00. Since there is no data in the data buffer at this time, it is directly placed in the data buffer, and 00:00:05 receives "Play 1 Snapshot 2". At this time, the judgment is made first, because "Play 1 Snapshot 1" and "Play 1 Snapshot" 2" are all from the data source "Play 1". So delete "Play 1 Snapshot 1" and save "Play 1 Snapshot 2" to the data buffer. According to this method, data is sequentially received. By 00:00:59, only the data of "Play 1 Snapshot 3" and "Play 2 Snapshot 2" are in the data buffer, that is, 00:00:00 to 00: 00:59 has played twice during this time period.
本发明还提供一种实时系统数据去重装置,该装置主要用于实现本法明实施例上述内容所提供的实时系统数据的去重方法,下面对本发明的实时系统数据的去重方法进行具体介绍:The present invention also provides a real-time system data deduplication device, which is mainly used to implement the de-duplication method of real-time system data provided by the above content in the embodiment of the present invention, and the following describes the de-duplication method of the real-time system data of the present invention. Introduction:
图4是根据本发明实施例的实时系统数据去重方法的结构示意图。如图4所示,该装置主要包括接收单元10、判断单元20、删除单元30和暂存单元40,其中:4 is a schematic structural diagram of a real-time system data deduplication method according to an embodiment of the present invention. As shown in FIG. 4, the device mainly includes a receiving unit 10, a determining unit 20, a deleting unit 30, and a temporary storage unit 40, wherein:
接收单元10用于接收实时系统数据。具体地,在接收数据之前,先在内存里开辟一个数据缓冲区。The receiving unit 10 is configured to receive real-time system data. Specifically, a data buffer is opened in the memory before receiving the data.
判断单元20用于判断第一数据的数据源与实时系统数据的数据源是否相同,其中,第一数据为存储在数据缓冲区中的数据。The determining unit 20 is configured to determine whether the data source of the first data is the same as the data source of the real-time system data, wherein the first data is data stored in the data buffer.
删除单元30用于在判断出第一数据的数据源与实时系统数据的数据源相同的情况下,删除第一数据中的数据源与实时系统数据数据源相同的数据,The deleting unit 30 is configured to delete the data in the first data and the data in the real-time system data source when the data source of the first data is determined to be the same as the data source of the real-time system data.
暂存单元40用于暂存实时系统数据至数据缓冲区。The temporary storage unit 40 is configured to temporarily store real-time system data to the data buffer.
本发明实施例的实时系统数据的去重装置,在接收实时系统数据之前先建立一个数据缓冲区,通过将实时系统数据暂存至数据缓冲区,且在数据暂存至数据缓冲区之前,先对数据进行判断,如果缓冲区中有来自同一个数据源的数据则先删除数据缓冲区中的这个数据源的数据,在进行下一步的处理之前就将数据源的重复数据去重,避免了同一个数据源的大量数据直接进入实时处理系统进行去重,解决了实时系统去重效率低的问题,进而达到了提高实时系统处理数据效率的效果。The deduplication device of the real-time system data in the embodiment of the present invention first establishes a data buffer before receiving the real-time system data, and temporarily stores the real-time system data into the data buffer, and before the data is temporarily stored in the data buffer, Judging the data, if there is data from the same data source in the buffer, the data of the data source in the data buffer is deleted first, and the duplicate data of the data source is deduplicated before proceeding to the next step, thereby avoiding A large amount of data of the same data source directly enters the real-time processing system for de-duplication, and solves the problem that the real-time system has low de-emphasis efficiency, thereby achieving the effect of improving the efficiency of real-time system processing data.
可选地,实时系统数据为多个,判断单元20包括判断子单元,用于每接收一个实时系统数据,判断一次第一数据的数据源与实时系统数据的数据源是否相同。实时系统的各个数据源会每间隔一段时间发送一个最新的数据,以便于监控系统的运行状态,每接收一个新的数据时,进行一次判断,即将数据缓冲区中的数据的数据源与接收到的这个实时系统数据的数据源进行比较,如果数据缓冲区中有数据的数据源与接收到的这个实时系统的数据的数据源相同,则删除数据缓冲区中的旧数据,而这个数据源发来的最新数据。 Optionally, the real-time system data is multiple, and the determining unit 20 includes a determining sub-unit, configured to determine whether the data source of the first data and the data source of the real-time system data are the same each time a real-time system data is received. Each data source of the real-time system will send an up-to-date data every interval to monitor the running status of the system. Whenever a new data is received, a judgment is made, that is, the data source of the data in the data buffer is received and received. The data source of the real-time system data is compared. If the data source of the data buffer has the same data source as the data of the real-time system received, the old data in the data buffer is deleted, and the data source is sent. The latest data coming.
可选地,判断子单元包括判断模块,用于判断每一个第一数据的标识ID与实时系统数据的标识ID是否相同,来判断第一数据的数据源与实时系统数据的数据源是否相同,其中,当判断出某个第一数据的标识ID与实时系统数据的标识ID相同的情况下,确定这个第一数据的数据源与实时系统数据的数据源相同,反之,则不相同。此种通过标识ID进行判断的方式也称作自连接判断方式,以实时系统数据为来自视频实时处理系统的数据为例可选说明,假设目前在处理很多条“播放记录”的集合,每条播放记录都有一个唯一的播放Id,对于判断新接收到的视频播放数据的数据源与已经记录的视频播放数据的数据源是否相同,则将新接收到的视频播放数据集合与已经记录的视频播放数据集合进行连接,这两个集合在连接的时候,根据播放Id作为连接的条件判定,即从新接收到的视频数据集合中选择任意一个播放Id的播放记录,到已经记录的视频播放数据的集合中去查找是否具有同样播放Id的播放记录。。Optionally, the determining subunit includes a determining module, configured to determine whether the identifier ID of each first data is the same as the identifier ID of the real-time system data, to determine whether the data source of the first data is the same as the data source of the real-time system data, Wherein, when it is determined that the identifier ID of the first data is the same as the identifier ID of the real-time system data, the data source of the first data is determined to be the same as the data source of the real-time system data, and vice versa. Such a method of judging by the identification ID is also called a self-connection judging method, and the real-time system data is used as an example for the data from the video real-time processing system, and it is assumed that a plurality of “play recording records” are currently being processed, each of which is The play record has a unique play Id. For determining whether the data source of the newly received video play data is the same as the data source of the already recorded video play data, the newly received video play data set and the already recorded video are The play data set is connected. When the two sets are connected, the play ID is determined according to the condition of the connection, that is, the play record of any one play Id is selected from the newly received video data set, and the video play data of the already recorded video is recorded. Find the playlist with the same play Id in the collection. .
通过一对一的自连接,来判断数据缓冲区中的数据与实时系统数据是否来自同一个源的效率要远远高于多对多的数据自连接,判断的效率较高。Through one-to-one self-connection, it is more efficient to judge whether the data in the data buffer and the real-time system data are from the same source than the many-to-many data self-joining.
可选地,去重装置还包括存储单元,用于将数据缓冲区中的数据存储至目标存储区域;以及清空单元,用于清空数据缓冲区中的数据。其中,存储单元主要是每间隔预定时间将数据缓冲区中的数据存储至目标存储区域,为了让实时系统的最新数据及时输出,以便实时获取实时系统的运行状态,所以需要每隔一段时间将去重的数据存储起来,以便于下一步的处理。在数据存储之后,清空缓冲区,以便与缓冲区缓冲后面接收到的数据。预定时间的长短可以根据后续处理系统对数据实时性的要求来确定。Optionally, the deduplication device further comprises a storage unit for storing data in the data buffer to the target storage area; and a clearing unit for clearing data in the data buffer. The storage unit mainly stores the data in the data buffer to the target storage area every predetermined time interval. In order to output the latest data of the real-time system in time, so as to obtain the running state of the real-time system in real time, it is necessary to go every other time. The heavy data is stored for easy processing in the next step. After the data is stored, the buffer is emptied to buffer the data received after the buffer. The length of the predetermined time can be determined according to the requirements of the subsequent processing system for real-time data.
可选地,本发明实施例的实时数据的去重装置可以用于统计一段时间以内地视屏播放次数,或者用于统计一段时间内网页浏览的次数,即,在本发明实施例所提供的实时系统数据的去重方法中,实时系统数据可以是来自视频实时处理系统的数据,也可以是来自网页实时处理系统的数据。其中,去重装置用于统计一段时间以内的视屏播放次数的具体统计方式,与本发明实施例上述内容所提供的实时系统数据的去重方法中统计一段时间内的视频播放次数相同,此处不再赘述。Optionally, the deduplication device of the real-time data in the embodiment of the present invention may be used to count the number of times of video playback in a period of time, or to count the number of times of web browsing in a period of time, that is, in real time provided by the embodiment of the present invention. In the deduplication method of the system data, the real-time system data may be data from a video real-time processing system, or may be data from a webpage real-time processing system. The de-duty device is configured to collect a specific statistical manner of the number of video playbacks within a period of time, and the de-duplication method of the real-time system data provided by the foregoing content in the embodiment of the present invention counts the same number of times of video playback in a period of time, where No longer.
从以上的描述中,可以看出,本发明解决了实时系统去重效率低的问题,进而达到了提高实时系统数据处理效率的效果。From the above description, it can be seen that the present invention solves the problem of low de-emphasis efficiency of the real-time system, thereby achieving the effect of improving the data processing efficiency of the real-time system.
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模 块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。It will be apparent to those skilled in the art that the various modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device, such that they may be stored in a storage device for execution by the computing device, or they may be separately fabricated into individual integrated circuit modules. Blocks, or a plurality of modules or steps in them, are implemented as a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 The above description is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims (12)

  1. 一种实时系统数据的去重方法,包括:A method for deduplicating real-time system data, including:
    接收所述实时系统数据;Receiving the real-time system data;
    判断第一数据的数据源与所述实时系统数据的数据源是否相同,其中,所述第一数据为存储在数据缓冲区中的数据;Determining whether the data source of the first data is the same as the data source of the real-time system data, wherein the first data is data stored in a data buffer;
    在判断出所述第一数据的数据源与所述实时系统数据的数据源相同的情况下,删除所述第一数据中数据源与所述实时系统数据数据源相同的数据;以及Determining, in a case where the data source of the first data is the same as the data source of the real-time system data, deleting data in the first data that is the same as the data source of the real-time system data;
    暂存所述实时系统数据至所述数据缓冲区。Temporarily storing the real-time system data to the data buffer.
  2. 根据权利要求1所述的去重方法,其中,所述实时系统数据的数量为多个,判断第一数据的数据源与所述实时系统数据的数据源是否相同包括:每接收一个所述实时系统数据,判断一次所述第一数据的数据源与所述实时系统数据的数据源是否相同。The deduplication method according to claim 1, wherein the number of the real-time system data is plural, and determining whether the data source of the first data is the same as the data source of the real-time system data comprises: receiving each of the real-time The system data determines whether the data source of the first data is the same as the data source of the real-time system data.
  3. 根据权利要求2所述的去重方法,其中,通过判断每一个所述第一数据的标识ID与所述实时系统数据的标识ID是否相同,来判断所述第一数据的数据源与所述实时系统数据的数据源是否相同。The deduplication method according to claim 2, wherein the data source of the first data is determined by determining whether the identification ID of each of the first data and the identification ID of the real-time system data are the same Whether the data source of real-time system data is the same.
  4. 根据权利要求1所述的去重方法,其中,在暂存所述实时系统数据至所述数据缓冲区之后,所述去重方法还包括:The deduplication method according to claim 1, wherein after the storing the real-time system data to the data buffer, the de-duplication method further comprises:
    将所述数据缓冲区中的数据存储至目标存储区域;以及Storing data in the data buffer to a target storage area;
    清空所述数据缓冲区中的数据。Empty the data in the data buffer.
  5. 根据权利要求4所述的去重方法,其中,每间隔预定时间将所述数据缓冲区中的数据存储至所述目标存储区域。The deduplication method according to claim 4, wherein the data in the data buffer is stored to the target storage area every predetermined time interval.
  6. 根据权利要求1所述的去重方法,其中,所述实时系统数据为来自视频实时处理系统或者网页实时处理系统的数据。The deduplication method according to claim 1, wherein the real-time system data is data from a video real-time processing system or a webpage real-time processing system.
  7. 一种实时系统数据的去重装置,包括:A deduplication device for real-time system data, comprising:
    接收单元,用于接收所述实时系统数据; a receiving unit, configured to receive the real-time system data;
    判断单元,用于判断第一数据的数据源与所述实时系统数据的数据源是否相同,其中,所述第一数据为存储在数据缓冲区中的数据;a determining unit, configured to determine whether a data source of the first data is the same as a data source of the real-time system data, where the first data is data stored in a data buffer;
    删除单元,用于在判断出所述第一数据的数据源与所述实时系统数据的数据源相同的情况下,删除所述第一数据中数据源与所述实时系统数据数据源相同的数据;以及a deleting unit, configured to delete, in a case where the data source of the first data is the same as the data source of the real-time system data, deleting the data in the first data and the data source in the real-time system data source ;as well as
    暂存单元,用于暂存所述实时系统数据至所述数据缓冲区。a temporary storage unit, configured to temporarily store the real-time system data to the data buffer.
  8. 根据权利要求7所述的去重装置,其中,所述实时系统数据的数量为多个,所述判断单元包括:The deduplication device according to claim 7, wherein the number of the real-time system data is plural, and the judging unit comprises:
    判断子单元,用于每接收一个所述实时系统数据,判断一次所述第一数据的数据源与所述实时系统数据的数据源是否相同。The determining subunit is configured to determine whether the data source of the first data and the data source of the real-time system data are the same each time the real-time system data is received.
  9. 根据权利要求8所述的去重装置,其中,所述判断子单元包括:The deduplication device of claim 8, wherein the determining subunit comprises:
    判断模块,用于判断每一个所述第一数据的标识ID与所述实时系统数据的标识ID是否相同,来判断所述第一数据的数据源与所述实时系统数据的数据源是否相同。The determining module is configured to determine whether the identifier ID of each of the first data is the same as the identifier ID of the real-time system data, to determine whether the data source of the first data is the same as the data source of the real-time system data.
  10. 根据权利要求7所述的去重装置,其中,所述去重装置还包括:The deduplication device of claim 7, wherein the deduplication device further comprises:
    存储单元,用于将所述数据缓冲区中的数据存储至目标存储区域;以及a storage unit, configured to store data in the data buffer to a target storage area;
    清空单元,用于清空所述数据缓冲区中的数据。Emptying the unit for clearing data in the data buffer.
  11. 根据权利要求10所述的去重装置,其中,所述存储单元用于每间隔预定时间将所述数据缓冲区中的数据存储至所述目标存储区域。The deduplication device according to claim 10, wherein said storage unit is configured to store data in said data buffer to said target storage area every predetermined time interval.
  12. 根据权利要求7所述的去重装置,其中,所述实时系统数据为来自视频实时处理系统或者网页实时处理系统的数据。 The deduplication device of claim 7, wherein the real-time system data is data from a video real-time processing system or a web page real-time processing system.
PCT/CN2014/088312 2013-11-07 2014-10-10 Deduplication method and apparatus of real-time system data WO2015067115A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310551776.8 2013-11-07
CN201310551776.8A CN103559282B (en) 2013-11-07 2013-11-07 The De-weight method and device of real-time system data

Publications (1)

Publication Number Publication Date
WO2015067115A1 true WO2015067115A1 (en) 2015-05-14

Family

ID=50013528

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/088312 WO2015067115A1 (en) 2013-11-07 2014-10-10 Deduplication method and apparatus of real-time system data

Country Status (2)

Country Link
CN (1) CN103559282B (en)
WO (1) WO2015067115A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559282B (en) * 2013-11-07 2018-02-23 北京国双科技有限公司 The De-weight method and device of real-time system data
CN104298750B (en) * 2014-10-14 2018-02-23 北京国双科技有限公司 Renewal processing method and processing device for real-time system communication
CN108959397A (en) * 2018-06-04 2018-12-07 成都盯盯科技有限公司 Data de-duplication method and terminal
CN108923972B (en) * 2018-06-30 2021-06-04 平安科技(深圳)有限公司 Weight-reducing flow prompting method, device, server and storage medium
CN111400370A (en) * 2020-03-06 2020-07-10 上海数据交易中心有限公司 Data monitoring method and device in data circulation, storage medium and server

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1988669A (en) * 2006-11-21 2007-06-27 北京大学 Digital marking structure, verifying method and monitoring broadcasting system in stream medium monitoring and broadcasting
CN101510835A (en) * 2009-03-23 2009-08-19 北京学之途网络科技有限公司 Method and system for monitoring multicast business of network television system
CN102591946A (en) * 2010-12-28 2012-07-18 微软公司 Using index partitioning and reconciliation for data deduplication
CN103559282A (en) * 2013-11-07 2014-02-05 北京国双科技有限公司 Real-time system data reduplication removing method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101102432A (en) * 2007-08-07 2008-01-09 四川长虹电器股份有限公司 Method for recording digital TV operation
EP2186015A4 (en) * 2007-09-05 2015-04-29 Emc Corp De-duplication in virtualized server and virtualized storage environments
US20100250502A1 (en) * 2009-03-27 2010-09-30 Kiyokazu Saigo Method and apparatus for contents de-duplication
CN101834801B (en) * 2010-05-20 2012-11-21 哈尔滨工业大学 Data caching and sequencing on-line processing method based on cache pool
CN103067696A (en) * 2013-01-31 2013-04-24 东方网力科技股份有限公司 Stream media caching method, device, controller and system facing video monitoring

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1988669A (en) * 2006-11-21 2007-06-27 北京大学 Digital marking structure, verifying method and monitoring broadcasting system in stream medium monitoring and broadcasting
CN101510835A (en) * 2009-03-23 2009-08-19 北京学之途网络科技有限公司 Method and system for monitoring multicast business of network television system
CN102591946A (en) * 2010-12-28 2012-07-18 微软公司 Using index partitioning and reconciliation for data deduplication
CN103559282A (en) * 2013-11-07 2014-02-05 北京国双科技有限公司 Real-time system data reduplication removing method and device

Also Published As

Publication number Publication date
CN103559282A (en) 2014-02-05
CN103559282B (en) 2018-02-23

Similar Documents

Publication Publication Date Title
WO2015067115A1 (en) Deduplication method and apparatus of real-time system data
US10204147B2 (en) System for capture, analysis and storage of time series data from sensors with heterogeneous report interval profiles
CN105989048B (en) Data record processing method, device and system
WO2014019349A1 (en) File merge method and device
CN109391647B (en) Storage resource recovery method, device and system
WO2016016944A1 (en) Database management system and database management method
WO2015085969A1 (en) Recommendation algorithm optimization method, device, and system
CN109672936B (en) Method and device for determining video evaluation set and electronic equipment
CN106488256B (en) data processing method and device
CN110879687B (en) Data reading method, device and equipment based on disk storage
CN107491458B (en) Method, device and system for storing time series data
US20140046912A1 (en) Methods and systems for data cleanup using physical image of files on storage devices
CN110928851A (en) Method, device and equipment for processing log information and storage medium
JP5024453B2 (en) Business flow distributed processing system and method
US9633027B1 (en) High speed backup
US20120265908A1 (en) Server and method for buffering monitored data
KR101666440B1 (en) Data processing method in In-memory Database System based on Circle-Queue
CN111694505A (en) Data storage management method, device and computer readable storage medium
US20180025096A1 (en) Data referring method, information processing apparatus, and storage medium
WO2021184333A1 (en) Multimedia data storage method, apparatus, device, storage medium, and program product
CN104915376A (en) Cloud storage file archiving and compressing method
TWI420333B (en) A distributed de-duplication system and the method therefore
WO2014162397A1 (en) Computer system, data management method, and computer
CN106873906A (en) Method and apparatus for managing metamessage
CN103559898A (en) Method, device and system for playing multi-media file

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14859521

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.09.2016)

122 Ep: pct application non-entry in european phase

Ref document number: 14859521

Country of ref document: EP

Kind code of ref document: A1