US20170060742A1 - Control of cache data - Google Patents

Control of cache data Download PDF

Info

Publication number
US20170060742A1
US20170060742A1 US15/243,825 US201615243825A US2017060742A1 US 20170060742 A1 US20170060742 A1 US 20170060742A1 US 201615243825 A US201615243825 A US 201615243825A US 2017060742 A1 US2017060742 A1 US 2017060742A1
Authority
US
United States
Prior art keywords
data
data items
cache
computer program
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/243,825
Inventor
Jim Wilkinson
Jonathan Lawn
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Metaswitch Networks Ltd
Original Assignee
Metaswitch Networks Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Metaswitch Networks Ltd filed Critical Metaswitch Networks Ltd
Assigned to METASWITCH NETWORKS LTD reassignment METASWITCH NETWORKS LTD ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LAWN, JONATHAN, WILKINSON, JIM
Publication of US20170060742A1 publication Critical patent/US20170060742A1/en
Priority to US17/341,270 priority Critical patent/US11438432B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0804Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0868Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/46Caching storage objects of specific type in disk cache
    • G06F2212/466Metadata, control data

Definitions

  • the present invention relates to the control of cached data, and more particularly to modifying the process of transfer of selected data from a cache to storage according to data inspection criteria.
  • the purpose of the cache is to smooth out uneven write request rates and data item sizes, and possibly to manipulate the requests so that they can be tailored for maximum write efficiency given the characteristics of the permanent storage. It may also avoid writes that are quickly overwritten.
  • the cache mechanism may also be used to allow the data to be structured to improve read or search performance. For example, it may be used to group records appropriately, to aggregate data in time order or to provide extended information in the records that are eventually written from the cache.
  • the latency introduced by the cache can cause problems. For instance, if any part of the system is likely to fail, it increases the chance that data will not reach the permanent storage. The latency will also mean that on an active system the data on the permanent storage is not up-to-date, or is only partially present. This can be limiting if one of the purposes of the system is to provide real-time or near-real-time inspection of the data (as well as archiving for later inspection or batch processing).
  • a machine-implemented method for controlling transfer of at least one data item from a data cache component in communication with storage using at least one relatively higher-latency path and at least one relatively lower-latency path comprising: receiving metadata defining at least a first characteristic of data selected for inspection; responsive to the metadata, seeking a match between said at least first characteristic and a second characteristic of at least one of a plurality of data items in the data cache component; selecting said at least one of the plurality of data items where the at least one of the plurality of data items has the second characteristic matching the first characteristic; and passing the selected one of the plurality of data items from the data cache component using the relatively lower-latency path.
  • an apparatus for controlling transfer of at least one data item from a data cache component in communication with storage using at least one relatively higher-latency path and at least one relatively lower-latency path comprising: a receiver component operable to receive metadata defining at least a first characteristic of data selected for inspection; a seeker component operable to respond to the metadata by seeking a match between the at least first characteristic and a second characteristic of at least one of a plurality of data items in the data cache component; a selector component operable to select the at least one of the plurality of data items where the at least one of the plurality of data items has the second characteristic matching the first characteristic; and a communications component operable to pass the selected one of the plurality of data items from the data cache component using the relatively lower-latency path.
  • a computer program product stored on a non-transient storage medium and comprising computer-readable code for controlling transfer of at least one data item from a data cache component in communication with storage using at least one relatively higher-latency path and at least one relatively lower-latency path, the computer-readable code comprising computer program code elements for receiving metadata defining at least a first characteristic of data selected for inspection; responsive to the metadata, seeking a match between the at least first characteristic and a second characteristic of at least one of a plurality of data items in the data cache component; selecting the at least one of the plurality of data items where the at least one of the plurality of data items has the second characteristic matching the first characteristic; and passing the selected one of the plurality of data items using the relatively lower-latency path.
  • FIG. 1 shows one part of an exemplary method for the control of cached data
  • FIG. 2 shows a further part of an exemplary method for the control of cached data
  • FIG. 3 shows an example of data flows from emitters to storage
  • FIG. 4 shows an exemplary apparatus operable to control cached data.
  • FIGS. 1 and 2 are shown parts of a machine-implemented method 100 , 200 for controlling transfer of at least one data item from a data cache component in communication with storage using at least one relatively higher-latency path and at least one relatively lower-latency path.
  • data cache component is merely a convenient exemplary descriptive term and that the term may also encompass such variants as a spool, a buffer or a queue, each of which represents a more or less temporary or transient data structure for managing the flow of data from an emitter to a consumer, such as a permanent storage arrangement.
  • the consumer storage arrangement may also vary, in that it may itself be a temporary structure, such as an in-memory database or the like.
  • the consumer storage may be an inspectable portion of the cache itself.
  • the consumer storage may be a further cache having a higher flush speed than the original cache.
  • path as used here is not intended restrictively.
  • the higher-latency and lower-latency path mechanism may be achieved simply by prioritising items selected for lower-latency handling, while the overall bandwidth of the paths is fixed.
  • the method 100 commences at Start step 102 , and at step 104 data is received.
  • a match is sought in stored metadata defining at least a first characteristic of data selected for inspection.
  • the metadata may comprise, in an example, stored search criteria from at least one of a current search and a prior search.
  • a match is sought between the characteristic in the stored metadata and a characteristic of at least one of the data items in transit from the data cache component. If, at test step 108 , no match is found, at step 110 , the data item may be passed from the cache via a “normal”, higher-latency path at step 110 and the process ends at End step 116 . However, if at test step 108 , a match is found, the data item is selected for passing via a lower-latency path at step 114 and the process ends at End step 116 . In one embodiment, when a match is found at test step 108 , the stream containing the matching data item may be marked for passing to the storage via the lower-latency path.
  • the data consists of a multiplicity of distinct streams, each of which has been aggregated for archiving on permanent storage, but where any small subset of the streams may be required for human inspection with as little latency as possible.
  • the data item or the data items in the stream may be further processed at process step 112 before being passed at step 114 .
  • the data may be tagged for ease of retrieval from the storage in a subsequent inspection process.
  • the data item may be duplicated by a copier component, so that the duplicate may be passed via the lower-latency path while an original remains in the cache.
  • FIG. 2 there is shown a further part of the method 100 , 200 .
  • the process begins at Start step 202 and at step 204 metadata is received.
  • a match is sought in the cache for data matching the metadata defining at least a first characteristic of data to be selected for inspection.
  • a match is sought between the characteristic in the received metadata and a characteristic of at least one of the data items in transit from the data cache component. If, at test step 208 , no match is found, at step 210 , process ends at End step 210 . However, if at test step 208 , a match is found, the data item is selected for passing via a lower-latency path at step 214 and the process returns to check for further matches in the cache at step 206 .
  • the data item or the data items in the stream may be further processed at process step 212 before being passed at step 214 .
  • the data may be tagged for ease of retrieval from the storage in a subsequent inspection process.
  • the data item may be duplicated by a copier component, so that the duplicate may be passed via the lower-latency path while an original remains in the cache.
  • the embodied method thus provides a mechanism to allow the inspection processing to selectively influence the caching behaviour, so that the data of interest reaches the inspectable storage with significantly reduced delay.
  • extensions to a data retrieval language such as SQL, may be used to allow a client to indicate which data should be selected for flushing from the cache, or the cache component itself may be operable proactively to monitor recent searches performed on data in the inspectable storage, and to fast-flush matching data items from the cache, possibly by predicting future “items of interest” based on past performance.
  • the cache component may select streams of data items that contain unexpected error log information for the fast-flush mechanism.
  • the data flow 300 shown in FIG. 3 of the appended drawings illustrates an embodiment in which one or more emitters 302 are generating a flow 303 of pre-cache data, possibly comprising one or more streams, as described above.
  • Flow 303 enters cache 304 and is transferred onward to, for example, inspectable storage 306 as a flow of post-cache data 305 as a result of a cache flush, a fetch operation or the like.
  • metadata from search client 308 or search service 310 is used to select data in the cache to be routed via the appropriate path—lower-latency if a data item in the flow matches the search criteria for a current or prior search, and higher-latency if no such match is found.
  • this technique allows the actual search to proceed against the inspectable storage as normal, in the same way it would without the new mechanism, but with significantly decreased latency.
  • this technique can indicate to the cache that a particular subset of streams should be written to inspectable storage faster than normal. Because only a small proportion of the data is being selected for this form of caching, overall write efficiently is not significantly affected.
  • the embodied method can be readily extended to include multiple data sources, caches, post-cache processors, permanent or temporary storage devices and inspection apparatus and methods.
  • the embodiment may take into account any delay caused by such processing, as well as the write latency of the inspectable storage.
  • the output of this post-cache processing might be inspected directly, without any specific permanent storage element being present.
  • processor intensive tasks like encryption or low-bandwidth communications can also create a queue of work where a cache might advantageously be deployed, and where benefits might be derived from allowing a reader to expedite certain data that it is waiting for using the disclosed technique or apparatus operable to perform the technique.
  • Cache component 400 for controlling transfer by a data sender component 406 of at least one data item from a data cache 403 using at least one relatively higher-latency path 405 and at least one relatively lower-latency path 407 .
  • Cache component 400 comprises a data receiver 401 that is operable to pass data on to filter 402 .
  • Filter 402 is operable in connection with search metadata store component 409 and cache search component 410 to locate selected data items in cache 403 .
  • Cache component 400 further comprises a search receiver component 408 that is operable to receive current or past search criteria, which are in turn used to provide metadata relating to data items in the cache 403 .
  • Cache search component 410 is operable in conjunction with cache flusher 404 to selectively flush items from the cache 403 via either high-latency processing 405 or low-latency processing 407 .
  • the data item or data items in the cached data may be tagged for ease of retrieval from the storage in a subsequent inspection process.
  • the data item may be duplicated by a copier component, so that the duplicate may be passed via the lower-latency path while an original remains in the cache.
  • the presently disclosed technique is of wide applicability.
  • One example is that of the gathering and use of diagnostic information from sessions between telecommunications clients, such sessions representing primarily telephone calls. Diagnostics from millions of calls per day from multiple servers must be stored for a number of weeks, necessitating large disk arrays. However, operators may want to inspect diagnostics from a few calls in real-time.
  • the diagnostics servers split the incoming diagnostics into a separate cache for each session, and normally flush this cache on a timer that detects a gap since the last data for the session was received.
  • search presentation layer stores the search terms that clients are currently inspecting. These terms are used to determine the appropriate sessions currently in the cache, and also future sessions as they start using the cache. The sessions identified are flushed to disk very quickly, so that the disk provides up-to-date information. The presentation layer is also notified of any new sessions by the cache, to prevent the need for polling for them.
  • Other applications of the disclosed technique include (but are not limited to) the provision of analytics with a live display of a subset of the data, such as system or network “health” or commercial performance metrics, such as database transaction costing. Further applications include human-determined analytics queries, such as system or network troubleshooting, customer analytics during customer care calls and interception and analytics instigated by law-enforcement or governance control agencies. Where the present technique is applied in the area of knowledge-based systems, there is scope for application of machine-determined diagnostic operations, such as automatic issue spotting, diagnostics collection and troubleshooting, firewalling of computer systems with automatic blacklists, and analysis relating to fraud detection.
  • aspects of the present technology may be embodied as a system, method or computer program product. Accordingly, aspects of the present technology may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects.
  • aspects of the present technology may take the form of a computer program product embodied in a transient or non-transient computer readable medium having computer readable program code embodied thereon.
  • the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
  • a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • Computer program code for carrying out operations for aspects of the present technology may be written in any combination of one or more programming languages, including object oriented programming languages and conventional procedural programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network.
  • Code components may be embodied as procedures, methods or the like, and may comprise sub-components which may take the form of instructions or sequences of instructions at any of the levels of abstraction, from the direct machine instructions of a native instruction set to high-level compiled or interpreted language constructs.
  • a logical method may suitably be embodied in a logic apparatus comprising logic elements to perform the steps of the method, and that such logic elements may comprise components such as logic gates in, for example a programmable logic array or application-specific integrated circuit.
  • Such a logic arrangement may further be embodied in enabling elements for temporarily or permanently establishing logic structures in such an array or circuit using, for example, a virtual hardware descriptor language, which may be stored and transmitted using fixed or transmittable carrier media.
  • an embodiment of the present technology may be realized in the form of a computer implemented method of deploying a service comprising steps of deploying computer program code operable to, when deployed into a computer infrastructure or network and executed thereon, cause said computer system or network to perform all the steps of the method.
  • the preferred embodiment of the present technology may be realized in the form of a data carrier having functional data thereon, said functional data comprising functional computer data structures to, when loaded into a computer system or network and operated upon thereby, enable said computer system to perform all the steps of the method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A machine-implemented method for controlling transfer of at least one data item from a data cache component, in communication with storage using at least one relatively higher-latency path and at least one relatively lower-latency path, comprises: receiving metadata defining at least a first characteristic of data selected for inspection; responsive to the metadata, seeking a match between said at least first characteristic and a second characteristic of at least one of a plurality of data items in the data cache component; selecting said at least one of the plurality of data items where the at least one of the plurality of data items has the second characteristic matching the first characteristic; and passing the selected one of the plurality of data items from the data cache component using the relatively lower-latency path.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims the benefit of United Kingdom Patent Application No. GB1515237.4, filed on Aug. 27, 2015 and entitled “Control of a Cache Data,” which is hereby incorporated by reference herein.
  • FIELD OF THE INVENTION
  • The present invention relates to the control of cached data, and more particularly to modifying the process of transfer of selected data from a cache to storage according to data inspection criteria.
  • BACKGROUND
  • The scalability of many applications is limited by how fast they can write to permanent storage. They therefore implement a “write-cache” using a faster storage medium which is placed before the permanent storage in the write process. This allows the writes to permanent storage to be performed as efficiently as possible, the cost being some latency before the write is complete, thus facing those of skill in the art with the problem of tuning the cache to manage the trade-off between latency and efficiency.
  • Often, the purpose of the cache is to smooth out uneven write request rates and data item sizes, and possibly to manipulate the requests so that they can be tailored for maximum write efficiency given the characteristics of the permanent storage. It may also avoid writes that are quickly overwritten. The cache mechanism may also be used to allow the data to be structured to improve read or search performance. For example, it may be used to group records appropriately, to aggregate data in time order or to provide extended information in the records that are eventually written from the cache.
  • However, the latency introduced by the cache can cause problems. For instance, if any part of the system is likely to fail, it increases the chance that data will not reach the permanent storage. The latency will also mean that on an active system the data on the permanent storage is not up-to-date, or is only partially present. This can be limiting if one of the purposes of the system is to provide real-time or near-real-time inspection of the data (as well as archiving for later inspection or batch processing).
  • SUMMARY OF THE INVENTION
  • According to a first aspect, there is provided a machine-implemented method for controlling transfer of at least one data item from a data cache component in communication with storage using at least one relatively higher-latency path and at least one relatively lower-latency path, the method comprising: receiving metadata defining at least a first characteristic of data selected for inspection; responsive to the metadata, seeking a match between said at least first characteristic and a second characteristic of at least one of a plurality of data items in the data cache component; selecting said at least one of the plurality of data items where the at least one of the plurality of data items has the second characteristic matching the first characteristic; and passing the selected one of the plurality of data items from the data cache component using the relatively lower-latency path.
  • In a second aspect, there is provided an apparatus for controlling transfer of at least one data item from a data cache component in communication with storage using at least one relatively higher-latency path and at least one relatively lower-latency path, the apparatus comprising: a receiver component operable to receive metadata defining at least a first characteristic of data selected for inspection; a seeker component operable to respond to the metadata by seeking a match between the at least first characteristic and a second characteristic of at least one of a plurality of data items in the data cache component; a selector component operable to select the at least one of the plurality of data items where the at least one of the plurality of data items has the second characteristic matching the first characteristic; and a communications component operable to pass the selected one of the plurality of data items from the data cache component using the relatively lower-latency path.
  • There may further be provided a computer program product stored on a non-transient storage medium and comprising computer-readable code for controlling transfer of at least one data item from a data cache component in communication with storage using at least one relatively higher-latency path and at least one relatively lower-latency path, the computer-readable code comprising computer program code elements for receiving metadata defining at least a first characteristic of data selected for inspection; responsive to the metadata, seeking a match between the at least first characteristic and a second characteristic of at least one of a plurality of data items in the data cache component; selecting the at least one of the plurality of data items where the at least one of the plurality of data items has the second characteristic matching the first characteristic; and passing the selected one of the plurality of data items using the relatively lower-latency path.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • An embodiment of the invention will now be described, by way of example only, with reference to the appended drawings, in which:
  • FIG. 1 shows one part of an exemplary method for the control of cached data;
  • FIG. 2 shows a further part of an exemplary method for the control of cached data;
  • FIG. 3 shows an example of data flows from emitters to storage; and
  • FIG. 4 shows an exemplary apparatus operable to control cached data.
  • DETAILED DESCRIPTION
  • In FIGS. 1 and 2 are shown parts of a machine-implemented method 100, 200 for controlling transfer of at least one data item from a data cache component in communication with storage using at least one relatively higher-latency path and at least one relatively lower-latency path. It will be clear to one of ordinary skill in the art that “data cache component” is merely a convenient exemplary descriptive term and that the term may also encompass such variants as a spool, a buffer or a queue, each of which represents a more or less temporary or transient data structure for managing the flow of data from an emitter to a consumer, such as a permanent storage arrangement. The consumer storage arrangement may also vary, in that it may itself be a temporary structure, such as an in-memory database or the like. In one further embodiment, the consumer storage may be an inspectable portion of the cache itself. In a different embodiment, the consumer storage may be a further cache having a higher flush speed than the original cache. It will be also clear to one of ordinary skill in the art that the word “path” as used here is not intended restrictively. Thus, the higher-latency and lower-latency path mechanism may be achieved simply by prioritising items selected for lower-latency handling, while the overall bandwidth of the paths is fixed.
  • The method 100 commences at Start step 102, and at step 104 data is received. At step 106, a match is sought in stored metadata defining at least a first characteristic of data selected for inspection. The metadata may comprise, in an example, stored search criteria from at least one of a current search and a prior search.
  • Responsive to the receipt of the data, at step 108 a match is sought between the characteristic in the stored metadata and a characteristic of at least one of the data items in transit from the data cache component. If, at test step 108, no match is found, at step 110, the data item may be passed from the cache via a “normal”, higher-latency path at step 110 and the process ends at End step 116. However, if at test step 108, a match is found, the data item is selected for passing via a lower-latency path at step 114 and the process ends at End step 116. In one embodiment, when a match is found at test step 108, the stream containing the matching data item may be marked for passing to the storage via the lower-latency path. Thus is addressed the common situation where the data consists of a multiplicity of distinct streams, each of which has been aggregated for archiving on permanent storage, but where any small subset of the streams may be required for human inspection with as little latency as possible.
  • Optionally, the data item or the data items in the stream may be further processed at process step 112 before being passed at step 114. For example, the data may be tagged for ease of retrieval from the storage in a subsequent inspection process. In a further optional refinement, the data item may be duplicated by a copier component, so that the duplicate may be passed via the lower-latency path while an original remains in the cache.
  • In FIG. 2, there is shown a further part of the method 100, 200. The process begins at Start step 202 and at step 204 metadata is received. At step 206, a match is sought in the cache for data matching the metadata defining at least a first characteristic of data to be selected for inspection.
  • Responsive to the receipt of the metadata, at step 208 a match is sought between the characteristic in the received metadata and a characteristic of at least one of the data items in transit from the data cache component. If, at test step 208, no match is found, at step 210, process ends at End step 210. However, if at test step 208, a match is found, the data item is selected for passing via a lower-latency path at step 214 and the process returns to check for further matches in the cache at step 206.
  • Optionally, the data item or the data items in the stream may be further processed at process step 212 before being passed at step 214. For example, the data may be tagged for ease of retrieval from the storage in a subsequent inspection process. In a further optional refinement, the data item may be duplicated by a copier component, so that the duplicate may be passed via the lower-latency path while an original remains in the cache.
  • The embodied method thus provides a mechanism to allow the inspection processing to selectively influence the caching behaviour, so that the data of interest reaches the inspectable storage with significantly reduced delay. In one embodiment, extensions to a data retrieval language, such as SQL, may be used to allow a client to indicate which data should be selected for flushing from the cache, or the cache component itself may be operable proactively to monitor recent searches performed on data in the inspectable storage, and to fast-flush matching data items from the cache, possibly by predicting future “items of interest” based on past performance. In a further variant, the cache component may select streams of data items that contain unexpected error log information for the fast-flush mechanism.
  • The data flow 300 shown in FIG. 3 of the appended drawings illustrates an embodiment in which one or more emitters 302 are generating a flow 303 of pre-cache data, possibly comprising one or more streams, as described above. Flow 303 enters cache 304 and is transferred onward to, for example, inspectable storage 306 as a flow of post-cache data 305 as a result of a cache flush, a fetch operation or the like. Before the flow from emitter 302 to inspectable storage 306, metadata from search client 308 or search service 310 is used to select data in the cache to be routed via the appropriate path—lower-latency if a data item in the flow matches the search criteria for a current or prior search, and higher-latency if no such match is found.
  • This allows the actual search to proceed against the inspectable storage as normal, in the same way it would without the new mechanism, but with significantly decreased latency. For example, when the data consists of streams, this technique can indicate to the cache that a particular subset of streams should be written to inspectable storage faster than normal. Because only a small proportion of the data is being selected for this form of caching, overall write efficiently is not significantly affected.
  • In a refinement of the disclosed technique, the embodied method can be readily extended to include multiple data sources, caches, post-cache processors, permanent or temporary storage devices and inspection apparatus and methods. In the case of post-cache processor involvement, the embodiment may take into account any delay caused by such processing, as well as the write latency of the inspectable storage. The output of this post-cache processing might be inspected directly, without any specific permanent storage element being present. For instance, processor intensive tasks like encryption or low-bandwidth communications can also create a queue of work where a cache might advantageously be deployed, and where benefits might be derived from allowing a reader to expedite certain data that it is waiting for using the disclosed technique or apparatus operable to perform the technique.
  • Turning now to FIG. 4 of the appended drawings, there is shown a cache component 400 for controlling transfer by a data sender component 406 of at least one data item from a data cache 403 using at least one relatively higher-latency path 405 and at least one relatively lower-latency path 407. Cache component 400 comprises a data receiver 401 that is operable to pass data on to filter 402. Filter 402 is operable in connection with search metadata store component 409 and cache search component 410 to locate selected data items in cache 403. Cache component 400 further comprises a search receiver component 408 that is operable to receive current or past search criteria, which are in turn used to provide metadata relating to data items in the cache 403. Cache search component 410 is operable in conjunction with cache flusher 404 to selectively flush items from the cache 403 via either high-latency processing 405 or low-latency processing 407.
  • Optionally, the data item or data items in the cached data may be tagged for ease of retrieval from the storage in a subsequent inspection process. In a further optional refinement the data item may be duplicated by a copier component, so that the duplicate may be passed via the lower-latency path while an original remains in the cache.
  • As will be clear to one of ordinary skill in the art, the presently disclosed technique is of wide applicability. One example is that of the gathering and use of diagnostic information from sessions between telecommunications clients, such sessions representing primarily telephone calls. Diagnostics from millions of calls per day from multiple servers must be stored for a number of weeks, necessitating large disk arrays. However, operators may want to inspect diagnostics from a few calls in real-time.
  • The diagnostics servers split the incoming diagnostics into a separate cache for each session, and normally flush this cache on a timer that detects a gap since the last data for the session was received.
  • Application of an embodiment of the present technique adds a mechanism whereby the search presentation layer stores the search terms that clients are currently inspecting. These terms are used to determine the appropriate sessions currently in the cache, and also future sessions as they start using the cache. The sessions identified are flushed to disk very quickly, so that the disk provides up-to-date information. The presentation layer is also notified of any new sessions by the cache, to prevent the need for polling for them.
  • Other applications of the disclosed technique include (but are not limited to) the provision of analytics with a live display of a subset of the data, such as system or network “health” or commercial performance metrics, such as database transaction costing. Further applications include human-determined analytics queries, such as system or network troubleshooting, customer analytics during customer care calls and interception and analytics instigated by law-enforcement or governance control agencies. Where the present technique is applied in the area of knowledge-based systems, there is scope for application of machine-determined diagnostic operations, such as automatic issue spotting, diagnostics collection and troubleshooting, firewalling of computer systems with automatic blacklists, and analysis relating to fraud detection.
  • As will be appreciated by one skilled in the art, aspects of the present technology may be embodied as a system, method or computer program product. Accordingly, aspects of the present technology may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects.
  • Furthermore, aspects of the present technology may take the form of a computer program product embodied in a transient or non-transient computer readable medium having computer readable program code embodied thereon. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • Computer program code for carrying out operations for aspects of the present technology may be written in any combination of one or more programming languages, including object oriented programming languages and conventional procedural programming languages. The program code may execute entirely on the user's computer, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network. Code components may be embodied as procedures, methods or the like, and may comprise sub-components which may take the form of instructions or sequences of instructions at any of the levels of abstraction, from the direct machine instructions of a native instruction set to high-level compiled or interpreted language constructs.
  • It will also be clear to one of skill in the art that all or part of a logical method according to the preferred embodiments of the present technology may suitably be embodied in a logic apparatus comprising logic elements to perform the steps of the method, and that such logic elements may comprise components such as logic gates in, for example a programmable logic array or application-specific integrated circuit. Such a logic arrangement may further be embodied in enabling elements for temporarily or permanently establishing logic structures in such an array or circuit using, for example, a virtual hardware descriptor language, which may be stored and transmitted using fixed or transmittable carrier media.
  • In one alternative, an embodiment of the present technology may be realized in the form of a computer implemented method of deploying a service comprising steps of deploying computer program code operable to, when deployed into a computer infrastructure or network and executed thereon, cause said computer system or network to perform all the steps of the method.
  • In a further alternative, the preferred embodiment of the present technology may be realized in the form of a data carrier having functional data thereon, said functional data comprising functional computer data structures to, when loaded into a computer system or network and operated upon thereby, enable said computer system to perform all the steps of the method.
  • It will be clear to one skilled in the art that many improvements and modifications can be made to the foregoing exemplary embodiments without departing from the scope of the present invention.

Claims (27)

What is claimed is:
1. A machine-implemented method for controlling transfer of at least one data item from a data cache component in communication with storage using at least one relatively higher-latency path and at least one relatively lower-latency path, the method comprising:
receiving metadata defining at least a first characteristic of data selected for inspection;
responsive to said metadata, seeking a match between said at least first characteristic and a second characteristic of at least one of a plurality of data items in said data cache component;
selecting said at least one of said plurality of data items where said at least one of said plurality of data items has said second characteristic matching said first characteristic; and
passing the selected one of said plurality of data items from said data cache component using said relatively lower-latency path.
2. The machine-implemented method as claimed in claim 1, wherein said receiving metadata comprises receiving stored search criteria from at least one of a current search and a prior search.
3. The machine-implemented method as claimed in claim 1, wherein said passing comprises making a duplicate of said data item and passing said duplicate while leaving an original of said data item in said cache.
4. The machine-implemented method as claimed in claim 1, wherein said passing comprises passing the selected one of said plurality of data items to permanent storage.
5. The machine-implemented method as claimed in claim 1, wherein said passing comprises passing the selected one of said plurality of data items to at least one of inspectable permanent storage, an inspectable portion of said cache, a low-bandwidth communications channel, and a post-cache processor.
6. The machine-implemented method as claimed in claim 1, further comprising passing at least one unselected data item of said plurality of data items to inspectable storage using said relatively higher-latency path.
7. The machine-implemented method as claimed in claim 1, further comprising passing said selected one of said plurality of data items to a further cache having a higher flush speed than a cache used for at least one unselected data item of said plurality of data items.
8. The machine-implemented method as claimed in claim 1, wherein said plurality of data items is divided into a plurality of streams, wherein:
said selecting further comprises selecting a stream by matching said first characteristic and said second characteristic of a said data item in said stream; and
wherein said passing comprises passing a plurality of data items from said stream.
9. The machine-implemented method as claimed in claim 1, further comprising tagging said data item to expedite finding said data item for inspection.
10. An apparatus for controlling transfer of at least one data item from a data cache component in communication with storage using at least one relatively higher-latency path and at least one relatively lower-latency path, the apparatus comprising:
a receiver component operable to receive metadata defining at least a first characteristic of data selected for inspection;
a seeker component operable to respond to said metadata by seeking a match between said at least first characteristic and a second characteristic of at least one of a plurality of data items in said data cache component;
a selector component operable to select said at least one of said plurality of data items where said at least one of said plurality of data items has said second characteristic matching said first characteristic; and
a communications component operable to pass the selected one of said plurality of data items from said data cache component using said relatively lower-latency path.
11. The apparatus as claimed in claim 10, wherein said receiver component is operable to receive metadata comprising stored search criteria from at least one of a current search and a prior search.
12. The apparatus as claimed in claim 10, further comprising a copying component operable to make a duplicate of said data item while leaving an original of said data item in said cache.
13. The apparatus as claimed in claim 10, wherein said communications component is operable to pass the selected one of said plurality of data items to permanent storage.
14. The apparatus as claimed in claim 10, wherein said communications component is operable to pass the selected one of said plurality of data items to at least one of inspectable permanent storage, an inspectable portion of said cache, a low-bandwidth communications channel, and a post-cache processor.
15. The apparatus as claimed in claim 10, wherein said communications component is further operable to pass at least one unselected data item of said plurality of data items to inspectable storage using said relatively higher-latency path.
16. The apparatus as claimed in claim 10, wherein said communications component is further operable to pass said selected one of said plurality of data items to a further cache having a higher flush speed than a cache used for at least one unselected data item of said plurality of data items.
17. The apparatus as claimed in claim 10, wherein said plurality of data items is divided into a plurality of streams, wherein:
said selector is further operable to select a stream by matching said first characteristic and said second characteristic of a said data item in said stream; and
wherein said communications component is further operable to pass a plurality of data items from said stream.
18. The apparatus as claimed in claim 10, wherein said communications component further comprises a tagger component operable to tag said data item to expedite finding said data item for inspection.
19. A computer program product stored on a non-transitory storage medium and comprising computer-readable code for controlling transfer of at least one data item from a data cache component in communication with storage using at least one relatively higher-latency path and at least one relatively lower-latency path, the computer-readable code comprising computer program code elements for:
receiving metadata defining at least a first characteristic of data selected for inspection;
responsive to said metadata, seeking a match between said at least first characteristic and a second characteristic of at least one of a plurality of data items in said data cache component;
selecting said at least one of said plurality of data items where said at least one of said plurality of data items has said second characteristic matching said first characteristic; and
passing the selected one of said plurality of data items using said relatively lower-latency path.
20. The computer program product as claimed in claim 19, wherein the computer program code element for receiving metadata comprises a computer program code element for receiving stored search criteria from at least one of a current search and a prior search.
21. The computer program product as claimed in claim 19, wherein the computer program code element for passing comprises a computer program code element for making a duplicate of said data item and passing said duplicate while leaving an original of said data item in said cache.
22. The computer program product as claimed in claim 19, wherein the computer program code element for passing comprises a computer program code element for passing the selected one of said plurality of data items to permanent storage.
23. The computer program product as claimed in claim 19, wherein the computer program code element for passing comprises a computer program code element for passing the selected one of said plurality of data items to at least one of inspectable permanent storage, an inspectable portion of said cache, a low-bandwidth communications channel, and a post-cache processor.
24. The computer program product as claimed in claim 19, wherein the computer program code element for passing further comprises a computer program code element for passing at least one unselected data item of said plurality of data items to inspectable storage using said relatively higher-latency path.
25. The computer program product as claimed in claim 19, wherein the computer program code element for passing further comprises a computer program code element for passing said selected one of said plurality of data items to a further cache having a higher flush speed than a cache used for at least one unselected data item of said plurality of data items.
26. The computer program product as claimed in claim 19, wherein said plurality of data items is divided into a plurality of streams, wherein:
said computer program code element for selecting further comprises a computer program code element for selecting a stream by matching said first characteristic and said second characteristic of a said data item in said stream; and
wherein said computer program code element for passing comprises a computer program code element for passing a plurality of data items from said stream.
27. The computer program product as claimed in claim 19, further comprising a computer program code element for tagging said data item to expedite finding said data item for inspection.
US15/243,825 2015-08-27 2016-08-22 Control of cache data Abandoned US20170060742A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/341,270 US11438432B2 (en) 2015-08-27 2021-06-07 Control of cache data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1515237.4A GB2543252B (en) 2015-08-27 2015-08-27 Control of cache data
GB1515237.4 2015-08-27

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/341,270 Continuation US11438432B2 (en) 2015-08-27 2021-06-07 Control of cache data

Publications (1)

Publication Number Publication Date
US20170060742A1 true US20170060742A1 (en) 2017-03-02

Family

ID=54326424

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/243,825 Abandoned US20170060742A1 (en) 2015-08-27 2016-08-22 Control of cache data
US17/341,270 Active US11438432B2 (en) 2015-08-27 2021-06-07 Control of cache data

Family Applications After (1)

Application Number Title Priority Date Filing Date
US17/341,270 Active US11438432B2 (en) 2015-08-27 2021-06-07 Control of cache data

Country Status (2)

Country Link
US (2) US20170060742A1 (en)
GB (1) GB2543252B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6128717A (en) * 1998-01-20 2000-10-03 Quantum Corporation Method and apparatus for storage application programming interface for digital mass storage and retrieval based upon data object type or size and characteristics of the data storage device
US7120836B1 (en) * 2000-11-07 2006-10-10 Unisys Corporation System and method for increasing cache hit detection performance
US20100235522A1 (en) * 2009-03-11 2010-09-16 Juniper Networks Inc. Session-cache-based http acceleration
WO2013030595A1 (en) * 2011-08-31 2013-03-07 Metaswitch Networks Ltd Identifying data items
US20160147664A1 (en) * 2014-11-21 2016-05-26 International Business Machines Corporation Dynamic partial blocking of a cache ecc bypass

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6553463B1 (en) * 1999-11-09 2003-04-22 International Business Machines Corporation Method and system for high speed access to a banked cache memory
US7568068B2 (en) * 2006-11-13 2009-07-28 Hitachi Global Storage Technologies Netherlands B. V. Disk drive with cache having volatile and nonvolatile memory
US9887914B2 (en) * 2014-02-04 2018-02-06 Fastly, Inc. Communication path selection for content delivery
US9274713B2 (en) * 2014-04-03 2016-03-01 Avago Technologies General Ip (Singapore) Pte. Ltd. Device driver, method and computer-readable medium for dynamically configuring a storage controller based on RAID type, data alignment with a characteristic of storage elements and queue depth in a cache

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6128717A (en) * 1998-01-20 2000-10-03 Quantum Corporation Method and apparatus for storage application programming interface for digital mass storage and retrieval based upon data object type or size and characteristics of the data storage device
US7120836B1 (en) * 2000-11-07 2006-10-10 Unisys Corporation System and method for increasing cache hit detection performance
US20100235522A1 (en) * 2009-03-11 2010-09-16 Juniper Networks Inc. Session-cache-based http acceleration
WO2013030595A1 (en) * 2011-08-31 2013-03-07 Metaswitch Networks Ltd Identifying data items
US20160147664A1 (en) * 2014-11-21 2016-05-26 International Business Machines Corporation Dynamic partial blocking of a cache ecc bypass

Also Published As

Publication number Publication date
US20210365373A1 (en) 2021-11-25
GB2543252A (en) 2017-04-19
US11438432B2 (en) 2022-09-06
GB201515237D0 (en) 2015-10-14
GB2543252B (en) 2021-03-17

Similar Documents

Publication Publication Date Title
US12013852B1 (en) Unified data processing across streaming and indexed data sets
US11194552B1 (en) Assisted visual programming for iterative message processing system
US11113353B1 (en) Visual programming for iterative message processing system
US10775976B1 (en) Visual previews for programming an iterative publish-subscribe message processing system
US11886440B1 (en) Guided creation interface for streaming data processing pipelines
US11614923B2 (en) Dual textual/graphical programming interfaces for streaming data processing pipelines
US11650995B2 (en) User defined data stream for routing data to a data destination based on a data route
US11669528B2 (en) Joining multiple events in data streaming analytics systems
US9917913B2 (en) Large message support for a publish-subscribe messaging system
US20180196753A1 (en) Pre-fetching data from buckets in remote storage for a cache
US9965518B2 (en) Handling missing data tuples in a streaming environment
US20210096981A1 (en) Identifying differences in resource usage across different versions of a software application
US20220121708A1 (en) Dynamic data enrichment
US11620336B1 (en) Managing and storing buckets to a remote shared storage system based on a collective bucket size
US20100153363A1 (en) Stream data processing method and system
US11720824B1 (en) Visualizing outliers from timestamped event data using machine learning-based models
US11687487B1 (en) Text files updates to an active processing pipeline
Pal et al. Big data real time ingestion and machine learning
US20190253532A1 (en) Increasing data resiliency operations based on identifying bottleneck operators
US20230385288A1 (en) User interface for customizing data streams and processing pipelines
US11438432B2 (en) Control of cache data
US11922222B1 (en) Generating a modified component for a data intake and query system using an isolated execution environment image
US10608903B1 (en) System and method for serverless monitoring
US11232106B1 (en) Windowed query with event-based open time for analytics of streaming data
US9229965B2 (en) Managing attributes in stream processing using a cache

Legal Events

Date Code Title Description
AS Assignment

Owner name: METASWITCH NETWORKS LTD, UNITED KINGDOM

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WILKINSON, JIM;LAWN, JONATHAN;SIGNING DATES FROM 20160901 TO 20160907;REEL/FRAME:039953/0817

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: ADVISORY ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION