WO2015006795A1 - System and method of implementing near real time updates to a search index - Google Patents

System and method of implementing near real time updates to a search index Download PDF

Info

Publication number
WO2015006795A1
WO2015006795A1 PCT/AU2013/001387 AU2013001387W WO2015006795A1 WO 2015006795 A1 WO2015006795 A1 WO 2015006795A1 AU 2013001387 W AU2013001387 W AU 2013001387W WO 2015006795 A1 WO2015006795 A1 WO 2015006795A1
Authority
WO
WIPO (PCT)
Prior art keywords
data structure
update
index
search
requests
Prior art date
Application number
PCT/AU2013/001387
Other languages
French (fr)
Inventor
Aaron HERNAT SIRAKY
Michael Ridgway
Khan THOMPSON
Original Assignee
Carsales.Com Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2013902663A external-priority patent/AU2013902663A0/en
Application filed by Carsales.Com Ltd filed Critical Carsales.Com Ltd
Publication of WO2015006795A1 publication Critical patent/WO2015006795A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Definitions

  • the invention relates to a system, method and computer software instruction code for updating a search index of stored data that significantly reduces the time delay between a change in the stored data and the availability of an updated search index that accurately reflects the change to the underlying stored data.
  • database programmers strive to ensure that there is no disparity between the data stored in a database and the search index used by users when conducting queries to determine the presence or otherwise of data in the database.
  • database programmers are very familiar with the "ACID" principles which is an abbreviation for the fundamental principles of database design and integrity meaning Atomic, Consistent, Isolation and Durability.
  • database programmers generally consider a disparity between the underlying data stored in a database and the search index reflecting that stored data to be unacceptable and as a result, over many years, it has become standard practice to perform updates to databases and search indexes by taking the database and/or search index off-line and performing updates in a batch process. Batch processes to update databases and their accompanying search indexes are generally performed over night or during some other time when the database is unlikely to be required by users.
  • help desk or technical support department of organisations that manage large databases are regularly subject to queries from users seeking to determine whether their change request has been effected and it is usually the case that the help desk or technical support department needs to explain to the user that there is a delay between the point in time that the change request is submitted and the corresponding search index is updated to reflect that change.
  • a delay between the point in time of a change request and an update to the corresponding search index also proves troublesome for other users who conduct a search query and receive "old" data.
  • a user who concludes a transaction and sells their item may submit a request to remove the item for sale from the database.
  • other users may conduct a search and locate "old" data indicating that the item remains available for sale.
  • Delayed updates to a search index is also a significant problem for on-line retailers seeking to sell goods with a limited shelf-life (e.g. perishable items) or goods for which sale after a particular date (e.g. Easter, Mother's day, etc.) will result in a significant devaluation of the goods.
  • Prompt update to any reduction in cost to consumers for such goods is important to improve the retailer's prospects of selling the goods for the highest price possible.
  • the present invention provides a method of updating a search index for stored data wherein the search index includes at least one data structure, the method including receiving a request to update a first index data structure, producing a copy of the data structure thereby generating a second data structure, preventing any further requests to update the first index data structure, effecting the update to the second data structure such that the search index accurately reflects the stored data, retaining the first index data structure until all search queries previously directed to the first index data structure are completed, swapping the first index data structure with the updated second data structure upon which all new search queries are conducted, the second data structure thereby becoming the first index data structure and acting as the search index, or part thereof and allowing further requests to update the first index data structure.
  • the method defined above prevents more than one update request occurring to a data structure at any point in time.
  • any single update request may include one or more changes to the data structure.
  • allowing search queries to continue whilst updating the second data structure significantly improves the performance of the search index despite preventing subsequent update requests until such time as the second data structure is completely updated according to the current Update request and the first index data structure is swapped with the second data structure.
  • it is not possible to conduct more than a single update request to a single data structure at any point in time it is the ability to allow search queries to continue whilst the second data structure is being updated that gives rise to the significant improvement in performance and hence, the overall performance of the search index is significantly better as compared with other updating processes and procedures.
  • the second data structure in addition to effecting an update request to the second data structure such that the search index accurately reflects the stored data, the second data structure is compacted prior to the replacement of the first index data structure with the second data structure.
  • the second data structure by effecting a compacting process as part of the update request to the second data structure, fragmentation of the data structure as a result of updates is prevented.
  • the extent to which fragmentation of the data structure is avoided may be selected by the database programmer and in some instances, a number of update requests may occur before a compacting process is included as part of the generation of a second data structure and swapping of the first index data structure with the second (updated) data structure.
  • the step of preventing subsequent update requests at the time an update process has commenced includes receiving subsequent update requests and placing those update requests in a queue such that subsequent update requests may be commenced as soon as possible after the data structure swap is complete and further update requests may be effected.
  • the method includes managing the update process for each individual data structure that collectively form the search index for the stored data.
  • multiple update requests may be accommodated simultaneously in respect of the search index whilst multiple simultaneous update requests to the same data structure are prevented.
  • the present invention provides a system for updating a search index associated with stored data wherein the search index includes at least one index data structure, the system including at least one computer processor operable to execute computer instruction code to perform search queries and updates upon a first index data structure forming the search index, or part thereof, a first user operated computer processor operable to execute computer instruction code to request an update to the stored data, a second user operated computer processor operable to execute computer instruction code to request a search in respect of the stored data, and a data communications network operably connected to the at least one computer processor for managing the execution of search queries and the first and second user operated computer processors, the at least one computer processor executing computer instruction code to perform search queries and update requests being further operable to receive a request from the first user operated processor to update the first index data structure and upon receipt of same, producing a copy of the first index data structure thereby generating a second data structure, and preventing any subsequent requests to update the first index data structure and completing all search queries previously directed to the first index data structure, the at least one computer processor operable to
  • the present invention provides a system for updating a search index associated with stored data wherein the search index includes at least one data structure, the system including at least one computer processor operable to execute computer instruction code to perform search queries and updates upon a first index data structure forming the search index, or part thereof, the at least one computer processor operable to receive a request to update the stored data from a first user operated processor, the computer processor operable to receive a request to conduct a search in respect of the stored data from a second user, and a data communications network operably connecting the at least one computer processor for performing search queries and update requests from the first and second users, the at least one computer processor executing computer instruction code causing the processor to receive search queries and update requests and upon receipt of a request from the first user to update the first index data structure, producing a copy of the first index data structure thereby generating a second data structure, preventing any subsequent requests to update the first index data structure and completing all search queries previously directed to the first index database structure, the at least one computer processor further operable to
  • Any search queries commenced upon the first index data structure should be completed on that data structure otherwise, transferring existing search queries to the second data structure will likely generate erroneous results. However, new search queries received subsequent to receiving a request to update stored data are directed to the second data structure. Once all existing search queries have been completed in respect of the first index data structure, the memory resources associated with the first index data structure may be relinquished and made available for other requirements.
  • the present invention provides computer instruction code executable upon one or more processors causing the one or more processors to execute the method steps of the invention.
  • the system and method of the present invention provides a significant improvement to the performance of search indexes and in particular, a significant reduction in the time delay between an update request to the stored data of a database and the availability of the new data to users conducting searches of the database.
  • a user will conduct a search query based upon the first index data structure whilst an update is occurring in respect of the second data structure.
  • Figure 1 provides a diagrammatic representation of a data structure for which an update is required
  • Figure 2 is a diagrammatic representation of the process of generating a copy of the data structure for updating purposes according to an embodiment of the invention
  • Figure 3 is a diagrammatic representation of the process of effecting an update to the copy of the data structure according to an embodiment of the invention
  • Figure 4 is a diagrammatic representation of the completion of the update process according to an embodiment of the invention.
  • Figure 5 is a diagrammatic representation of the swapping of the original data structure with an updated data structure according to an embodiment of the invention.
  • a data structure in the form of an array consisting of five elements (a[0], a[1], a[2], a[3], a[4]) is depicted in which the data structure contains data elements in the array from the least most to the upper most elements of 1 , 2, 4, 8 and 16.
  • the data in the array referenced by the pointer "a" has been sorted and the data elements from the least most to the upper most elements in the array are in ascending order.
  • the data structure (a) in Figure 1 represents a data structure that forms the search index (or part thereof) of a database with stored data. Any user conducting a search of the database accesses the data structure (a) and, in accordance with usual searching processes conducted in respect of databases, various "Readers” submit their search queries and those search queries are accommodated by accessing the data structure (a) and providing a response to each search query.
  • various search queries are depicted by the individual "Readers", namely, Reader 1 , Reader 2 to Reader N.
  • the array consisting of the five elements (a[0], a[1], a[2], a[3], a[4]) is located in memory at a "root" address and the readers are directed to the data structure by reference to the pointer (a).
  • the individual Readers receive a response to their search query in the form of data providing links to the underlying database and upon receiving the response to their search query, the user continues to interrogate the database until they locate the data they are seeking.
  • FIG. 3 a diagrammatic representation of the process of updating the copy of the data structure (a Copy) is depicted and the data structure (a Copy) comprises an array of six elements (a[0], a,[1], a[2], a[3], a[4], a[5]) in which new data (12) and (26) have been inserted into the array in the same sort order and occupies the fourth element of the array (a[3]) and the sixth element of the array (a[5]). Further, old data (4) that was originally the third element of the array (a[2]) has been removed.
  • the swapping of the data structure is detailed in Figure 4, which depicts the process in which the data structure (a) has been swapped with the updated copy of the data structure (a Copy) and in the embodiment depicted in Figure 4, the swapping process is implemented by altering the "root address" of the search index data structure to the address referenced by (a Copy).
  • the new data structure (a) comprises the updated data structure of an array consisting of six elements which includes the old and new data with data sorted in the same sort order with the data from the least most to the upper most array element being 1 , 2, 8, 12, 16 and 26.
  • FIG. 5 A diagrammatic representation of the process of completing search queries based upon the original data structure (a) and directing new search queries to the updated data structure is depicted in the diagrammatic representation in Figure 5.
  • new search queries are directed to the updated data structure that has effectively become the "first" data structure whilst search queries that have previously been initiated prior to the availability of the updated data structure are completed in accordance with the original data structure until such time as those search queries have been completed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method of updating a search index for stored data wherein the search index includes at least one data structure, the method including receiving a request to update a first index data structure, producing a copy of the data structure thereby generating a second data structure, preventing any further requests to update the first index data structure, effecting the update to the second data structure such that the search index accurately reflects the stored data, retaining the first index data structure until all search queries previously directed to the first index data structure are completed, swapping the first index data structure with the updated second data structure upon which all new search queries are conducted, the second data structure thereby becoming the first index data structure and acting as the search index, or part thereof, and allowing further requests to update the first index data structure.

Description

SYSTEM AND METHOD OF IMPLEMENTING NEAR REAL TIME UPDATES TO
A SEARCH INDEX
FIELD OF THE INVENTION
[0001] The invention relates to a system, method and computer software instruction code for updating a search index of stored data that significantly reduces the time delay between a change in the stored data and the availability of an updated search index that accurately reflects the change to the underlying stored data.
BACKGROUND OF THE INVENTION
[0002] Computer databases have become ubiquitous and are generally used on a daily basis by a large proportion of the population either in their personal lives or in commerce. Increasingly, databases are storing larger amounts of data as data is collected and the deployment and use of databases is increasing as the population across the world increasingly relies upon computer communications networks to locate and identify relevant data and conduct searches of databases in order to locate the data necessary to fulfill their personal needs or achieve a commercial outcome.
[0003] As computer databases store increasingly large amounts of data, the process of conducting a search of a database becomes increasingly longer. Although the increase in the search time is not a linear relationship with the size of the database, the search time does increase with an increased size of the underlying database. Database programmers are mindful of the increasing delays associated with conducting searches of databases and in particular, the time delay between the point in time that an update to the data stored in a database is requested/submitted and the point in time that the database search index is updated to enable users to identify the new, or changed, data
in the database during a search query.
[0004] Generally, database programmers strive to ensure that there is no disparity between the data stored in a database and the search index used by users when conducting queries to determine the presence or otherwise of data in the database. In this regard, database programmers are very familiar with the "ACID" principles which is an abbreviation for the fundamental principles of database design and integrity meaning Atomic, Consistent, Isolation and Durability. [0005] Accordingly, database programmers generally consider a disparity between the underlying data stored in a database and the search index reflecting that stored data to be unacceptable and as a result, over many years, it has become standard practice to perform updates to databases and search indexes by taking the database and/or search index off-line and performing updates in a batch process. Batch processes to update databases and their accompanying search indexes are generally performed over night or during some other time when the database is unlikely to be required by users.
[0006] As a result of the standard practice, updates to data that occur during a working day are generally not incorporated into the database until the update process that is implemented overnight and any such updates to the data and the associated search index are generally not available until the start of the next working day. Of course, the delay between the point in time that a user submits a request to update data in a database and the point in time at which other users become aware of the updated data is becoming an increasing problem in various database applications. For example, in the instance of a database that contains data that is relatively dynamic in nature, users prefer to make any change to that data available to other users as soon as possible. In the example of a database that stores data relating to items for sale, if a seller has provided details to the database regarding an item that they are offering for sale with a brief description and a sale price, in the event that the seller seeks to reduce the price to make the item more attractive, they are clearly interested in having the reduced sale price available to other users as soon as possible in order to improve the prospects of achieving a prompt sale.
[0007] Whilst database programmers and designers are aware of the technical difficulties associated with updating a database and the associated search index with new data, users are generally unaware of those technical difficulties and solely seek to have their new data available to others as soon as possible. When a user experiences a significant delay to an update of the database search index, they generally initiate a query to the help desk or technical support department that manages the database as they are generally concerned that the requested update has failed to be recorded. This is due to the user conducting a search shortly after requesting an update to the database and failing to locate the new data in that search query that relates to their change request. Accordingly, the help desk or technical support department of organisations that manage large databases are regularly subject to queries from users seeking to determine whether their change request has been effected and it is usually the case that the help desk or technical support department needs to explain to the user that there is a delay between the point in time that the change request is submitted and the corresponding search index is updated to reflect that change.
[0008] In addition to creating difficulties for users with respect to a failure to quickly update a search index, a delay between the point in time of a change request and an update to the corresponding search index also proves troublesome for other users who conduct a search query and receive "old" data. In the example of a search in relation to items for sale, a user who concludes a transaction and sells their item may submit a request to remove the item for sale from the database. During the delay between the point in time that a user requests such an amendment to the database, other users may conduct a search and locate "old" data indicating that the item remains available for sale. This can lead to the original seller of the item receiving many further queries regarding the item for sale thus requiring the original -seller to attend to those requests confirming that the item has already been sold. This problem is further exacerbated in sales environments where commissions are levied upon a retailer based upon the introduction of a prospective customer who has conducted a search and identified an item for sale. Understandably, an alteration to the commission to be paid needs to be calculated as no commission is payable as a result of introducing a prospective customer based upon "old" data pertaining to an item that has already been sold.
[0009] Delayed updates to a search index is also a significant problem for on-line retailers seeking to sell goods with a limited shelf-life (e.g. perishable items) or goods for which sale after a particular date (e.g. Easter, Mother's day, etc.) will result in a significant devaluation of the goods. Prompt update to any reduction in cost to consumers for such goods is important to improve the retailer's prospects of selling the goods for the highest price possible.
[0010] Accordingly, the problems associated with undue time delays are significant particularly for organisations operating databases containing very large amounts of data with a large number of independent users. In an attempt to address the problem associated with undue time delays whilst at the same time preserving the integrity of the database and its associated search index, database programmers generally lock a row or table of a database and the corresponding search index, update the data and the corresponding search index and then release the locks. Whilst this enables updates to be performed dynamically, the practice of using locks to ensure sole access to a row or table of a database and the corresponding data structure of the search index for the purpose of updating same impacts the on-line performance to an unacceptable level. The unacceptable delay is evidenced by search queries taking too long to generate a search result whilst they wait for locks to be released and in this regard, it is clearly understood by database providers that if a search query exceeds the time expectation of a user, they are likely to cancel the search and as a result, a potential sale or possible commercial transaction may be lost.
[0011] Accordingly, there is a need for a system and method of updating stored data in a database and the associated search index to allow dynamic updating of the stored data and associated search index whilst at the same time avoiding unacceptable delays and performance degradation in respect of the operation of the database and searches conducted on the database.
[0012] The reference to any prior art in this specification is not, and should not be taken as, an acknowledgement of any suggestion that the prior art forms part of the common general knowledge in Australia.
SUMMARY OF THE INVENTION
[0013] In one aspect, the present invention provides a method of updating a search index for stored data wherein the search index includes at least one data structure, the method including receiving a request to update a first index data structure, producing a copy of the data structure thereby generating a second data structure, preventing any further requests to update the first index data structure, effecting the update to the second data structure such that the search index accurately reflects the stored data, retaining the first index data structure until all search queries previously directed to the first index data structure are completed, swapping the first index data structure with the updated second data structure upon which all new search queries are conducted, the second data structure thereby becoming the first index data structure and acting as the search index, or part thereof and allowing further requests to update the first index data structure. [0014] As a result of generating a copy of the first index data structure to produce a second data structure, and applying updates to the second data structure, search queries directed to the first index data structure during the process of updating the second data structure are not prevented and hence, the performance of the search index and the ability of users to continue to query the search index during an update is not significantly affected.
[0015] The method defined above prevents more than one update request occurring to a data structure at any point in time. Of course, any single update request may include one or more changes to the data structure. However, allowing search queries to continue whilst updating the second data structure significantly improves the performance of the search index despite preventing subsequent update requests until such time as the second data structure is completely updated according to the current Update request and the first index data structure is swapped with the second data structure. Whilst it is not possible to conduct more than a single update request to a single data structure at any point in time, it is the ability to allow search queries to continue whilst the second data structure is being updated that gives rise to the significant improvement in performance and hence, the overall performance of the search index is significantly better as compared with other updating processes and procedures.
[0016] In an embodiment, in addition to effecting an update request to the second data structure such that the search index accurately reflects the stored data, the second data structure is compacted prior to the replacement of the first index data structure with the second data structure. In this embodiment, by effecting a compacting process as part of the update request to the second data structure, fragmentation of the data structure as a result of updates is prevented. Of course, the extent to which fragmentation of the data structure is avoided may be selected by the database programmer and in some instances, a number of update requests may occur before a compacting process is included as part of the generation of a second data structure and swapping of the first index data structure with the second (updated) data structure. To keep fragmentation of the data structure to an absolute minimum, a compacting process may be effected each and every time a second data structure is generated for the purposes of effecting an update request. [0017] In another embodiment, the step of preventing subsequent update requests at the time an update process has commenced, includes receiving subsequent update requests and placing those update requests in a queue such that subsequent update requests may be commenced as soon as possible after the data structure swap is complete and further update requests may be effected.
[0018] In another embodiment, where the search index includes more than one data structure, the method includes managing the update process for each individual data structure that collectively form the search index for the stored data. Of course, in this embodiment, multiple update requests may be accommodated simultaneously in respect of the search index whilst multiple simultaneous update requests to the same data structure are prevented.
[0019] In another aspect, the present invention provides a system for updating a search index associated with stored data wherein the search index includes at least one index data structure, the system including at least one computer processor operable to execute computer instruction code to perform search queries and updates upon a first index data structure forming the search index, or part thereof, a first user operated computer processor operable to execute computer instruction code to request an update to the stored data, a second user operated computer processor operable to execute computer instruction code to request a search in respect of the stored data, and a data communications network operably connected to the at least one computer processor for managing the execution of search queries and the first and second user operated computer processors, the at least one computer processor executing computer instruction code to perform search queries and update requests being further operable to receive a request from the first user operated processor to update the first index data structure and upon receipt of same, producing a copy of the first index data structure thereby generating a second data structure, and preventing any subsequent requests to update the first index data structure and completing all search queries previously directed to the first index data structure, the at least one computer processor effecting an update to the second data structure in accordance with the first user update request whilst performing search requests from the second user upon the first index data structure to satisfy the second user search request, the at least one computer processor executing computer instruction code causing the at least one computer processor to swap the first index data structure with the second data structure upon completion of the update to the second data structure, the second data structure thereby becoming the first index data structure and the search index, or part thereof, and upon receipt of subsequent search requests from the first or second user operated processors, the at least one computer processor conducting the search request upon the new search index including the updated first index data structure and allowing further requests to update the first index data structure.
[0020] In another aspect, the present invention provides a system for updating a search index associated with stored data wherein the search index includes at least one data structure, the system including at least one computer processor operable to execute computer instruction code to perform search queries and updates upon a first index data structure forming the search index, or part thereof, the at least one computer processor operable to receive a request to update the stored data from a first user operated processor, the computer processor operable to receive a request to conduct a search in respect of the stored data from a second user, and a data communications network operably connecting the at least one computer processor for performing search queries and update requests from the first and second users, the at least one computer processor executing computer instruction code causing the processor to receive search queries and update requests and upon receipt of a request from the first user to update the first index data structure, producing a copy of the first index data structure thereby generating a second data structure, preventing any subsequent requests to update the first index data structure and completing all search queries previously directed to the first index database structure, the at least one computer processor further operable to effect an update to the second data structure in accordance with the first user update request and performing search requests received from the second user upon the first index data structure to satisfy the second user search request, the at least one computer processor further operable to execute computer instruction code causing the at least one computer processor to swap the first index data structure with the second data structure upon completion of the update to the second data structure, the second data structure thereby becoming the first index data structure and the search index, or part thereof, and upon receipt of subsequent search requests, the at least one computer processor executing computer instruction code to conduct the search request upon the new search index including the updated first index data structure and allowing further requests to update the first index data structure. [0021] Any search queries commenced upon the first index data structure should be completed on that data structure otherwise, transferring existing search queries to the second data structure will likely generate erroneous results. However, new search queries received subsequent to receiving a request to update stored data are directed to the second data structure. Once all existing search queries have been completed in respect of the first index data structure, the memory resources associated with the first index data structure may be relinquished and made available for other requirements.
[0022] In another aspect, the present invention provides computer instruction code executable upon one or more processors causing the one or more processors to execute the method steps of the invention.
[0023] It will be appreciated by skilled readers that the system and method of the present invention provides a significant improvement to the performance of search indexes and in particular, a significant reduction in the time delay between an update request to the stored data of a database and the availability of the new data to users conducting searches of the database. Of course, as a result of obtaining a copy of the first index data structure to generate a second data structure and performing an update to that second data structure, it is possible that a user will conduct a search query based upon the first index data structure whilst an update is occurring in respect of the second data structure. In this instance, there is a slight disparity between the results obtained from a search request based upon the first index data structure whilst the second data structure is in the process of being updated.
[0024] However, the performance degradation arising from the possible slight disparity between the underlying data and the associated search index is vastly outweighed by the performance improvement in respect of performing dynamic updates to a database and providing the updated data to users as soon as possible.
BRIEF DESCRIPTION OF THE DRAWINGS
[0025] In order that the present invention may be readily understood and put into practical effect, reference will now be made to the accompanying Figures and following description that provides details of one or more embodiments of the invention. Throughout the Figures, reference numerals that are repeated throughout the Figures are used to refer to the same element in each respective Figure. With reference to the accompanying Figures:
[0026] Figure 1 provides a diagrammatic representation of a data structure for which an update is required;
[0027] Figure 2 is a diagrammatic representation of the process of generating a copy of the data structure for updating purposes according to an embodiment of the invention;
[0028] Figure 3 is a diagrammatic representation of the process of effecting an update to the copy of the data structure according to an embodiment of the invention;
[0029] Figure 4 is a diagrammatic representation of the completion of the update process according to an embodiment of the invention; and
[0030] Figure 5 is a diagrammatic representation of the swapping of the original data structure with an updated data structure according to an embodiment of the invention.
DETAILED DESCRIPTION OF EMBODI ENT(S) OF THE INVENTION
[0031] The following detailed description of embodiment(s) of the invention refers to the accompanying Figures. Although the description includes exemplary embodiments, other embodiments are possible and changes may be made to the embodiments described without departing from the spirit and scope of the invention.
[0032] With reference to Figure 1 , a data structure in the form of an array consisting of five elements (a[0], a[1], a[2], a[3], a[4]) is depicted in which the data structure contains data elements in the array from the least most to the upper most elements of 1 , 2, 4, 8 and 16. As will be recognized, the data in the array referenced by the pointer "a" has been sorted and the data elements from the least most to the upper most elements in the array are in ascending order.
[0033] The data structure (a) in Figure 1 represents a data structure that forms the search index (or part thereof) of a database with stored data. Any user conducting a search of the database accesses the data structure (a) and, in accordance with usual searching processes conducted in respect of databases, various "Readers" submit their search queries and those search queries are accommodated by accessing the data structure (a) and providing a response to each search query. In the diagrammatic representation in Figure 1 , various search queries are depicted by the individual "Readers", namely, Reader 1 , Reader 2 to Reader N. The array consisting of the five elements (a[0], a[1], a[2], a[3], a[4]) is located in memory at a "root" address and the readers are directed to the data structure by reference to the pointer (a).
[0034] In the event that the data in the database remained static, there would be no requirement to update the data structure and as search queries are submitted, the individual Readers receive a response to their search query in the form of data providing links to the underlying database and upon receiving the response to their search query, the user continues to interrogate the database until they locate the data they are seeking.
[0035] However, as previously described, a problem occurs when a change in the data has occurred and it becomes necessary to update the data as soon as possible to enable the individual Readers to obtain the latest "up-to-date" data. Of course, this involves updating the associated search index and in the embodiment depicted in the diagrammatic representation of Figure 2, upon receiving a request from a "Writer" to update the data in the data structure referenced by (a), the system generates a copy of the data structure with the reference (a Copy). During the process of generating the copy of the data structure, the Readers continue to access the data structure with reference (a) for the purpose of performing search queries.
[0036] Having created the copy of the data structure (a Copy), the system then performs an update to the data structure (a Copy). As will be recognized by skilled Readers, it is possible to allow a Writer to write new data and alter the data structure (a Copy) whilst simultaneously allowing Readers to continue to submit search queries and to continue to perform those search queries on the original data structure (a).
[0037] With reference to Figure 3, a diagrammatic representation of the process of updating the copy of the data structure (a Copy) is depicted and the data structure (a Copy) comprises an array of six elements (a[0], a,[1], a[2], a[3], a[4], a[5]) in which new data (12) and (26) have been inserted into the array in the same sort order and occupies the fourth element of the array (a[3]) and the sixth element of the array (a[5]). Further, old data (4) that was originally the third element of the array (a[2]) has been removed. Again, whilst the copy of the data structure (a Copy) is being updated with the inclusion of new data (or removal of old data), the data structure (a) continues to be accessed by multiple Readers in order to satisfy their various search queries. However, subsequent update requests of another writer are prevented.
[0038] Subsequent to the completion of the update process of the copy of the data structure (a Copy), the original data structure (a) and the updated data structure (a Copy) are swapped.
[0039] The swapping of the data structure is detailed in Figure 4, which depicts the process in which the data structure (a) has been swapped with the updated copy of the data structure (a Copy) and in the embodiment depicted in Figure 4, the swapping process is implemented by altering the "root address" of the search index data structure to the address referenced by (a Copy). Hence the new data structure (a) comprises the updated data structure of an array consisting of six elements which includes the old and new data with data sorted in the same sort order with the data from the least most to the upper most array element being 1 , 2, 8, 12, 16 and 26. From the point in time that the swapping process is effected all new search queries are directed to the data structure at the "root address" and any existing search queries are continued according to any existing search queries that were directed to the data structure (a) and are completed upon the first index data structure. Existing search queries should be completed upon the first index data structure otherwise, if they are transferred to the second data structure, they will likely provide erroneous results. Once all prior search queries directed to the first index data structure have been completed, the first index data structure (a) is no longer required and the memory used for the purpose of storing the first index data structure (a) is then available for other memory requirements. . Further, once the swapping process has been completed, further update requests from writers may be accommodated by the repetition of the updating process.
[0040] A diagrammatic representation of the process of completing search queries based upon the original data structure (a) and directing new search queries to the updated data structure is depicted in the diagrammatic representation in Figure 5. In this regard, skilled readers will understand that new search queries are directed to the updated data structure that has effectively become the "first" data structure whilst search queries that have previously been initiated prior to the availability of the updated data structure are completed in accordance with the original data structure until such time as those search queries have been completed.
[0041] Throughout the specification, preferred embodiments of the invention have been described and do not limit the invention to any one embodiment or specific collection of features. It will therefore be appreciated by those skilled in the relevant field of technology that, in view of this disclosure, various modifications and changes can be made to particular embodiments exemplified without departing from the scope and spirit of the present invention.

Claims

Claims:
1. A method of updating a search index for stored data wherein the search index includes at least one data structure, the method including:
receiving a request to update a first index data structure, producing a copy of the data structure thereby generating a second data structure;
preventing any further requests to update the first index data structure;
effecting the update to the second data structure such that the search index accurately reflects the stored data;
retaining the first index data structure until all search queries previously directed to the first index data structure are completed;
swapping the first index data structure with the updated second data structure upon which all new search queries are conducted;
the second data structure thereby becoming the first index data structure and acting as the search index, or part thereof; and
allowing further requests to update the first index data structure.
2. A method according to claim 1 wherein any single update request includes a plurality of changes for the at least one search index data structure.
3. A method according to either claim 1 or claim 2 wherein the second data structure is compacted prior to swapping the first index data structure with the second data structure.
4. A method according to claim 3 wherein a number of update requests occur before a compacting process is conducted.
5. A method according to claim 3 wherein a compacting process is conducted each and every time subsequent to generating a second data structure and effecting an update to the second data structure.
6. A method according to any one of the preceding claims wherein subsequent update requests at the time an update process has commenced includes receiving subsequent update requests and placing said subsequent update requests in a queue and performing subsequent update requests after the first index data structure is swapped with the second data structure.
7. A method according to claim 6 wherein queued subsequent update requests are processed as a batch update when updating the search index.
8. A method according to any one of the preceding claims including managing the update process for each individual data structure that collectively form the search index for the stored data.
9. A method according to any one of the preceding claims wherein multiple search index update requests are accommodated simultaneously in respect of individual components of the search index data structure whilst multiple simultaneous update requests of the same index data structure are prevented.
10. A system for updating a search index associated with stored data wherein the search index includes at least one index data structure, the system including:
at least one computer processor operable to execute computer instruction code to perform search queries and updates upon a first index data structure forming the search index, or part thereof;
a first user operated computer processor operable to execute computer instruction code to request an update to the stored data;
a second user operated computer processor operable to execute computer instruction code to request a search in respect of the stored data; and
a data communications network operably connected to the at least one computer processor for managing the execution of search queries and the first and second user operated computer processors,
the at least one computer processor executing computer instruction code to perform search queries and update requests being further operable to receive a request from the first user operated processor to update the first index data structure and upon receipt of same, producing a copy of the first index data structure thereby generating a second data structure, and preventing any subsequent requests to update the first index data structure and completing all search queries previously directed to the first index data structure;
the at least one computer processor effecting an update to the second data structure in accordance with the first user update request whilst performing search requests from the second user upon the first index data structure to satisfy the second user search request;
the at least one computer processor executing computer instruction code causing the at least one computer processor to swap the first index data structure with the second data structure upon completion of the update to the second data structure, the second data structure thereby becoming the first index data structure and the search index, or part thereof; and
upon receipt of subsequent search requests from the first or second user operated processors, the at least one computer processor conducting the search request upon the new search index including the updated first index data structure and allowing further requests to update the first index data structure.
11. A system for updating a search index associated with stored data wherein the search index includes at least one data structure, the system including:
at least one computer processor operable to execute computer instruction code to perform search queries and updates upon a first index data structure forming the search index, or part thereof;
the at least one computer processor operable to receive a request to update the stored data from a first user operated processor;
the computer processor operable to receive a request to conduct a search in respect of the stored data from a second user; and
a data communications network operably connecting the at least one computer processor for performing search queries and update requests from the first and second users;
the at least one computer processor executing computer instruction code causing the processor to receive search queries and update requests and upon receipt of a request from the first user to update the first index data structure, producing a copy of the first index data structure thereby generating a second data structure, preventing any subsequent requests to update the first index data structure and completing all search queries previously directed to the first index database structure;
the at least one computer processor further operable to effect an update to the second data structure in accordance with the first user update request and performing search requests received from the second user upon the first index data structure to satisfy the second user search request;
the at least one computer processor further operable to execute computer instruction code causing the at least one computer processor to swap the first index data structure with the second data structure upon completion of the update to the second data structure, the second data structure thereby becoming the first index data structure and the search index, or part thereof; and
upon receipt of subsequent search requests, the at least one computer processor executing computer instruction code to conduct the search request upon the new search index including the updated first index data structure and allowing further requests to update the first index data structure.
12. A system according to either claim 10 or 1 1 wherein the at least one computer processor is further operable to receive a user request to update the stored data wherein the update request includes a plurality of changes to the stored data.
13. A system according to any one of claims 10 to 12 wherein the at least one computer processor is further operable to execute computer instruction code to compact the second data structure prior to swapping the first index data structure with the second data structure.
14. A system according to any one of claims 10 to 13 wherein the at least one computer processor is further operable to execute computer instruction code causing the second data structure to be compacted each and every time a second data structure is generated for the purpose of effecting an update request prior to swapping the first index data structure with the second data structure.
15. A system according to any one of claims 10 to 14 wherein the computer processor is further operable to execute computer instruction code that prevents processing of subsequent update requests at the time an update process has commenced.
16. A system according to claim 15 wherein subsequent update requests are placed in a queue and processed after the data structure swap is complete.
17. A system according to claim 6 wherein the queued subsequent update requests are processed as a batch update when updating the search index.
18. Computer instruction code executable upon one or more processors causing the one or more processors to execute the method steps of any one of claims 1 to 9.
PCT/AU2013/001387 2013-07-17 2013-11-29 System and method of implementing near real time updates to a search index WO2015006795A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2013902663A AU2013902663A0 (en) 2013-07-17 System and method of implementing near real time updates to a search engine
AU2013902663 2013-07-17

Publications (1)

Publication Number Publication Date
WO2015006795A1 true WO2015006795A1 (en) 2015-01-22

Family

ID=52345600

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2013/001387 WO2015006795A1 (en) 2013-07-17 2013-11-29 System and method of implementing near real time updates to a search index

Country Status (1)

Country Link
WO (1) WO2015006795A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019165661A1 (en) * 2018-02-27 2019-09-06 平安科技(深圳)有限公司 Method and apparatus for intelligently searching for organization name, and device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020004799A1 (en) * 2000-02-11 2002-01-10 Alexander Gorelik High availability database system using live/load database copies
US6868414B2 (en) * 2001-01-03 2005-03-15 International Business Machines Corporation Technique for serializing data structure updates and retrievals without requiring searchers to use locks
US20070282878A1 (en) * 2006-05-30 2007-12-06 Computer Associates Think Inc. System and method for online reorganization of a database using flash image copies
US8386494B2 (en) * 2008-08-07 2013-02-26 Hewlett-Packard Development Company, L.P. Providing data structures for determining whether keys of an index are present in a storage system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020004799A1 (en) * 2000-02-11 2002-01-10 Alexander Gorelik High availability database system using live/load database copies
US6868414B2 (en) * 2001-01-03 2005-03-15 International Business Machines Corporation Technique for serializing data structure updates and retrievals without requiring searchers to use locks
US20070282878A1 (en) * 2006-05-30 2007-12-06 Computer Associates Think Inc. System and method for online reorganization of a database using flash image copies
US8386494B2 (en) * 2008-08-07 2013-02-26 Hewlett-Packard Development Company, L.P. Providing data structures for determining whether keys of an index are present in a storage system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019165661A1 (en) * 2018-02-27 2019-09-06 平安科技(深圳)有限公司 Method and apparatus for intelligently searching for organization name, and device and storage medium

Similar Documents

Publication Publication Date Title
US11243920B2 (en) Distributed database system, transaction processing method, lock server and storage medium
JP5047806B2 (en) Apparatus and method for data warehousing
US7461065B2 (en) Method and system for utilizing shared numeric locks
EP3519986B1 (en) Direct table association in in-memory databases
US8364634B2 (en) System and method for processing fault tolerant transaction
US8620923B1 (en) System and method for storing meta-data indexes within a computer storage system
WO2020192063A1 (en) Caching-based method and system for sales locking
Schultz et al. Tunable consistency in mongodb
EP1808779B1 (en) Bundling database
CN109191233A (en) A kind of second kills lower single request processing method, device and storage medium
WO2022106878A1 (en) Systems and methods for database query efficiency improvement
US20140032703A1 (en) System and method for an expandable computer storage system
CN115422205A (en) Data processing method and device, electronic equipment and storage medium
US8549007B1 (en) System and method for indexing meta-data in a computer storage system
US10832309B2 (en) Inventory data model for large scale flash sales
US9766949B2 (en) System and method for locking exclusive access to a divided resource
WO2015006795A1 (en) System and method of implementing near real time updates to a search index
KR102411806B1 (en) Systems and methods for database query efficiency improvement
US11789922B1 (en) Admitting for performance ordered operations of atomic transactions across a distributed database
US7711730B2 (en) Method of returning data during insert statement processing
CN111737273A (en) Transaction submitting method, device, coordination node and storage medium
US11501354B2 (en) Information processing apparatus for searching database
CN112199213B (en) Data interaction method and server for hanging bill interaction data
Shen A performance comparison of NoSQL and SQL databases for different scales of ecommerce systems
EP2495657A1 (en) Efficient batch processing in a multi-tier application

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13889755

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13889755

Country of ref document: EP

Kind code of ref document: A1