CN111506790B - Method, system, device and storage medium for determining extraction object and refreshing data - Google Patents

Method, system, device and storage medium for determining extraction object and refreshing data Download PDF

Info

Publication number
CN111506790B
CN111506790B CN202010275801.4A CN202010275801A CN111506790B CN 111506790 B CN111506790 B CN 111506790B CN 202010275801 A CN202010275801 A CN 202010275801A CN 111506790 B CN111506790 B CN 111506790B
Authority
CN
China
Prior art keywords
data
weight value
objects
value
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010275801.4A
Other languages
Chinese (zh)
Other versions
CN111506790A (en
Inventor
高坤晓
齐文超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ctrip Computer Technology Shanghai Co Ltd
Original Assignee
Ctrip Computer Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctrip Computer Technology Shanghai Co Ltd filed Critical Ctrip Computer Technology Shanghai Co Ltd
Priority to CN202010275801.4A priority Critical patent/CN111506790B/en
Publication of CN111506790A publication Critical patent/CN111506790A/en
Application granted granted Critical
Publication of CN111506790B publication Critical patent/CN111506790B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a system, equipment and a storage medium for determining an extraction object and refreshing data. The method for determining the extraction object comprises the following steps: determining a plurality of objects to be selected, wherein each object corresponds to a weight value, and the value of the weight value is positively correlated with the probability that the object is extracted; aggregation of objects of the same weight value is stored as a set; constructing an index on the weight value, wherein the length value of the same weight value occupied in the index is equal to the weight value multiplied by the number of objects in the corresponding set; randomly selecting an index value from the indexes, finding a weight value corresponding to the selected index value, and inquiring a set corresponding to the found weight value; an object is randomly acquired from the queried set as an extraction object. The method and the device can not occupy excessive temporary storage space in operation when determining the extraction object, and can reduce the operation time.

Description

Method, system, device and storage medium for determining extraction object and refreshing data
Technical Field
The present invention relates to the field of computers, and in particular, to a method, system, device, and storage medium for determining an extraction object and refreshing data.
Background
The fields of the internet, big data computing, etc. often involve massive data, and how to process massive data quickly and efficiently is of interest to the skilled person. For example, in OTA (online travel agency), the website needs to update hotel data provided by each hotel provider frequently, and because of the large data volume, updating all hotel data at a time may put a great burden on the server, and may be limited by QPS (query rate per second) so that all hotel data cannot be acquired quickly.
To avoid the above problems, it is common practice to selectively choose partial data step updates from all hotel data. And regarding all hotel data as a set containing massive data elements, wherein each element corresponds to one weight, and extracting part of data for updating by using a weighted random algorithm. However, the current weighted random algorithm cannot achieve both time complexity and space complexity, or the temporary occupied storage space is larger in the running process or the running time is longer.
Disclosure of Invention
The invention aims to overcome the defects of large storage space and long running time when the object is extracted by relying on the existing weighted random algorithm, and provides an object extraction determining and data refreshing method, system, equipment and storage medium.
The invention solves the technical problems through the following technical scheme:
a method of determining an extraction object, comprising:
determining a plurality of objects to be selected, wherein each object corresponds to a weight value, and the value of the weight value is positively correlated with the probability that the object is extracted;
aggregation of objects of the same weight value is stored as a set;
constructing an index on the weight value, wherein the length value of the same weight value occupied in the index is equal to the weight value multiplied by the number of objects in the corresponding set;
randomly selecting an index value from the indexes, finding a weight value corresponding to the selected index value, and inquiring a set corresponding to the found weight value;
an object is randomly acquired from the queried set as an extraction object.
Preferably, a uniform random algorithm is used to randomly select an index value from the indexes.
Preferably, the weight value corresponding to the selected index value is found by a dichotomy.
A data refresh method, comprising:
determining a data refreshing range, wherein the data refreshing range comprises data of a plurality of objects, and the probability of the objects being extracted is positively correlated with the data change frequency of the objects;
using the object in the data refreshing range as an object to be selected, and determining an extraction object by using the extraction object determining method;
acquiring data of the extraction object through an access data interface;
and caching the data of the extraction object in a data caching unit and replacing the original data of the extraction object in the data caching unit.
An extraction object determination system, comprising:
the object determining module is used for determining a plurality of objects to be selected, each object corresponds to a weight value, and the value of the weight value is positively correlated with the probability that the object is extracted;
the object aggregation module is used for aggregating and storing the objects with the same weight value into a set;
the index construction module is used for constructing an index on the weight value, and the length value occupied by the same weight value in the index is equal to the weight value multiplied by the number of objects in the corresponding set;
the set searching module is used for randomly selecting an index value from the indexes, finding a weight value corresponding to the selected index value, and inquiring a set corresponding to the found weight value;
and the object extraction module is used for randomly acquiring an object from the queried set as an extraction object.
Preferably, the set searching module adopts a uniform random algorithm to randomly select an index value from the indexes.
Preferably, the set searching module finds the weight value corresponding to the selected index value through a dichotomy.
A data refresh system, comprising:
the data determining module is used for determining a data refreshing range, wherein the data refreshing range comprises data of a plurality of objects, and the probability of the objects being extracted is positively correlated with the data change frequency of the objects;
the object extraction module is used for taking the object in the data refreshing range as an object to be selected and determining an extraction object by using the extraction object determination system;
the interface access module is used for acquiring the data of the extraction object through accessing a data interface;
and the data refreshing module is used for caching the data of the extraction object in the data caching unit and replacing the original data of the extraction object in the data caching unit.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method as described above when executing the program.
A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method as described above.
On the basis of conforming to the common knowledge in the field, the above preferred conditions can be arbitrarily combined to obtain the preferred examples of the invention.
The invention has the positive progress effects that: the method can achieve the complexity of O (1) in time/space when the extracted objects are determined, can not occupy too much temporary storage space in operation, can reduce operation time, is particularly suitable for scenes with more objects to be selected and occupied by performances and space, has simple index and short time consumption in reconstructing the index, and can be suitable for scenes with rapid change of weighted random candidate sets. The invention also updates the data with high change frequency with high probability during data refreshing, avoids the situation that the server bears overlarge burden or is limited by QPS because all the data are refreshed at one time, and realizes the quick updating of the data.
Drawings
FIG. 1 is a flow chart of a method for determining an extraction object according to embodiment 1 of the present invention;
FIG. 2 is a diagram illustrating a data structure of an index in embodiment 1 of the present invention;
FIG. 3 is a schematic diagram of a data model of a specific index constructed in embodiment 1 of the present invention;
FIG. 4 is a diagram of a memory model for a first common implementation of current weighted randomization;
FIG. 5 is a flow chart of a data refreshing method according to embodiment 2 of the present invention;
FIG. 6 is a schematic block diagram of an extraction object determination system according to embodiment 3 of the present invention;
FIG. 7 is a schematic block diagram of a data refresh system according to embodiment 4 of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to embodiment 5 of the present invention.
Detailed Description
The invention is further illustrated by means of the following examples, which are not intended to limit the scope of the invention.
Example 1
Fig. 1 shows a method for determining an extraction object of the present embodiment, which includes the steps of:
step 101: and determining a plurality of objects to be selected, wherein each object corresponds to a weight value, and the value of the weight value is positively correlated with the probability of the object being extracted.
Step 102: the aggregate of objects of the same weight value is stored as a set. Wherein the total weight of each set is the sum of the weight values of each object in the set.
Step 103: and constructing an index on the weight values, wherein the length value occupied by the same weight value in the index is equal to the weight value multiplied by the number of objects in the corresponding set. Fig. 2 shows a data structure of an index, where the index is listed as a weight value, each weight value corresponds to an index range, where the index range includes N index values, where N is equal to a product of the corresponding weight value and the number of objects in the corresponding set of weight values, and each index range points to one candidate data set, where the candidate data set is the candidate data set with the same weight, and the object with the weight is included.
Step 104: randomly selecting an index value from the indexes, finding a weight value corresponding to the selected index value, and querying a set corresponding to the found weight value.
Step 105: an object is randomly acquired from the queried set as an extraction object.
In this embodiment, step 104 preferably uses a uniform random algorithm to randomly select an index value from the indexes.
In this embodiment, step 104 further preferably finds the weight value corresponding to the selected index value by a dichotomy.
For example, for a total of 5 objects, A1, A2, A3, B1 and B2, respectively, given that A1, A2, A3 weights are 2 and B1, B2 weights are 3, an indexed data model is constructed as shown in fig. 3, and when the objects are extracted each time, an element is randomly selected from [0,12 ], for example, from [0,12 ], a set of weights 2 is corresponding, and then an object is randomly selected from the set of weights 2 { A1, A2, A3 }. It is thereby ensured that the probability that each object is selected is linearly positively correlated with its weight value.
The method for determining the extraction object in this embodiment is compared with the implementation of the weighted random algorithm commonly used at present, so as to further embody the technical effects achieved by the method in this embodiment:
scene description: the set S is known, which contains a mass data object whose elements are fast dynamically changing: each object has a corresponding weight w, the value range of w is [1, m ], and m is a finite value, for example, the general weight is 1 to 100. Each time a specified object is extracted from the set S, the probability p of its extraction is positively correlated (p≡w) with the weight value w, i.e. the set S is weighted randomly.
(1) Common implementations in the industry today are:
first kind: this set is extended such that the number of occurrences of each term is positively correlated with its weight. In the above example this set extends to: { a { A1}, a { A2}, a { A3}, B { B1}, B { B2}, a+2b sets in total can then be selected from the sets using a uniform random algorithm.
For example, if weights A1, A2, and A3 are 2 and weights B1 and B2 are 3, the memory model is as shown in fig. 4.
The implementation time complexity is O (1), and the space complexity is O (n); when the current set changes, the reconstruction index cost is high.
Second kind: calculating a weight sum, randomly selecting a number R between 1 and sum, traversing the whole set, counting the sum of the weights of the traversed items, stopping traversing if R is more than or equal to R, and selecting the encountered items.
The implementation time complexity is O (n), and the space complexity is O (1).
The time complexity and the space complexity of the two implementations cannot be simultaneously considered. First, the space is occupied, and in the scene of fast updating of mass data, a great deal of time is required to recalculate the index, and meanwhile, a great deal of GC is brought, so that performance is lost. Second, the time complexity is poor.
(2) The method of this embodiment is used:
for the set S, where the weights of x objects a are P, and the total weight value of the set S is P, the probability of each object being extracted is a/a:
the weighted random is implemented according to the method of the present embodiment, and then the new set { a } is extracted to a probability of: (x P)/P;
at any one value in the random { A } set, the probability that each object is randomly reached is 1/x.
The probability that an individual object is weighted random is (x P)/P (1/x) =a/a, as expected.
In the method, the weight w takes the value range of [1, m ], so the method occupies m space at most, the space complexity is O (1), and in the time complexity, the index at most m, the corresponding interval is found at most log (m) times, so the time complexity is O (1).
As can be seen from comparison, the method for determining the extraction object in this embodiment starts from the viewpoint of weight, and selects one extraction object through constructing an index and through a new weighted random idea, so that compared with the existing weighted random algorithm, the method has the advantages of realizing the compression of the storage space in the extraction process, along with excellent time complexity and meeting the weighted expectations mathematically.
The method of the embodiment can achieve the fact that the time/space complexity is O (1) when the extracted objects are determined, not only can the excessive temporary storage space be occupied during operation be avoided, but also the operation time can be reduced, and the method is particularly suitable for scenes with many objects to be selected, performance and space occupation needs to be considered, meanwhile, the index is relatively simple, the time consumption for reconstructing the index is relatively short, and the method can be suitable for scenes with rapid change of weighted random candidate sets.
Example 2
The present embodiment applies the extraction object determination method in embodiment 1 to a data refresh scene, thereby forming a data refresh method. Fig. 5 shows a data refreshing method of the present embodiment, which includes the steps of:
step 201: a data refresh range is determined. The data refresh range includes data of several objects, and the probability of an object being extracted is positively correlated with the frequency of data changes of the object. That is, the weight value of the object is positively correlated with the data change frequency of the object.
Step 202: the extraction object is determined by the extraction object determination method of embodiment 1 with the object in the data refresh range as the object to be selected.
Step 203: and acquiring the data of the extraction object by accessing the data interface.
Step 204: and caching the data of the extraction object in a data caching unit and replacing original data of the extraction object in the data caching unit.
The data refreshing method of the embodiment can be applied to a hotel data refreshing scene of an OTA website, for example, in the OTA (online travel agency), when the website updates hotel data (hotel data comprises but not limited to hotel room type, room preset price and the like) provided by each hotel provider, the website takes a hotel as an object, hotel weights are set according to the output of each hotel, the hotel weights with high output and large hotel reservation amount are high, otherwise, the hotel weights with small output and small hotel reservation amount are low, and the data of part of hotels are updated from all hotels each time, so that the situation that the server bears excessive load or is limited by QPS is avoided, and the data is updated rapidly.
Example 3
Fig. 6 shows an extraction object determination system of the present embodiment, which includes the following modules:
the object determining module 301 is configured to determine a plurality of objects to be selected, where each object corresponds to a weight value, and the value of the weight value is positively related to the probability that the object is extracted;
an object aggregation module 302, configured to aggregate and store objects with the same weight value as a set;
an index construction module 303, configured to construct an index on a weight value, where a length value occupied by the same weight value in the index is equal to the weight value multiplied by the number of objects in a corresponding set;
the set searching module 304 is configured to randomly select an index value from the indexes, find a weight value corresponding to the selected index value, and query a set corresponding to the found weight value;
the object extraction module 305 is configured to randomly obtain an object from the queried set as an extraction object.
In this embodiment, the set lookup module 304 preferably randomly selects an index value from the indexes by using a uniform random algorithm.
In this embodiment, the set searching module 304 also preferably finds the weight value corresponding to the selected index value by a dichotomy.
The system of the embodiment can achieve the fact that the time/space complexity is O (1) when the extracted objects are determined, not only can the excessive temporary storage space be occupied during operation be avoided, but also the operation time can be reduced, and the system is particularly suitable for scenes with many objects to be selected, performance and space occupation needs to be considered, meanwhile, the index is relatively simple, the time consumption for reconstructing the index is relatively short, and the system can be suitable for scenes with rapid change of weighted random candidate sets.
Example 4
Fig. 7 shows a data refresh system of the present embodiment, which includes the following modules:
a data determining module 401, configured to determine a data refresh range, where the data refresh range includes data of a plurality of objects, and a probability that the objects are extracted is positively related to a data change frequency of the objects;
an object extraction module 402, configured to determine an extraction object by using the extraction object determination system of embodiment 3, with an object in the data refresh range as an object to be selected;
an interface access module 403, configured to obtain the data of the extraction object by accessing a data interface;
and the data refreshing module 404 is configured to cache the data of the extraction object in a data caching unit and replace original data of the extraction object in the data caching unit.
The system of the embodiment updates the data with high change frequency with high probability during data refreshing, avoids the situation that the server bears overlarge burden or is limited by QPS (quality control system) due to the fact that all the data are refreshed at one time, and realizes the rapid data updating.
Example 5
Fig. 8 is a schematic structural diagram of an electronic device according to embodiment 5 of the present invention. The electronic device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing any of the methods of embodiments 1-2 when executing the program. The electronic device 50 shown in fig. 8 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 8, the electronic device 50 may be embodied in the form of a general purpose computing device, which may be a server device, for example. Components of electronic device 50 may include, but are not limited to: the at least one processor 51, the at least one memory 52, a bus 53 connecting the different system components, including the memory 52 and the processor 51.
The bus 53 includes a data bus, an address bus, and a control bus.
Memory 52 may include volatile memory such as Random Access Memory (RAM) 521 and/or cache memory 522, and may further include Read Only Memory (ROM) 523.
Memory 52 may also include a program/utility 525 having a set (at least one) of program modules 524, such program modules 524 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The processor 51 executes various functional applications and data processing, such as the method provided in embodiment 1 or 2 of the present invention, by running a computer program stored in the memory 52.
The electronic device 50 may also communicate with one or more external devices 54 (e.g., keyboard, pointing device, etc.). Such communication may occur through an input/output (I/O) interface 55. Also, model-generating device 50 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet via network adapter 56. As shown in fig. 8, the network adapter 56 communicates with other modules of the model-generating device 50 via the bus 53. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in connection with the model-generating device 50, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.
It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present invention. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
Example 6
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any one of the methods provided in embodiments 1-2.
More specifically, among others, readable storage media may be employed including, but not limited to: portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible embodiment, the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps of any one of the methods described in embodiments 1-2, when said program product is run on the terminal device.
Wherein the program code for carrying out the invention may be written in any combination of one or more programming languages, which program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on the remote device or entirely on the remote device.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the principles and spirit of the invention, but such changes and modifications fall within the scope of the invention.

Claims (10)

1. The method for determining the extracted objects is characterized by being applied to a hotel data refreshing scene, taking hotels as objects, and setting hotel weights according to the output of each hotel; comprising the following steps:
determining a plurality of objects to be selected, wherein each object corresponds to a weight value, and the value of the weight value is positively correlated with the probability that the object is extracted;
aggregation of objects of the same weight value is stored as a set;
constructing an index on the weight value, wherein the length value of the same weight value occupied in the index is equal to the weight value multiplied by the number of objects in the corresponding set;
randomly selecting an index value from the indexes, finding a weight value corresponding to the selected index value, and inquiring a set corresponding to the found weight value;
an object is randomly acquired from the queried set as an extraction object.
2. The extraction object determination method as claimed in claim 1, wherein an index value is randomly selected from the indexes using a uniform random algorithm.
3. The extraction object determining method as claimed in claim 1, wherein the weight value corresponding to the selected index value is found by a dichotomy.
4. A data refreshing method, comprising:
determining a data refreshing range, wherein the data refreshing range comprises data of a plurality of objects, and the probability of the objects being extracted is positively correlated with the data change frequency of the objects;
determining an extraction object by using the extraction object determining method according to any one of claims 1 to 3 by taking the object in the data refreshing range as an object to be selected;
acquiring data of the extraction object through an access data interface;
and caching the data of the extraction object in a data caching unit and replacing the original data of the extraction object in the data caching unit.
5. The extraction object determining system is characterized by being applied to a hotel data refreshing scene, taking hotels as objects, and setting hotel weights according to the output of each hotel; comprising the following steps:
the object determining module is used for determining a plurality of objects to be selected, each object corresponds to a weight value, and the value of the weight value is positively correlated with the probability that the object is extracted;
the object aggregation module is used for aggregating and storing the objects with the same weight value into a set;
the index construction module is used for constructing an index on the weight value, and the length value occupied by the same weight value in the index is equal to the weight value multiplied by the number of objects in the corresponding set;
the set searching module is used for randomly selecting an index value from the indexes, finding a weight value corresponding to the selected index value, and inquiring a set corresponding to the found weight value;
and the object extraction module is used for randomly acquiring an object from the queried set as an extraction object.
6. The extraction object determination system as claimed in claim 5, wherein said set lookup module employs a uniform random algorithm to randomly choose an index value among said indices.
7. The extraction object determination system as claimed in claim 5, wherein the set lookup module finds the weight value corresponding to the selected index value by a dichotomy.
8. A data refresh system, comprising:
the data determining module is used for determining a data refreshing range, wherein the data refreshing range comprises data of a plurality of objects, and the probability of the objects being extracted is positively correlated with the data change frequency of the objects;
an object extraction module, configured to determine an extraction object by using the extraction object determination system according to any one of claims 5 to 7, with an object in the data refresh range being an object to be selected;
the interface access module is used for acquiring the data of the extraction object through accessing a data interface;
and the data refreshing module is used for caching the data of the extraction object in the data caching unit and replacing the original data of the extraction object in the data caching unit.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 4 when executing the program.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method of any of claims 1 to 4.
CN202010275801.4A 2020-04-09 2020-04-09 Method, system, device and storage medium for determining extraction object and refreshing data Active CN111506790B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010275801.4A CN111506790B (en) 2020-04-09 2020-04-09 Method, system, device and storage medium for determining extraction object and refreshing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010275801.4A CN111506790B (en) 2020-04-09 2020-04-09 Method, system, device and storage medium for determining extraction object and refreshing data

Publications (2)

Publication Number Publication Date
CN111506790A CN111506790A (en) 2020-08-07
CN111506790B true CN111506790B (en) 2024-03-22

Family

ID=71864142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010275801.4A Active CN111506790B (en) 2020-04-09 2020-04-09 Method, system, device and storage medium for determining extraction object and refreshing data

Country Status (1)

Country Link
CN (1) CN111506790B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570120A (en) * 2016-11-02 2017-04-19 四川用联信息技术有限公司 Process for realizing searching engine optimization through improved keyword optimization
CN107220283A (en) * 2017-04-21 2017-09-29 东软集团股份有限公司 Data processing method, device, storage medium and electronic equipment
CN108399266A (en) * 2018-03-23 2018-08-14 广州爱九游信息技术有限公司 Data pick-up method, apparatus, electronic equipment and computer readable storage medium
CN108563715A (en) * 2018-03-29 2018-09-21 中国科学院计算技术研究所 A kind of distributed convergence method for digging and system
CN110162528A (en) * 2019-05-24 2019-08-23 安徽芃睿科技有限公司 Magnanimity big data search method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002373109A (en) * 2001-06-13 2002-12-26 Nec Corp Data look-ahead system and its method
US7756845B2 (en) * 2006-12-28 2010-07-13 Yahoo! Inc. System and method for learning a weighted index to categorize objects
US7610283B2 (en) * 2007-06-12 2009-10-27 Microsoft Corporation Disk-based probabilistic set-similarity indexes

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570120A (en) * 2016-11-02 2017-04-19 四川用联信息技术有限公司 Process for realizing searching engine optimization through improved keyword optimization
CN107220283A (en) * 2017-04-21 2017-09-29 东软集团股份有限公司 Data processing method, device, storage medium and electronic equipment
CN108399266A (en) * 2018-03-23 2018-08-14 广州爱九游信息技术有限公司 Data pick-up method, apparatus, electronic equipment and computer readable storage medium
CN108563715A (en) * 2018-03-29 2018-09-21 中国科学院计算技术研究所 A kind of distributed convergence method for digging and system
CN110162528A (en) * 2019-05-24 2019-08-23 安徽芃睿科技有限公司 Magnanimity big data search method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
An index structure for fast query retrieval in object oriented data bases using signature weight declustering;Shanthi, I.E. et al.;Information Technology Journal;20091231;第275-283页 *
基于word2vec和TF-IDF算法实现酒店评论的个性化推送;张雷;;电脑与信息技术;20171215(06);第12-15页 *
朱睿 ; 王斌 ; 杨晓春 ; 王国仁 ; .大数据环境下支持概率数据范围查询索引的研究.计算机学报.(10),第1929-1946页. *

Also Published As

Publication number Publication date
CN111506790A (en) 2020-08-07

Similar Documents

Publication Publication Date Title
US11288282B2 (en) Distributed database systems and methods with pluggable storage engines
EP3345101B1 (en) Selective data compression for in-memory databases
US8892586B2 (en) Accelerated query operators for high-speed, in-memory online analytical processing queries and operations
US8719529B2 (en) Storage in tiered environment for colder data segments
US11151126B2 (en) Hybrid column store providing both paged and memory-resident configurations
US10678770B2 (en) Managing data records
US8990166B2 (en) Variable page sizing for improved physical clustering
CN111241108B (en) Key value based indexing method and device for KV system, electronic equipment and medium
CN112307062B (en) Database aggregation query method, device and system
CN111611250A (en) Data storage device, data query method, data query device, server and storage medium
CN111291041B (en) Non-uniform paging of column data
US11294930B2 (en) Resource scaling for distributed database services
US9760836B2 (en) Data typing with probabilistic maps having imbalanced error costs
US9552298B2 (en) Smart pre-fetch for sequential access on BTree
CN111506790B (en) Method, system, device and storage medium for determining extraction object and refreshing data
US20200327106A1 (en) Database management systems for managing data with data confidence
CN115905168B (en) Self-adaptive compression method and device based on database, equipment and storage medium
US9305045B1 (en) Data-temperature-based compression in a database system
US11947490B2 (en) Index generation and use with indeterminate ingestion patterns
US10142234B1 (en) Memory page indexing data structure
US8549041B2 (en) Converter traversal using power of two-based operations
US11940998B2 (en) Database compression oriented to combinations of record fields
US20230021513A1 (en) System and method for a content-aware and context-aware compression algorithm selection model for a file system
US11151057B2 (en) Method and system for efficiently evicting data from memory slots in a storage system
CN116561374A (en) Resource determination method, device, equipment and medium based on semi-structured storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant