CN113076339A - Data caching method, device, equipment and storage medium - Google Patents

Data caching method, device, equipment and storage medium Download PDF

Info

Publication number
CN113076339A
CN113076339A CN202110292403.8A CN202110292403A CN113076339A CN 113076339 A CN113076339 A CN 113076339A CN 202110292403 A CN202110292403 A CN 202110292403A CN 113076339 A CN113076339 A CN 113076339A
Authority
CN
China
Prior art keywords
matching degree
sample
data
index
access data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110292403.8A
Other languages
Chinese (zh)
Inventor
吕梦圆
闰秋胜
张国佩
赵阳
李帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110292403.8A priority Critical patent/CN113076339A/en
Publication of CN113076339A publication Critical patent/CN113076339A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data caching method, a data caching device, data caching equipment and a data caching storage medium, wherein the method comprises the following steps: acquiring original access data and a matching degree model corresponding to a current service scene; inputting the original access data into a pre-trained matching degree model to obtain a matching degree result output by the matching degree model; and determining associated access data associated with the service scene in the original access data based on the matching degree result, and caching the associated access data. According to the method provided by the embodiment of the invention, the matching degree of the access data and the service scene is calculated through the matching degree model, and the associated access data associated with the service scene is selected from the original access data for caching based on the matching degree, so that the data in the cache is matched with the hot data of the service scene, and the cache hit rate is improved.

Description

Data caching method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a data caching method, a data caching device, data caching equipment and a data caching storage medium.
Background
The commodity information query service of the e-commerce system is large in calling amount, a local cache is generally added to improve service performance so as to reduce system pressure, commodity data stored in the local cache generally comes from a recently queried commodity of a local computer, and the data is replaced according to an expiration time and Least Recently Used (LRU) algorithm.
In the process of implementing the invention, the inventor finds that at least the following technical problems exist in the prior art: the inquired commodity data has randomness, and the difference between different machines in different time periods is large, and basically no rule exists, so that the caching effect of the commodity data in a specific scene is poor, and the cache hit rate is low.
Disclosure of Invention
The embodiment of the invention provides a data caching method, a data caching device, data caching equipment and a data caching storage medium, and aims to improve the cache hit rate.
In a first aspect, an embodiment of the present invention provides a data caching method, including:
acquiring original access data and a matching degree model corresponding to a current service scene;
inputting the original access data into a pre-trained matching degree model to obtain a matching degree result output by the matching degree model;
and determining associated access data associated with the service scene in the original access data based on the matching degree result, and caching the associated access data.
In a second aspect, an embodiment of the present invention further provides a data caching apparatus, including:
the data acquisition module is used for acquiring original access data and a matching degree model corresponding to the current service scene;
the matching degree determining module is used for inputting the original access data into a pre-trained matching degree model to obtain a matching degree result output by the matching degree model;
and the data caching module is used for determining the associated access data associated with the service scene in the original access data based on the matching degree result and caching the associated access data.
In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
storage means for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the data caching method as provided by any embodiment of the invention.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the data caching method provided in any embodiment of the present invention.
The embodiment of the invention obtains the original access data and the matching degree model corresponding to the current service scene; inputting the original access data into a pre-trained matching degree model to obtain a matching degree result output by the matching degree model; determining associated access data associated with the service scene in the original access data based on the matching degree result, caching the associated access data, calculating the matching degree of the access data and the service scene through a matching degree model, and selecting the associated access data associated with the service scene from the original access data based on the matching degree for caching, so that the data in the cache is in accordance with the hot spot data of the service scene, and the cache hit rate is improved.
Drawings
Fig. 1 is a flowchart of a data caching method according to an embodiment of the present invention;
fig. 2 is a flowchart of a data caching method according to a second embodiment of the present invention;
fig. 3 is a flowchart of a data caching method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data caching apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a data caching method according to an embodiment of the present invention. The embodiment can be applied to the situation when the data is cached in a specific service scene. The method may be performed by a data caching apparatus, which may be implemented in software and/or hardware, for example, the data caching apparatus may be configured in a computer device. As shown in fig. 1, the method includes:
and S110, acquiring the original access data and a matching degree model corresponding to the current service scene.
In this embodiment, the service scenario may be a service activity scenario, a service marketing scenario, or the like. Taking the e-commerce platform as an example, the business scenario may be an e-commerce system marketing campaign. In general, the service scene can be a home appliance sales promotion scene, a beauty and skin care sales promotion scene, a food sales promotion scene, a clothing home textile sales promotion scene, a home building material sales promotion scene, and the like. More specifically, the home appliance sales promotion scenario is taken as an example, and the sales promotion scenario may also be a refrigerator sales promotion scenario, a television sales promotion scenario, and the like, and an air conditioner sales promotion scenario, and the like. The specific business scenario may be determined based on the current marketing campaign.
It should be noted that, in this embodiment, each service scenario corresponds to one matching degree model, it can be understood that the matching degree model is used to calculate the matching degree between an article and a service scenario, and the inherent attribute of the article is fixed, so that the matching degrees between the same article and different service scenarios may be different, that is, the matching degree models corresponding to different service scenarios are generally different. The matching degree model of each service scene can be obtained by training in advance according to the matching degree of the sample object under the service scene.
The original access data can be commodity data recently accessed by the user, and the obtaining mode of the original access data can be determined according to actual requirements. For example, the original access data may be obtained in real time, or may be obtained in a set time period, which is not limited herein.
And S120, inputting the original access data into a pre-trained matching degree model to obtain a matching degree result output by the matching degree model.
In this embodiment, the matching degree of the original access data and the service scenario is calculated through a pre-trained matching degree model. The original access data may include each business measurement index of the access object, the business measurement index may be set according to an actual requirement, and it should be noted that the business measurement index needs to be consistent with a sample business measurement index adopted by the training matching degree model.
Illustratively, for each original access data, extracting each business measurement index in the original access data, inputting each business measurement index into a pre-trained matching degree model, and obtaining a matching degree result output by the matching degree model, namely the matching degree index of an access article and a business scene in the original access data. Optionally, the display form of the matching degree index of the access item and the service scene may be determined according to the setting of the matching degree model, and specifically may be in a numerical form, for example, the matching degree index of the access item and the service scene is 8.
S130, determining associated access data associated with the service scene in the original access data based on the matching degree result, and caching the associated access data.
In this embodiment, after the matching degree index of the access item and the service scene in each original access data is obtained, the associated access item associated with the service scene is selected according to the matching degree index of the access item and the service scene in each original access data, and the original access data corresponding to the associated access item is taken as the associated access data to be cached.
The manner of selecting the associated access item/associated access data associated with the business scenario is not limited herein. Optionally, a matching degree threshold may be set, and the original access data corresponding to the matching degree index higher than the set matching degree threshold is used as the associated access data, where the matching degree threshold may be specifically set according to an actual application scenario; optionally, the matching degree indexes may be sorted, and associated access data may be selected from the original access data according to a sorting result.
In an embodiment of the present invention, determining associated access data associated with a service scenario in original access data based on a matching degree result includes: performing reverse ordering on the original access data based on the matching degree result; and selecting the original access data with the set caching quantity in the sequencing result as the associated access data. Optionally, the number N of caches may be preset, and when caching is performed, the original access data are sorted in a reverse order according to the matching degree index of the original access data and the service scene, and N original access data before sorting are used as associated access data to be cached. The set buffer amount can be directly set as a numerical value or a proportion. In consideration of the fact that the number of the original access data acquired at a time is not constant, a ratio may be set, and the set buffer number may be calculated based on the number of the original access data acquired at the time and the set ratio. For example, the ratio is set to 60%, and if the number of original access data is M, the number of caches is set to 0.6M.
For example, assuming that the original access data includes original access data 1, original access data 2, original access data 3, original access data 4, original access data 5, original access data 6, and original access data 7, a matching degree index of the original access data 1 and a service scene is 9, a matching degree index of the original access data 2 and the service scene is 9, a matching degree index of the original access data 3 and the service scene is 8, a matching degree index of the original access data 4 and the service scene is 7.5, a matching degree index of the original access data 5 and the service scene is 10, a matching degree index of the original access data 6 and the service scene is 9.5, and a matching degree index of the original access data 7 and the service scene is 9.2, the original access data is sorted in reverse order according to the matching degree index of each original access data and the service scene: original access data 5, original access data 6, original access data 7, original access data 1, original access data 2, original access data 3 and original access data 4, and if the caching number is set to be 4, selecting the original access data 5, the original access data 6, the original access data 7 and the original access data 1 as associated access data to cache.
The embodiment of the invention obtains the original access data and the matching degree model corresponding to the current service scene; inputting the original access data into a pre-trained matching degree model to obtain a matching degree result output by the matching degree model; determining associated access data associated with the service scene in the original access data based on the matching degree result, caching the associated access data, calculating the matching degree of the access data and the service scene through a matching degree model, and selecting the associated access data associated with the service scene from the original access data based on the matching degree for caching, so that the data in the cache is in accordance with the hot spot data of the service scene, and the cache hit rate is improved.
Example two
Fig. 2 is a flowchart of a data caching method according to a second embodiment of the present invention. On the basis of the above embodiments, the present embodiment adds an operation of training the matching degree model. As shown in fig. 2, the method includes:
s210, obtaining matching degree sample data, wherein the matching degree sample data comprises a sample business measurement index of a sample article and a matching degree index of the sample article and a business scene.
In this embodiment, before training the matching degree model, matching degree sample data required for training the matching degree model needs to be acquired. The matching degree sample data comprises a sample business measurement index of a sample article and a matching degree index of the sample article and a business scene. The sample business metric can be understood as a metric that affects the matching degree of the sample object and the business scenario, such as the type, brand, and business of the sample object. The matching degree index of the sample article and the service scene can be manually set according to experience, and can also be calculated through a sample service measurement index.
In an embodiment of the present invention, acquiring matching degree sample data includes: obtaining a sample business measurement index of a sample article; and determining the matching degree index of the sample article and the service scene according to the sample service measurement index and the preset service measurement index parameter. Optionally, in order to ensure that the matching degree index is not influenced by human subjectivity and ensure the accuracy of the matching degree index, the matching degree index can be calculated through a sample service measurement index. In this embodiment, when calculating the matching degree index of the sample article and the service scene, a service metric index parameter needs to be obtained in addition to the sample service metric index, and the service metric index parameter can be understood as the related influence degree of the sample service metric index.
Optionally, the number of the sample service metric indexes is multiple, the service metric index parameters include index values and index weight values of the sample service metric indexes, and the matching degree index of the sample object and the service scene is determined according to the sample service metric indexes and preset service metric index parameters, including: for each sample service measurement index, obtaining the measurement matching degree of the sample service measurement index according to the index value and the index weight value of the sample service measurement index; and determining the matching degree index of the sample article and the service scene based on the measurement matching degree of each sample service measurement index. The sample business measurement indexes can be specifically determined according to actual requirements, and on the assumption that the sample business measurement indexes comprise brands, categories and merchants, the measurement matching degree of each sample business measurement index is calculated, and then the matching degree indexes of the sample articles and the business scenes are calculated based on the measurement matching degree of each sample business measurement index. Specifically, an index weight value and an index value of each sample service metric index are preset, and a product of the index weight value and the index value is used as a measurement matching degree of the sample service metric index. And then, calculating a characteristic value of the measurement matching degree of each sample service measurement index as a matching degree index of the sample article and the service scene, wherein the characteristic value of the measurement matching degree of each sample service measurement index can be any one of summation, mean value, variance and the like of the measurement matching degree of each sample service measurement index.
On the basis of the scheme, the method for determining the matching degree index of the sample article and the service scene based on the measurement matching degree of each sample service measurement index comprises the following steps: and summing the measurement matching degrees of the sample service measurement indexes to obtain the matching degree indexes of the sample articles and the service scenes. Optionally, the sum of the metric matching degrees of the sample business metric indexes may be used as the matching degree index of the sample article and the business scenario. As an example, the sample business metric indexes include three indexes, i.e., a brand index weight of a sample item is 0.3, an index value is 9, a class index weight is 0.4, an index value is 8, a merchant index weight is 0.3, and an index value is 8, then the metric matching degree of the brand index is 0.3 × 9 ═ 0.27, the metric matching degree of the class index is 0.4 × 8 ═ 0.32, the metric matching degree of the merchant index is 0.3 × 8 ═ 0.24, and the metric luminance index of the sample item to the business scene is 0.27+0.32+0.24 ═ 0.83.
And S220, training the pre-constructed matching degree model by adopting the matching degree sample data to obtain the trained matching degree model.
And after the matching degree index of each sample article and the service scene is obtained, taking the sample service measurement index of the sample article and the matching degree index of the sample article as matching degree sample data, and training a pre-constructed matching degree model by adopting the matching degree sample data to obtain a trained matching degree model.
The matching degree model may adopt an existing neural network model, and the training of the matching degree model may adopt a model training mode in the prior art, which is not described herein again. In one embodiment, the matching degree model is a support vector machine model, and the kernel function in the support vector machine model is a gaussian radial basis function.
Optionally, after training the pre-constructed matching degree model by using the matching degree sample data to obtain the trained matching degree model, the method further includes: testing the trained matching degree model by using the matching degree measurement data to obtain a test result; and optimizing the kernel parameters in the Gaussian radial basis function by adopting a particle swarm algorithm based on the test result, and training based on the optimized kernel parameters to obtain a trained matching degree model.
The kernel parameters in the support vector machine model can be set according to experience, but in order to improve the accuracy of the matching degree model, the kernel parameters g can be optimized so as to realize the optimization of the matching degree model. It can be understood that the Radial Basis Function (RBF) Function only needs to determine the value of the kernel parameter g, which is more beneficial to parameter optimization, and in order to avoid the SVM optimization becoming more complex, the RBF kernel Function can be used as the kernel Function in the support vector machine.
Training a pre-constructed matching degree model by using matching degree sample data to obtain a trained matching degree model, inputting matching degree measurement data into the matching degree model to obtain a test result output by the matching degree model, optimizing a kernel parameter g by using a particle swarm algorithm based on the test result, taking the optimized kernel parameter as a kernel parameter of the kernel parameter in the matching degree model, performing training test and optimization on the matching degree model again, and iterating the processes until an iteration stop condition is met to obtain the optimized kernel parameter and the trained matching degree model.
In the process, the SVM and a cross validation method are optimized by adopting a particle swarm optimization algorithm.
Can be based on
Figure BDA0002982807060000091
Error was verified using k-fold cross-over.
Wherein, CAvTo classify accuracy, gammalTo classify the correct number γ f as the classification error number, the above parameters can be obtained by testing the results. The k-fold cross validation method comprises the following steps: randomly dividing data into k sizesAnd (d) equal and disjoint subsets, wherein (k-1) subsets are used as training sets, the rest subsets are test sets, and finally, the classification error number and the classification correct number are obtained, and the cross validation error is calculated according to the formula. 1-CavAnd as a fitness function, when the value of the fitness function is smaller than a set threshold value, the kernel parameter is determined as an optimized kernel parameter, and iteration can be stopped to obtain a trained matching degree model.
And S230, acquiring the original access data and a matching degree model corresponding to the current service scene.
S240, inputting the original access data into a pre-trained matching degree model to obtain a matching degree result output by the matching degree model.
And S250, determining associated access data associated with the service scene in the original access data based on the matching degree result, and caching the associated access data.
The embodiment of the invention is added with the operation of training the matching degree model on the basis of the embodiment, the matching degree index of the sample object and the service scene is determined according to the sample service measurement index and the preset service measurement index parameter, the accuracy of the matching degree sample data is improved, the accuracy of the matching degree model is further improved, and the accuracy of the matching degree model is further improved by optimizing the kernel parameter in the matching degree model by adopting the particle swarm optimization.
EXAMPLE III
Fig. 3 is a flowchart of a data caching method according to a third embodiment of the present invention. The present embodiment provides a preferred embodiment based on the above-described scheme. In this embodiment, the matching degree model is embodied as a Support Vector Machine (SVM) model. As shown in fig. 3, the method includes: selecting commodity data service measurement indexes, setting data index judgment standards, setting SVM model training samples, training and optimizing SVM models, predicting hot commodity data by the SVM models and writing the hot commodity data into a refined cache.
(1) Selected commodity data business measurement index
Because e-commerce system marketing campaigns are often targeted to certain brands, categories, merchants, and the like. Therefore, the brand, the category, the merchant information and the like of the commodity are selected as the business measurement indexes.
(2) Setting data index judgment standard
And determining an index judgment standard according to the marketing activity theme and the marketing activity strength. The specific grade range and the reference standard data can be flexibly configured according to the promotion scene and the activity. In addition, the weight of the index may be set to prioritize. Table 1 schematically shows the index values and weight values (i.e., index weight values) corresponding to each brand, category, and merchant.
TABLE 1
Figure BDA0002982807060000111
(3) Setting SVM model training sample
And giving model training samples corresponding to different standards according to the index standard. The service matching degree of each sample article is schematically shown in table 2.
TABLE 2
Commodity numbering device Brand Articles and the like Business company Degree of service matching
1 Millet Household electrical appliance Millet household electrical appliance monopoly store 10
2 Millet Household electrical appliance Millet flagship store 9.7
3 Huawei Household electrical appliance Hua is official flagship shop 9.1
4 Millet Mobile phone communication Millet flagship store 8.9
5 Millet Computer office Millet flagship store 8.9
6 Huawei Mobile phone communication Hua is official flagship shop 8.3
7 Millet Household daily use Millet flagship store 7.7
100
It should be noted that the matching sample data needs to cover different range standards of each index as much as possible to ensure that various combination scenarios can be hit.
The specific numerical calculation formula of the service matching degree in the table is as follows: index value of business matching degree ═ sigma sample data index hit ═ weight
(4) Training and optimizing SVM model
And (4) training a test model and optimizing parameters by adopting the training sample in the step (3).
Assume that the known sample set is: t { (x)1,y1),...,(xi,yi)}∈(X·Y)l
Wherein xi∈X=RnFor input of feature vectors, viE.y {1, -1} (i 1, 2.., l) denotes the corresponding output vector, 1 and-1 denote positive and negative samples, respectively, and l denotes the sample. And projecting the samples from the input space to a high-dimensional feature space through a nonlinear mapping function, and establishing an optimal classification surface in the high-dimensional feature space. In order to make the classification surface have higher classification accuracy and larger classification interval, the SVM is changed from the classification problem to an optimal solution problem:
Figure BDA0002982807060000121
xi in the formulaiIs a relaxation variable for measuring the true value yiAnd the distance between the output of the support vector machine. C is a penalty factor used for limiting the penalty degree of sample classification errors, and b is a threshold value. To solve equation (1), a lagrange function is introduced, and the above solution process is converted into a dual problem:
Figure BDA0002982807060000131
wherein alpha isiIs corresponding to xiLagrange multiplier of, K (x)i·yi) Is a kernel function used in SVM training to map the inner product to a feature space.
In this embodiment, to improve the accuracy of the support vector machine model, the kernel parameters of the kernel function in the support vector machine are optimized. The RBF kernel function only needs to determine the value of g, which is more beneficial to parameter optimization, and the RBF kernel function is adopted in the embodiment in order to avoid the complexity of SVM optimization. The RBF kernel function is: k (x)i·xj)=exp(-|xi-xj|/2g2) Wherein g is a nuclear parameter.
And (4) in the step (3), the commodities numbered 1 to 100 and the corresponding brands, categories and merchants are the sample set X, and the service matching degree 1 to 10 is the output vector Y. Dividing 100 sample sets into 50 training sets and 50 test sets, training an SVM model by adopting the training sets, calculating the accuracy of the SVM model by adopting the test sets, wherein a punishment parameter C and a kernel function parameter g in the SVM are two parameters which have a crucial effect on the classification accuracy, and carrying out SVM training by optimizing C, g to obtain a parameter and a model for obtaining higher accuracy. The PSO (particle swarm optimization) is used to optimize the SVM and a Cross Validation (CV) method. Namely, adopting k-fold cross validation error 1-CA according to formula (3)vAs a fitness function.
Figure BDA0002982807060000132
Wherein CAvTo classify accuracy, gammalTo classify the correct number gammafThe number of classification errors. The k-fold cross validation method comprises the following steps: dividing the data into k subsets with equal size and no intersection randomly, using (k-1) subsets as training set, obtaining classification error number and classification correct number finally by the rest subsets as test set, and solving cross validation error.
Optimization tests are performed on the goal of obtaining the commodities with the first three matching degrees, wherein the results obtained by the model before optimization are shown in the table 3, the results obtained by the model after optimization are shown in the table 3, and the visible accuracy is obviously improved.
TABLE 3
Commodity numbering device Brand Articles and the like Business company Degree of service matching Predicted results
1 Millet Household electrical appliance Millet household electrical appliance monopoly store 10 10
2 Millet Household electrical appliance Millet flagship store 9.7 9
3 Huawei Household electrical appliance Hua is official flagship shop 9.1 9
4 Millet Mobile phone communication Millet flagship store 8.9 9
5 Millet Computer office Millet flagship store 8.9 9
6 Huawei Mobile phone communication Hua is official flagship shop 8.3 8
7 Millet Household daily use Millet flagship store 7.7 7
100
TABLE 4
Commodity numbering device Brand Articles and the like Business company Degree of service matching Predicted results
1 Millet Household electrical appliance Millet household electrical appliance monopoly store 10 10
2 Millet Household electrical appliance Millet flagship store 9.7 9.5
3 Huawei Household electrical appliance Hua is official flagship shop 9.1 9.2
4 Millet Mobile phone communication Millet flagship store 8.9 9
5 Millet Computer office Millet flagship store 8.9 9
6 Huawei Mobile phone communication Hua is official flagship shop 8.3 8
7 Millet Household daily use Millet flagship store 7.7 7.5
100
(5) SVM model prediction hot commodity data
Assuming that the number of the commodity cache sets is N, commodity data recently accessed by a user is used as input and accessed into an SVM model to obtain service matching degree corresponding to the commodity data, and the N commodity data are screened out from high to low according to the grade of the service matching degree, namely a hot commodity data set which is about to be accessed by the user at a high probability is predicted.
(6) Write to refinement cache
And (5) writing the hot commodity data set obtained in the step (5) into a refined local cache, and preferentially inquiring the local cache commodity data for returning when a user requests commodity information.
According to the embodiment of the invention, the commodity data measurement index is set according to the promotion activity of the e-commerce system, and hot commodity data is predicted to be written into the refined cache through the support vector machine model, so that the cache hit rate is improved, and the system performance is improved.
Example four
Fig. 4 is a schematic structural diagram of a data caching apparatus according to a fourth embodiment of the present invention. The data caching device can be implemented in software and/or hardware, for example, the data caching device can be configured in a computer device. As shown in fig. 4, the apparatus includes a data obtaining module 410, a matching degree determining module 420, and a data caching module 430, wherein:
a data obtaining module 410, configured to obtain original access data and a matching degree model corresponding to a current service scenario;
the matching degree determining module 420 is configured to input the original access data into a pre-trained matching degree model to obtain a matching degree result output by the matching degree model;
and the data caching module 430 is configured to determine, based on the matching degree result, associated access data associated with the service scenario in the original access data, and cache the associated access data.
The embodiment of the invention obtains original access data and a matching degree model corresponding to a current service scene through a data obtaining module; the matching degree determining module inputs the original access data into a pre-trained matching degree model to obtain a matching degree result output by the matching degree model; the data caching module determines associated access data associated with the service scene in the original access data based on the matching degree result, caches the associated access data, calculates the matching degree of the access data and the service scene through a matching degree model, and selects the associated access data associated with the service scene from the original access data based on the matching degree to cache, so that the data in the cache is matched with the hot spot data of the service scene, and the cache hit rate is improved.
Optionally, on the basis of the foregoing scheme, the data caching module 430 is specifically configured to:
performing reverse ordering on the original access data based on the matching degree result;
and selecting the original access data with the set caching quantity in the sequencing result as the associated access data.
Optionally, on the basis of the above scheme, the apparatus further includes a model training module, including:
the system comprises a sample data acquisition unit, a matching degree detection unit and a matching degree detection unit, wherein the sample data acquisition unit is used for acquiring matching degree sample data which comprises a sample service measurement index of a sample article and a matching degree index of the sample article and a service scene;
and the model training unit is used for training the pre-constructed matching degree model by adopting the matching degree sample data to obtain the trained matching degree model.
Optionally, on the basis of the foregoing scheme, the sample data obtaining unit is specifically configured to:
obtaining a sample business measurement index of a sample article;
and determining the matching degree index of the sample article and the service scene according to the sample service measurement index and the preset service measurement index parameter.
Optionally, on the basis of the foregoing scheme, the number of the sample service metric indexes is multiple, the service metric index parameter includes an index value and an index weight value of the sample service metric index, and the sample data obtaining unit is specifically configured to:
for each sample service measurement index, obtaining the measurement matching degree of the sample service measurement index according to the index value and the index weight value of the sample service measurement index;
and determining the matching degree index of the sample article and the service scene based on the measurement matching degree of each sample service measurement index.
Optionally, on the basis of the foregoing scheme, the sample data obtaining unit is specifically configured to:
and summing the measurement matching degrees of the sample service measurement indexes to obtain the matching degree indexes of the sample articles and the service scenes.
Optionally, on the basis of the above scheme, the matching degree model is a support vector machine model, a kernel function in the support vector machine model is a gaussian radial basis function, and the apparatus further includes a model optimization module, configured to:
after the matching degree model which is constructed in advance is trained by adopting the matching degree sample data to obtain the trained matching degree model, testing the trained matching degree model by adopting the matching degree measurement data to obtain a test result;
and optimizing the kernel parameters in the Gaussian radial basis function by adopting a particle swarm algorithm based on the test result, and training based on the optimized kernel parameters to obtain a trained matching degree model.
The data caching device provided by the embodiment of the invention can execute the data caching method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
Fig. 5 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention. Fig. 5 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention. FIG. 5 illustrates a block diagram of an exemplary computer device 512 suitable for use in implementing embodiments of the present invention. The computer device 512 shown in FIG. 5 is only an example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.
As shown in FIG. 5, computer device 512 is in the form of a general purpose computing device. Components of computer device 512 may include, but are not limited to: one or more processors 514, a system memory 528, and a bus 518 that couples the various system components including the system memory 528 and the processors 514.
Bus 518 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and processor 514, or a local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 512 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 512 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 528 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)530 and/or cache memory 532. The computer device 512 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage 534 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 518 through one or more data media interfaces. Memory 528 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 540 having a set (at least one) of program modules 542, including but not limited to an operating system, one or more application programs, other program modules, and program data, may be stored in, for example, the memory 528, each of which examples or some combination may include an implementation of a network environment. The program modules 542 generally perform the functions and/or methods of the described embodiments of the invention.
The computer device 512 may also communicate with one or more external devices 514 (e.g., keyboard, pointing device, display 524, etc.), with one or more devices that enable a user to interact with the computer device 512, and/or with any devices (e.g., network card, modem, etc.) that enable the computer device 512 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 522. Also, computer device 512 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 520. As shown, the network adapter 520 communicates with the other modules of the computer device 512 via the bus 518. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the computer device 512, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processor 514 executes programs stored in the system memory 528 to execute various functional applications and data processing, for example, implement a data caching method provided by the embodiment of the present invention, the method includes:
acquiring original access data and a matching degree model corresponding to a current service scene;
inputting the original access data into a pre-trained matching degree model to obtain a matching degree result output by the matching degree model;
and determining associated access data associated with the service scene in the original access data based on the matching degree result, and caching the associated access data.
Of course, those skilled in the art can understand that the processor may also implement the technical solution of the data caching method provided by any embodiment of the present invention.
EXAMPLE six
The sixth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the data caching method provided in the sixth embodiment of the present invention, and the method includes:
acquiring original access data and a matching degree model corresponding to a current service scene;
inputting the original access data into a pre-trained matching degree model to obtain a matching degree result output by the matching degree model;
and determining associated access data associated with the service scene in the original access data based on the matching degree result, and caching the associated access data.
Of course, the computer program stored on the computer-readable storage medium provided in the embodiments of the present invention is not limited to the above method operations, and may also perform the operations related to the data caching method provided in any embodiments of the present invention.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments illustrated herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for caching data, comprising:
acquiring original access data and a matching degree model corresponding to a current service scene;
inputting the original access data into a pre-trained matching degree model to obtain a matching degree result output by the matching degree model;
and determining associated access data associated with the service scene in the original access data based on the matching degree result, and caching the associated access data.
2. The method of claim 1, wherein the determining associated access data associated with the service scenario in the raw access data based on the matching result comprises:
sorting the original access data in a reverse order based on the matching degree result;
and selecting the original access data with the set caching quantity in the sequencing result as the associated access data.
3. The method of claim 1, further comprising:
obtaining matching degree sample data, wherein the matching degree sample data comprises a sample business measurement index of a sample article and a matching degree index of the sample article and a business scene;
and training a pre-constructed matching degree model by adopting the matching degree sample data to obtain a trained matching degree model.
4. The method of claim 3, wherein the obtaining matching degree sample data comprises:
obtaining a sample business measurement index of a sample article;
and determining the matching degree index of the sample article and the service scene according to the sample service measurement index and a preset service measurement index parameter.
5. The method according to claim 4, wherein the number of the sample business metric indexes is plural, the business metric index parameters include index values and index weight values of the sample business metric indexes, and the determining the matching degree index of the sample object and the business scenario according to the sample business metric indexes and preset business metric index parameters includes:
for each sample service measurement index, obtaining the measurement matching degree of the sample service measurement index according to the index value of the sample service measurement index and the index weight value;
and determining the matching degree index of the sample article and the business scene based on the measurement matching degree of each sample business metric index.
6. The method of claim 5, wherein determining the match metric for the sample item to the business scenario based on the metric match metric for each of the sample business metric metrics comprises:
and summing the measurement matching degrees of the sample service measurement indexes to obtain the matching degree index of the sample article and the service scene.
7. The method according to claim 3, wherein after training the pre-constructed matching degree model by using the matching degree sample data to obtain a trained matching degree model, the method further comprises:
testing the trained matching degree model by using the matching degree measurement data to obtain a test result;
and optimizing the kernel parameters in the matching degree model by adopting a particle swarm algorithm based on the test result, and training based on the optimized kernel parameters to obtain the trained matching degree model.
8. A data caching apparatus, comprising:
the data acquisition module is used for acquiring original access data and a matching degree model corresponding to the current service scene;
the matching degree determining module is used for inputting the original access data into a pre-trained matching degree model to obtain a matching degree result output by the matching degree model;
and the data caching module is used for determining associated access data associated with the service scene in the original access data based on the matching degree result and caching the associated access data.
9. A computer device, the device comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a data caching method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data caching method as claimed in any one of claims 1 to 7.
CN202110292403.8A 2021-03-18 2021-03-18 Data caching method, device, equipment and storage medium Pending CN113076339A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110292403.8A CN113076339A (en) 2021-03-18 2021-03-18 Data caching method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110292403.8A CN113076339A (en) 2021-03-18 2021-03-18 Data caching method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113076339A true CN113076339A (en) 2021-07-06

Family

ID=76613175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110292403.8A Pending CN113076339A (en) 2021-03-18 2021-03-18 Data caching method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113076339A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023046059A1 (en) * 2021-09-24 2023-03-30 中国第一汽车股份有限公司 Cache warmup method and apparatus, and computer device and storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106055690A (en) * 2016-06-08 2016-10-26 杭州电子科技大学 Method for carrying out rapid retrieval and acquiring data features on basis of attribute matching
CN107124630A (en) * 2017-03-30 2017-09-01 华为技术有限公司 The method and device of node data management
CN107292388A (en) * 2017-06-27 2017-10-24 郑州云海信息技术有限公司 A kind of Forecasting Methodology and system of the hot spot data based on neutral net
US20180060738A1 (en) * 2014-05-23 2018-03-01 DataRobot, Inc. Systems and techniques for determining the predictive value of a feature
WO2018075995A1 (en) * 2016-10-21 2018-04-26 DataRobot, Inc. Systems for predictive data analytics, and related methods and apparatus
CN108446340A (en) * 2018-03-02 2018-08-24 哈尔滨工业大学(威海) A kind of user's hot spot data access prediction technique towards mass small documents
CN108733756A (en) * 2018-04-11 2018-11-02 北京三快在线科技有限公司 Data preload method, apparatus, electronic equipment and readable storage medium storing program for executing
CN108762684A (en) * 2018-06-04 2018-11-06 平安科技(深圳)有限公司 Hot spot data migrates flow control method, device, electronic equipment and storage medium
CN111090674A (en) * 2019-12-28 2020-05-01 安徽微沃信息科技股份有限公司 Search engine system based on hot words and cache
CN111339048A (en) * 2020-02-28 2020-06-26 京东数字科技控股有限公司 Cache reading amount adjusting method and device, electronic equipment and storage medium
CN111949890A (en) * 2020-09-27 2020-11-17 平安科技(深圳)有限公司 Data recommendation method, equipment, server and storage medium based on medical field
CN112101335A (en) * 2020-08-25 2020-12-18 深圳大学 APP violation monitoring method based on OCR and transfer learning
CN112118295A (en) * 2020-08-21 2020-12-22 深圳大学 File caching method and device, edge node and computer readable storage medium
CN112130759A (en) * 2020-09-04 2020-12-25 苏州浪潮智能科技有限公司 Parameter configuration method, system and related device of storage system

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180060738A1 (en) * 2014-05-23 2018-03-01 DataRobot, Inc. Systems and techniques for determining the predictive value of a feature
CN106055690A (en) * 2016-06-08 2016-10-26 杭州电子科技大学 Method for carrying out rapid retrieval and acquiring data features on basis of attribute matching
WO2018075995A1 (en) * 2016-10-21 2018-04-26 DataRobot, Inc. Systems for predictive data analytics, and related methods and apparatus
CN107124630A (en) * 2017-03-30 2017-09-01 华为技术有限公司 The method and device of node data management
CN107292388A (en) * 2017-06-27 2017-10-24 郑州云海信息技术有限公司 A kind of Forecasting Methodology and system of the hot spot data based on neutral net
CN108446340A (en) * 2018-03-02 2018-08-24 哈尔滨工业大学(威海) A kind of user's hot spot data access prediction technique towards mass small documents
CN108733756A (en) * 2018-04-11 2018-11-02 北京三快在线科技有限公司 Data preload method, apparatus, electronic equipment and readable storage medium storing program for executing
CN108762684A (en) * 2018-06-04 2018-11-06 平安科技(深圳)有限公司 Hot spot data migrates flow control method, device, electronic equipment and storage medium
CN111090674A (en) * 2019-12-28 2020-05-01 安徽微沃信息科技股份有限公司 Search engine system based on hot words and cache
CN111339048A (en) * 2020-02-28 2020-06-26 京东数字科技控股有限公司 Cache reading amount adjusting method and device, electronic equipment and storage medium
CN112118295A (en) * 2020-08-21 2020-12-22 深圳大学 File caching method and device, edge node and computer readable storage medium
CN112101335A (en) * 2020-08-25 2020-12-18 深圳大学 APP violation monitoring method based on OCR and transfer learning
CN112130759A (en) * 2020-09-04 2020-12-25 苏州浪潮智能科技有限公司 Parameter configuration method, system and related device of storage system
CN111949890A (en) * 2020-09-27 2020-11-17 平安科技(深圳)有限公司 Data recommendation method, equipment, server and storage medium based on medical field

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱碧钦;吴飞;罗富财;: "基于大数据的全业务统一数据中心数据分析域建设研究", 电力信息与通信技术, no. 02, 15 February 2017 (2017-02-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023046059A1 (en) * 2021-09-24 2023-03-30 中国第一汽车股份有限公司 Cache warmup method and apparatus, and computer device and storage medium

Similar Documents

Publication Publication Date Title
CN109902708B (en) Recommendation model training method and related device
CN109558951B (en) Method and device for detecting fraud account and storage medium thereof
CN110866181B (en) Resource recommendation method, device and storage medium
JP6422617B2 (en) Network access operation identification program, server, and storage medium
JP2020177670A (en) Information recommendation method, information recommendation device, equipment, and medium
CN110427560B (en) Model training method applied to recommendation system and related device
CN107464132B (en) Similar user mining method and device and electronic equipment
CN107766573B (en) Commodity recommendation method, commodity recommendation device, commodity recommendation equipment and storage medium based on data processing
CN106709318A (en) Recognition method, device and calculation equipment for user equipment uniqueness
CN111612581A (en) Method, device and equipment for recommending articles and storage medium
CN111966886A (en) Object recommendation method, object recommendation device, electronic equipment and storage medium
CN111400600A (en) Message pushing method, device, equipment and storage medium
CN111626767B (en) Resource data issuing method, device and equipment
CN111861605A (en) Business object recommendation method
TW202030592A (en) Damage determination method and apparatus for maintenance object, and an electronic device
CN113076339A (en) Data caching method, device, equipment and storage medium
CN113822693A (en) Method, device, equipment and storage medium for generating user purchasing power evaluation value
CN112435068A (en) Malicious order identification method and device, electronic equipment and storage medium
CN115758271A (en) Data processing method, data processing device, computer equipment and storage medium
CN110297989B (en) Test method, device, equipment and medium for anomaly detection
CN113762675A (en) Information generation method, device, server, system and storage medium
CN113792952A (en) Method and apparatus for generating a model
CN110347905B (en) Method, device and storage medium for determining information association degree and information recommendation
CN113837846B (en) Commodity recommendation method, commodity recommendation device, computer equipment and storage medium
CN111598638A (en) Click rate determination method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination