CN117112625A

CN117112625A - Data spam method, device and equipment based on multi-level cache distribution

Info

Publication number: CN117112625A
Application number: CN202311074840.8A
Authority: CN
Inventors: 郭进
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-08-24
Filing date: 2023-08-24
Publication date: 2023-11-24

Abstract

The embodiment of the specification discloses a data spam method, a device and equipment based on multi-level cache distribution. The scheme comprises the following steps: receiving a service request generated based on a service scene, and acquiring corresponding user characteristics according to the service request; constructing a user session identifier corresponding to the current session according to the user characteristics; accessing the distributed cache based on the user session identifier, and providing personalized spam data for the service request according to the first cache data when the user session identifier hits the first cache data in the distributed cache; if the user session identifier does not hit the first cache data and personalized recall is not successfully performed according to the user characteristics, obtaining scene characteristics corresponding to the service scene, and constructing a corresponding scene identifier according to the scene characteristics; accessing the local cache based on the scene identifier, and providing generalized spam data for the service request according to the second cache data when the scene identifier hits the second cache data in the local cache.

Description

Data spam method, device and equipment based on multi-level cache distribution

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to a method, an apparatus, and a device for data spam based on multi-level cache distribution.

Background

With the development of computer and internet technologies, more and more internet applications use a micro-service architecture to execute services, so that for a function facing a user, a plurality of micro-service applications are usually required to cooperate to realize at a server. However, in the actual service execution process, the cooperative work between the micro service applications is not reliable due to abnormal reasons such as time-out, failure, etc., and it is difficult to avoid failure of service execution.

For example, in the information flow recommendation scene, four links of recall, coarse ranking, fine ranking and rearrangement are generally divided, and the information flow recommendation is cooperatively realized by one or more micro-service applications corresponding to each link. The recall and the refined ranking have relatively high complexity, if the recall link fails, the follow-up link cannot be performed, and if the refined ranking link fails, personalized recommendation cannot be realized.

In view of this, a solution is needed that can smooth the data in the event of an anomaly.

Disclosure of Invention

One or more embodiments of the present disclosure provide a method, an apparatus, a device, and a storage medium for data spam based on multi-level cache distribution, so as to solve the following technical problems: there is a need for a solution that can smooth the bottom of the data in the event of an anomaly.

To solve the above technical problems, one or more embodiments of the present specification are implemented as follows:

one or more embodiments of the present disclosure provide a data spam method based on multi-level cache distribution, including:

receiving a service request generated based on a service scene, and acquiring corresponding user characteristics according to the service request;

constructing a user session identifier corresponding to the current session according to the user characteristics;

accessing a distributed cache based on the user session identifier, and providing personalized spam data for the service request according to first cache data when the user session identifier hits the first cache data in the distributed cache;

if the user session identifier does not hit the first cache data and personalized recall is not successfully performed according to the user characteristics, obtaining scene characteristics corresponding to the service scene, and constructing a corresponding scene identifier according to the scene characteristics;

accessing a local cache based on the scene identifier, and providing generalized spam data for the service request according to second cache data when the scene identifier hits the second cache data in the local cache.

One or more embodiments of the present disclosure provide a data spam device based on multi-level cache distribution, including:

the request receiving module receives a service request generated based on a service scene and acquires corresponding user characteristics according to the service request;

the first identifier construction module constructs a user session identifier corresponding to the current session according to the user characteristics;

the personalized spam module is used for accessing the distributed cache based on the user session identifier, and providing personalized spam data for the service request according to the first cache data when the user session identifier hits the first cache data in the distributed cache;

the second identifier construction module is used for acquiring scene characteristics corresponding to the service scene and constructing a corresponding scene identifier according to the scene characteristics if the user session identifier is not hit in the first cache data and personalized recall is not successfully performed according to the user characteristics;

and the generalized spam module accesses the local cache based on the scene identifier, and provides generalized spam data for the service request according to the second cache data when the scene identifier hits the second cache data in the local cache.

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to:

One or more embodiments of the present specification provide a non-volatile computer storage medium storing computer-executable instructions configured to:

The above-mentioned at least one technical solution adopted by one or more embodiments of the present disclosure can achieve the following beneficial effects:

Based on the multi-level cache storage spam data, the personalized spam data is converted into generalized spam data under the condition that the personalized spam data cannot be spam, so that smooth spam can be realized, and spam failure caused by single cache failure is prevented. The personalized spam data based on the combination of the caching and recall conditions of the user dimension is essentially data recalled in quasi-real time according to the user characteristics, and the recall data has high quality and can be comparable to real-time recall data

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some of the embodiments described in the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow diagram of a method for data spam based on multi-level cache distribution according to one or more embodiments of the present disclosure;

FIG. 2 is a flow chart of a method for data spam based on multi-level cache distribution in an application scenario according to one or more embodiments of the present disclosure;

Fig. 3 is a schematic illustration of multi-level caching in an application scenario according to one or more embodiments of the present disclosure;

FIG. 4 is a schematic diagram illustrating writing update of personalized spam in an application scenario according to one or more embodiments of the present disclosure;

FIG. 5 is a schematic diagram illustrating a structure of a data spam device based on multi-level cache distribution according to one or more embodiments of the present disclosure;

fig. 6 is a schematic structural diagram of a data spam device based on multi-level cache distribution according to one or more embodiments of the present disclosure.

Detailed Description

The embodiment of the specification provides a data spam method, a device, equipment and a storage medium based on multi-level cache distribution.

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

In order to solve the problem of recall failure caused by abnormal conditions, a data spam scheme is provided: and when the recall fails, the universal spam data is directly read from the local cache or the database to carry out spam. This scheme, while implementing data spam, still has the following drawbacks: 1. the spam data is poor in quality, and the spam data corresponding to all users is the same on one side of thousands of people, so that the spam data is extremely unfriendly for personalized recommendation. 2. Reliability is inadequate, with only a single level of cache spam, which fails once local cache misses.

Based on this, fig. 1 is a flow chart of a data spam method based on multi-level cache distribution according to one or more embodiments of the present disclosure. The method can be applied to different business fields, such as the internet financial business field, the electric business field, the instant messaging business field, the game business field, the public business field and the like. The process may be performed by computing devices in the respective domain (e.g., commodity recommendation servers for the business of the electric motor, etc.), with certain input parameters or intermediate results in the process allowing for manual intervention adjustments to help improve accuracy.

The flow in fig. 1 may include the steps of:

s102: and receiving a service request generated based on the service scene, and acquiring corresponding user characteristics according to the service request.

Fig. 2 is a flow chart of a data spam method based on multi-level cache distribution in an application scenario according to one or more embodiments of the present disclosure. The explanation is made below with reference to fig. 1 and 2.

The service scenario mainly refers to an information flow recommendation scenario, and when a user executes a service in the information flow recommendation scenario, a service request is triggered. For example, in the e-commerce platform, the user inputs required commodity information by means of keyword input, photo shooting and the like, a service request is triggered, and the e-commerce platform displays corresponding commodity information for the user based on the service request. Or, the user inputs a keyword in the search engine, the service request is triggered, and the search engine displays a corresponding website for the user based on the keyword. Or, the user inputs the needed applet and service in the application program, and triggers the service request, and the application program displays the corresponding applet or service for the user.

Fig. 3 is a schematic illustration of multi-level caching in an application scenario according to one or more embodiments of the present disclosure. As shown in fig. 3, the user characteristics may include the user's liveness, age, city, rating, etc. The activity level is the activity level of the service system where the user logs in the service scene, for example, in the e-commerce platform, as the user recently browses, searches and purchases goods, the activity level of the user increases. The grade may be the grade of the user in the service system, for example, in the e-commerce platform, as the number of purchases of the commodity and the purchase amount of the user increase, the grade of the commodity also increases. Of course, the user characteristics may also include dimensions of user gender, preferences, and the like.

S104: and constructing a user session identifier corresponding to the current session according to the user characteristics.

As shown in fig. 2, a user session identification (also referred to as a user feature key) is used to uniquely determine the user's current session. In order to ensure the uniqueness, a User ID (also called User ID) which uniquely represents the User is generated in advance for each User, the User ID can also belong to a part of User characteristics, when the User interacts with a service system through a client, a corresponding Session is generated, a corresponding Session ID (also called Session ID) is generated for each Session, after the User ID is spliced with the Session ID, a User Session identification key is obtained, and the User Session identification key has uniqueness and can represent the current Session of a unique User.

S106: accessing a distributed cache based on the user session identifier, and providing personalized spam data for the service request according to first cache data when the user session identifier hits the first cache data in the distributed cache.

As shown in FIG. 3, personalized recall conditions can be generated based on user characteristics, wherein, materials (also called items) are selected from a material library (such as a commodity library, a rights library and the like) to enter a recommendation pool based on certain rules, and then are updated periodically through replacement rules. For example, the e-commerce platform can construct a recommendation pool based on the near 30-day volume, the price of the commodity in the category, etc., and the short video platform can construct a recommendation pool based on the release time, the near 7-day play volume, etc. The recommendation pools are typically built off-line on a regular basis. Selecting a certain amount (usually about thousands to tens of thousands) of materials from the recommendation pool, and sending the materials to a subsequent sorting module, wherein the selected process is a recall process, and a multi-way recall solution model is used for recall. The personalized recall condition refers to adding a corresponding constraint condition for recall, and the constraint condition is obtained based on user characteristics, for example, selecting good-sales commodities in the age range of the user, or selecting popular commodities in the city of the user, and the like. Thus, based on the data obtained by recall of the personalized recall condition, the user is naturally more fit with the user.

As shown in fig. 2, after the user feature is obtained according to the service request, a corresponding personalized recall condition may be generated according to the user feature (for convenience of description, the personalized recall condition generated according to the user feature is referred to as a first personalized recall condition). And accessing the distributed cache according to the user session identifier after the user session identifier is obtained.

The distributed cache not only stores personalized recall conditions and corresponding recall results thereof, but also stores user session identifications corresponding to each personalized recall condition. And if the user session identifier corresponding to the current session is consistent with the user session identifier stored in the distributed cache, the user session identifier is considered to hit the distributed cache, and the user session identifier can also be called as cache data in the hit distributed cache (for convenience of description, the cache data is called as first cache data, and the first cache data comprises the user session identifier, the personalized recall condition and the corresponding recall result).

If the distributed cache is hit, the user is informed that personalized recall is successfully realized in the session, and corresponding data is written into the distributed cache. At this time, based on the first cache data, the first personalized recall condition is perfected, and a second personalized recall condition is obtained.

At this point, based on the second personalized recall condition, personalized recall in the recommendation pool is attempted. If the attempt fails, it is indicated that the service request is difficult to obtain the required personalized data through a conventional recall way, and because the personalized recall is successfully implemented in the session and the corresponding data is written into the distributed cache, the recall result of the part of data can be used as personalized spam data corresponding to the service request to continue to execute the corresponding process (for example, to perform the sequencing process).

If the personalized recall is successfully performed in the recommendation pool based on the second personalized recall condition, the obtained recall result can be used as personalized data corresponding to the service request, no spam is needed, and the flow is directly continuously performed through the personalized data.

Fig. 4 is a schematic diagram of writing update of personalized spam data in an application scenario according to one or more embodiments of the present disclosure. And when the business is executed, the writing process of the spam data can be synchronously executed, and at the moment, a new recall result is obtained through a second personalized recall condition, so that the old recall result obtained by the user in the current session can be updated in the distributed cache, and the latest recall result is stored in the distributed cache and is used as the new personalized spam data. Of course, if the personalized recall is not successfully performed in the recommendation pool based on the second personalized recall condition, the data cannot be written in the distributed cache, and the writing process of the personalized spam data is ended.

S108: if the user session identifier does not hit the first cache data and personalized recall is not successfully performed according to the user characteristics, obtaining scene characteristics corresponding to the service scene, and constructing a corresponding scene identifier according to the scene characteristics.

And when the personalized recall fails, entering a generalized recall stage. The generalized data obtained in the generalized stage is not more matched with the selection of the user than the personalized data, but can still provide the user with some general selections in the scene, so that the basic experience of the user is ensured.

As shown in fig. 2, during service execution, if the user session identifier does not hit in the first cache data in the distributed cache (including that in the present session, personalized recall is not performed, or that the personalized recall is attempted to be performed fails), based on the first personalized recall condition, the personalized recall is attempted in the recommendation pool. If the recall is successful, the obtained recall result can be used as personalized data corresponding to the service request, no spam is needed, and the flow is directly continuously executed through the personalized data. If the recall fails, the general recall process is entered.

As shown in fig. 4, during the service execution process, the procedure of writing spam data may be synchronously executed, and when the recall is successful, since the user session identifier does not hit the first cache data in the distributed cache, the corresponding personalized spam data does not exist in the distributed cache yet, so that the obtained recall result is directly written into the distributed cache as the personalized spam data. And when the recall fails, ending the writing process of the personalized spam data.

As shown in fig. 3, the scene features may include content of a channel, inventory, category, status, etc. of the user. In the service scenario, the channel is a source channel of data, for example, in an e-commerce platform, the channel can be a goods source provider of goods, and in a search engine, the channel can be a source website of search data. The categories may be planned in advance, for example, in the electronic commerce platform, the categories of the commodities are sequentially set into major categories, middle categories, minor categories and commodity details from large to small, for example, the commodities are divided into major categories such as hardware categories, chemical categories, foods and aquatic products, the major categories corresponding to the foods include vegetables and fruits, meat and meat products, milk and milk products, eggs and egg products, and the like, the minor categories corresponding to the meat and meat products include pork and pork products, beef and beef products, and the like, and the minor categories corresponding to the white spirit are further subdivided according to specifications, colors and grades of the commodities to obtain the commodity details. Similarly, the state may be planned in advance, for example, the state of the commodity is set to a normal state, a new product trial, a new product evaluation, a temporary prohibition of ordering, and the like in the e-commerce platform.

As shown in fig. 2, a scene identification (which may also be referred to as a scene feature key) is used to uniquely determine the current scene. A scene ID (also referred to as Scenario ID) uniquely identifying each scene is generated in advance for each scene, and a scene identification is obtained from the scene ID for uniquely representing the scene.

S110: accessing a local cache based on the scene identifier, and providing generalized spam data for the service request according to second cache data when the scene identifier hits the second cache data in the local cache.

As shown in fig. 2, if the scene identifier hits the second cache data in the local cache, it is indicated that the local cache already stores the generalized spam data in the scene, so that the generalized spam data can be directly adopted to continue to execute the corresponding flow.

For personalized recall, because the content required by the user may be different in each session, there may be a great difference in the content required even in one session, for example, the user searches for a plurality of different commodities in turn during one session. At this point, the personalized recall data required by the user may be different in the same session (i.e., corresponding to the same user session identification).

In order to ensure user experience, accurate personalized recall of the user is realized, whether the user session identifier hits the distributed cache or not, personalized recall is tried through personalized recall conditions, and the most timely and accurate personalized data is obtained. Therefore, after the user characteristics are obtained, personalized recall conditions can be directly generated, and subsequent personalized recall is facilitated.

Compared with personalized recall, generalized recall for business scenes is deeply bound with the scenes, and is less relevant to the behavior and operation of the user. Therefore, after the scene identifier hits the local cache, even if the scene feature is utilized to generate the generalized recall condition, the generalized recall is attempted through the generalized recall condition, and the difference between the finally obtained recall result and the generalized spam data stored in the local cache is small, even completely consistent.

Therefore, when the scene identifier hits the local cache, the generalized spam data is directly generated according to the second cache data, and generalized recall is not needed. The local cache is similar to the distributed cache, and is stored with generalized recall conditions and corresponding recall results, and scene identifiers corresponding to each generalized recall condition.

If the scene identification is missed, the local cache is indicated that the corresponding generalized spam data under the service scene is not stored in the local cache. At this time, generalized recall is required according to scene features, if recall is successful, the flow can be continuously executed based on the obtained generalized data, if failure occurs, the spam process is ended, a preset spam page is displayed for the user, for example, a web page crash is displayed, and a refresh retry is requested.

If the scene identifier misses the second cache data in the local cache, a generalized recall condition is constructed according to the scene feature, as shown in fig. 2. Based on the generalized recall condition, an attempt is made to perform generalized recall in the recommendation pool. If the attempt is successful, continuing the flow in the service execution process, and writing the obtained recall result into a local cache in the spam data writing process to be used as the generalized spam data under the scene identification.

In addition, as shown in fig. 4, in addition to the writing of the generalized spam data in the service execution process, because the scene features are relatively fixed, in the data processing process, even if the personalized data is recalled through personalization at this time, the generalized recall condition can still be formed according to the scene features even when the corresponding service request is not received, if the update time corresponding to the service scene at this time expires, the obtained recall result is taken as the generalized spam data after the recall is successful based on the generalized recall condition, if the generalized spam data under the scene identifier does not exist in the local cache at this time, the generalized spam data can be directly written, and if the generalized spam data exists, the generalized recall condition can be updated.

In one or more embodiments of the present disclosure, it is mentioned above that the first personalized recall condition needs to be perfected to obtain the second personalized recall condition, so that the personalized recall of this time is more accurate.

Specifically, after determining content to be finally displayed to the user (for example, commodities displayed to the user in the e-commerce platform) through steps of recall, coarse ranking, fine ranking, rearrangement and the like, the content is displayed to the user at a client of the user. For the presented content, it is implemented by a paging request. And according to the final rearranged order, each page request sequentially comprises partial contents, each time the user slides down to the bottom in the client, after browsing the corresponding contents of the page, the request is sent to the server, the server stores the browsed data of the page request into the distributed cache, and then feeds back the partial contents corresponding to the next page request to the client so as to continue to be displayed on the client.

Based on the above, the user session dimension corresponding to the current session contained in the first cache data is determined. The user session dimension refers to the requested page that is stored in the distributed cache. Based on the user session dimensions, the specified data that has been exposed to the user may be predicted.

Generating shielding conditions according to the specified data, perfecting the first personalized recall conditions according to the shielding conditions, and obtaining second personalized recall conditions. For example, the masking condition may be such that it includes specified data determined by "not including", "not containing", or the like. Thus, when personalized recall is performed through the second personalized recall condition, the corresponding recall result does not contain the specified data.

Therefore, in the recall result of the second personalized recall condition, the content browsed by the user cannot exist, so that the situation that the user browses repeatedly is prevented, and the user experience is ensured.

Further, the pre-estimation of the specified data that has been exposed to the user can be achieved from two directions.

Based on the user session dimension, a number of corresponding request pages stored in the first cached data are determined.

In a first aspect, first data contained in other request pages than a last request page (last in chronological order of the storing) is determined. The client paging request server acquires data and caches the data in the client. Taking an e-commerce platform as an example, assuming that 20 commodities are requested in each request page, each screen may only display 6 commodities due to the difference between the size and the height of the mobile phone screen of the user, the user needs to slide down and browse, and then all 20 commodities in the request page can be exposed. Therefore, the data in the first data is browsed by the user and can be used as part of the specified data.

In a second aspect, for the last request page, the user is still currently browsing, and no new request is sent because it is not browsing.

At this time, the sending time corresponding to the last request page and the browsing duration between the current time are determined. The longer the browsing duration, the more time the user has been in the last requested page, and the more content he is browsing. Based on the browsing duration and the obtained average browsing speed of the user (the average browsing speed of a single user under the normal browsing condition can be calculated through big data after data acquisition, for example, the browsing speed can be calculated according to 'parts per second' for an e-commerce platform), the corresponding browsing amount can be obtained by multiplying the browsing duration, based on the browsing amount, the content with the matched number is selected from front to back according to the arrangement sequence of the content in the last request page, and the second data can also be used as a part of data in the appointed data.

Combining the first data with the second data, the appointed data exposed to the user can be estimated.

For the scheme of directly carrying out the generalized spam based on the local cache, it is difficult to record the exposed data of the user session dimension, so once the generalized spam is directly adopted because of abnormality, in order to avoid subsequent repetition, only the link of the generalized spam can be always carried away, but in the embodiment, the occurrence of the situation can be avoided based on the user session dimension recorded in the distributed cache, so that the personalized spam data only affects the current page, has no influence on the subsequent page, and ensures the user experience. For the non-home page, the personalized spam data of the user session dimension is cached, so that the 'no sense of spam' can be realized based on the personalized spam data, and the personalized recommendation effect is hardly influenced.

In one or more embodiments of the present disclosure, the foregoing requests for the user session identifier miss the distributed cache, and the service request for the personalized data is not successfully recalled, and then a generalized recall stage is entered, where even if the recall is successful, the recall is generalized data or generalized spam data is obtained, which easily affects the user experience.

Based on the above, after the user session identifier misses the distributed cache and the service request of the personalized data is not successfully recalled, the user session identifier does not directly enter a generalized recall stage, but builds a corresponding user identifier according to the user characteristics. As already mentioned above, after the user ID and the session ID are spliced, the user session identifier may be obtained, and at this time, the corresponding user identifier is generated only by the user feature (i.e., the user ID) therein, and the user identifier can uniquely determine the user.

The distributed cache is accessed based on the user identifier, and since the distributed cache stores all user session identifiers, which are composed of a user ID and a session ID, if the distributed cache stores the user session identifier generated by the user in the previous session, the user identifier can hit the user ID (that is, a part of the character string used to represent the user feature) in the user session identifier generated in the previous session, and this part of the character string is referred to as a specified character string.

If the user identification can hit the appointed character string stored in the distributed cache, the corresponding first cache data provides personalized spam data for the service request according to the user session identification to which the appointed character string belongs.

At this time, although the personalized spam data provided for the user is not the personalized spam data corresponding to the current session, the personalized spam data still is based on the personalized data generated by the user in the previous session, and is more in accordance with the preference of the user compared with the generalized data, so that the experience of the user is better.

In one or more embodiments of the present disclosure, the generalized spam data may be updated based on an update time, which may be preset, may be a fixed time, or may be set to different times based on different scenarios.

Specifically, when the service scene corresponds to the power Shang Ping station, the scene feature corresponding to the service scene is determined. Generally, each business scenario includes a plurality of scenario features, and each scenario feature includes a certain number of commodities, and at this time, a commodity change speed under the scenario feature is selected within a certain time (for example, a week). The commodity change speed refers to the commodity change speed increased by 1 every time one commodity is newly added (including the forms of putting on shelf, pre-selling and the like) or one commodity is reduced (including the forms of putting off shelf, selling off the like), so that the total number of the commodities changed under the scene characteristic can be counted in a week. The commodity change rate may be recalculated once a year or every half year.

At this time, according to commodity change speeds of all scene features corresponding to the service scene, selecting the highest commodity change speed, and generating corresponding update time for the service scene. For the business scene, the higher the commodity change speed is, the more the commodity is increased or decreased, and the more frequent updating of the generalized spam data is needed to ensure the high-quality generalized spam data in the business scene, so that the shorter the updating time is.

Because each service scene contains a plurality of scene features, a large number of service scenes are generated, and if all the service scenes are counted for corresponding update time, a large amount of workload is generated. Therefore, only for each scene feature (based on actual requirements, the scene feature can be selected again, for example, the scene feature corresponding to the category can be selected from the major category, the middle category, the minor category, the commodity detail and the like where the business scene is located based on the requirements, and the like, as the corresponding scene feature), the commodity change speed under the scene feature is collected, so that the required workload can be reduced. And finally selecting the highest commodity change speed, namely selecting the scene feature with the fastest change to represent the service scene, so that the situation that when an average value is selected, most scene features change slowly and a small part of scene features change very high, the final average value is lower, the update time is longer, and the user experience is influenced is prevented.

In one or more embodiments of the present disclosure, after a data recall, a coarse-rank, fine-rank, and reorder process is required in sequence.

Wherein the coarse rank, after having obtained the recall result, selects a portion of the material (typically thousands of items) to send to the fine rank module. Coarse drainage can be understood as a round of filtering mechanism before fine drainage, and the pressure of a fine drainage module is relieved. The coarse row is between recall and fine row, and the general model cannot be too complex to achieve both precision and low delay.

And after the fine ranking obtains the ranking result of the coarse ranking, scoring and ranking the candidate sets. Precision alignment is required to ensure the accuracy of scoring under the condition of maximum time delay permission, and the construction of the precision alignment generally needs to involve three parts of a sample, a feature and a model.

After the ranking result of the fine ranking is obtained, fine tuning is performed again based on an operation strategy, diversity, context and the like, for example, in an e-commerce platform, weights of certain specific types of commodities are lifted in certain specific shopping knots, and category scattering, same-picture scattering, same-seller scattering and other measures for guaranteeing user experience are performed.

If the recall fails, the obtained spam data (including personalized spam data and generalized spam data) needs to be sent to the coarse-ranking module located downstream for the coarse-ranking module to perform the first round of screening.

Compared with personalized data successfully obtained through personalized recall, the data quality of the personalized spam data is higher, but a certain gap exists. Therefore, when the spam data are personalized spam data, the residual quantity of the coarse-ranking module during screening is improved (for example, the quantity of materials is increased by 20% -30%), so that the first residual quantity of the coarse-ranking module after screening the personalized spam data is higher than the second residual quantity of the personalized data obtained through personalized recall, and on the premise that the workload of the coarse-ranking module is not increased (only the quantity of the output is increased and the calculation process is not changed), the accurate ranking of as many materials as possible can be ensured, the probability of hit user preference is increased, and the user experience is improved.

Based on the same thought, one or more embodiments of the present disclosure further provide apparatuses and devices corresponding to the above method, as shown in fig. 5 and fig. 6.

Fig. 5 is a schematic structural diagram of a data spam device based on multi-level cache distribution according to one or more embodiments of the present disclosure, where the device includes:

the request receiving module 502 receives a service request generated based on a service scene and acquires corresponding user characteristics according to the service request;

A first identifier construction module 504, configured to construct a user session identifier corresponding to the current session according to the user characteristics;

the personalized spam module 506 accesses the distributed cache based on the user session identifier, and provides personalized spam data for the service request according to the first cache data when the user session identifier hits the first cache data in the distributed cache;

the second identifier construction module 508 is configured to acquire a scene feature corresponding to the service scene if the user session identifier misses the first cached data and fails to perform personalized recall according to the user feature, and construct a corresponding scene identifier according to the scene feature;

the generalized spam module 510 accesses a local cache based on the scene identifier, and provides generalized spam data for the service request according to the second cache data when the scene identifier hits the second cache data in the local cache.

Optionally, the personalized spam module 506 determines a first personalized recall condition that is generated in advance according to the user characteristics;

judging whether the user session identifier hits the first cache data in the distributed cache;

If the first personalized recall condition is hit, perfecting the first personalized recall condition based on the first cache data to obtain a second personalized recall condition;

based on the second personalized recall condition, attempting to perform personalized recall in a recommendation pool;

if the attempt fails, personalized spam data is provided for the service request according to a recall result contained in the first cache data.

Optionally, the personalized spam module 506 tries to perform personalized recall in a recommendation pool based on the first personalized recall condition if the user session identifier does not hit the first cache data in the distributed cache, and after the attempt is successful, uses the obtained recall result as personalized data corresponding to the service request, and writes the personalized recall result into the distributed cache as personalized spam data;

if the personalized recall is successfully performed in the recommendation pool based on the second personalized recall condition, the obtained recall result is used as personalized data corresponding to the service request, and the recall result contained in the distributed cache is updated to update the personalized spam data.

Optionally, the personalized spam module 506 determines a user session dimension corresponding to the current session, where the user session dimension is included in the first cached data;

Estimating appointed data which is exposed to a user based on the user session dimension;

generating a shielding condition according to the appointed data, perfecting the first personalized recall condition according to the shielding condition, and obtaining a second personalized recall condition so as to remove the appointed data from recall results corresponding to the second personalized recall condition.

Optionally, the personalized spam module 506 determines, based on the user session dimension, a number of corresponding request pages that have been stored in the first cached data;

determining first data contained in other request pages except the last request page in the plurality of request pages;

determining second data contained in the last request page based on the browsing duration between the corresponding sending time and the current time and the preset average browsing speed of the user aiming at the last request page;

and combining the first data and the second data to serve as estimated specified data which is exposed to the user.

Optionally, before acquiring the scene feature corresponding to the service scene and constructing the corresponding scene identifier according to the scene feature, the second identifier construction module 508 further includes:

Constructing a corresponding user identifier according to the user characteristics;

accessing the distributed cache based on the user identification;

determining that the user identifier hits a designated character string in the distributed cache, wherein the designated character string refers to a part of character strings used for representing user characteristics in the stored user session identifiers in the distributed cache;

and providing personalized spam data for the service request according to the user session identifier to which the specified character string belongs and the corresponding first cache data.

Optionally, the generalized spam module 510 accesses a local cache based on the scene identifier, and determines whether the scene identifier hits second cache data in the local cache;

if the service request is hit, based on the second cache data, generalized spam data is provided for the service request;

if the service request is not hit, the generalized recall is tried according to the scene characteristics, and after the successful attempt, generalized data corresponding to the service request is obtained.

Optionally, the generalized spam module 510 constructs a generalized recall condition according to the scene feature if the scene identifier misses the second cache data in the local cache, tries to perform generalized recall in a recommendation pool based on the generalized recall condition, and writes the obtained recall result into the local cache as generalized spam data after the attempt is successful;

If the update time corresponding to the service scene expires, constructing a generalized recall condition according to the scene characteristics, attempting to perform generalized recall in a recommendation pool based on the generalized recall condition, and writing an obtained recall result into the local cache after the successful attempt to serve as generalized spam data.

Optionally, the generalized spam module 510 determines a scene feature corresponding to the service scene if the service scene belongs to an e-commerce platform;

determining, for each scene feature, a commodity change speed under the scene feature;

and selecting the highest commodity change speed according to commodity change speeds of all scene features corresponding to the service scene, and generating corresponding update time for the service scene, wherein the higher the highest commodity change speed is, the shorter the update time is.

Optionally, a coarse row module 512 is also included;

the coarse row module 512 sends spam data to a coarse row module located downstream, so that the coarse row module screens according to the spam data, where the spam data includes: the personalized spam data or the generalized spam data;

When the spam data are the personalized spam data, the coarse arrangement module screens the personalized spam data to obtain first residual quantity which is higher than second residual quantity which is obtained by personalized recall and is screened by the personalized spam data.

Fig. 6 is a schematic structural diagram of a data spam device based on multi-level cache distribution according to one or more embodiments of the present disclosure, where the device includes:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

Based on the same considerations, one or more embodiments of the present specification further provide a non-volatile computer storage medium corresponding to the above method, storing computer-executable instructions configured to:

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that the present description may be provided as a method, system, or computer program product. Accordingly, the present specification embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description embodiments may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, devices, non-volatile computer storage medium embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the section of the method embodiments being relevant.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The foregoing is merely one or more embodiments of the present description and is not intended to limit the present description. Various modifications and alterations to one or more embodiments of this description will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of one or more embodiments of the present description, is intended to be included within the scope of the claims of the present description.

Claims

1. A data spam method based on multi-level cache distribution comprises the following steps:

2. The method of claim 1, wherein when the user session identifier hits the first cache data in the distributed cache, providing personalized spam data for the service request according to the first cache data, specifically comprising:

Determining a first personalized recall condition which is generated in advance according to the user characteristics;

3. The method of claim 2, wherein providing personalized spam for the service request according to the recall result included in the first cached data, specifically comprises:

if the user session identifier does not hit the first cache data in the distributed cache, based on the first personalized recall condition, attempting to carry out personalized recall in a recommendation pool, and after the attempt is successful, taking an obtained recall result as personalized data corresponding to the service request, and writing the personalized data into the distributed cache as personalized spam data;

4. The method of claim 2, wherein the perfecting the first personalized recall condition based on the first cache data to obtain a second personalized recall condition, specifically comprises:

determining the user session dimension corresponding to the current session, which is contained in the first cache data;

5. The method according to claim 4, wherein the predicting the specified data that has been exposed to the user based on the user session dimension specifically comprises:

determining a plurality of corresponding request pages stored in the first cache data based on the user session dimension;

6. The method of claim 1, wherein before the obtaining the scene features corresponding to the service scene and constructing the corresponding scene identifier according to the scene features, the method further comprises:

accessing the distributed cache based on the user identification;

7. The method of claim 1, wherein the accessing the local cache based on the scene identifier, and providing generalized spam data for the service request according to the second cache data when the scene identifier hits the second cache data in the local cache, specifically comprises:

accessing a local cache based on the scene identifier, and judging whether the scene identifier hits second cache data in the local cache or not;

8. The method of claim 7, wherein the providing generalized spam data for the service request based on the second cached data specifically comprises:

if the scene identifier is not hit in the second cache data in the local cache, constructing a generalized recall condition according to the scene characteristics, attempting to perform generalized recall in a recommendation pool based on the generalized recall condition, and writing an obtained recall result into the local cache after the attempt is successful to serve as generalized spam data;

9. The method of claim 8, wherein the update time corresponding to the service scenario expires, specifically comprising:

if the service scene belongs to the e-commerce platform, determining scene characteristics corresponding to the service scene;

10. The method of claim 1, the method further comprising:

transmitting the spam data to a coarse row module positioned at the downstream so that the coarse row module screens according to the spam data, wherein the spam data comprises: the personalized spam data or the generalized spam data;

11. A data spam based on multi-level cache distribution, comprising:

12. The apparatus of claim 11, the personalized spam module to determine a first personalized recall condition that was previously generated based on the user characteristics;

13. The apparatus of claim 12, wherein the personalized spam module is configured to attempt personalized recall in a recommendation pool based on the first personalized recall condition if the user session identifier misses a first cache data in the distributed cache, and after the attempt is successful, to write an obtained recall result as personalized data corresponding to the service request into the distributed cache as personalized spam data;

14. The apparatus of claim 12, wherein the personalized spam module determines a user session dimension corresponding to the current session, where the user session dimension is included in the first cached data;

15. The apparatus of claim 14, the personalized spam module to determine corresponding ones of the request pages stored in the first cached data based on the user session dimension;

16. The apparatus of claim 11, the second identifier construction module, before obtaining a scene feature corresponding to the service scene and constructing a corresponding scene identifier according to the scene feature, further comprises:

accessing the distributed cache based on the user identification;

17. The apparatus of claim 11, the generalized spam module to access a local cache based on the scene identification and to determine whether the scene identification hits second cache data in the local cache;

18. The apparatus of claim 17, wherein the generalized spam module is configured to construct generalized recall conditions based on the scene characteristics if the scene identification misses the second cache data in the local cache, to attempt generalized recall in a recommendation pool based on the generalized recall conditions, and to write the resulting recall result into the local cache as generalized spam data after the attempt is successful;

19. The apparatus of claim 18, wherein the generalized spam module is configured to determine scene features corresponding to the business scene if the business scene belongs to an e-commerce platform;

20. The apparatus of claim 11, further comprising a coarse row module;

the coarse row module sends spam data to a coarse row module positioned at the downstream, so that the coarse row module screens according to the spam data, and the spam data comprises: the personalized spam data or the generalized spam data;

21. A data spam based on multi-level cache distribution, comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,