CN117271463A - Method, apparatus, device and computer readable medium for screening users - Google Patents

Method, apparatus, device and computer readable medium for screening users Download PDF

Info

Publication number
CN117271463A
CN117271463A CN202210663959.8A CN202210663959A CN117271463A CN 117271463 A CN117271463 A CN 117271463A CN 202210663959 A CN202210663959 A CN 202210663959A CN 117271463 A CN117271463 A CN 117271463A
Authority
CN
China
Prior art keywords
user
screening
data
data block
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210663959.8A
Other languages
Chinese (zh)
Inventor
罗勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Wodong Tianjun Information Technology Co Ltd
Priority to CN202210663959.8A priority Critical patent/CN117271463A/en
Publication of CN117271463A publication Critical patent/CN117271463A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device, equipment and a computer readable medium for screening users, and relates to the technical field of computers. One embodiment of the method comprises the following steps: determining a data sheet in the log distributed storage based on a user identification in a user screening request through a stream processing engine so as to locate a search data block corresponding to a screening target value in data blocks of the data sheet; according to the search data block, in a rough set index of a memory, matching a user log storage position by an operation in the user screening request to acquire a target file according to the user log storage position; and screening out the target user of the user screening request by combining the log behavior of the target file with a preset screening rule. The implementation method can accelerate the speed of screening the target users, and then the response speed is improved.

Description

Method, apparatus, device and computer readable medium for screening users
Technical Field
The present invention relates to the field of computer technology, and in particular, to a method, an apparatus, a device, and a computer readable medium for screening users.
Background
RTA is the abbreviation of real time API for meeting the delivery needs of advertisers for real time personalization. RTA gives the flow selection right of the direct-casting advertiser to the advertiser, and generally sends a user identification request to the advertiser in a targeting link so as to screen users, so that the advertiser can judge the casting strategy before advertisement exposure. RTA is essentially a requirement for solving the problem that the advertising system platform cannot be personalized and oriented in real time.
The current mainstream scheme is to cache the data of the user into a redis memory, and after receiving an RTA request, find out whether the target user is hit or not from the redis according to a preconfigured screening rule, namely screen the target user.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art: the time for screening the target user is longer, and the response speed is slow.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, apparatus, device, and computer readable medium for screening users, which can increase the speed of screening target users, and thus increase the response speed.
To achieve the above object, according to one aspect of the embodiments of the present invention, there is provided a method for screening a user, including:
determining a data sheet in the log distributed storage based on a user identification in a user screening request through a stream processing engine so as to locate a search data block corresponding to a screening target value in data blocks of the data sheet;
according to the search data block, in a rough set index of a memory, matching a user log storage position by an operation in the user screening request to acquire a target file according to the user log storage position;
and screening out the target user of the user screening request by combining the log behavior of the target file with a preset screening rule.
The method for determining the data pieces in the log distributed storage by the stream processing engine based on the user identification in the user screening request so as to locate the search data blocks corresponding to the screening target value in the data blocks of the data pieces comprises the following steps:
determining a data sheet corresponding to a user identification in a log distributed storage based on the user identification in a user screening request through a stream processing engine;
and positioning the searching data blocks in the plurality of data blocks of the data sheet according to the screening target value set by the user screening request.
And matching the user log storage position by the operation in the user screening request in the rough set index of the memory according to the search data block to acquire a target file according to the user log storage position, wherein the method comprises the following steps:
responding to the user screening request, and loading the rough set index of the search data block into a memory;
according to the parameters determined by the operation, matching the user log storage position in the rough set index of the memory;
and acquiring a target file according to the user log storage position.
The matching the user log storage location in the rough set index of the memory according to the parameters determined by the operation includes:
and matching the user log storage position with an inverted index in the rough set index of the memory according to the parameters determined by the operation, wherein the parameters determined by the operation comprise the compressed target file.
The compressed target file is obtained by globally mapping and replacing preset data in the target file.
The method further comprises the steps of:
and establishing a rough set index of each data block subordinate to the data piece, wherein the data piece is set by a user identifier, and the data block is set by a screening target value.
The method further comprises the steps of:
if the data block storage data quantity is larger than the data block storage threshold, dividing the data block into a plurality of data blocks according to the data block storage threshold;
and updating the rough set index of the divided data block.
According to a second aspect of an embodiment of the present invention, there is provided an apparatus for screening a user, including:
the positioning module is used for determining the data pieces in the log distributed storage based on the user identification in the user screening request through the stream processing engine so as to position the search data blocks corresponding to the screening target value in the data blocks of the data pieces;
the matching module is used for matching the user log storage position by the operation in the user screening request in the rough set index of the memory according to the search data block so as to acquire a target file according to the user log storage position;
and the screening module is used for screening out the target user of the user screening request by combining the log behaviors of the target file with preset screening rules.
According to a third aspect of an embodiment of the present invention, there is provided an electronic device for screening a user, including:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods as described above.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium having stored thereon a computer program which when executed by a processor implements a method as described above.
One embodiment of the above invention has the following advantages or benefits: determining a data sheet in the log distributed storage based on a user identification in a user screening request through a stream processing engine so as to locate a search data block corresponding to a screening target value in data blocks of the data sheet; according to the search data block, in a rough set index of a memory, matching a user log storage position by an operation in the user screening request to acquire a target file according to the user log storage position; and screening out the target user of the user screening request by combining the log behavior of the target file with a preset screening rule. Because the target file is acquired in the data block by utilizing the rough set index without traversing the database, the speed of screening the target user can be increased, and the response speed is further increased.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic flow diagram of a method of screening users according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a structure of a data slice according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a rough set index structure according to an embodiment of the present invention;
FIG. 4 is a flowchart of a search data block corresponding to a location filtering target value according to an embodiment of the present invention;
FIG. 5 is a flow chart of a process for obtaining a target file according to an embodiment of the invention;
FIG. 6 is a flow diagram of data block splitting according to an embodiment of the invention;
FIG. 7 is a schematic diagram of screening a user's usage scenario according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of an application flow for screening users according to an embodiment of the present invention;
fig. 9 is a main structural diagram of an apparatus for screening a user according to an embodiment of the present invention;
FIG. 10 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 11 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
After receiving the RTA request, searching a target user in the redis, and storing data according to a storage structure in the redis in the following two ways:
storage by user
And according to the screening rule, each user group is calculated in advance, and then the key-value key value pair structure is converted by adopting an inverted mode. The key is a unique user identity identifier, the value is a list of user groups corresponding to the user, and the list is stored in redis. After receiving the RTA request, inquiring a hit user list to match the target user of the advertisement strategy, and finally achieving the aim of hit.
Storing by user tag field
Storing the label attribute of the user in redis, adopting key-value key value pair structure, and adopting key: user identity unique identification and label, value: tag value. Such as: key: whether user 1 is active; value: yes. After receiving the RTA request, combining the screening rule of the advertisement strategy, and acquiring the label attribute of the user from the redis. Such as: target users of this policy: if there is no purchase and is an active user, then two tag values need to be obtained from redis and the target user is screened.
In addition, other storage engines may be used instead of redis, such as: hbase or mongodb. The storage data structure is also described above.
Depending on the user storage, the data volume is relatively small since only the user list is cached. Because the calculation needs to be time-consuming and can not quickly respond to a new advertisement strategy, the next calculation needs to be effective after the completion of the next calculation;
according to the user tag storage, as the unknown target user needs to cache tag attributes of all users, the cache data volume is large, and the storage cost is high.
Other alternatives to k-v storage are as follows: hbase, mongoDB. Although the above structure can also be stored, concurrent carrying capacity and response are time consuming, with a certain gap compared to redis.
In summary, screening the target user takes a long time, resulting in slow response speed.
In order to solve the technical problem that the time consumption for screening target users is long, the following technical scheme in the embodiment of the invention can be adopted.
Referring to fig. 1, fig. 1 is a main flow chart of a method for screening users according to an embodiment of the present invention, and a rough set index is used to obtain a target file in a data block to screen users. As shown in fig. 1, the method specifically comprises the following steps:
s101, determining a data sheet in the log distributed storage based on a user identification in a user screening request through a stream processing engine so as to locate a search data block corresponding to a screening target value in data blocks of the data sheet.
In the screening of users, the screening is triggered mainly in response to a user screening request. Wherein the user screening request includes a user identification and an operation. The user identification is an identification for distinguishing different users. An operation is a specific action of a user in the network space. As one example, operations include browsing items, adding items to a shopping cart, or searching for items.
The user screening request is triggered in response to a user operation. At the same point in time, multiple users operate simultaneously, and then multiple user screening requests are triggered. It will be appreciated that the user screening request belongs to the streaming data.
For streaming data, centralized processing of multiple data over a period of time is typically employed. In the embodiment of the invention, in order to improve the real-time performance of screening users, the users are screened by a stream processing engine. As one example, the stream processing request includes a Flink. At the heart of the Flink is a distributed streaming data stream engine written in Java and Scala. The Flink executes any stream data program in a data parallel and pipeline manner, and the pipeline runtime system of the Flink can execute batch processing and stream processing programs.
In the embodiment of the invention, the rough set index of the data block is utilized in the process of screening the user. That is, in screening users with the coarse set index, the coarse set index needs to be established first.
In one embodiment of the invention, a rough set index is established for each data block that is subordinate to the data piece, the data piece is set with a user identification, and the data block is set with a screening target value.
Specifically, the user data is stored in the storage component, and may also be stored in a plurality of storage physical machines. As one example, the storage component includes a distributed file system (HDFS).
The data set is divided by hash value of the user identification, and the user data is split into a plurality of pieces of data (segments). The data piece is self-explanatory in that the data piece contains version information, a definition of the data structure and a data range of hash values of the primary key in the data piece, i.e. the user identification.
Each data slice contains the user's log behavior and user identification. One data slice includes a plurality of data blocks (blocks). Each data block has an independent rough set index. The data block stores the dimension data of the primary key. The index structure of each data block adopts rough set index, compared with the traditional accurate index, the cache cost of the index can be reduced, and the function of accurately positioning the target file is realized.
Referring to fig. 2, fig. 2 is a schematic diagram of a structure of a data slice according to an embodiment of the present invention. In fig. 2 schame is the organization and structure of the data sheet. version is the version identification of the piece of data. MetaDate is the metadata of a piece of data. The data slice in fig. 2 comprises n data blocks. For a block of data, the data is stored in a bitmap. Each data block corresponds to a coarse set index.
Referring to fig. 3, fig. 3 is a schematic diagram of a rough set index structure according to an embodiment of the present invention. The rough set index includes three parts, a bitmap position, a data value range, and whether data is stored. User data is stored in the bitmap, and the data value range corresponds to the bitmap position. In addition, 0 represents stored user data in the bitmap, and 1 represents not stored user data in the bitmap.
The rough set index types are also correspondingly classified into two types according to the data types of the primary key: number type and string type.
For the number type primary key, the data block stores a maximum value and a minimum value of the number type primary key.
For the string type primary key, high-order priority (MSD) ordering is adopted to obtain maximum and minimum values. Then, the primary key between the maximum value and the minimum value of the string type primary key is divided into a plurality of sections, and each section corresponds to a bit in the bitMap, which is denoted by 1 bit. Finally, the numerical value of the string type primary key is mapped into the interval.
With continued reference to FIG. 3, a value of 0 in the third column indicates that one or more rows of data exist for the interval; the value 1 in the third column indicates that no data is present in this interval. In the embodiment of the invention, the index of the primary key can be completed only by storing the maximum value, the minimum value and the value sequence for the primary key. Wherein the value sequence occupies 6 bits. The value sequence is the third column of data in fig. 3.
Since the rough set index is small enough, it can be loaded into the memory at low cost. Can still work effectively in a large data volume scene. The user data is usually sequential, the number of primary key values is usually smaller, and the rough set index table in fig. 3 is adopted, so that the efficiency of searching the user data can be improved.
Referring to fig. 4, fig. 4 is a schematic flow chart of a search data block corresponding to a positioning filtering target value according to an embodiment of the present invention. The method specifically comprises the following steps:
s401, determining a data sheet corresponding to the user identification in the log distributed storage based on the user identification in the user screening request through the stream processing engine.
The data pieces are stored in a log distributed store and the user screening request is processed by a stream processing engine. Specifically, the user identification in the user screening request is taken as a basis, and the data sheet corresponding to the user identification is determined in the log distributed storage.
S402, locating and searching the data blocks in the plurality of data blocks of the data sheet according to the screening target value set by the user screening request.
The data sheet belongs to a plurality of data blocks, and each data block stores dimension data corresponding to the primary key. The dimension data is data of a user in a preset dimension. The dimension is a preset parameter. As one example, dimensions include shopping carts, browsing, and purchasing. The corresponding dimension data includes: item data in the shopping cart, browse item data, and purchase item data.
In the embodiment of the invention, the search data block is positioned according to the screening target value set by the user screening request. The screening target value is a parameter set in the user screening request. As one example, the screening target value includes item data in a shopping cart.
That is, the filter target value corresponds to the dimension data item. Among the plurality of data blocks of the data sheet, a data block of a dimension corresponding to the filtering target value, that is, a search data block is located. Searching for a data block is a data block that is successfully located in the data block according to the screening target value.
In the embodiment of fig. 4, a stream processing engine is employed to increase the speed of screening users for the nature of the user screening request. Storing user data in the form of data blocks can effectively filter out data blocks that are not related to user screening requests.
S102, according to the search data block, in the rough set index of the memory, matching the user log storage position by the operation in the user screening request so as to acquire the target file according to the user log storage position.
Locating the search data block allows the target file to be obtained using the rough set index of the search data block without traversing all files in the search data block.
Referring to fig. 5, fig. 5 is a schematic flow chart of acquiring a target file according to an embodiment of the present invention. The method specifically comprises the following steps:
s501, responding to a user screening request, and loading a rough set index of a search data block into a memory.
After the user screening request is received and the search data block is determined, the rough set index of the search data block can be loaded into the memory because the data volume of the rough set index is smaller, so that the speed of searching the rough set index is improved.
S502, according to the parameters determined by operation, matching the user log storage position in the rough set index of the memory.
Parameters determined by the operation are included in the user screening request, including, as one example, a click time point, a browsing duration, and a collection duration. Such as: the operations include browsing items, and the parameters include browsing duration. That is, the parameters can describe the operation accurately. Furthermore, according to the parameters determined by the operation, the user log storage position is matched in the rough set index of the memory.
S503, acquiring a target file according to the user log storage position.
And acquiring the storage position of the user log in the data block according to the rough set index in the memory, and acquiring the target file according to the storage position of the user log.
In one embodiment of the invention, to increase the speed of searching in the coarse set index, an inverted index may also be employed to match the user log storage location. That is, according to the parameters determined by the operation, the inverted index is adopted to match the user log storage position in the rough set index of the memory. Wherein the parameters determined by the operation include the compressed object file.
The purpose of the inverted index is to: and inquiring the user log storage position according to the compressed target file. Compression is required because the data size of the target file is large. As one example, the parameters determined by the operation, i.e., the compressed object file, include: user a is shopping for items in the cart for 3 months. And inquiring the user log storage position based on the compressed target file.
In one embodiment of the present invention, the compressed target file is a file obtained by globally mapping the preset data in the replacement target file. That is, the global mapping mode is adopted as the encoding mode, and the compressed target file is obtained by replacing preset characters in the target file.
Specifically, compressing the target file, taking the compressed target file as a parameter determined by operation to pass through the compressed target file, and adopting the inverted index to match the user log storage position.
Specifically, global replacement is performed on preset characters with higher occurrence frequency in the target file, so that the volume of the bitMap is reduced. Such as: characters are as follows: "weather today is weather fit for outdoor activities" is replaced by the character "a1". Alternative mappings are stored in metadata for segments to ensure self-descriptive of individual segments. The preset characters with higher occurrence frequency refer to characters with occurrence frequency higher than a preset occurrence frequency threshold value.
In one embodiment of the invention, the amount of data stored by the data block is fixed. When the data block storage data amount is larger than the data block storage threshold value, the splitting of the data block is triggered.
Referring to fig. 6, fig. 6 is a schematic flow chart of data block splitting according to an embodiment of the present invention. The method specifically comprises the following steps:
s601, dividing the data block into a plurality of data blocks according to the data block storage threshold when the data block storage data amount is larger than the data block storage threshold.
And if the data block storage data quantity is larger than the data block storage threshold, dividing the data block into a plurality of data blocks according to the data block storage threshold. That is, when the write data amount of one data block exceeds the data block storage threshold, self-splitting of the data block is triggered, and the splitting mode is equal division. And by adopting an equipartition splitting mode, the volume balance of the data block and the volume balance of the index corresponding to the rough set can be ensured.
S602, updating the rough set index of the divided data block.
The divided data block is a new data block, and then the rough set index of the divided data block needs to be updated.
As one example, the data block storage threshold is 10 lines of data, and a data block contains 19 lines of data, then the data from rank 11 is split back and forth. The two split data blocks will construct new rough set index, and update the version of segment to make the data point to the two data blocks, and finally delete the original data block.
In the embodiment of fig. 6, the data block may also be divided into a plurality of data blocks, thereby increasing the amount of data stored.
S103, screening out target users of the user screening request by combining the log behaviors of the target files with preset screening rules.
The target file stores the log behavior of the user. As one example, the log behavior includes real-time shopping data, real-time traffic data, IP addresses, user hardware model numbers, and historical data. The screening rule is a rule for presetting a screening user. As one example, the screening rules include women over 60 years old.
The user characteristics can be obtained based on the log behaviors of the target file, and then the target users of the user screening requests are screened out together with preset screening rules.
In the above embodiment, the stream processing engine determines, based on the user identifier in the user screening request, a data slice in the log distributed storage, so as to locate, among the data blocks of the data slice, a search data block corresponding to the screening target value; according to the search data block, in a rough set index of a memory, matching a user log storage position by an operation in the user screening request to acquire a target file according to the user log storage position; and screening out the target user of the user screening request by combining the log behavior of the target file with a preset screening rule. Because the target file is acquired in the data block by utilizing the rough set index without traversing the database, the speed of screening the target user can be increased, and the response speed is further increased.
Referring to fig. 7, fig. 7 is a schematic view of usage scenarios of screening users according to an embodiment of the present invention. In FIG. 7, the user's data in the advertising media is returned, i.e., the user click log is returned. The user click log comprises three parts, namely: behavior data in a user advertisement master station, a click data structure construction layer and user information data in the user station. The user click log is stored into the data block subordinate to the data slice through the stream processing layer, i.e., the Flink.
As one example, user advertising master site behavior data includes shopping data within the advertising master site. The click data structure construction layer includes an IP address and hardware information of the user. The user information data in the subscriber station includes purchase records and browse records.
The advertising media screens the user through the RTA. Specifically, by adopting the technical scheme in the embodiment of the invention based on the user screening request, log behaviors are obtained in the data block, and the target user is screened out by combining with a preset screening rule.
Referring to fig. 8, fig. 8 is a schematic diagram of an application flow for screening users according to an embodiment of the present invention. The method specifically comprises the following steps:
s801, the advertisement media sends a user screening request to an advertiser.
Advertising media to facilitate advertisers in selecting users, user screening requests are sent to advertisers. Wherein the user screening request includes a user identification.
S802, positioning the search data block through a stream processing engine.
After the advertiser knows the user identification, it is necessary to screen the user based on the user's log behavior. The search data block where the user target file is located needs to be located by the stream processing engine.
S803, acquiring the target file in the search data block.
The target file of the user is stored in the search data block, and the target file can be acquired according to the rough set index in the search data block.
S804, acquiring log behaviors.
And acquiring log behaviors from the target file.
S805, screening out target users.
And matching the screening rules set by the advertisement putting strategy. And judging whether the log behavior accords with the screening rule. As one example, the screening rules include: there is a commodity plus shopping cart behavior and the age is less than 30 years old.
Through the embodiment of FIG. 8, advertisers may filter out target users that meet the filter criteria.
Referring to fig. 9, fig. 9 is a schematic main structure diagram of a user screening device according to an embodiment of the present invention, where the user screening device may implement a user screening method, and as shown in fig. 9, the user screening device specifically includes:
the positioning module 901 is configured to determine, by using the stream processing engine, a data slice in the log distributed storage based on a user identifier in the user screening request, so as to position, among data blocks of the data slice, a search data block corresponding to the screening target value;
the matching module 902 is configured to match, according to the search data block, a user log storage location in a rough set index of a memory by an operation in the user screening request, so as to obtain a target file according to the user log storage location;
and the screening module 903 is configured to screen out the target user of the user screening request by combining the log behavior of the target file with a preset screening rule.
In one embodiment of the present invention, the positioning module 901 is specifically configured to determine, by using a stream processing engine, a data slice corresponding to a user identifier based on the user identifier in a user screening request in log distributed storage;
and positioning the searching data blocks in the plurality of data blocks of the data sheet according to the screening target value set by the user screening request.
In one embodiment of the present invention, the matching module 902 is specifically configured to load the rough set index of the search data block into the memory in response to the user screening request;
according to the parameters determined by the operation, matching the user log storage position in the rough set index of the memory;
and acquiring a target file according to the user log storage position.
In one embodiment of the present invention, the matching module 902 is specifically configured to match, in the rough set index of the memory, the user log storage location with an inverted index according to a parameter determined by the operation, where the parameter determined by the operation includes a compressed target file.
In one embodiment of the present invention, the compressed target file is a file obtained by replacing preset data in the target file with a global map.
In one embodiment of the present invention, the positioning module 901 is further configured to establish a rough set index of each data block subordinate to a data slice, where the data slice is set by a user identifier, and the data block is set by a screening target value.
In one embodiment of the present invention, the positioning module 901 is further configured to divide a data block into a plurality of data blocks according to a data block storage threshold if a data block storage data amount is greater than the data block storage threshold;
and updating the rough set index of the divided data block.
Fig. 10 illustrates an exemplary system architecture 1000 to which a method of screening a user or a device of screening a user of an embodiment of the present invention may be applied.
As shown in fig. 10, a system architecture 1000 may include terminal devices 1001, 1002, 1003, a network 1004, and a server 1005. The network 1004 serves as a medium for providing a communication link between the terminal apparatuses 1001, 1002, 1003 and the server 1005. The network 1004 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user can interact with a server 1005 via a network 1004 using terminal apparatuses 1001, 1002, 1003 to receive or transmit messages or the like. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 1001, 1002, 1003.
The terminal devices 1001, 1002, 1003 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 1005 may be a server providing various services, such as a background management server (merely an example) providing support for shopping-type websites browsed by the user using the terminal apparatuses 1001, 1002, 1003. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.
It should be noted that, the method for screening users provided in the embodiment of the present invention is generally executed by the server 1005, and accordingly, the device for screening users is generally disposed in the server 1005.
It should be understood that the number of terminal devices, networks and servers in fig. 10 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 11, there is illustrated a schematic diagram of a computer system 1100 suitable for use in implementing the terminal device of an embodiment of the present invention. The terminal device shown in fig. 11 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 11, the computer system 1100 includes a Central Processing Unit (CPU) 1101, which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the system 1100 are also stored. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.
The following components are connected to the I/O interface 1105: an input section 1106 including a keyboard, a mouse, and the like; an output portion 1107 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1108 including a hard disk or the like; and a communication section 1109 including a network interface card such as a LAN card, a modem, and the like. The communication section 1109 performs communication processing via a network such as the internet. The drive 1110 is also connected to the I/O interface 1105 as needed. Removable media 1111, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in drive 1110, so that a computer program read therefrom is installed as needed in storage section 1108.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1109, and/or installed from the removable media 1111. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 1101.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes a positioning module, a matching module, and a screening module. The names of these modules do not constitute a limitation on the module itself in some cases, for example, the positioning module may also be described as "a module for determining, by the stream processing engine, a data piece in the log distributed storage based on the user identifier in the user screening request, so as to position, among the data pieces of the data piece, a search data piece corresponding to the screening target value".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include:
determining a data sheet in the log distributed storage based on a user identification in a user screening request through a stream processing engine so as to locate a search data block corresponding to a screening target value in data blocks of the data sheet;
according to the search data block, in a rough set index of a memory, matching a user log storage position by an operation in the user screening request to acquire a target file according to the user log storage position;
and screening out the target user of the user screening request by combining the log behavior of the target file with a preset screening rule.
According to the technical scheme of the embodiment of the invention, the stream processing engine is used for determining the data sheet in the log distributed storage based on the user identification in the user screening request so as to locate the search data block corresponding to the screening target value in the data blocks of the data sheet; according to the search data block, in a rough set index of a memory, matching a user log storage position by an operation in the user screening request to acquire a target file according to the user log storage position; and screening out the target user of the user screening request by combining the log behavior of the target file with a preset screening rule. Because the target file is acquired in the data block by utilizing the rough set index without traversing the database, the speed of screening the target user can be increased, and the response speed is further increased.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of screening a user, comprising:
determining a data sheet in the log distributed storage based on a user identification in a user screening request through a stream processing engine so as to locate a search data block corresponding to a screening target value in data blocks of the data sheet;
according to the search data block, in a rough set index of a memory, matching a user log storage position by an operation in the user screening request to acquire a target file according to the user log storage position;
and screening out the target user of the user screening request by combining the log behavior of the target file with a preset screening rule.
2. The method according to claim 1, wherein the determining, by the stream processing engine, the data slice in the log distributed storage based on the user identifier in the user screening request to locate the search data block corresponding to the screening target value among the data blocks of the data slice includes:
determining a data sheet corresponding to a user identification in a log distributed storage based on the user identification in a user screening request through a stream processing engine;
and positioning the searching data blocks in the plurality of data blocks of the data sheet according to the screening target value set by the user screening request.
3. The method according to claim 1, wherein the step of matching the user log storage location by the operation in the user screening request in the rough set index of the memory according to the search data block to obtain the target file according to the user log storage location comprises:
responding to the user screening request, and loading the rough set index of the search data block into a memory;
according to the parameters determined by the operation, matching the user log storage position in the rough set index of the memory;
and acquiring a target file according to the user log storage position.
4. A method of screening users according to claim 3, wherein said matching said user log storage locations in said rough set index of memory according to said parameters determined by said operation comprises:
and matching the user log storage position with an inverted index in the rough set index of the memory according to the parameters determined by the operation, wherein the parameters determined by the operation comprise the compressed target file.
5. The method of claim 4, wherein the compressed target file is a file obtained by replacing preset data in the target file with a global map.
6. The method of screening a user of claim 1, further comprising:
and establishing a rough set index of each data block subordinate to the data piece, wherein the data piece is set by a user identifier, and the data block is set by a screening target value.
7. The method of screening a user of claim 6, further comprising:
if the data block storage data quantity is larger than the data block storage threshold, dividing the data block into a plurality of data blocks according to the data block storage threshold;
and updating the rough set index of the divided data block.
8. An apparatus for screening a user, comprising:
the positioning module is used for determining the data pieces in the log distributed storage based on the user identification in the user screening request through the stream processing engine so as to position the search data blocks corresponding to the screening target value in the data blocks of the data pieces;
the matching module is used for matching the user log storage position by the operation in the user screening request in the rough set index of the memory according to the search data block so as to acquire a target file according to the user log storage position;
and the screening module is used for screening out the target user of the user screening request by combining the log behaviors of the target file with preset screening rules.
9. An electronic device for screening a user, comprising:
one or more processors;
storage means for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.
10. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.
CN202210663959.8A 2022-06-13 2022-06-13 Method, apparatus, device and computer readable medium for screening users Pending CN117271463A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210663959.8A CN117271463A (en) 2022-06-13 2022-06-13 Method, apparatus, device and computer readable medium for screening users

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210663959.8A CN117271463A (en) 2022-06-13 2022-06-13 Method, apparatus, device and computer readable medium for screening users

Publications (1)

Publication Number Publication Date
CN117271463A true CN117271463A (en) 2023-12-22

Family

ID=89206842

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210663959.8A Pending CN117271463A (en) 2022-06-13 2022-06-13 Method, apparatus, device and computer readable medium for screening users

Country Status (1)

Country Link
CN (1) CN117271463A (en)

Similar Documents

Publication Publication Date Title
US11711420B2 (en) Automated management of resource attributes across network-based services
US11979433B2 (en) Highly scalable four-dimensional web-rendering geospatial data system for simulated worlds
US10853847B2 (en) Methods and systems for near real-time lookalike audience expansion in ads targeting
US10686862B2 (en) Apparatus and method for low-latency message request/response processing
US10318987B2 (en) Managing cookie data
CN102656570B (en) For method and the server of buffer memory
CN111046237B (en) User behavior data processing method and device, electronic equipment and readable medium
CN110300084B (en) IP address-based portrait method and apparatus, electronic device, and readable medium
CN110297995B (en) Method and device for collecting information
CN112835904A (en) Data processing method and data processing device
CN112256772A (en) Data service method, device and readable storage medium
CN112784152A (en) Method and device for marking user
CN109753424B (en) AB test method and device
CN111401684A (en) Task processing method and device
CN108416645B (en) Recommendation method, device, storage medium and equipment for user
CN111753019A (en) Data partitioning method and device applied to data warehouse
CN107357557B (en) Information updating method and device
US20150347112A1 (en) Providing data analytics for cohorts
CN117271463A (en) Method, apparatus, device and computer readable medium for screening users
US10769110B2 (en) Facilitating queries for interaction data with visitor-indexed data objects
CN115373831A (en) Data processing method, device and computer readable storage medium
CN111127077A (en) Recommendation method and device based on stream computing
CN113434754A (en) Method and device for determining recommended API (application program interface) service, electronic equipment and storage medium
CN111161067A (en) Method and device for determining transaction route
CN113362097B (en) User determination method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination