CN110929207B - Data processing method, device and computer readable storage medium - Google Patents

Data processing method, device and computer readable storage medium Download PDF

Info

Publication number
CN110929207B
CN110929207B CN201911153341.1A CN201911153341A CN110929207B CN 110929207 B CN110929207 B CN 110929207B CN 201911153341 A CN201911153341 A CN 201911153341A CN 110929207 B CN110929207 B CN 110929207B
Authority
CN
China
Prior art keywords
identifier
operation object
entry
entry identifier
target event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911153341.1A
Other languages
Chinese (zh)
Other versions
CN110929207A (en
Inventor
魏铮铮
崔波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Wodong Tianjun Information Technology Co Ltd
Priority to CN201911153341.1A priority Critical patent/CN110929207B/en
Publication of CN110929207A publication Critical patent/CN110929207A/en
Application granted granted Critical
Publication of CN110929207B publication Critical patent/CN110929207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Abstract

The disclosure relates to a data processing method, a data processing device and a computer readable storage medium, and relates to the technical field of computers. The method of the present disclosure comprises: acquiring a click log and target event information; determining each entry identifier and each operation object corresponding to the same user according to the click log and the target event information; determining the relevance of each entrance identifier and each operation object according to the attribute information corresponding to each entrance identifier; the attribute information corresponding to each entry identifier includes: at least one item of information of the matching relationship between the index object corresponding to each entry identifier and each operation object, the source address of the click operation corresponding to each entry identifier, the priority of each entry, and the time sequence of the click operation corresponding to each entry identifier; and determining the number of target events introduced by each entry according to the relevance of each entry identifier and each operation object.

Description

Data processing method, device and computer readable storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus, and a computer-readable storage medium.
Background
The relevance exists between each page and the entrance on the Internet platform, and the user can execute the target event to realize the operation on the specific object (for example, the purchase of the commodity, the playing of the video and the like) from the entrance to the page of the specific object by continuously clicking and the like.
The setting mode of the portal influences the access frequency and the operation frequency of the corresponding object by the user, for example, the access frequency and the operation frequency of the corresponding object are higher when the portal setting is more obvious. Therefore, the method and the device determine the entrances through which the user enters, further realize the operation of the object to finish the target event, determine the number of the target events introduced by each entrance, and have important effects on adjusting the arrangement strategy of the entrances by the Internet platform and improving the access and operation frequency of the object.
Disclosure of Invention
One technical problem to be solved by the present disclosure is: how to determine the number of target events introduced by each portal.
According to some embodiments of the present disclosure, there is provided a data processing method including: acquiring a click log and target event information; the click log includes: the entry identifier corresponding to each click operation and the corresponding user information, and the target event information comprises: the operation object of the target event and the user information corresponding to the target event; determining each entrance identifier and each operation object corresponding to the same user according to the click log and the target event information; determining the relevance of each entrance identifier and each operation object according to the attribute information corresponding to each entrance identifier; the attribute information corresponding to each entry identifier includes: matching relations between the index objects corresponding to the entry identifiers and the operation objects, source addresses of clicking operations corresponding to the entry identifiers, priorities of the entries and time sequences of the clicking operations corresponding to the entry identifiers; and determining the number of target events introduced by each entrance according to the relevance of each entrance identifier and each operation object.
In some embodiments, obtaining the click log and the target event information comprises: determining the batches to be read of the click logs and the target event information according to the offset value of the queue; the stored click logs and the target event information are divided into a plurality of batches according to the time sequence; each batch comprises a click log in first preset time and target event information in second preset time; and reading the click log and the target event information of the batch to be read from the queue.
In some embodiments, determining the number of target events introduced by each entry further comprises: the offset value of the queue is updated.
In some embodiments, according to the click log and the target event information, determining each entry identifier and each operation object corresponding to the same user; determining the relevance of each entrance identifier and each operation object according to the attribute information corresponding to each entrance identifier; determining the entry identifier of the trigger source of each operation object according to the relevance between each entry identifier and each operation object comprises the following steps: dividing a plurality of tasks and distributing the plurality of tasks to a plurality of nodes, each task comprising: the identification of the click log corresponding to the task and the identification of the target event information; each node searches the corresponding click log and the corresponding target event information according to the identification of the click log corresponding to the distributed task and the identification of the target event information; each node determines each entry identifier and each operation object corresponding to the same user according to the acquired corresponding click log and the corresponding target event information; determining the relevance of each entrance identifier and each operation object according to the attribute information corresponding to each entrance identifier; and determining the number of target events introduced by each entrance according to the relevance of each entrance identifier and each operation object.
In some embodiments, the target event information is broadcast to a plurality of nodes; and each node searches the target event information corresponding to the distributed tasks from the received target event information.
In some embodiments, determining, according to the attribute information corresponding to each entry identifier, an association between each entry identifier and each operation object includes: aiming at one operation object and a plurality of entrance identifiers, under the condition that each entrance identifier corresponds to a plurality of items of attribute information, sequentially comparing the same item of attribute information of each entrance identifier according to the sequence of the priority levels of the attribute information from high to low, and if the same item of attribute information of each entrance identifier is the same, comparing the attribute information of each entrance identifier with the priority level lower by one, thereby determining the relevance between the plurality of entrance identifiers and the operation object.
In some embodiments, for each entry identifier and each operation object, in the case that the index object corresponding to the entry identifier is the same as the operation object, determining that the entry identifier is directly associated with the operation object, and in the case that the index object corresponding to the entry identifier and the operation object belong to the same category, determining that the entry identifier is indirectly associated with the operation object; the relevance corresponding to direct correlation is higher than that corresponding to indirect correlation; or, for each entry identifier and each operation object, if the source platform corresponding to the entry identifier is the same as the source platform corresponding to the operation object, the entry identifier is used as an in-station entry identifier, if the source platform corresponding to the entry identifier is not the same as the source platform corresponding to the operation object, the entry identifier is used as an out-station entry identifier, and the relevance between the in-station entry identifier and the operation object is higher than the relevance between the out-station entry identifier and the operation object; or, the higher the priority of the corresponding entry is, the higher the relevance between the entry identifier and the operation object is; or the relevance between the entry identifier and the operation object is higher as the time of the corresponding click operation is closer to the current time.
In some embodiments, determining the number of target events introduced by each entry according to the association of each entry identifier with each operation object includes: aiming at each operation object, taking the entrance identifier with the highest relevance with the operation object as the entrance identifier of a target event corresponding to the introduced operation object; the number of target events introduced by each entry is determined.
According to further embodiments of the present disclosure, there is provided a data processing apparatus including: the information acquisition module is used for acquiring the click log and the target event information; the click log includes: the entry identifier corresponding to each click operation and the corresponding user information, and the target event information includes: the operation object of the target event and the user information corresponding to the target event; the data analysis module is used for determining each entrance identifier and each operation object corresponding to the same user according to the click log and the target event information; the relevance determining module is used for determining the relevance of each entrance identifier and each operation object according to the attribute information corresponding to each entrance identifier; the attribute information corresponding to each entry identifier includes: at least one item of information of the matching relationship between the index object corresponding to each entry identifier and each operation object, the source address of the click operation corresponding to each entry identifier, the priority of each entry, and the time sequence of the click operation corresponding to each entry identifier; and the output module is used for determining the number of target events introduced by each entrance according to the relevance between each entrance identifier and each operation object.
In some embodiments, the information obtaining module is configured to determine, according to the offset value of the queue, a batch to be read of the click log and the target event information; the stored click logs and the target event information are divided into a plurality of batches according to the time sequence; each batch comprises a click log in first preset time and target event information in second preset time; and reading the click log and the target event information of the batch to be read from the queue.
In some embodiments, further comprising: and the offset value setting module is used for updating the offset value of the queue.
In some embodiments, the information obtaining module is further configured to divide a plurality of tasks and distribute the plurality of tasks to the plurality of nodes, each task including: the identification of the click log corresponding to the task and the identification of the target event information; the device also includes: the information acquisition sub-modules are respectively arranged in each node and used for searching corresponding click logs and corresponding target event information according to the marks of the click logs and the marks of the target event information corresponding to the distributed tasks; the data analysis module comprises a plurality of data analysis sub-modules which are respectively arranged in each node and used for determining each entrance identifier and each operation object corresponding to the same user according to the obtained corresponding click log and the corresponding target event information; the relevance determining module comprises a plurality of relevance determining submodules which are respectively arranged in each node and used for determining the relevance of each entry identifier and each operation object according to the attribute information corresponding to each entry identifier; the output module comprises a plurality of output sub-modules which are respectively arranged in each node and used for determining the number of target events introduced into each entrance according to the relevance between each entrance identifier and each operation object.
In some embodiments, the target event information is broadcast to a plurality of nodes; and the information acquisition submodule of each node is used for searching target event information corresponding to the distributed tasks from the received target event information.
In some embodiments, the association determining module is configured to, for one operation object and multiple entry identifiers, compare, according to a sequence from high to low of priorities of various items of attribute information, the same item of attribute information of each entry identifier in turn, and if the same item of attribute information of each entry identifier is the same, compare, to determine the association between the multiple entry identifiers and the operation object, the attribute information of each entry identifier that is one level lower in priority.
In some embodiments, the association determining module is configured to determine, for each entry identifier and each operation object, that the entry identifier is directly associated with the operation object if the index object corresponding to the entry identifier is the same as the operation object, and that the entry identifier is indirectly associated with the operation object if the index object corresponding to the entry identifier and the operation object belong to the same category; the relevance corresponding to direct correlation is higher than that corresponding to indirect correlation; or, for each entry identifier and each operation object, if the source platform corresponding to the entry identifier is the same as the source platform corresponding to the operation object, the entry identifier is used as an in-station entry identifier, and if the source platform corresponding to the entry identifier is different from the source platform corresponding to the operation object, the entry identifier is used as an out-station entry identifier, and the relevance between the in-station entry identifier and the operation object is higher than the relevance between the out-station entry identifier and the operation object; or, the higher the priority of the corresponding entry is, the higher the relevance of the entry identifier with the operation object is; or the relevance of the entrance identifier and the operation object is higher when the time of the corresponding click operation is closer to the current time.
In some embodiments, the output module is configured to, for each operand, use the entry identifier with the highest association with the operand as the entry identifier of the target event corresponding to the incoming operand; the number of target events introduced by each entry is determined.
According to still other embodiments of the present disclosure, there is provided a data processing apparatus including: a processor; and a memory coupled to the processor for storing instructions that, when executed by the processor, cause the processor to perform the data processing method of any of the preceding embodiments.
According to still further embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the data processing method of any of the foregoing embodiments.
According to the method and the device, the click logs and the target event information of the users are obtained, all entry identifiers and all operation objects corresponding to the same user are determined, and then the number of target events introduced into all entries is determined according to at least one item of information in the matching relation between the index object corresponding to each entry identifier and each operation object, the source address of the click operation corresponding to each entry identifier, the priority of each entry and the time sequence of the click operation corresponding to each entry identifier. According to the method and the device, the number of the target events introduced into each entrance can be determined by integrating multiple items of attribute information corresponding to the entrance identification, and the accuracy is improved.
Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 shows a flow diagram of a data processing method of some embodiments of the present disclosure.
Fig. 2 shows a schematic diagram of the overall architecture of some embodiments of the present disclosure.
Fig. 3 shows a flow diagram of a data processing method of further embodiments of the present disclosure.
Fig. 4 shows a schematic structural diagram of a data processing apparatus of some embodiments of the present disclosure.
Fig. 5 shows a schematic structural diagram of a data processing apparatus of some embodiments of the present disclosure.
Fig. 6 shows a schematic structural diagram of a data processing apparatus of some embodiments of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
The present disclosure proposes how to determine the entries of the incoming target events, and determine the number of incoming target events for each entry, which is described below with reference to fig. 1.
FIG. 1 is a flow chart of some embodiments of the disclosed data processing method. As shown in fig. 1, the method of this embodiment includes: steps S102 to S108.
In step S102, a click log and target event information are acquired.
The click log includes: and the corresponding entry identifier and the corresponding user information of each click operation. An entry may be a hyperlink in a web page that can be clicked to enter another page. The portal may be in the form of text or pictures, for example, an advertising picture of a certain item, etc. The target event information includes: the operation object of the target event and the user information corresponding to the target event. The target event may be a target operation performed on one or more operation objects, thereby forming a target event. For example, if the operation object is a video and the target operation is clicking a play button, the target event is playing the video. For another example, the operation object is a product item (SKU), the target operation is to click a purchase button and pay, and the target event is to place an order for the product.
The same portal on different devices corresponds to different portal identities. And setting the same entrance on different devices as different entrance identifications according to the device identification in the click log.
In some embodiments, since the data volume of the click log and the target event information is very large, the stored click log and the target event information may be divided into a plurality of batches according to the time sequence, and the data of each batch may be periodically read and then processed. For example, the click log and the target event information are put into a queue, the queue sets an Offset value (Offset), and before reading, batches to be read of the click log and the target event information are determined according to the Offset value of the queue, wherein each batch comprises the click log in a first preset time and the target event information in a second preset time. For example, each batch includes a click log within 1 day and target event information for 10 minutes. After determining the number of target events introduced by each entry, updating the offset value of the queue. In the next cycle, the click log and the target event information of the next batch are read. The offset value is set to avoid repeated reading and processing of data.
In step S104, each entry identifier and each operation object corresponding to the same user are determined according to the click log and the target event information.
The click log and the target event information of the same user can be corresponded based on the user information, so that each entry identifier and each operation object corresponding to the same user can be determined.
In step S106, the association between each entry identifier and each operation object is determined according to the attribute information corresponding to each entry identifier.
The attribute information corresponding to each entry identifier includes, for example: the matching relation between the index object corresponding to each entry identifier and each operation object, the source address of the click operation corresponding to each entry identifier, the priority of each entry, and at least one item of information in the time sequence of the click operation corresponding to each entry identifier.
In some embodiments, for each entry identifier and each operation object, in a case that an index object corresponding to the entry identifier is the same as the operation object, determining that the entry identifier is directly associated with the operation object, and in a case that an index object corresponding to the entry identifier belongs to the same category as the operation object, determining that the entry identifier is indirectly associated with the operation object; the relevance of a direct associative correspondence is higher relative to an indirect associative correspondence. The first associated value may be set to directly associate the corresponding first associated value higher than the first associated value to indirectly associate the corresponding first associated value.
Each entry mark corresponds to one or more index objects, for example, an item in a picture displayed by the entry is an index object. The correspondence of the entry identifier to the index object may be stored in advance as configuration information.
In some embodiments, for each entry identifier and each operation object, if the source platform corresponding to the entry identifier is the same as the source platform corresponding to the operation object, the entry identifier is used as an in-station entry identifier, and if the source platform corresponding to the entry identifier is not the same as the source platform corresponding to the operation object, the entry identifier is used as an out-station entry identifier, and the relevance between the in-station entry identifier and the operation object is higher than that between the out-station entry identifier and the operation object. A second association value may be set, the second association value corresponding to the intra-site portal identification being higher than the second association value of the extra-site portal identification.
In some embodiments, the higher priority entry identification of the corresponding entry is associated with the operand. The priority of the portal may be determined according to the priority of the drop user corresponding to the portal, and the higher the priority of the drop user is, the higher the priority of the portal is. A third association value may be set, the higher the priority of the corresponding entry is, the larger the third association value is.
In some embodiments, the entry identifier associated with the operation object is associated with a higher association as the time of the corresponding click operation is closer to the current time. A fourth correlation value may be set, the closer the time of the corresponding click operation is to the current time, the larger the fourth correlation value is.
For each entry identifier and each operation object, the corresponding first correlation value, second correlation value, third correlation value, and fourth correlation value may be weighted to obtain the correlation value between the entry identifier and the operation object.
In some embodiments, for one operation object and a plurality of entry identifiers, under the condition that each entry identifier corresponds to a plurality of items of attribute information, sequentially comparing the same item of attribute information of each entry identifier according to the sequence of the priorities of the items of attribute information from high to low, and if the same item of attribute information of each entry identifier is the same, comparing the attribute information of each entry identifier with the priority lower than one level, thereby determining the relevance between the plurality of entry identifiers and the operation object. For example, according to the order of the priority from high to low, the attribute information includes, in order, the matching relationship between the index object corresponding to the entry identifier and the operation object, the source address of the click operation corresponding to the entry identifier, the priority of the entry, and the time sequence of the click operation corresponding to the entry identifier.
For example, for one operation object and a plurality of entry identifiers, it is first determined that each entry identifier is directly or indirectly associated with the operation object. Since the entry identifier with the highest relevance to the operation object is determined subsequently, the indirectly-associated entry identifier may be excluded, and relevance determination is no longer performed. Further, for each directly associated entry identifier, it is determined to belong to an intra-site entry identifier or an extra-site entry identifier, and the extra-site entry identifiers are excluded. Further, determining the priority of the entry corresponding to each entry identifier, selecting each entry identifier with the highest priority, and excluding the rest entry identifiers. Further, according to the time sequence of the click operation corresponding to each entry identifier, the entry identifier with the time of the corresponding click operation closest to the current time is determined as the entry identifier with the highest relevance to the operation object.
In some embodiments, the determination of the association may be made for a portion of the entry identification and a portion of the operand. And determining whether the releasing user identifier corresponding to the entrance identifier belongs to a preset identifier, and deleting the entrance identifier if the releasing user identifier does not belong to the preset identifier. And determining whether the operation object belongs to the operation object of the preset type, and deleting the operation object if the operation object does not belong to the preset type. The preset identifier and the preset type can be customized and can be stored in advance as configuration information.
In step S108, the number of target events introduced by each entry is determined according to the association between each entry identifier and each operation object.
In some embodiments, for each operation object, the entry identifier with the highest relevance to the operation object is used as the entry identifier of the target event corresponding to the introduced operation object; the number of target events introduced by each entry is determined.
As shown in FIG. 2, the overall architecture diagram of the present disclosure includes a data entry layer 210, a persistent queue layer 220, a business computation layer 230, and a data store layer 240. The data input layer 210 is an entry for click logs, target event information input, stored by other systems. The information enters the persistent queue layer 220 through the data input layer, which can be implemented by Kafka, and Offset marks are set in the queue to mark the position that has been read last time. The service computing layer 230 may pull data from the persistent queue layer for computation, and may also pull some configuration information from an external database for computation, for example, the corresponding relationship between the entry identifier and the index object, the preset identifier and the preset type in the foregoing embodiment, and the like. The external database may also set a flag, and the service computing layer 230 determines whether new configuration information needs to be loaded or not through the flag, and if the flag indicates that loading is needed, loading is performed, otherwise, loading is not needed. For example, the flag may be periodically changed to indicate that loading is required. The service computing layer 230 executes the above steps S104 to S108, finally determines the number of the target events introduced by each entry, and outputs the result to the data output layer 240. The data output layer 240 sets up a database, such as MySQL or the like, for storing the results.
In the method of the embodiment, the click log and the target event information of the user are acquired, each entry identifier and each operation object corresponding to the same user are determined, and the number of target events introduced by each entry is determined according to at least one item of information of the matching relationship between the index object corresponding to each entry identifier and each operation object, the source address of the click operation corresponding to each entry identifier, the priority of each entry, and the time sequence of the click operation corresponding to each entry identifier. The method of the embodiment can determine the number of the target events introduced into each entrance by integrating the attribute information corresponding to the entrance identification, thereby improving the accuracy.
Further embodiments of the disclosed data processing method are described below in conjunction with fig. 3.
FIG. 3 is a flow chart of further embodiments of the data processing method of the present disclosure. As shown in fig. 3, the method of this embodiment includes: steps S302 to S312.
In step S302, a click log and target event information are acquired.
The scheme of the present disclosure may be implemented by using a Spark architecture, and as in the foregoing embodiments, the click log to be read and the target event information may be obtained in batches. After Spark is initialized, a stream object can be created to obtain click logs to be read and target event information. If the historical click log is needed, the historical click log can be loaded from the HDFS and cached in a memory.
The obtained click log and the target event information can be respectively re-partitioned, so that the data becomes more uniform, and the subsequent use is facilitated. After the re-partition, the target event information may be compared to a broadcast threshold, and if less than the threshold, the target event information is broadcast to the nodes. Broadcasting may be implemented by setting the target event information to a corresponding broadcast variable.
In step S304, a plurality of tasks are divided and distributed to a plurality of nodes.
Each task includes: and the identification of the click log corresponding to the task and the identification of the target event information. The division and allocation of the Task (Task) to multiple nodes (executors) can be implemented by using the technology in Spark and will not be described herein again.
In step S306, each node searches for a corresponding click log and corresponding target event information according to the identification of the click log and the identification of the target event information corresponding to the assigned task.
For example, each node searches for target event information corresponding to the assigned task from among the target event information received from the broadcast.
In step S308, each node determines each entry identifier and each operation object corresponding to the same user according to the obtained corresponding click log and the corresponding target event information.
In step S310, each node determines the association between each entry identifier and each operation object according to the attribute information corresponding to each entry identifier.
In step S312, each node determines the entry identifier of the trigger source of each operation object according to the association between each entry identifier and each operation object.
According to the method, the number of the target events introduced into each inlet is determined by parallel processing of the plurality of nodes, and the processing efficiency is improved.
The present disclosure also provides a data processing apparatus, which is described below in conjunction with fig. 4.
FIG. 4 is a block diagram of some embodiments of a data processing device of the present disclosure. As shown in fig. 4, the apparatus 40 of this embodiment includes: the system comprises an information acquisition module 410, a data analysis module 420, a relevance determination module 430 and an output module 440.
An information obtaining module 410, configured to obtain a click log and target event information; the click log includes: the entry identifier corresponding to each click operation and the corresponding user information, and the target event information comprises: the operation object of the target event and the user information corresponding to the target event.
In some embodiments, the information obtaining module 410 is configured to determine, according to the offset value of the queue, a batch to be read of the click log and the target event information; the stored click logs and the target event information are divided into a plurality of batches according to the time sequence; each batch comprises a click log in first preset time and target event information in second preset time; and reading the click log and the target event information of the batch to be read from the queue.
In some embodiments, the apparatus 40 further comprises: and the offset value setting module is used for updating the offset value of the queue.
And the data analysis module 420 is configured to determine, according to the click log and the target event information, each entry identifier and each operation object corresponding to the same user.
The relevance determining module 430 is configured to determine relevance between each entry identifier and each operation object according to the attribute information corresponding to each entry identifier; the attribute information corresponding to each entry identifier includes: the matching relation between the index object corresponding to each entry identifier and each operation object, the source address of the click operation corresponding to each entry identifier, the priority of each entry, and at least one item of information in the time sequence of the click operation corresponding to each entry identifier.
In some embodiments, the association determining module 430 is configured to, for one operation object and multiple entry identifiers, sequentially compare, according to a sequence from high to low of priorities of attribute information, the same attribute information of each entry identifier, and if the same attribute information of each entry identifier is the same, compare, to determine the association between the multiple entry identifiers and the operation object, the same attribute information of each entry identifier that is one level lower in priority.
In some embodiments, the association determining module 430 is configured to determine, for each entry identifier and each operation object, that the entry identifier is directly associated with the operation object if the index object corresponding to the entry identifier is the same as the operation object, and that the entry identifier is indirectly associated with the operation object if the index object corresponding to the entry identifier and the operation object belong to the same category; the relevance corresponding to direct correlation is higher than that corresponding to indirect correlation; or, for each entry identifier and each operation object, if the source platform corresponding to the entry identifier is the same as the source platform corresponding to the operation object, the entry identifier is used as an in-station entry identifier, and if the source platform corresponding to the entry identifier is different from the source platform corresponding to the operation object, the entry identifier is used as an out-station entry identifier, and the relevance between the in-station entry identifier and the operation object is higher than the relevance between the out-station entry identifier and the operation object; or, the higher the priority of the corresponding entry is, the higher the relevance of the entry identifier with the operation object is; or the relevance between the entry identifier and the operation object is higher as the time of the corresponding click operation is closer to the current time.
And the output module 440 is configured to determine, according to the association between each entry identifier and each operation object, the number of target events introduced by each entry.
In some embodiments, the output module 440 is configured to, for each operand, use the entry identifier with the highest association with the operand as the entry identifier of the target event corresponding to the incoming operand; the number of target events introduced by each entry is determined.
In some embodiments, the information obtaining module 410 is further configured to divide the plurality of tasks and distribute the plurality of tasks to the plurality of nodes, each task including: the identification of the click log corresponding to the task and the identification of the target event information; the apparatus 40 further comprises: the information acquisition sub-modules are respectively arranged in each node and used for searching corresponding click logs and corresponding target event information according to the marks of the click logs and the marks of the target event information corresponding to the distributed tasks; the data analysis module comprises a plurality of data analysis sub-modules which are respectively arranged in each node and used for determining each entrance identifier and each operation object corresponding to the same user according to the obtained corresponding click log and the corresponding target event information; the relevance determining module comprises a plurality of relevance determining sub-modules which are respectively arranged in each node and used for determining the relevance of each entrance identifier and each operation object according to the attribute information corresponding to each entrance identifier; the output module comprises a plurality of output sub-modules which are respectively arranged in each node and used for determining the number of target events introduced into each entrance according to the relevance between each entrance identifier and each operation object.
The data processing apparatus in the embodiments of the present disclosure may each be implemented by various computing devices or computer systems, which are described below in conjunction with fig. 5 and 6.
FIG. 5 is a block diagram of some embodiments of a data processing apparatus of the present disclosure. As shown in fig. 5, the apparatus 50 of this embodiment includes: a memory 510 and a processor 520 coupled to the memory 510, the processor 520 being configured to perform the data processing method in any of the embodiments of the present disclosure based on instructions stored in the memory 510.
Memory 510 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a BootLoader (BootLoader), a database, and other programs.
FIG. 6 is a block diagram of further embodiments of a data processing device according to the present disclosure. As shown in fig. 6, the apparatus 60 of this embodiment includes: memory 610 and processor 620 are similar to memory 510 and processor 520, respectively. Input/output interfaces 630, network interfaces 640, storage interfaces 650, etc. may also be included. These interfaces 630, 640, 650 and the connections between the memory 610 and the processor 620 may be, for example, via a bus 660. The input/output interface 630 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 640 provides a connection interface for various networking devices, such as a database server or a cloud storage server. The storage interface 650 provides a connection interface for external storage devices such as an SD card and a usb disk.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only exemplary of the present disclosure and is not intended to limit the present disclosure, so that any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (11)

1. A method of data processing, comprising:
acquiring a click log and target event information; the click log includes: the entry identifier corresponding to each click operation and the corresponding user information are obtained, and the target event information comprises: the operation object of the target event and the user information corresponding to the target event;
determining each entrance identifier and each operation object corresponding to the same user according to the click log and the target event information;
determining the relevance of each entrance identifier and each operation object according to the attribute information corresponding to each entrance identifier; the attribute information corresponding to each entry identifier includes: at least one item of information of a matching relation between the index object corresponding to each entry identifier and each operation object, a source address of the click operation corresponding to each entry identifier, a priority of each entry, and a time sequence of the click operation corresponding to each entry identifier;
and determining the number of target events introduced by each entrance according to the relevance of each entrance identifier and each operation object.
2. The data processing method of claim 1, wherein,
the acquiring of the click log and the target event information includes:
determining the batches to be read of the click logs and the target event information according to the offset value of the queue; the method comprises the steps that each click log and each target event message are stored in a queue, the queue sets the offset value, and the stored click logs and the target event messages are divided into a plurality of batches according to the time sequence; each batch comprises a click log in first preset time and target event information in second preset time;
and reading the click log and the target event information of the batch to be read from the queue.
3. The data processing method of claim 2, wherein the determining the number of target events introduced by each entry further comprises:
updating an offset value for the queue.
4. The data processing method of claim 1, wherein,
determining each entry identifier and each operation object corresponding to the same user according to the click log and the target event information; determining the relevance of each entrance identifier and each operation object according to the attribute information corresponding to each entrance identifier; determining the entry identifier of the trigger source of each operation object according to the relevance of each entry identifier and each operation object comprises the following steps:
dividing a plurality of tasks and distributing the plurality of tasks to a plurality of nodes, each task comprising: the identification of the click log corresponding to the task and the identification of the target event information;
each node searches a corresponding click log and corresponding target event information according to the identification of the click log corresponding to the distributed task and the identification of the target event information;
each node determines each entry identifier and each operation object corresponding to the same user according to the acquired corresponding click log and the corresponding target event information; determining the relevance of each entrance identifier and each operation object according to the attribute information corresponding to each entrance identifier; and determining the number of target events introduced by each entrance according to the relevance of each entrance identifier and each operation object.
5. The data processing method according to claim 4,
the target event information is broadcast to the plurality of nodes; and each node searches the target event information corresponding to the distributed tasks from the received target event information.
6. The data processing method according to claim 4,
the determining the association between each entry identifier and each operation object according to the attribute information corresponding to each entry identifier includes:
aiming at one operation object and a plurality of entrance identifiers, under the condition that each entrance identifier corresponds to a plurality of items of attribute information, sequentially comparing the same item of attribute information of each entrance identifier according to the sequence of the priority levels of the attribute information from high to low, and if the same item of attribute information of each entrance identifier is the same, comparing the attribute information of each entrance identifier with the priority level lower by one, thereby determining the relevance between the plurality of entrance identifiers and the operation object.
7. The data processing method according to claim 1,
for each entry identifier and each operation object, determining that the entry identifier is directly associated with the operation object under the condition that the index object corresponding to the entry identifier is the same as the operation object, and determining that the entry identifier is indirectly associated with the operation object under the condition that the index object corresponding to the entry identifier and the operation object belong to the same category; the relevance corresponding to direct association is higher than the relevance corresponding to indirect association;
or, for each entry identifier and each operation object, if the source platform corresponding to the entry identifier is the same as the source platform corresponding to the operation object, the entry identifier is used as an in-station entry identifier, and if the source platform corresponding to the entry identifier is different from the source platform corresponding to the operation object, the entry identifier is used as an out-station entry identifier, and the relevance between the in-station entry identifier and the operation object is higher than the relevance between the out-station entry identifier and the operation object;
or, the higher the priority of the corresponding entry is, the higher the relevance of the entry identifier with the operation object is;
or the relevance between the entry identifier and the operation object is higher as the time of the corresponding click operation is closer to the current time.
8. The data processing method of claim 1, wherein,
the determining the number of target events introduced by each entry according to the relevance between each entry identifier and each operation object includes:
for each operation object, taking the entry identifier with the highest relevance with the operation object as the entry identifier of a target event corresponding to the introduced operation object;
the number of target events introduced by each entry is determined.
9. A data processing apparatus comprising:
the information acquisition module is used for acquiring the click log and the target event information; the click log includes: the entry identifier corresponding to each click operation and the corresponding user information, wherein the target event information comprises: the operation object of the target event and the user information corresponding to the target event;
the data analysis module is used for determining each entrance identifier and each operation object corresponding to the same user according to the click log and the target event information;
the relevance determining module is used for determining the relevance of each entrance identifier and each operation object according to the attribute information corresponding to each entrance identifier; the attribute information corresponding to each entry identifier includes: at least one item of information of a matching relation between the index object corresponding to each entry identifier and each operation object, a source address of the click operation corresponding to each entry identifier, a priority of each entry, and a time sequence of the click operation corresponding to each entry identifier;
and the output module is used for determining the number of target events introduced by each entrance according to the relevance between each entrance identifier and each operation object.
10. A data processing apparatus comprising:
a processor; and
a memory coupled to the processor for storing instructions that, when executed by the processor, cause the processor to perform the data processing method of any of claims 1-8.
11. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the steps of the method of any one of claims 1-8.
CN201911153341.1A 2019-11-22 2019-11-22 Data processing method, device and computer readable storage medium Active CN110929207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911153341.1A CN110929207B (en) 2019-11-22 2019-11-22 Data processing method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911153341.1A CN110929207B (en) 2019-11-22 2019-11-22 Data processing method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110929207A CN110929207A (en) 2020-03-27
CN110929207B true CN110929207B (en) 2023-01-31

Family

ID=69851611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911153341.1A Active CN110929207B (en) 2019-11-22 2019-11-22 Data processing method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110929207B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232856A (en) * 2020-09-25 2021-01-15 上海淇毓信息科技有限公司 Traffic processing method and device based on diversion and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107526748A (en) * 2016-06-22 2017-12-29 华为技术有限公司 A kind of method and apparatus for identifying user and clicking on behavior
CN110032698A (en) * 2019-02-03 2019-07-19 阿里巴巴集团控股有限公司 Information display method and device, information processing method and device
CN110069463A (en) * 2019-03-12 2019-07-30 北京奇艺世纪科技有限公司 User behavior processing method, device electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7779121B2 (en) * 2007-10-19 2010-08-17 Nokia Corporation Method and apparatus for detecting click fraud
US9442621B2 (en) * 2009-05-05 2016-09-13 Suboti, Llc System, method and computer readable medium for determining user attention area from user interface events

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107526748A (en) * 2016-06-22 2017-12-29 华为技术有限公司 A kind of method and apparatus for identifying user and clicking on behavior
CN110032698A (en) * 2019-02-03 2019-07-19 阿里巴巴集团控股有限公司 Information display method and device, information processing method and device
CN110069463A (en) * 2019-03-12 2019-07-30 北京奇艺世纪科技有限公司 User behavior processing method, device electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Mining Online users" access records for web business intelligence;Fong, S等;《Proceedings 2002 IEEE International Conference on Data Mining. ICDM 2002》;20030310;全文 *
利用词嵌入模型实现基于网站访问日志的专利聚类研究;文奕等;《现代情报》;20180415(第04期);全文 *

Also Published As

Publication number Publication date
CN110929207A (en) 2020-03-27

Similar Documents

Publication Publication Date Title
CN106776809B (en) Data query method and system
CN103748579A (en) Processing data in a mapreduce framework
US20170046447A1 (en) Information Category Obtaining Method and Apparatus
CN107291745B (en) Data index management method and device
US20230205755A1 (en) Methods and systems for improved search for data loss prevention
WO2017114276A1 (en) User analysis method and system based on image
CN109597810B (en) Task segmentation method, device, medium and electronic equipment
CN110704677A (en) Program recommendation method and device, readable storage medium and terminal equipment
TW201344475A (en) Information providing method and system
CN111159563A (en) Method, device and equipment for determining user interest point information and storage medium
CN110929207B (en) Data processing method, device and computer readable storage medium
CN112860850B (en) Man-machine interaction method, device, equipment and storage medium
CN107798450B (en) Service distribution method and device
CN112258244A (en) Method, device, equipment and storage medium for determining task of target object
KR20210023635A (en) Method and system for providing document timeline using cluster of long-term related issue unit
CN107391728B (en) Data mining method and data mining device
CN111143582B (en) Multimedia resource recommendation method and device for updating association words in double indexes in real time
US10109019B2 (en) Accelerated disaggregation in accounting calculation via pinpoint queries
CN113076322A (en) Commodity search processing method and device
CN111400510A (en) Data archiving processing method, device, equipment and readable storage medium
CN108182201B (en) Application expansion method and device based on key keywords
CN111552561B (en) Task processing method and device
CN111143456B (en) Spark-based Cassandra data import method, device, equipment and medium
CN110245208B (en) Retrieval analysis method, device and medium based on big data storage
CN110765100B (en) Label generation method and device, computer readable storage medium and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant