CN114327857A - Operation data processing method and device, computer equipment and storage medium - Google Patents

Operation data processing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114327857A
CN114327857A CN202111296917.7A CN202111296917A CN114327857A CN 114327857 A CN114327857 A CN 114327857A CN 202111296917 A CN202111296917 A CN 202111296917A CN 114327857 A CN114327857 A CN 114327857A
Authority
CN
China
Prior art keywords
resource
target
data
index table
processing unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111296917.7A
Other languages
Chinese (zh)
Inventor
石志林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111296917.7A priority Critical patent/CN114327857A/en
Publication of CN114327857A publication Critical patent/CN114327857A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to an operation data processing method, which is applied to the technical field of cloud and comprises the following steps: acquiring operation data obtained by operating each target resource, wherein the operation data comprises an object identifier and an associated resource identifier; converting each object identifier into hash data, and searching a corresponding target index in a preset index table through each hash data; determining addresses stored in positions corresponding to the target indexes, and distributing the object identifications and the associated resource identifications to processing units at corresponding addresses; acquiring corresponding object characteristics based on the object identification and acquiring corresponding resource characteristics based on the resource identification through each processing unit; splicing the object characteristics of each object identifier with the resource characteristics of the resource identifiers associated with the object identifiers through each processing unit to obtain target characteristic data; the target feature data is used for training a prediction model for predicting resource click rates. By adopting the method, the energy consumption for processing the operation data can be reduced and the processing efficiency can be improved.

Description

Operation data processing method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an operation data processing method and apparatus, a computer device, and a storage medium.
Background
With the rapid development of computer technology, computer devices need to run huge data at all times, such as operation data of users for requesting services, browsing data, verifying data, and the like. The computer equipment realizes data support for various service requirements by analyzing and processing the operation data of the user.
While the traditional processing of the operation data generally adopts a batch processing technology, the batch processing technology has high requirements on processing resources, and congestion is easily caused under the condition of limited processing resources, so that the processing efficiency is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an operation data processing method, an operation data processing apparatus, a computer device, and a storage medium, which can achieve load balancing and improve data processing efficiency.
A method of operational data processing, the method comprising:
acquiring operation data obtained by operating each target resource; the operation data comprises an object identifier and an associated resource identifier;
respectively converting each object identifier into corresponding hash data, and respectively searching corresponding target indexes in a preset index table through each hash data;
determining addresses stored in positions corresponding to the target indexes, and distributing the object identifiers and the associated resource identifiers to processing units at corresponding addresses;
acquiring, by each processing unit, a corresponding object feature based on the received object identifier and a corresponding resource feature based on the received resource identifier;
splicing the object characteristics of the object identifiers with the resource characteristics of the resource identifiers associated with the object identifiers through the processing units to obtain target characteristic data; the target characteristic data is used for training a prediction model, and the prediction model is used for predicting the resource click rate of the resource to be processed.
An operational data processing apparatus, the apparatus comprising:
the identification acquisition module is used for acquiring operation data obtained by operating each target resource; the operation data comprises an object identifier and an associated resource identifier;
the conversion module is used for respectively converting each object identifier into corresponding hash data and respectively searching corresponding target indexes in a preset index table through each hash data;
the distribution module is used for determining the address stored in the position corresponding to each target index and distributing each object identifier and the associated resource identifier to the processing unit at the corresponding address;
the characteristic acquisition module is used for acquiring corresponding object characteristics based on the received object identification through each processing unit and acquiring corresponding resource characteristics based on the received resource identification;
the splicing module is used for splicing the object characteristics of the object identifiers with the resource characteristics of the resource identifiers associated with the object identifiers through the processing units to obtain target characteristic data; the target characteristic data is used for training a prediction model, and the prediction model is used for predicting the resource click rate of the resource to be processed.
In one embodiment, the identifier obtaining module is further configured to obtain operation data obtained by operating each target resource every preset time period;
the device also comprises a cache module; the cache module is used for acquiring an object identifier and object characteristics from the operation data and storing the object identifier and the corresponding object characteristics into a cache space in an associated manner; and acquiring resource identification and resource characteristics respectively corresponding to each target resource from the operation data, and storing each resource identification and corresponding resource characteristics into a cache space in an associated manner.
In an embodiment, the conversion module is configured to convert each object identifier into corresponding hash data through a hash function, and determine a conversion value corresponding to each object identifier according to each hash data and a length of a preset index table; and respectively searching corresponding target indexes in the preset index table through the conversion values.
In one embodiment, the apparatus further comprises a building module; the construction module is used for constructing an empty index table and converting each candidate object identifier into corresponding candidate hash data respectively; determining indexes of all positions in the empty index table according to all the candidate hash data; and filling positions corresponding to the indexes in the empty index table through the addresses of the processing units to obtain a preset index table.
In one embodiment, the building module is further configured to generate a random value in the list of each processing unit through two random hash functions and a unit identifier of each processing unit; the number of random values in the list of each processing unit is the same as the number of positions in the empty index table; and filling the address of each processing unit to each position in the empty index table based on the random value in the list of each processing unit to obtain a preset index table.
In one embodiment, the building module is further configured to determine, based on the first hash function, the unit identifier of each processing unit, and the length of the empty index table, an offset corresponding to each processing unit; determining jump amount respectively corresponding to each processing unit based on a second hash function, unit identification of each processing unit and length of the empty index table; the second hash function is different from the first hash function; and determining a random value in the list of each processing unit according to the offset and the jumping amount corresponding to each processing unit.
In an embodiment, the building module is further configured to select a target random value from the list of each processing unit, and search, in the empty index table, a target position corresponding to an index that is the same as the target random value; when the target position is not filled with the address, filling the address of the processing unit to which the list where the target random value is located belongs to the target position; and selecting a next target random value from a next list of the list where the target random value is located, returning to the step of searching the target position corresponding to the index which is the same as the target random value in the empty index table, and continuing to execute the step until a filling stop condition is met, so as to obtain a preset index table.
In an embodiment, the building module is further configured to, when the target location has been filled with an address, select a next target random value from the list where the target random value is located, return to the step of searching for the target location corresponding to the index that is the same as the target random value in the empty index table, and continue to execute the step until the filling stop condition is satisfied, so as to obtain a preset index table.
In an embodiment, the feature obtaining module is further configured to obtain, by each of the processing units, an object feature corresponding to the received object identifier from a cache space, and obtain, from the cache space, a resource feature corresponding to the received resource identifier.
In one embodiment, the apparatus further comprises a training module; the training module is used for inputting the target characteristic data into a prediction model and outputting the predicted click rate of each object identifier for the corresponding target resource; and adjusting parameters of the prediction model according to the difference between each predicted click rate and the corresponding expected click rate until the parameters stop when the training stop condition is reached, so as to obtain an updated target prediction model.
In one embodiment, the apparatus further comprises a prediction module; the prediction module is used for acquiring target object characteristics corresponding to a target object and to-be-processed resource characteristics corresponding to-be-processed resources from a prediction request when the prediction request is received; inputting the target object characteristics and the to-be-processed resource characteristics into the target prediction model to obtain the resource click rate output by the target prediction model; and the resource click rate represents the probability of the target object clicking the resource to be processed.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring operation data obtained by operating each target resource; the operation data comprises an object identifier and an associated resource identifier;
respectively converting each object identifier into corresponding hash data, and respectively searching corresponding target indexes in a preset index table through each hash data;
determining addresses stored in positions corresponding to the target indexes, and distributing the object identifiers and the associated resource identifiers to processing units at corresponding addresses;
acquiring, by each processing unit, a corresponding object feature based on the received object identifier and a corresponding resource feature based on the received resource identifier;
splicing the object characteristics of the object identifiers with the resource characteristics of the resource identifiers associated with the object identifiers through the processing units to obtain target characteristic data; the target characteristic data is used for training a prediction model, and the prediction model is used for predicting the resource click rate of the resource to be processed.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring operation data obtained by operating each target resource; the operation data comprises an object identifier and an associated resource identifier;
respectively converting each object identifier into corresponding hash data, and respectively searching corresponding target indexes in a preset index table through each hash data;
determining addresses stored in positions corresponding to the target indexes, and distributing the object identifiers and the associated resource identifiers to processing units at corresponding addresses;
acquiring, by each processing unit, a corresponding object feature based on the received object identifier and a corresponding resource feature based on the received resource identifier;
splicing the object characteristics of the object identifiers with the resource characteristics of the resource identifiers associated with the object identifiers through the processing units to obtain target characteristic data; the target characteristic data is used for training a prediction model, and the prediction model is used for predicting the resource click rate of the resource to be processed.
A computer program product comprising a computer program which when executed by a processor performs the steps of:
acquiring operation data obtained by operating each target resource; the operation data comprises an object identifier and an associated resource identifier;
respectively converting each object identifier into corresponding hash data, and respectively searching corresponding target indexes in a preset index table through each hash data;
determining addresses stored in positions corresponding to the target indexes, and distributing the object identifiers and the associated resource identifiers to processing units at corresponding addresses;
acquiring, by each processing unit, a corresponding object feature based on the received object identifier and a corresponding resource feature based on the received resource identifier;
splicing the object characteristics of the object identifiers with the resource characteristics of the resource identifiers associated with the object identifiers through the processing units to obtain target characteristic data; the target characteristic data is used for training a prediction model, and the prediction model is used for predicting the resource click rate of the resource to be processed.
According to the operation data processing method, the operation data processing device, the computer equipment, the computer readable storage medium and the computer program product, the operation data obtained by operating each target resource is obtained, the operation data comprises the object identification and the associated resource identification, each object identification is converted into corresponding hash data, and the corresponding target index can be accurately found in the preset index table through each hash data. And determining addresses stored in positions corresponding to the target indexes, and distributing the object identifications and the associated resource identifications to the processing units at the corresponding addresses, so that uniform distribution of data is realized, load balance of the processing units is realized, and balanced distribution of processing resources is realized. And the distributed data is the object identification and the associated resource identification, so that the distributed data is small in quantity and high in transmission speed, and the data distribution efficiency can be effectively improved. The processing units acquire corresponding object characteristics based on the received object identifiers and acquire corresponding resource characteristics based on the received resource identifiers, so that the object characteristics of the object identifiers and the resource characteristics of the resource identifiers associated with the object identifiers can be spliced respectively through the processing units to obtain target characteristic data, the processing efficiency of the operation data can be effectively improved through the processing units, data response can be timely carried out, and timeliness is better. The prediction model can be trained in real time through the target characteristic data, so that the resource click rate of the resource to be processed can be accurately predicted by the prediction model.
Drawings
FIG. 1 is a diagram of an application environment in which a method of data processing is operated in one embodiment;
FIG. 2 is a flow diagram illustrating a method of operating data processing according to one embodiment;
FIG. 3 is a flowchart illustrating obtaining operation data obtained by operating on target resources according to an embodiment;
FIG. 4 is a flowchart illustrating steps of constructing a pre-determined index table according to an embodiment;
FIG. 5 is a flowchart illustrating the generation of random values in the list of each processing unit by two random hash functions and the unit identifier of each processing unit in one embodiment;
FIG. 6 is a flowchart illustrating filling addresses of each processing unit into each position in an empty index table based on a random value in a list of each processing unit to obtain a preset index table in one embodiment;
FIG. 7 is a flow diagram illustrating a method of operating data processing in one embodiment;
FIG. 8 is a flowchart illustrating a method of operating data processing in accordance with another embodiment;
FIG. 9 is a block diagram of an operating data processing apparatus in one embodiment;
FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The operation data processing method provided by the application can be applied to the application environment shown in fig. 1. The present application relates to the field of Artificial Intelligence (AI) technology, which is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The scheme provided by the embodiment of the application relates to an artificial intelligence operation data processing method, and is specifically explained by the following embodiments.
The operation data processing method provided by the application can be applied to the operation data processing system shown in FIG. 1. As shown in fig. 1, the operation data processing system includes a terminal 110 and a server 120, and a data storage system integrated on the server 120. The data storage system may store data that the server 120 needs to process. The data storage system may also be placed on the cloud or other network server. In one embodiment, the terminal 110 and the server 120 may each separately perform the operation data processing method provided in the embodiment of the present application. The terminal 110 and the server 120 may also be cooperatively used to execute the operation data processing method provided in the embodiment of the present application. When the terminal 110 and the server 120 are cooperatively used to execute the operation data processing method provided in the embodiment of the present application, the terminal 110 obtains operation data obtained by operating each target resource; the operational data includes an object identification and an associated resource identification. The terminal 110 transmits the operation data to the data storage system of the server 120. The server 120 obtains operation data including object identifiers and resource identifiers from the data storage system, converts each object identifier into corresponding hash data, and searches a corresponding target index in a preset index table through each hash data. The server 120 determines the addresses stored at the positions corresponding to the target indexes, and distributes the object identifiers and the associated resource identifiers to the processing units at the corresponding addresses. The server 120 acquires, through each processing unit, a corresponding object feature based on the received object identifier, and acquires a corresponding resource feature based on the received resource identifier. The server 120 splices the object characteristics of each object identifier with the resource characteristics of the resource identifier associated with the object identifier through each processing unit to obtain target characteristic data; the target characteristic data is used for training a prediction model, and the prediction model is used for predicting the resource click rate of the resource to be processed.
The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. Cloud computing (cloud computing) is a computing model that distributes computing tasks over a pool of resources formed by a large number of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.
The terminal 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, smart voice interaction devices, smart appliances, vehicle terminals, and portable wearable devices. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The terminal 110 and the server 120 may be directly or indirectly connected through wired or wireless communication, and the application is not limited thereto.
In one embodiment, multiple servers may make up a blockchain, with the servers acting as nodes on the blockchain.
In one embodiment, data related to the operation data processing method may be stored in the block chain, such as, but not limited to, operation data, hash data, a predetermined index table, a target index, an object feature, a resource feature, target feature data, a prediction model, a resource to be processed, a resource click rate, and the like.
In one embodiment, the target resources and operational data may be obtained from a cloud social application. Cloud Social interaction (Cloud Social) is a virtual Social application mode of internet of things, Cloud computing and mobile internet interactive application, and aims to establish a famous resource sharing relationship map so as to develop network Social interaction. The main characteristic of cloud social interaction is that a large number of social resources are uniformly integrated and evaluated to form a resource effective pool to provide services for users according to needs.
It should be noted that, each target resource and operation data in the related embodiments may be acquired by the vehicle-mounted terminal in the form of the internet of vehicles, and then processed by the operation data processing method mentioned in each embodiment of the present application to obtain target feature data. The obtained target characteristic data is used for training a prediction model, and the trained prediction module can predict the resource click rate of the to-be-pushed or pushed to-be-processed resource (such as resources of advertisement, audio, video, music and the like) of the object in the scene of the internet of vehicles.
In an embodiment, as shown in fig. 2, an operation data processing method is provided, which is described by taking an example that the method is applied to a computer device (the computer device may specifically be a terminal or a server in fig. 1), and includes the following steps:
step S202, obtaining operation data obtained by operating each target resource; the operational data includes an object identification and an associated resource identification.
Wherein, the target resource can be at least one of various information and various articles. The information includes at least one of an application, text, an expression, a picture, audio, video, a file or a link, etc., but is not limited thereto. The various types of items may include physical items and virtual items. The solid articles include various solid products, specifically, various electronic products such as a mobile phone, a computer, a notebook, a watch, and the like, and also clothing products such as clothes, shoes, and the like, which are not limited herein.
Virtual items include, but are not limited to, insurance products, financial products, virtual gift resources, virtual scenes, virtual characters, virtual props, and the like. The virtual scene may be a game scene, a virtual reality simulation scene, and the like in the game device, the virtual character may be various characters in the game, and the virtual character object may be various characters in the game.
The operation data refers to operation data obtained by performing preset operation on the target resource through the object identifier. The object identification may be a user identification. The preset operation includes, but is not limited to, at least one of a touch operation, a voice operation, an operation performed through an input device such as a mouse, or a gesture operation, and may be any one of a click operation, a double-click operation, a long-press operation, a left-slide operation, or a right-slide operation, for example, without being limited thereto.
Specifically, the computer device obtains operation data obtained by operating each target resource by the object identifier. The operation data comprises an object identification and a resource identification associated with the object identification. The resource identifier associated with the object identifier refers to a resource identifier corresponding to a target resource operated by the object identifier.
In one embodiment, the operation data may include a plurality of object identifiers and a plurality of resource identifiers of the target resources, and the computer device may determine the resource identifiers associated with the object identifiers respectively. The plurality of fingers is at least two.
In one embodiment, the computer device may obtain a resource identifier and resource data obtained by operating the object identifier on each target resource, obtain object data corresponding to the object identifier, extract resource features from the resource data, and extract object features from the object data. The computer device can determine a resource identifier associated with the object identifier, and take the object identifier, the object feature, the resource identifier associated with the object identifier, and the resource feature of the resource identifier as the operation data corresponding to the object identifier.
Object data is data related to object identification and may include object attribute data and object behavior data. The object attribute data is attribute data related to the user object, and specifically may be at least one of information such as name, gender, age, and city where the user is located. The object behavior data is data related to the network behavior of the user object, and may specifically include historical click behavior of the user object, and the like.
The resource identification may be, but is not limited to, a title, a link, a thumbnail, a content summary, etc. of the target resource. The resource data is data related to the resource identifier, and may be a resource type, a source, a content summary, a total resource content, a resource key content, and the like corresponding to the resource identifier, but is not limited thereto.
In one embodiment, when multiple object identifiers exist in the operation data, the computer device may determine the same object identifier, and the object features corresponding to the same object identifier are the same, and then may reserve one object identifier and the corresponding object features.
Step S204, each object identifier is converted into corresponding hash data, and corresponding target indexes are searched in a preset index table through each hash data.
The hash data is an output obtained by converting an input of an arbitrary length into a fixed length by a hash algorithm, and the output is a hash value.
In a relational database, an index is a single, physical storage structure that sorts values of one or more columns in a database table, and an index is a collection of values of one or more columns in a table and a corresponding list of logical pointers to data pages in the table that physically identify the values. Each entry in the index table includes an attribute value and the address of the record having the attribute value. The target index is an index value corresponding to the hash data in a preset index table.
The preset index table is constructed in advance according to the candidate object identification, and the address of each processing unit is stored in the preset index table.
Specifically, the computer device may obtain a hash function, take each object identifier as an input of the hash function, and convert each object identifier into corresponding hash data through the hash function. The computer device may obtain the preset index table, and compare each hash data with each index in the preset index table, so as to find a target index corresponding to each hash data.
Step S206, the address stored in the corresponding position of each target index is determined, and each object identifier and the associated resource identifier are distributed to the processing unit at the corresponding address.
The processing unit, i.e. Slot, is configured to receive the object identifier and the resource identifier, and perform a series of processing on the object identifier and the resource identifier, for example, obtain an object feature corresponding to the object identifier and a resource feature corresponding to the resource identifier from the cache space, and splice the object identifier, the corresponding object feature, and the resource feature corresponding to the resource identifier associated with the object identifier into target feature data.
Specifically, after determining the target index corresponding to each hash data in the preset index table, the computer device determines an address stored at a position corresponding to each target index, where the stored address is an address of the processing unit. The computer device distributes each object identifier and the associated resource identifier to the processing units at the corresponding addresses. For example, if the address stored in the position of the target index corresponding to the hash data 1 obtained by converting the object identifier 1 is a, and the address stored in the position of the target index corresponding to the hash data 2 obtained by converting the object identifier 2 is B, the object identifier 1 and the associated resource identifier are distributed to the processing unit at the address a, and the object identifier 2 and the associated resource identifier are distributed to the processing unit at the address B.
In step S208, the corresponding object feature is acquired based on the received object identifier by each processing unit, and the corresponding resource feature is acquired based on the received resource identifier.
Specifically, for each processing unit, after receiving the object identifier and the associated resource identifier, the object feature corresponding to the object identifier may be obtained through the object identifier, and the resource feature corresponding to the resource identifier may be obtained through the resource identifier.
In one embodiment, after receiving the object identifier and the associated resource identifier, each processing unit may obtain, through the object identifier, an object feature corresponding to the object identifier from a preset storage space, and obtain, through the resource identifier, a resource feature corresponding to the resource identifier from the preset storage space.
In one embodiment, the predetermined storage space may be a cache space.
Step S210, splicing the object characteristics of each object identifier with the resource characteristics of the resource identifier associated with the object identifier through each processing unit to obtain target characteristic data; the target characteristic data is used for training a prediction model, and the prediction model is used for predicting the resource click rate of the resource to be processed.
The target feature data comprises an object identifier, a corresponding object feature and a resource feature corresponding to a resource identifier associated with the object identifier.
Specifically, each processing unit may perform a splicing process on the object identifier, the object feature of the object identifier, and the resource feature corresponding to the resource identifier associated with the object identifier, to obtain target feature data. According to the same processing, the computer equipment can splice each object identification, the corresponding object characteristic and the corresponding resource characteristic into target characteristic data through each processing unit.
In one embodiment, the processing unit may obtain corresponding object features according to the object identifier received in the target time period, obtain corresponding resource features according to the resource identifier received in the target time period, and splice the object features of each object identifier with the resource features of the resource identifier associated with the object identifier to obtain the target feature data. The target time period may be 20 minutes, 30 minutes, etc., but is not limited thereto.
In one embodiment, the computer device may send each target feature data to the data set and batch write to the distributed file system in a specific format required by the preset model. The distributed file system can be an HDFS, and the HDFS is a high fault-tolerant and extensible distributed file system and is an important component of a Hadoop system.
The target characteristic data can be used for performing real-time online training on the prediction model, updating the preset model, and predicting the resource click rate of the resource to be processed through the trained prediction model or the updated prediction model. The resource click rate refers to the probability of the target object clicking the resource to be processed.
In the operation data processing method, operation data obtained by operating each target resource is obtained, the operation data comprises object identifiers and associated resource identifiers, each object identifier is converted into corresponding hash data, and the corresponding target index can be accurately found in the preset index table through each hash data. And determining addresses stored in positions corresponding to the target indexes, and distributing the object identifications and the associated resource identifications to the processing units at the corresponding addresses, so that uniform distribution of data is realized, load balance of the processing units is realized, and balanced distribution of processing resources is realized. And the distributed data is the object identification and the associated resource identification, so that the distributed data is small in quantity and high in transmission speed, and the data distribution efficiency can be effectively improved. The processing units acquire corresponding object characteristics based on the received object identifiers and acquire corresponding resource characteristics based on the received resource identifiers, so that the object characteristics of the object identifiers and the resource characteristics of the resource identifiers associated with the object identifiers can be spliced respectively through the processing units to obtain target characteristic data, the processing efficiency of the operation data can be effectively improved through the processing units, data response can be timely carried out, and timeliness is better. The prediction model can be trained in real time through the target characteristic data, so that the resource click rate of the resource to be processed can be accurately predicted by the prediction model.
In one embodiment, as shown in fig. 3, the obtaining operation data obtained by operating on each target resource includes:
step S302, acquiring operation data obtained by operating each target resource every preset time.
Specifically, the computer device may obtain operation data obtained by operating each target resource at a preset time interval. The operation data comprises an object identification and a corresponding object characteristic, and a resource identification associated with the object identification and a resource characteristic corresponding to the resource identification. For example, operational data is acquired every 30 minutes.
In one embodiment, the operational data includes object identification, object data, resource identification, and resource data from which the computer device may extract object features and from which resource features may be extracted.
In one embodiment, the computer device may set a timer, and count time by the timer, and when the preset time length is reached, the operation data within the preset time length is acquired. It is understood that when the preset time is reached, the timer may enter the next round of timing and accumulate the operation data in the next round of timing.
The method further comprises the following steps:
step S304, acquiring the object identification and the object characteristics from the operation data, and storing the object identification and the corresponding object characteristics into a cache space in an associated manner.
Specifically, the cache space is used for temporarily storing data, and after the computer device obtains the object identifiers and the object features from the operation data, each object identifier and the corresponding object feature are stored in the cache space in an associated manner according to a preset format. For example, the cache space may be MapState, and the preset format may be < object id, object feature > in the map.
In an embodiment, when a plurality of identical object identifiers exist in the operation data, and object features corresponding to the identical object identifiers are identical, it is sufficient that one object identifier and the corresponding object feature are stored in the cache space in an associated manner according to a preset format.
Step S306, acquiring the resource identification and the resource characteristic respectively corresponding to each target resource from the operation data, and storing each resource identification and the corresponding resource characteristic into a cache space in an associated manner.
Specifically, the computer device may determine a resource identifier corresponding to each target resource, and after obtaining the resource feature corresponding to each resource identifier from the operation data, store the resource identifier and the corresponding resource feature in the cache space in association according to a preset format. For example, the cache space may be MapState, and the preset format may be < resource identifier, resource feature > in the map.
In this embodiment, operation data obtained by operating each target resource is acquired every preset time length, so as to perform batch processing on the operation data within the preset time length. The object identification and the object characteristic are obtained from the operation data and stored in the cache space in an associated mode, the resource identification and the resource characteristic corresponding to each target resource are obtained from the operation data, each resource identification and the corresponding resource characteristic are stored in the cache space in an associated mode, the object identification, the object characteristic, the resource identification and the resource characteristic can be temporarily stored in the cache space, when the data are distributed to different processing units subsequently, the object identification and the resource identification are only needed to be distributed, the distributed data volume is reduced, transmission resources are saved, and the data distribution efficiency can be effectively improved.
In one embodiment, converting each object identifier into corresponding hash data, and searching for a corresponding target index in a preset index table through each hash data respectively includes:
converting each object identifier into corresponding hash data through a hash function, and determining a conversion value corresponding to each object identifier according to the hash data and the length of a preset index table; and respectively searching corresponding target indexes in a preset index table through the conversion values.
Specifically, the computer device obtains a hash function, and takes each object identifier as an input of the hash function, so as to convert each object identifier into corresponding hash data through the hash function. The computer device determines the length of a preset index table, where the length of the preset index table is the number of indexes existing in the preset index table or the number of positions included in the preset index table.
And the computer equipment calculates a conversion value corresponding to the object identifier according to the hash data corresponding to the object identifier and the length of a preset index table. According to the same processing mode, the conversion value corresponding to each object identification can be obtained. Further, the hash data is a hash value.
The computer device searches index values respectively identical to the conversion values in a preset index table to serve as target indexes corresponding to the conversion values. According to the same processing mode, the target index corresponding to each conversion value can be obtained.
In one embodiment, the computer device performs remainder processing on the hash data corresponding to the object identifier and the length of the preset index table to obtain a conversion value corresponding to the object identifier. Further, a remainder obtained by dividing the hash data corresponding to the object identifier by the length of the preset index table is used as a conversion value corresponding to the object identifier.
In this embodiment, each object identifier is converted into corresponding hash data through a hash function, so as to determine a conversion value corresponding to each object identifier according to the length of each hash data and the length of the preset index table, and thus, the corresponding target index is accurately and quickly searched in the preset index table through each conversion value, and thus, which processing unit each object identifier needs to be distributed to for processing can be determined, and uniform distribution of the object identifiers is realized.
In one embodiment, as shown in fig. 4, the preset index table is obtained through a construction step, and the construction step includes:
step S402, constructing a null index table, and converting each candidate object identifier into corresponding candidate hash data respectively.
Specifically, the computer device may construct a null index table with a preset length, where the null index table includes a plurality of locations for storing addresses of the processing units, and the number of the locations is the preset length of the null index table. For example, if the empty index table includes 10 positions, the length of the empty index table is 10.
The computer device can obtain each candidate object identifier and the hash function, and take each candidate object identifier as the input of the hash function respectively, so as to convert each candidate object identifier into corresponding candidate hash data respectively through the hash function.
In one embodiment, the computer device may determine the number of candidate object identifiers and construct a null index table of a preset length according to the number of candidate object identifiers. The number of candidate identifications is the same as the number of locations in the empty index table.
Step S404, determining indexes of all positions in the empty index table according to all candidate hash data.
Specifically, the computer device sets indexes corresponding to the positions in the empty index table according to the candidate hash data. Further, the computer device may use each candidate hash data as an index corresponding to each position in the empty index table.
In one embodiment, the computer device calculates a candidate conversion value corresponding to the candidate object identifier according to the candidate hash data corresponding to the candidate object identifier and the length of the empty index table. According to the same processing mode, the candidate conversion value corresponding to each candidate object identifier can be obtained. Further, the candidate hash data is a candidate hash value. The computer device indexes each candidate conversion value as a position in the empty index table, it being understood that a single candidate conversion value is an index of a single position.
In one embodiment, the computer device performs remainder processing on the candidate hash data corresponding to the candidate object identifier and the length of the empty index table to obtain a candidate conversion value corresponding to the candidate object identifier. Further, a remainder obtained by dividing the candidate hash data corresponding to the candidate object identifier by the length of the empty index table is used as a candidate conversion value corresponding to the candidate object identifier. And allocating each candidate conversion value to the position of the empty index table as the index of the corresponding position.
For example, the computer device may determine the index of each location in the empty index table by the following formula:
index=hash(id)%M
wherein, index is an index, id is an object identifier, hash (id) is hash data obtained by converting the object identifier, and M is the length of the empty index table.
Step S406, filling the positions corresponding to the indexes in the empty index table with the addresses of the processing units to obtain a preset index table.
Specifically, the computer device obtains an address of each processing unit, and fills the address of each processing unit to each position in the empty index table based on each index in the empty index table to obtain a preset index table.
In one embodiment, the address of the processing unit is the unit index of the processing unit. And the computer equipment fills the unit indexes of the processing units to all positions in the empty index table based on all indexes in the empty index table to obtain a preset index table.
In this embodiment, an empty index table is constructed, each candidate object identifier is converted into corresponding candidate hash data, and an index of each position in the empty index table is determined according to each candidate hash data, so that the candidate object identifier can be mapped to each position in the empty index table in a hash data form. The addresses of the processing units are filled in the positions corresponding to the indexes in the empty index table through the addresses of the processing units, so that the addresses of the processing units are stored in the empty index table, the corresponding processing units can be quickly searched from the preset index table based on the object identifiers in subsequent operation data processing, the data of the object identifiers are distributed to the corresponding processing units for processing, and uniform distribution of the data is realized.
In one embodiment, the obtaining the preset index table by filling the address of each processing unit with the position corresponding to each index in the empty index table includes:
generating a random value in a list of each processing unit through two random hash functions and the unit identification of each processing unit; the number of random values in the list of each processing unit is the same as the number of positions in the empty index table; and filling the address of each processing unit to each position in the empty index table based on the random value in the list of each processing unit to obtain a preset index table.
In particular, the computer device may build a corresponding list for each processing unit, one list for each processing unit.
The computer device may obtain two unrelated random hash functions and determine a unit identification for each processing unit. The unit identification may be a unit name. The computer device generates a random value in the list of processing units by means of two random hash functions and the unit identification of the processing unit. In the same processing manner, random values in the list of each processing unit may be generated separately.
The number of random values in the list of processing units is the same as the number of locations in the empty index table. For example, if the empty index table contains 10 locations, the list of each processing unit contains 10 random values.
In one embodiment, the two random hash functions may be different hash functions under md 5.
And the computer equipment fills the address of the corresponding processing unit to the corresponding position of the empty index table according to the random value in each list and the index corresponding to each position in the empty index table, and obtains the preset index table after filling.
In this embodiment, the random value in the list of each processing unit is generated through two random hash functions and the unit identifier of each processing unit, and the two independent and unrelated hash functions can reduce the collision frequency of the generated random values, improve the randomness of data in the list, and ensure that the random values in the list of each processing unit are uniformly distributed. In addition, the addresses of the processing units can be uniformly filled to the positions in the empty index table based on random values uniformly distributed in the list of the processing units, so that the preset index table is obtained.
In one embodiment, as shown in fig. 5, generating the random value in the list of each processing unit by two random hash functions and the unit identification of each processing unit includes:
step S502, based on the first hash function, the unit identifier of each processing unit and the length of the empty index table, determining the offset corresponding to each processing unit.
Specifically, the computer device may obtain the first hash function, a unit identification of each processing unit, and determine a length of the empty index table. The computer device can input the unit identifier of the processing unit into the first hash function to perform hash conversion processing, and determine the offset corresponding to the processing unit according to the numerical value obtained by the hash conversion processing and the length of the empty index table. According to the same processing mode, the offset corresponding to each processing unit can be reached.
Further, the computer device may input the unit identifier of the processing unit into a first hash function to perform hash conversion processing, divide a numeric value obtained by the hash conversion processing by the length of the empty index table to obtain a remainder, and use the remainder as an offset corresponding to the processing unit.
For example, the computer device may calculate the offset corresponding to the processing unit by the following formula:
offset=hash_1(slot_i)%M
the offset is an offset, slot _ i is a unit identifier of the ith processing unit slot, hash _1 is a first hash function, and M is the length of the empty index table.
Step S504, based on the second hash function, the unit identification of each processing unit and the length of the empty index table, determining the jump amount corresponding to each processing unit; the second hash function is different from the first hash function.
In particular, the second hash function is different from the first hash function. The computer device may obtain the second hash function, a unit identification for each processing unit, and determine a length of the empty index table. The computer device can input the unit identifier of the processing unit into a second hash function to perform hash conversion processing, and determine the jump amount corresponding to the processing unit according to the numerical value obtained by the hash conversion processing and the length of the empty index table. According to the same processing mode, the jump amount corresponding to each processing unit can be reached.
Further, the computer device may input the unit identifier of the processing unit into a second hash function to perform hash conversion processing, determine a difference between the length of the empty index table and a preset numerical value, divide the numerical value obtained by the hash conversion processing and the difference to obtain a remainder, and take the sum of the remainder and the preset numerical value as the jump amount corresponding to the processing unit. For example, the preset value may be 1.
For example, the computer device may calculate the amount of jump corresponding to the processing unit by the following formula:
skip=hash_2(slot_i)%(M-1)+1
wherein, the skip is the jump amount, the slot _ i is the unit identifier of the ith processing unit slot, the hash _2 is the second hash function, and M is the length of the empty index table.
In one embodiment, the first hash function and the second hash function may be two different hash functions under the same type, for example, may be different hash functions under md 5. The first hash function and the second hash function are independent of each other.
In step S506, a random value in the list of each processing unit is determined according to the offset and the jump amount corresponding to each processing unit.
Specifically, the computer device generates each random value in a list of the processing unit according to the offset and the jump amount corresponding to the processing unit, wherein the number of the random values in the list is the same as the number of the positions in the empty index table. In the same way, the respective random values in the list for each processing unit are obtained.
For example, the computer device may generate each random value in the list of processing units by the following formula:
SlotList_i[j]=(offset+j*skip)%M
wherein, the Slot List _ i [ j ] is the jth random value in the List of the ith processing unit Slot, the offset is the offset of the ith processing unit Slot, the skip is the skip amount of the ith processing unit Slot, and M is the length of the empty index table.
In this embodiment, the offset corresponding to each processing unit is determined based on the first hash function, the unit identifier of each processing unit, and the length of the empty index table, and the jump amount corresponding to each processing unit is determined based on the second hash function, the unit identifier of each processing unit, and the length of the empty index table. The second hash function is different from the first hash function, the randomness of the generated offset and the generated jump amount can be ensured, and the random value in the list of each processing unit is determined according to the offset and the jump amount corresponding to each processing unit, so that the collision frequency of the generated random values is reduced, the randomness of data in each list is improved, and the random values in the list of each processing unit are uniformly distributed.
In one embodiment, as shown in fig. 6, based on the random value in the list of each processing unit, filling the address of each processing unit to each position in the empty index table, to obtain the preset index table, including:
step S602 selects a target random value from the list of each processing unit, and searches the empty index table for a target position corresponding to the same index as the target random value.
Specifically, the computer device selects a target list from the lists after generating random values in the list for each processing unit, and selects a target random value from the respective random values in the target list.
In one embodiment, the computer device may select a first list from the lists as a target list and a first random value from the random values in the target list as a target random value.
The computer device looks up the index which is the same as the target random value in the empty index table, and determines the target position corresponding to the index which is the same as the target random value.
Step S602, when the target position is not filled with the address, filling the address of the processing unit to which the target random value belongs in the list to the target position.
In particular, the computer device may determine whether the address of the processing unit has been populated at the target location. And when the target position is not filled with the address, acquiring the address of the processing unit to which the list where the target random value is located belongs, and filling the acquired address to the target position.
Step S602, selecting a next target random value from a next list of the list where the target random value is located, and returning to the step of searching for a target position corresponding to an index that is the same as the target random value in the empty index table and continuing to execute until a filling stop condition is satisfied, thereby obtaining a preset index table.
The condition that the filling stop is met means that the filling of the empty index table is completed, that is, each position in the empty index table completes the filling of the corresponding processing unit address.
Specifically, after filling the address of the processing unit to which the list in which the target random value is located belongs to the target position, the computer device may use the next list adjacent to the list in which the target random value is located as the target list. Then, the computer device searches the target random value from the random values in the target list, and returns to the step of searching the target position corresponding to the index which is the same as the target random value in the empty index table and continues to execute, so as to fill the address of the corresponding processing unit in the target position of each unfilled address. After the addresses of the corresponding processing units are filled in the target positions each time, whether the filling stop conditions are met or not can be determined, and the method stops when the filling stop conditions are met to obtain the preset index table.
In one embodiment, after the address of the corresponding processing unit is filled in the target position each time, it is determined whether there is a position of an unfilled address in the current index table, if there is no unfilled address, the next target random value is selected, and the step of searching for the target position corresponding to the index that is the same as the target random value in the empty index table is returned and the execution is continued. If the position of the unfilled address does not exist in the current index table, the position indicates that the filling stop condition is met, and the filling of the current index table is completed to obtain the preset index table.
In this embodiment, a target random value is selected from the list of each processing unit, and a target position corresponding to an index that is the same as the target random value is searched in an empty index table, so that the random value of the processing unit is used as a condition for data filling of the position in the index table. And when the target position is not filled with the address, filling the address of the processing unit to which the list where the target random value is located belongs to the target position, selecting a next target random value from the next list of the list where the target random value is located, returning to the step of searching the target position corresponding to the index which is the same as the target random value in the empty index table, and continuing to execute the steps until the filling stop condition is met, so that the addresses of the processing units can be uniformly filled into the index table based on the random values of the processing units, and the condition that only the address of a single processing unit is filled in a single position in the index table is ensured. The preset index table is subsequently used for processing the operation data, so that the data can be effectively and uniformly distributed to different processing units for processing, the balanced distribution of processing resources is realized, the resource utilization rate is improved, and the real-time processing efficiency of mass operation data can be improved.
In one embodiment, the method further comprises:
and when the target position is filled with the address, selecting a next target random value from the list where the target random value is located, returning to the step of searching the target position corresponding to the index which is the same as the target random value in the empty index table, and continuing to execute the step until the filling stop condition is met, so as to obtain a preset index table.
In particular, the computer device may determine whether the address of the processing unit has been populated at the target location. When the target location has been filled with an address, the computer device selects the next target random value from the list in which the target random value is located. And after the next target random value is selected, returning to the step of searching the target position corresponding to the index which is the same as the target random value in the empty index table and continuing to execute so as to fill the address of the corresponding processing unit in the target position of each unfilled address. After the addresses of the corresponding processing units are filled in the target positions each time, whether the filling stop conditions are met or not can be determined, and the method stops when the filling stop conditions are met to obtain the preset index table.
In one embodiment, after the address of the corresponding processing unit is filled in the target position each time, it is determined whether there is a position of an unfilled address in the current index table, if there is no unfilled address, the next target random value is selected, and the step of searching for the target position corresponding to the index that is the same as the target random value in the empty index table is returned and the execution is continued. If the position of the unfilled address does not exist in the current index table, the position indicates that the filling stop condition is met, and the filling of the current index table is completed to obtain the preset index table.
In one embodiment, when the target location has been populated with an address, the computer device selects the next random value adjacent to the target random value from the list in which the target random value is located as the next target random value.
In this embodiment, when the target location is filled with the address, the next target random value is selected from the list where the target random value is located, and the step of looking up the target location corresponding to the index that is the same as the target random value in the empty index table is returned and is continuously executed, so that the addresses of the processing units can be uniformly filled into the index table based on the random values of the processing units, and it is ensured that only the address of a single processing unit is filled in a single location in the index table. The preset index table is subsequently used for processing the operation data, so that the data can be effectively and uniformly distributed to different processing units for processing, the balanced distribution of processing resources is realized, the resource utilization rate is improved, and the real-time processing efficiency of mass operation data can be improved.
For example, there are 4 processing units slot1, slot2, slot3, and slot1, and the filling of the empty index table is as follows:
the random value in list 1 corresponding to processing unit slot1 is [3, 0, 4, 1, 5 ];
the random value in list 2 corresponding to processing unit slot2 is [0, 2, 4, 6, 1 ];
the random value in list 3 corresponding to processing unit slot3 is [3, 4, 5, 6, 0 ];
(1) firstly, traversing the first random value 3 of the slot1, and filling the address of the slot1 (namely the index of the slot 1) into the position with the index of 3 (namely the index of 3) in the index table;
(2) then, traversing the first random value 0 of the slot2, and filling the address of the slot2 to the position of the index which is 0 in the index table;
(3) then go through the first random value 3 of slot3, because the first random value 3 of slot3 collides with index 3 in (1), i.e. the location with index 3 in the index table is already filled with the address, then go on to go through the second random value 4 of slot 3. Filling the address of slot3 to the position of index ═ 4 in the index table;
(4) then, the slot1 is traversed to the second random value 0, because the second random value 0 of the slot1 collides with the index of 0 in (1), that is, the position with the index of 0 in the index table is already filled with the address, then the traversal continues to the third random value 4 of the slot 1; and the third random value 4 of slot1 collides with index of (3) to be 4, that is, the position with index 4 in the index table is already filled with the address, then the process continues to traverse the fourth random value 1 of slot1, and the address of slot1 is filled to the position with index of 1 in the index table;
and repeating the steps until the filling of each position in the index table is completed, and obtaining the preset index table.
In one embodiment, acquiring, by each processing unit, a corresponding object feature based on the received object identifier and a corresponding resource feature based on the received resource identifier includes:
and acquiring the object characteristics corresponding to the received object identification from the buffer space through each processing unit, and acquiring the resource characteristics corresponding to the received resource identification from the buffer space.
Specifically, after each processing unit receives the object identifier and the associated resource identifier, each processing unit obtains the object feature corresponding to the received object identifier from the cache space, and obtains the resource feature corresponding to the received resource identifier from the cache space. For example, each processing unit receives < object id, resource id >, and each processing unit can look up < object id, object feature > and < resource id, resource feature > in the cache space.
In this embodiment, the object identifier and the corresponding object feature, the resource identifier and the corresponding resource feature are stored in the cache space, and are only distributed to different processing units by using the object identifier and the resource identifier, and the processing units acquire the corresponding feature from the cache space based on the received object identifier and the received resource identifier, so that the data amount of data distribution can be reduced, and the problems of data distribution errors, data loss and the like can be avoided. And each processing unit can accurately acquire corresponding characteristic data from the cache space through the object identification and the resource identification for processing, so that the accuracy of the data can be ensured. Moreover, the object characteristics and the resource characteristics required by the plurality of processing units are acquired, so that parallel data processing can be realized, and the data processing efficiency is improved.
Fig. 7 is a flowchart illustrating an operation of the data processing method according to an embodiment.
The computer equipment collects operation data under different systems through a Software Development Kit (SDK) embedded point, wherein the operation data comprises object identifications and object data corresponding to the object identifications, and resource identifications and resource data corresponding to the object identifications.
The computer equipment compresses repeated object identifications and corresponding object data in the operation data into one piece of data to compress data volume, packages and reports the compressed data and a plurality of associated resource identifications and resource data to a back-end server, and the back-end server sends the received data to Kafka. Kafka, Apache Kafka, is an open source stream processing platform that can process all the action stream data of a user in a website.
The computer device may extract object features from the object data and resource features from the resource data. And storing the object identification and the corresponding object characteristic to MapState according to the format of < object identification, object characteristic >, and storing the resource identification and the corresponding resource characteristic to MapState according to < resource identification, resource characteristic >.
And the computer equipment uniformly distributes the object identifications and the associated resource identifications to the corresponding processing units according to the object identifications, and the join operator of the processing units associates the corresponding object characteristics and resource characteristics according to the received object identifications and resource identifications to obtain target characteristic data. The Join operator refers to an association operator and refers to processing logic of a processing unit.
And the processing unit sends the target characteristic data to a data collector, and writes the target characteristic data into the distributed file system hdfs in batches according to a specific format required by the prediction model.
In one embodiment, the computer device may generate an object representation corresponding to each object identifier based on each target feature data.
In one embodiment, the method further comprises:
inputting the characteristic data of each target into a prediction model, and outputting the predicted click rate of each object identifier for the corresponding target resource; and adjusting parameters of the prediction model according to the difference between each predicted click rate and the corresponding expected click rate until the parameters stop when the training stop condition is reached, so as to obtain an updated target prediction model.
Specifically, the target feature data includes an object identifier, an object feature, and a resource feature, where the resource feature is a feature corresponding to the target resource, and the resource identifier of the target resource is associated with the object identifier. The computer device may input each target feature data into the prediction model such that the prediction model outputs a predicted click rate of each object identifier for the corresponding target resource based on each target feature data, respectively. The predicted click rate refers to the predicted probability that the object identifies the target resource for clicking. And the computer equipment acquires the expected click rate, determines the difference between each predicted click rate and the expected click rate, adjusts the parameters of the prediction model according to the differences and continues training until the training stopping condition is reached, and the trained target prediction model is obtained.
It can be understood that, according to the above training mode, the trained prediction model can be updated in real time to obtain an updated target prediction model.
In this embodiment, the training stop condition may be at least one of that the loss error of the prediction model is less than or equal to an error threshold, that the training iteration reaches a preset iteration number, that the training iteration time reaches a preset iteration time, and the like.
For example, the loss error generated by the prediction model in each training is calculated, the parameters of the prediction model are adjusted based on the difference between the loss error and the error threshold value, and the training is continued until the training is stopped, so that a trained target prediction model or an updated target prediction model is obtained.
For example, the terminal calculates the iteration times of the prediction model in the training process, and stops training when the iteration times of the training process reach the preset iteration times to obtain the trained target prediction model or the updated target prediction model.
In this embodiment, the target feature data includes object identifiers, object features, and resource features, and each target feature data is input into the prediction model and the predicted click rate of each object identifier for the corresponding target resource is output; and adjusting parameters of the prediction model according to the difference between each predicted click rate and the corresponding expected click rate until the parameters stop reaching the training stop condition to obtain an updated target prediction model, wherein the influence of various factors on the prediction model can be fully considered, so that the prediction precision of the prediction model can be improved through training. Through the trained target prediction model or the updated target prediction model, the resource click rate of each target object for the resources to be processed can be accurately estimated.
In one embodiment, the method further comprises:
when a prediction request is received, acquiring target object characteristics corresponding to a target object and to-be-processed resource characteristics corresponding to-be-processed resources from the prediction request; inputting the target object characteristics and the characteristics of the resources to be processed into a target prediction model to obtain the resource click rate output by the target prediction model; the resource click rate represents the probability of the target object clicking the resource to be processed.
The resource click rate represents the probability of the target object clicking the resource to be processed, and can be used for estimating the conversion rate of the resource to be processed. For example, for a certain user object, it is estimated whether the user object clicks the to-be-processed resource, or the probability that the user object clicks the to-be-processed resource, and whether the user object generates downloading, ordering, and the like.
The prediction request is a request for predicting the resource click rate of the resource to be processed. The prediction request comprises a target object, target object characteristics corresponding to the target object and to-be-processed resource characteristics corresponding to-be-processed resources.
Specifically, when the computer device receives a prediction request of the resource click rate, the target object feature corresponding to the target object and the to-be-processed resource feature corresponding to the to-be-processed resource are obtained from the prediction request. And the computer equipment inputs the target object characteristics and the characteristics of the resources to be processed into the target prediction model to obtain the probability of clicking the resources to be processed by the target object output by the target prediction model, namely the resource click rate.
In this embodiment, when a prediction request is received, a target object feature corresponding to a target object and a to-be-processed resource feature corresponding to a to-be-processed resource are obtained from the prediction request. And predicting based on the target object characteristics and the to-be-processed resource characteristics by using the target prediction model, so that the probability of clicking the to-be-processed resource by the target object can be quickly and accurately obtained.
In one embodiment, the resource to be processed is a resource to be promoted; the method further comprises the following steps: selecting target popularization resources with resource click rates meeting push conditions from the resources to be popularized; and pushing the target popularization resource to the terminal corresponding to the corresponding target object identifier.
Specifically, the resource to be processed is a resource to be promoted, the computer device can respectively predict the resource click rate corresponding to each resource to be promoted through the target prediction model, and the resource to be promoted, of which the resource click rate is greater than the click rate threshold value, is selected as the target promotion resource. And the computer equipment pushes the target popularization resource to a terminal corresponding to the corresponding target object.
In one embodiment, the computer device may respectively determine the resource click rate of each target object for the same resource to be promoted, compare the resource click rate with a click rate threshold, and determine the target number of the resource click rate greater than the click rate threshold. And when the target quantity is greater than the quantity threshold, taking the resources to be promoted corresponding to the target quantity as target promotion resources, and pushing the target promotion resources to a terminal corresponding to a target object with the resource click rate greater than the click rate threshold.
In this embodiment, the target popularization resource whose resource click rate satisfies the push condition is selected from the resources to be promoted, and the target popularization resource is pushed to the terminal corresponding to the corresponding user object, so that the target popularization resource can be screened out based on the estimated resource click rate, and the conversion rate of the screened target popularization resource is highest, thereby effectively improving the conversion rate of the popularization resource.
In one embodiment, the method further comprises: selecting a target object with a resource click rate meeting a pushing condition based on the resources to be promoted; and pushing the corresponding resource to be promoted to the terminal corresponding to the target object.
Specifically, the computer device can determine the resource click rate of each target object for the resource to be promoted through the target prediction model, and screen out the target objects with the resource click rate larger than the click rate threshold value. And the computer equipment pushes the corresponding resource to be promoted to the terminal corresponding to the screened target object.
In one embodiment, as shown in fig. 8, there is provided an operation data processing method including:
step S802, generating a preset index table:
(1) building an index table
Establishing an empty index table lookup map with the length of M, and mapping the empty index table lookup map to the position of the index table in a hash function remainder mode when a user ID is input to obtain the index of the position, wherein a specific calculation formula is as follows:
index=hash(ID)%M
(2) building a list of processing unit slots
And constructing a list SlotList with the length of M aiming at each processing unit slot, wherein the number of the processing unit slots is N. The length of the list is the same as the length of the index table, and is equal to M (i.e., the number of random values in the list is the same as the number of indexes in the index table, and is M). The number of lists is the same as the number of processing unit slots, and is equal to N.
Generation of random values in the ith list SlotList _ i:
taking two unrelated hash functions, namely hash _1 and hash _2, wherein the name of the ith processing unit slot is slot _ i, and respectively calculating an offset and a skip quantity skip corresponding to the processing unit slot by using the two hash functions as follows:
offset=hash_1(slot_i)%M;
skip=hash_2(slot_i)%(M-1)+1;
SlotList_i[j]=(offset+j*skip)%M
and two independent and irrelevant hash functions are used for reducing the collision times of the mapping result, so that the generated random value of each slot corresponding list is uniform and random.
(3) Filling index table
Initializing i, j, n: i is 0, j is 0, n is 0,
the number n of locations of the filled addresses in the index table lookup map is initialized to 0
Circularly filling the index table: while true do;
and traversing each processing unit slot, and for the ith slot: for each i < N do;
j represents the jth random value in the ith slot list, c-slot list i j
If the position corresponding to the index c in the index table is filled with the address of while LookMap [ c ] > (0 do);
j is self-increased by 1, and the search is continued: j is j +1
c=SlotList_i[j]
The loop ends until the number of unfilled address locations in the index table is 0: end while
Assigning the address of i to the c position of the index table, i.e. LookMap [ c ] ═ i
j=j+1
n=n+1
Ending the loop when all locations of the index table are filled with addresses n-M-then break
End for
End while
Step S804, data collection and reporting to a server:
and acquiring object identifications and object data corresponding to each object identification under different systems through a Software Development Kit (SDK) embedded point, wherein each object identification corresponds to a resource identification and resource data.
The computer device compresses the object id and corresponding object data repeated in the operation data into a piece of data so that the data amount can be compressed into the original 1/10. And packaging and reporting the compressed data and the associated multiple resource identifications and resource data as operation data to a back-end server, and sending the received data to the Kafka by the back-end server.
It can be understood that the object identifier in this embodiment is a user id, the object data is user data, and the object feature is a user feature; the resource identification is an article id, the resource data is article data, and the resource characteristic is an article characteristic.
Step S806, parsing the operation data:
flink consumes the operational data of Kafka and parses the raw data in each topic of Kafka. The operation data may include user IDs, user data corresponding to the user IDs, and article IDs and article data associated with the user IDs. Flink extracts user features from the user data and item features from the item data. Flink is a real-time processing framework for performing stateful computations on streaming data.
Step S808, distributing data according to the user ID:
therefore, the user data is compressed, and the user characteristics and the article characteristics in each reported data are in a one-to-many relationship, so that the Flink cannot take the user ID + the article ID as a key when the data is distributed, and cannot be randomly distributed to a downstream processing unit slot directly according to the hash value of the user ID. Because the magnitude of the user data stream is very large, and the processing unit is flexible, the capacity can be constantly expanded, so that the slot number changes, if the slot is randomly distributed to the downstream processing unit slot directly according to the hash value of the user ID, the overall data distribution logic can generate huge fluctuation when the slot number is increased or decreased, and the problem of uneven data distribution can be caused.
Therefore, in this embodiment, a user-defined data stream processing operator is implemented, and the main logic of a data stream join in the user-defined operator is:
and (3) retrieving the address of the corresponding processing unit slot from the preset index table through hash user ID redundancy, and then sending the user ID and the associated article ID to the slot for processing in a format of < user ID, article ID >. The user ID is distributed to the corresponding processing unit, so that a plurality of user characteristics and article characteristics corresponding to the same user ID can be distributed to the same processing unit slot, aggregation statistics of the user characteristics and the article characteristics can be performed by candidates, and screening of training samples of a subsequent prediction model is facilitated.
Step S810, storing the feature data:
flink stores the user ID and corresponding user profile in a format of < user ID, user profile > to MapState, and stores the item ID and corresponding item profile in a format of < item ID, item profile > to MapState.
In MapState, data is stored in MapState in the form of Array [ Byte ]. Specifically, data is stored in a MapState < Long, Array [ Byte ] > format according to the whole time stamp, and is deserialized only when feature association is finally realized, so that the serializing and deserializing overhead of reading features for multiple times is reduced, the load of a recovery process GC is reduced, and the throughput is increased.
Serialization (Serialization) is the process of converting the state information of an object into a form that can be stored or transmitted, and the object can be recreated by reading or deserializing the state of the object from storage.
Step S812, setting a timer to associate the user characteristic and the item characteristic:
the timer is registered in the user-defined operator, and the timer is stored on the built-in rocksdb by setting the Flink parameter, so that the situation that the excessive timers occupy the memory is avoided. And associating the user ID, the user characteristic and the article characteristic every 30 minutes by using a timer to obtain target characteristic data.
Specifically, each processing unit slot queries the < user ID, user characteristic data > in MapState according to the user ID to obtain the user characteristic, and searches the < article ID, article characteristic data > in MapState according to the article ID to obtain the article characteristic. And finally, splicing the user ID, the user side characteristic and the article side characteristic into a piece of target characteristic data < user ID, user characteristic and article characteristic >.
Step S814, writing the target feature data into a training data set:
and sending the target characteristic data to a data collector, and writing the target characteristic data into a directory of the distributed file system hdfs in batches according to a specific format required by the prediction model.
Step S816, training and updating the prediction model:
and (3) a prediction model of the downstream click rate, loading target characteristic data on hdfs as real-time training data, and performing on-line training and updating the prediction model in real time. When a prediction request comes, the prediction model predicts the click probability of the user on the corresponding article and transmits the click probability to the sorting module as a click rate factor, and the sorting module sorts all articles corresponding to the prediction request based on the sorting strategy and takes the article at the head as a result to return.
The operation data processing method provided in the embodiment can ensure the real-time performance of data processing, and simultaneously ensure the performance, throughput and resource utilization rate of real-time processing under billions of real-time data streams and TB-level data.
In one embodiment, an operation data processing method is provided, which is applied to a computer device and comprises the following steps:
constructing an empty index table, and respectively converting each candidate object identifier into corresponding candidate hash values; and determining the index of each position in the empty index table according to each candidate hash value and the length of the empty index table.
Then, determining the offset corresponding to each processing unit based on the first hash function, the unit identification of each processing unit and the length of the empty index table; determining the jump amount corresponding to each processing unit based on the second hash function, the unit identification of each processing unit and the length of the empty index table; the second hash function is different from the first hash function.
Further, according to the offset and the jump amount corresponding to each processing unit, determining a random value in the list of each processing unit; the number of random values in the list of each processing unit is the same as the number of locations in the empty index table.
Next, a target random value is selected from the list of each processing unit, and a target position corresponding to the same index as the target random value is searched for in the empty index table.
Optionally, when the target position is not filled with the address, filling the address of the processing unit to which the list where the target random value is located belongs to the target position; and selecting the next target random value from the next list of the list where the target random value is located, returning to the step of searching the target position corresponding to the index which is the same as the target random value in the empty index table, and continuing to execute the step.
Optionally, when the target position is filled with the address, selecting a next target random value from the list where the target random value is located, returning to the step of searching the target position corresponding to the index which is the same as the target random value in the empty index table and continuing to execute the step until the filling of the empty index table is completed, so as to obtain the preset index table.
Acquiring operation data obtained by operating each target resource every preset time length; the operational data includes an object identification and an associated resource identification.
Then, acquiring an object identifier and object characteristics from the operation data, and storing the object identifier and the corresponding object characteristics into a cache space in an associated manner; and acquiring resource identifications and resource characteristics respectively corresponding to the target resources from the operation data, and storing each resource identification and the corresponding resource characteristics in a cache space in an associated manner.
Then, respectively converting each object identification into corresponding hash data through a hash function, and determining a conversion value respectively corresponding to each object identification according to each hash data and the length of a preset index table; and respectively searching corresponding target indexes in a preset index table through the conversion values.
Further, the address stored in the corresponding position of each target index is determined, and each object identifier and the associated resource identifier are distributed to the processing unit at the corresponding address.
Then, the object features corresponding to the received object identifiers are obtained from the buffer space through each processing unit, and the resource features corresponding to the received resource identifiers are obtained from the buffer space.
Further, the object characteristics of each object identifier are spliced with the resource characteristics of the resource identifier associated with the object identifier through each processing unit, so that target characteristic data is obtained.
Optionally, inputting the characteristic data of each target into a prediction model, and outputting the predicted click rate of each object identifier for the corresponding target resource; and adjusting parameters of the prediction model according to the difference between each predicted click rate and the corresponding expected click rate until the parameters stop when the training stop condition is reached, so as to obtain an updated target prediction model.
Optionally, when a prediction request is received, the target object feature corresponding to the target object and the to-be-processed resource feature corresponding to the to-be-processed resource are obtained from the prediction request.
Secondly, inputting the target object characteristics and the to-be-processed resource characteristics into a target prediction model to obtain the resource click rate output by the target prediction model; the resource click rate represents the probability of the target object clicking the resource to be processed.
In this embodiment, an empty index table is constructed, each candidate object identifier is converted into corresponding candidate hash data, and an index of each position in the empty index table is determined according to each candidate hash data, so that the candidate object identifier can be mapped to each position in the empty index table in a hash data form.
The random value in the list of each processing unit is generated through two random hash functions and the unit identification of each processing unit, the collision times of the generated random values can be reduced by utilizing the two independent and unrelated hash functions, the randomness of data in the list is improved, and the uniform distribution of the random values in the list of each processing unit can be ensured. And the addresses of the processing units can be uniformly filled to the positions in the empty index table based on the random values uniformly distributed in the list of the processing units, so that a preset index table is obtained for subsequent operation data processing.
And acquiring operation data obtained by operating each target resource every preset time length so as to perform batch processing on the operation data in the preset time length. The object identification and the object characteristic are obtained from the operation data and stored in the cache space in an associated mode, the resource identification and the resource characteristic corresponding to each target resource are obtained from the operation data, each resource identification and the corresponding resource characteristic are stored in the cache space in an associated mode, the object identification, the object characteristic, the resource identification and the resource characteristic can be temporarily stored in the cache space, when the data are distributed to different processing units subsequently, the object identification and the resource identification are only needed to be distributed, the distributed data quantity is reduced, and transmission resources are saved.
The object identifications are respectively converted into corresponding hash data through a hash function, and conversion values corresponding to the object identifications are determined according to the hash data and the length of a preset index table, so that corresponding target indexes can be accurately and quickly searched in the preset index table through the conversion values, and the processing unit to which the object identifications need to be distributed can be determined for processing, and the object identifications and the resource identifications are uniformly distributed. And load balancing of the processing units and balanced allocation of processing resources can be achieved.
And the distributed data is the object identification and the associated resource identification, so that the distributed data is small in quantity and high in transmission speed, and the data distribution efficiency can be effectively improved. The processing units acquire corresponding object characteristics based on the received object identifiers and acquire corresponding resource characteristics based on the received resource identifiers, so that the object characteristics of the object identifiers and the resource characteristics of the resource identifiers associated with the object identifiers can be spliced respectively through the processing units to obtain target characteristic data, the processing efficiency of the operation data can be effectively improved through the processing units, data response can be timely carried out, and timeliness is better.
The prediction model can be trained and updated in real time through the target characteristic data, so that the resource click rate of the resource to be processed can be accurately predicted by the prediction model.
It is understood that the object information and related data involved in the embodiments are all information and data collected after the object is authorized or fully authorized by each party. Object information includes, but is not limited to, object device information, object personal information, such as object identification, object data, object characteristics, and the like; relevant data includes, but is not limited to, data for presentation, analyzed data, and the like, such as target resources, resource identifications, and resource characteristics, and the like. In addition, the object can choose not to authorize the object information and the related data, and can also reject or conveniently reject the push information and the like.
It should be understood that although the various steps in the flowcharts of fig. 2-8 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-8 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 9, an operating data processing apparatus 900 is provided, which may be a part of a computer device using software modules or hardware modules, or a combination of the two, and specifically includes: an identity acquisition module 902, a conversion module 904, a distribution module 906, a feature acquisition module 908, and a concatenation module 910, wherein:
an identifier obtaining module 902, configured to obtain operation data obtained by operating each target resource; the operational data includes an object identification and an associated resource identification.
The converting module 904 is configured to convert each object identifier into corresponding hash data, and search a preset index table for a corresponding target index through each hash data.
A distributing module 906, configured to determine addresses stored in positions corresponding to the target indexes, and distribute the object identifiers and the associated resource identifiers to the processing units at the corresponding addresses.
A feature obtaining module 908, configured to obtain, by each processing unit, a corresponding object feature based on the received object identifier, and obtain a corresponding resource feature based on the received resource identifier.
The splicing module 910 is configured to splice, by using each processing unit, the object characteristics of each object identifier with the resource characteristics of the resource identifier associated with the object identifier to obtain target characteristic data; the target characteristic data is used for training a prediction model, and the prediction model is used for predicting the resource click rate of the resource to be processed.
In this embodiment, operation data obtained by operating each target resource is obtained, where the operation data includes an object identifier and an associated resource identifier, each object identifier is converted into corresponding hash data, and a corresponding target index can be accurately found in a preset index table through each hash data. And determining addresses stored in positions corresponding to the target indexes, and distributing the object identifications and the associated resource identifications to the processing units at the corresponding addresses, so that uniform distribution of data is realized, load balance of the processing units is realized, and balanced distribution of processing resources is realized. And the distributed data is the object identification and the associated resource identification, so that the distributed data is small in quantity and high in transmission speed, and the data distribution efficiency can be effectively improved. The processing units acquire corresponding object characteristics based on the received object identifiers and acquire corresponding resource characteristics based on the received resource identifiers, so that the object characteristics of the object identifiers and the resource characteristics of the resource identifiers associated with the object identifiers can be spliced respectively through the processing units to obtain target characteristic data, the processing efficiency of the operation data can be effectively improved through the processing units, data response can be timely carried out, and timeliness is better. The prediction model can be trained in real time through the target characteristic data, so that the resource click rate of the resource to be processed can be accurately predicted by the prediction model.
In an embodiment, the identifier obtaining module 902 is further configured to obtain operation data obtained by operating each target resource every preset time period;
the device also comprises a cache module; the cache module is used for acquiring the object identification and the object characteristics from the operation data and storing the object identification and the corresponding object characteristics into a cache space in an associated manner; and acquiring resource identifications and resource characteristics respectively corresponding to the target resources from the operation data, and storing each resource identification and the corresponding resource characteristics in a cache space in an associated manner.
In this embodiment, operation data obtained by operating each target resource is acquired every preset time length, so as to perform batch processing on the operation data within the preset time length. The object identification and the object characteristic are obtained from the operation data and stored in the cache space in an associated mode, the resource identification and the resource characteristic corresponding to each target resource are obtained from the operation data, each resource identification and the corresponding resource characteristic are stored in the cache space in an associated mode, the object identification, the object characteristic, the resource identification and the resource characteristic can be temporarily stored in the cache space, when the data are distributed to different processing units subsequently, the object identification and the resource identification are only needed to be distributed, the distributed data volume is reduced, transmission resources are saved, and the data distribution efficiency can be effectively improved.
In an embodiment, the converting module 904 is further configured to convert each object identifier into corresponding hash data through a hash function, and determine a conversion value corresponding to each object identifier according to each hash data and a length of the preset index table; and respectively searching corresponding target indexes in a preset index table through the conversion values.
In this embodiment, each object identifier is converted into corresponding hash data through a hash function, so as to determine a conversion value corresponding to each object identifier according to the length of each hash data and the length of the preset index table, and thus, the corresponding target index is accurately and quickly searched in the preset index table through each conversion value, and thus, which processing unit each object identifier needs to be distributed to for processing can be determined, and uniform distribution of the object identifiers is realized.
In one embodiment, the apparatus further comprises a building module; the building module is used for building an empty index table and converting each candidate object identifier into corresponding candidate hash data respectively; determining indexes of all positions in the empty index table according to all candidate hash data; and filling the positions corresponding to the indexes in the empty index table by the addresses of the processing units to obtain a preset index table.
In this embodiment, an empty index table is constructed, each candidate object identifier is converted into corresponding candidate hash data, and an index of each position in the empty index table is determined according to each candidate hash data, so that the candidate object identifier can be mapped to each position in the empty index table in a hash data form. The addresses of the processing units are filled in the positions corresponding to the indexes in the empty index table through the addresses of the processing units, so that the addresses of the processing units are stored in the empty index table, the corresponding processing units can be quickly searched from the preset index table based on the object identifiers in subsequent operation data processing, the data of the object identifiers are distributed to the corresponding processing units for processing, and uniform distribution of the data is realized.
In one embodiment, the building module is further configured to generate a random value in the list of each processing unit through two random hash functions and the unit identifier of each processing unit; the number of random values in the list of each processing unit is the same as the number of positions in the empty index table; and filling the address of each processing unit to each position in the empty index table based on the random value in the list of each processing unit to obtain a preset index table.
In this embodiment, the random value in the list of each processing unit is generated through two random hash functions and the unit identifier of each processing unit, and the two independent and unrelated hash functions can reduce the collision frequency of the generated random values, improve the randomness of data in the list, and ensure that the random values in the list of each processing unit are uniformly distributed. In addition, the addresses of the processing units can be uniformly filled to the positions in the empty index table based on random values uniformly distributed in the list of the processing units, so that the preset index table is obtained.
In one embodiment, the building module is further configured to determine, based on the first hash function, the unit identifier of each processing unit, and the length of the empty index table, an offset corresponding to each processing unit; determining the jump amount corresponding to each processing unit based on the second hash function, the unit identification of each processing unit and the length of the empty index table; the second hash function is different from the first hash function; and determining a random value in the list of each processing unit according to the offset and the jump amount corresponding to each processing unit.
In this embodiment, the offset corresponding to each processing unit is determined based on the first hash function, the unit identifier of each processing unit, and the length of the empty index table, and the jump amount corresponding to each processing unit is determined based on the second hash function, the unit identifier of each processing unit, and the length of the empty index table. The second hash function is different from the first hash function, the randomness of the generated offset and the generated jump amount can be ensured, and the random value in the list of each processing unit is determined according to the offset and the jump amount corresponding to each processing unit, so that the collision frequency of the generated random values is reduced, the randomness of data in each list is improved, and the random values in the list of each processing unit are uniformly distributed.
In one embodiment, the building module is further configured to select a target random value from the list of each processing unit, and search a null index table for a target position corresponding to an index that is the same as the target random value; when the target position is not filled with the address, filling the address of the processing unit to which the target random value belongs in the list to the target position; and selecting the next target random value from the next list of the list where the target random value is located, returning to the step of searching the target position corresponding to the index which is the same as the target random value in the empty index table, and continuing to execute the step until the filling stop condition is met, so as to obtain a preset index table.
In this embodiment, a target random value is selected from the list of each processing unit, and a target position corresponding to an index that is the same as the target random value is searched in an empty index table, so that the random value of the processing unit is used as a condition for data filling of the position in the index table. And when the target position is not filled with the address, filling the address of the processing unit to which the list where the target random value is located belongs to the target position, selecting a next target random value from the next list of the list where the target random value is located, returning to the step of searching the target position corresponding to the index which is the same as the target random value in the empty index table, and continuing to execute the steps until the filling stop condition is met, so that the addresses of the processing units can be uniformly filled into the index table based on the random values of the processing units, and the condition that only the address of a single processing unit is filled in a single position in the index table is ensured. The preset index table is subsequently used for processing the operation data, so that the data can be effectively and uniformly distributed to different processing units for processing, the balanced distribution of processing resources is realized, the resource utilization rate is improved, and the real-time processing efficiency of mass operation data can be improved.
In an embodiment, the building module is further configured to, when the target location has been filled with the address, select a next target random value from the list where the target random value is located, return to the step of searching for the target location corresponding to the index that is the same as the target random value in the empty index table, and continue execution until the filling stop condition is satisfied, to obtain the preset index table.
In this embodiment, when the target location is filled with the address, the next target random value is selected from the list where the target random value is located, and the step of looking up the target location corresponding to the index that is the same as the target random value in the empty index table is returned and is continuously executed, so that the addresses of the processing units can be uniformly filled into the index table based on the random values of the processing units, and it is ensured that only the address of a single processing unit is filled in a single location in the index table. The preset index table is subsequently used for processing the operation data, so that the data can be effectively and uniformly distributed to different processing units for processing, the balanced distribution of processing resources is realized, the resource utilization rate is improved, and the real-time processing efficiency of mass operation data can be improved.
In an embodiment, the feature obtaining module 908 is further configured to obtain, by each processing unit, an object feature corresponding to the received object identifier from the cache space, and obtain, from the cache space, a resource feature corresponding to the received resource identifier.
In this embodiment, the object identifier and the corresponding object feature, the resource identifier and the corresponding resource feature are stored in the cache space, and are only distributed to different processing units by using the object identifier and the resource identifier, and the processing units acquire the corresponding feature from the cache space based on the received object identifier and the received resource identifier, so that the data amount of data distribution can be reduced, and the problems of data distribution errors, data loss and the like can be avoided. And each processing unit can accurately acquire corresponding characteristic data from the cache space through the object identification and the resource identification for processing, so that the accuracy of the data can be ensured. Moreover, the object characteristics and the resource characteristics required by the plurality of processing units are acquired, so that parallel data processing can be realized, and the data processing efficiency is improved.
In one embodiment, the apparatus further comprises a training module; the training module is used for inputting the characteristic data of each target into the prediction model and outputting the predicted click rate of each object identifier for the corresponding target resource; and adjusting parameters of the click rate prediction model according to the difference between each predicted click rate and the corresponding expected click rate until the parameters stop when the training stop condition is reached, and obtaining an updated target prediction model.
In this embodiment, the target feature data includes object identifiers, object features, and resource features, and each target feature data is input into the prediction model and the predicted click rate of each object identifier for the corresponding target resource is output; and adjusting parameters of the prediction model according to the difference between each predicted click rate and the corresponding expected click rate until the parameters stop reaching the training stop condition to obtain an updated target prediction model, wherein the influence of various factors on the prediction model can be fully considered, so that the prediction precision of the prediction model can be improved through training. Through the trained target prediction model or the updated target prediction model, the resource click rate of each target object for the resources to be processed can be accurately estimated.
In one embodiment, the apparatus further comprises a prediction module; the prediction module is used for acquiring target object characteristics corresponding to a target object and to-be-processed resource characteristics corresponding to-be-processed resources from the prediction request when the prediction request is received; inputting the target object characteristics and the characteristics of the resources to be processed into a target prediction model to obtain the resource click rate output by the target prediction model; the resource click rate represents the probability of the target object clicking the resource to be processed.
In this embodiment, when a prediction request is received, a target object feature corresponding to a target object and a to-be-processed resource feature corresponding to a to-be-processed resource are obtained from the prediction request. And predicting based on the target object characteristics and the to-be-processed resource characteristics by using the target prediction model, so that the probability of clicking the to-be-processed resource by the target object can be quickly and accurately obtained.
In one embodiment, the resource to be processed is a resource to be promoted; the device also comprises a pushing module; the pushing module is used for selecting target popularization resources with resource click rates meeting pushing conditions from the resources to be popularized; and pushing the target popularization resource to the terminal corresponding to the corresponding target object identifier.
In one embodiment, the pushing module is further configured to select a target object, of which the resource click rate satisfies the pushing condition, based on the resource to be promoted; and pushing the corresponding resource to be promoted to the terminal corresponding to the target object.
For specific limitations of the operation data processing apparatus, reference may be made to the above limitations of the operation data processing method, which are not described herein again. The respective modules in the above-described operation data processing apparatus may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server or a terminal. In this embodiment, taking a server as an example, an internal structure diagram of the server may be as shown in fig. 10. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing operation data processing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an operational data processing method.
Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (15)

1. A method of operational data processing, the method comprising:
acquiring operation data obtained by operating each target resource; the operation data comprises an object identifier and an associated resource identifier;
respectively converting each object identifier into corresponding hash data, and respectively searching corresponding target indexes in a preset index table through each hash data;
determining addresses stored in positions corresponding to the target indexes, and distributing the object identifiers and the associated resource identifiers to processing units at corresponding addresses;
acquiring, by each processing unit, a corresponding object feature based on the received object identifier and a corresponding resource feature based on the received resource identifier;
splicing the object characteristics of the object identifiers with the resource characteristics of the resource identifiers associated with the object identifiers through the processing units to obtain target characteristic data; the target characteristic data is used for training a prediction model, and the prediction model is used for predicting the resource click rate of the resource to be processed.
2. The method according to claim 1, wherein the obtaining operation data obtained by operating on each target resource comprises:
acquiring operation data obtained by operating each target resource every preset time length;
the method further comprises the following steps:
acquiring an object identifier and object characteristics from the operation data, and storing the object identifier and the corresponding object characteristics into a cache space in an associated manner;
and acquiring resource identification and resource characteristics respectively corresponding to each target resource from the operation data, and storing each resource identification and corresponding resource characteristics into a cache space in an associated manner.
3. The method according to claim 1, wherein the converting each object id into corresponding hash data, and searching a corresponding target index in a preset index table through each hash data respectively comprises:
converting each object identifier into corresponding hash data through a hash function, and determining a conversion value corresponding to each object identifier according to the hash data and the length of a preset index table;
and respectively searching corresponding target indexes in the preset index table through the conversion values.
4. The method according to claim 1, wherein the predetermined index table is obtained by a constructing step, and the constructing step comprises:
constructing an empty index table, and respectively converting each candidate object identifier into corresponding candidate hash data;
determining indexes of all positions in the empty index table according to all the candidate hash data;
and filling positions corresponding to the indexes in the empty index table through the addresses of the processing units to obtain a preset index table.
5. The method according to claim 4, wherein the filling, by the address of each processing unit, the position corresponding to each index in the empty index table to obtain a preset index table includes:
generating a random value in a list of each processing unit through two random hash functions and unit identifiers of each processing unit; the number of random values in the list of each processing unit is the same as the number of positions in the empty index table;
and filling the address of each processing unit to each position in the empty index table based on the random value in the list of each processing unit to obtain a preset index table.
6. The method of claim 5, wherein generating the random value in the list of each processing unit by two random hash functions and the unit identifier of each processing unit comprises:
determining the offset corresponding to each processing unit respectively based on the first hash function, the unit identification of each processing unit and the length of a null index table;
determining jump amount respectively corresponding to each processing unit based on a second hash function, unit identification of each processing unit and length of the empty index table; the second hash function is different from the first hash function;
and determining a random value in the list of each processing unit according to the offset and the jumping amount corresponding to each processing unit.
7. The method of claim 5, wherein the populating an address of each processing unit to each location in the empty index table based on the random value in the list of each processing unit to obtain a preset index table, comprises:
selecting a target random value from the list of each processing unit, and searching a target position corresponding to an index which is the same as the target random value in the empty index table;
when the target position is not filled with the address, filling the address of the processing unit to which the list where the target random value is located belongs to the target position;
and selecting a next target random value from a next list of the list where the target random value is located, returning to the step of searching the target position corresponding to the index which is the same as the target random value in the empty index table, and continuing to execute the step until a filling stop condition is met, so as to obtain a preset index table.
8. The method of claim 7, further comprising:
and when the target position is filled with an address, selecting a next target random value from the list of the target random values, returning to the step of searching the target position corresponding to the index which is the same as the target random value in the empty index table, and continuing to execute the step until the filling stop condition is met, so as to obtain a preset index table.
9. The method of claim 1, wherein the obtaining, by each of the processing units, a corresponding object feature based on the received object identifier and a corresponding resource feature based on the received resource identifier comprises:
and acquiring the object characteristics corresponding to the received object identification from the cache space through each processing unit, and acquiring the resource characteristics corresponding to the received resource identification from the cache space.
10. The method according to any one of claims 1 to 9, further comprising:
inputting the target characteristic data into a prediction model, and outputting the predicted click rate of each object identifier for the corresponding target resource;
and adjusting parameters of the prediction model according to the difference between each predicted click rate and the corresponding expected click rate until the parameters stop when the training stop condition is reached, so as to obtain an updated target prediction model.
11. The method of claim 10, further comprising:
when a prediction request is received, acquiring target object characteristics corresponding to a target object and to-be-processed resource characteristics corresponding to-be-processed resources from the prediction request;
inputting the target object characteristics and the to-be-processed resource characteristics into the target prediction model to obtain the resource click rate output by the target prediction model; and the resource click rate represents the probability of the target object clicking the resource to be processed.
12. An operational data processing apparatus, characterized in that the apparatus comprises:
the identification acquisition module is used for acquiring operation data obtained by operating each target resource; the operation data comprises an object identifier and an associated resource identifier;
the conversion module is used for respectively converting each object identifier into corresponding hash data and respectively searching corresponding target indexes in a preset index table through each hash data;
the distribution module is used for determining the address stored in the position corresponding to each target index and distributing each object identifier and the associated resource identifier to the processing unit at the corresponding address;
the characteristic acquisition module is used for acquiring corresponding object characteristics based on the received object identification through each processing unit and acquiring corresponding resource characteristics based on the received resource identification;
the splicing module is used for splicing the object characteristics of the object identifiers with the resource characteristics of the resource identifiers associated with the object identifiers through the processing units to obtain target characteristic data; the target characteristic data is used for training a prediction model, and the prediction model is used for predicting the resource click rate of the resource to be processed.
13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 11 when executing the computer program.
14. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 11.
15. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 11 when executed by a processor.
CN202111296917.7A 2021-11-02 2021-11-02 Operation data processing method and device, computer equipment and storage medium Pending CN114327857A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111296917.7A CN114327857A (en) 2021-11-02 2021-11-02 Operation data processing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111296917.7A CN114327857A (en) 2021-11-02 2021-11-02 Operation data processing method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114327857A true CN114327857A (en) 2022-04-12

Family

ID=81044517

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111296917.7A Pending CN114327857A (en) 2021-11-02 2021-11-02 Operation data processing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114327857A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580794A (en) * 2022-05-05 2022-06-03 腾讯科技(深圳)有限公司 Data processing method, apparatus, program product, computer device and medium
CN115150448A (en) * 2022-06-14 2022-10-04 北京车网科技发展有限公司 Session data processing method, system, storage medium and electronic device
CN117235078A (en) * 2023-11-15 2023-12-15 湖南速子文化科技有限公司 Method, system, device and storage medium for processing mass data at high speed

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580794A (en) * 2022-05-05 2022-06-03 腾讯科技(深圳)有限公司 Data processing method, apparatus, program product, computer device and medium
CN114580794B (en) * 2022-05-05 2022-07-22 腾讯科技(深圳)有限公司 Data processing method, apparatus, program product, computer device and medium
CN115150448A (en) * 2022-06-14 2022-10-04 北京车网科技发展有限公司 Session data processing method, system, storage medium and electronic device
CN115150448B (en) * 2022-06-14 2023-08-25 北京车网科技发展有限公司 Session data processing method, system, storage medium and electronic equipment
CN117235078A (en) * 2023-11-15 2023-12-15 湖南速子文化科技有限公司 Method, system, device and storage medium for processing mass data at high speed
CN117235078B (en) * 2023-11-15 2024-01-30 湖南速子文化科技有限公司 Method, system, device and storage medium for processing mass data at high speed

Similar Documents

Publication Publication Date Title
CN111709533B (en) Distributed training method and device of machine learning model and computer equipment
Liu et al. A task scheduling algorithm based on classification mining in fog computing environment
CN114327857A (en) Operation data processing method and device, computer equipment and storage medium
Zhao et al. User-based collaborative-filtering recommendation algorithms on hadoop
CN107391502B (en) Time interval data query method and device and index construction method and device
CN110909182A (en) Multimedia resource searching method and device, computer equipment and storage medium
CN111355816B (en) Server selection method, device, equipment and distributed service system
CN112800095A (en) Data processing method, device, equipment and storage medium
CN110347651A (en) Method of data synchronization, device, equipment and storage medium based on cloud storage
CN113687964B (en) Data processing method, device, electronic equipment, storage medium and program product
CN111078723A (en) Data processing method and device for block chain browser
CN116414559A (en) Method for modeling and distributing unified computing power identification, storage medium and electronic equipment
CN112445776B (en) Presto-based dynamic barrel dividing method, system, equipment and readable storage medium
CN107844536B (en) Method, device and system for selecting application program
CN115687810A (en) Webpage searching method and device and related equipment
CN112231481A (en) Website classification method and device, computer equipment and storage medium
CN113971455A (en) Distributed model training method and device, storage medium and computer equipment
CN104636474A (en) Method and equipment for establishment of audio fingerprint database and method and equipment for retrieval of audio fingerprints
CN111369007B (en) Method and device for online artificial intelligent model
CN113010775B (en) Information recommendation method and device and computer equipment
CN115129981A (en) Information recommendation method, device, equipment and storage medium
CN114817344A (en) Data acquisition method and device
Cao Design and Implementation of Human‐Computer Interaction System in Parallel Digital Library System Based on Neural Network
CN110334067A (en) A kind of sparse matrix compression method, device, equipment and storage medium
CN113055476B (en) Cluster type service system, method, medium and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40071029

Country of ref document: HK