CN116776239A - Data processing method, device, electronic equipment, product and medium - Google Patents

Data processing method, device, electronic equipment, product and medium Download PDF

Info

Publication number
CN116776239A
CN116776239A CN202210228214.9A CN202210228214A CN116776239A CN 116776239 A CN116776239 A CN 116776239A CN 202210228214 A CN202210228214 A CN 202210228214A CN 116776239 A CN116776239 A CN 116776239A
Authority
CN
China
Prior art keywords
resource
resources
feature
classification
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210228214.9A
Other languages
Chinese (zh)
Inventor
刘刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210228214.9A priority Critical patent/CN116776239A/en
Publication of CN116776239A publication Critical patent/CN116776239A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a data processing method, a data processing device, electronic equipment, products and media, which can be applied to the technical field of data processing. The method comprises the following steps: acquiring a resource sample pair constructed based on the access behavior of the object set to the resource set; the similarity of resource content is provided between the first resource and the second resource contained by the resource sample pair, and M objects which have access behaviors to the first resource and do not have access behaviors to the second resource exist in the object set; acquiring a reference resource with access behaviors of a target object in a resource set; training a feature generation network based on the reference resource and the resource sample to obtain a trained feature generation network; the trained feature generation network is used for generating resource classification features of the resources. By adopting the embodiment of the application, the resources can be classified more accurately. The embodiment of the application can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like, and can also be used for accurately recommending resources.

Description

Data processing method, device, electronic equipment, product and medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method, a data processing device, an electronic device, a product, and a medium.
Background
Recommendations for resources in applications are often referred to in various applications, where classification of resources is particularly important in resource recommendations.
At present, the classification of the resources is usually performed based on the content of the resources, but the inventor finds that in practice, after the resources classified into the same class based on the content of the resources are recommended, the access behaviors of users may have large differences, which means that the classification mode has lower accuracy of classifying the resources.
Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing device, electronic equipment, products and media, which are beneficial to more accurately classifying resources.
In one aspect, an embodiment of the present application discloses a data processing method, which includes:
acquiring a resource sample pair constructed based on the access behavior of the object set to the resource set; the resource set comprises K resources, K is a positive integer, a pair of resource samples comprises first resources and second resources in the resource set, the first resources and the second resources have similarity of resource content, the object set comprises N objects, N is a positive integer, M objects in the N objects have access behaviors for the first resources and do not have access behaviors for the second resources, M is greater than or equal to a threshold number, and M is a positive integer and less than or equal to N;
Determining a target object from M objects, and acquiring a reference resource with access behaviors of the target object in a resource set; the first moment when the target object accesses the reference resource is adjacent to the second moment when the target object accesses the first resource;
training a feature generation network based on the reference resource and the resource sample to obtain a trained feature generation network; the trained feature generation network is used for generating resource classification features of the resources.
In one aspect, an embodiment of the present application discloses a data processing apparatus, including:
the acquisition unit is used for acquiring resource sample pairs constructed based on the access behaviors of the object set to the resource set; the resource set comprises K resources, K is a positive integer, a pair of resource samples comprises first resources and second resources in the resource set, the first resources and the second resources have similarity of resource content, the object set comprises N objects, N is a positive integer, M objects in the N objects have access behaviors for the first resources and do not have access behaviors for the second resources, M is greater than or equal to a threshold number, and M is a positive integer and less than or equal to N;
the processing unit is used for determining a target object from M objects and acquiring reference resources with access behaviors of the target object in a resource set; the first moment when the target object accesses the reference resource is adjacent to the second moment when the target object accesses the first resource;
The processing unit is also used for generating a network for the training characteristics based on the reference resources and the resource samples, and obtaining a trained characteristic generation network; the trained feature generation network is used for generating resource classification features of the resources.
In one aspect, an embodiment of the present application provides an electronic device, including a processor, and a memory, where the memory is configured to store a computer program, the computer program including program instructions, and the processor is configured to perform the steps of:
acquiring a resource sample pair constructed based on the access behavior of the object set to the resource set; the resource set comprises K resources, K is a positive integer, a pair of resource samples comprises first resources and second resources in the resource set, the first resources and the second resources have similarity of resource content, the object set comprises N objects, N is a positive integer, M objects in the N objects have access behaviors for the first resources and do not have access behaviors for the second resources, M is greater than or equal to a threshold number, and M is a positive integer and less than or equal to N;
determining a target object from M objects, and acquiring a reference resource with access behaviors of the target object in a resource set; the first moment when the target object accesses the reference resource is adjacent to the second moment when the target object accesses the first resource;
Training a feature generation network based on the reference resource and the resource sample to obtain a trained feature generation network; the trained feature generation network is used for generating resource classification features of the resources.
In one aspect, embodiments of the present application provide a computer readable storage medium having stored therein computer program instructions which, when executed by a processor, are adapted to perform the steps of:
acquiring a resource sample pair constructed based on the access behavior of the object set to the resource set; the resource set comprises K resources, K is a positive integer, a pair of resource samples comprises first resources and second resources in the resource set, the first resources and the second resources have similarity of resource content, the object set comprises N objects, N is a positive integer, M objects in the N objects have access behaviors for the first resources and do not have access behaviors for the second resources, M is greater than or equal to a threshold number, and M is a positive integer and less than or equal to N;
determining a target object from M objects, and acquiring a reference resource with access behaviors of the target object in a resource set; the first moment when the target object accesses the reference resource is adjacent to the second moment when the target object accesses the first resource;
Training a feature generation network based on the reference resource and the resource sample to obtain a trained feature generation network; the trained feature generation network is used for generating resource classification features of the resources.
In one aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the methods provided in the various alternatives of the above aspect and the like.
According to the embodiment of the application, the resource sample pair constructed based on the access behaviors of the object set to the resource set can be obtained, the resources in the resource sample pair have the similarity of resource content, the objects with the number greater than or equal to the threshold have the access behaviors to the first resource and do not have the access behaviors to the second resource, the reference resource can be obtained, and the network is generated based on the reference resource and the resource sample pair, so that the resource classification characteristic of the network generated resource is generated according to the trained characteristic, and the classification of the resource is realized. Therefore, the network can be generated by combining the access behavior training characteristics of the object set for the resource set, and in the training process, the influence on the classification of the resources caused by the similarity of the resource content is avoided, so that the accuracy of the resource classification characteristics of the resources generated by the characteristic generating network obtained by training is improved, and the classification of the resources is facilitated more accurately through the resource classification characteristics of the resources.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a data processing system according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of the effect of analysis of access behavior of resources according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of a data processing method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a feature generation network according to an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating the effect of a training process of a feature generation network according to an embodiment of the present application;
FIG. 7 is a schematic flow chart of a data processing method according to an embodiment of the present application;
FIG. 8 is a schematic view of an effect of a resource classification set according to an embodiment of the present application;
FIG. 9 is a schematic diagram of an effect of a data processing system according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
The embodiment of the application provides a resource classification scheme, which can acquire a resource sample pair constructed based on the access behavior of an object set to a resource set, wherein the resources in the resource sample pair have the similarity of resource content, objects with the number greater than or equal to a threshold have the access behavior to a first resource and do not have the access behavior to a second resource, and can acquire reference resources, so that a network is generated based on the reference resources and the resource sample pair training characteristics, the resource classification characteristics of the network generated resources are generated according to the trained characteristics, and the classification of the resources is realized. Therefore, the network can be trained by combining the access behaviors of the object set to the resource set to improve the accuracy of the resource classification features of the resources generated by the feature generation network obtained through training, and the method is beneficial to more accurately classifying the resources.
In one possible implementation, the embodiment of the present application may be applied to a data processing system, and referring to fig. 1, fig. 1 is a schematic structural diagram of a data processing system provided by the embodiment of the present application. The data processing system may include clients and servers. The server can be a server associated with application, can be configured with the resource classification scheme to train to obtain the characteristic generation network, and can generate the resource classification characteristic of the network generation resource according to the trained characteristic so as to realize the classification of the resource. The application may be a variety of video playback applications, information-based applications, and the like, without limitation. The client can be a client corresponding to the application, and a user can access each resource in the application based on the client, for example, the user plays a video in the video application based on the video playing application, and the user views graphic and text information in the information application based on the information application. It can be understood that after classifying the resources, the server can determine the resources recommended for the users corresponding to the clients according to the historic access resources of the users corresponding to the clients and the classifications to which the resources belong, and push the determined recommended resources to the clients corresponding to the users for display.
It should be noted that, in the present application, before collecting relevant data of a user and during collecting relevant data of a user (such as access behavior of a user, etc.), a prompt interface or a popup window may be displayed, where the prompt interface or the popup window is used to prompt the user to collect relevant data currently, so that the present application only starts to execute a relevant step of obtaining relevant data of the user after obtaining a confirmation operation sent by the user to the prompt interface or the popup window, otherwise (i.e. when no confirmation operation sent by the user to the prompt interface or the popup window is obtained), ends a relevant step of obtaining relevant data of the user, i.e. no relevant data of the user is obtained. In other words, all user data collected by the present application is collected with the consent and authorization of the user, and the collection, use and processing of relevant user data requires compliance with relevant laws and regulations and standards of the relevant country and region.
The technical scheme of the application can be applied to the electronic equipment, such as a server of the data processing system. The electronic device may be a terminal, a server, or other devices for classifying resources, which is not limited by the present application. Optionally, the server may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platforms. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. Terminals include, but are not limited to, user terminals including, but not limited to, cell phones, computers, intelligent voice interaction devices, intelligent home appliances, vehicle terminals, aircraft, and the like. The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent transportation, auxiliary driving and the like.
In one possible implementation manner, the embodiment of the application can be applied to the technical field of Big data, wherein Big data (Big data) refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth-rate and diversified information asset which needs a new processing mode to have stronger decision-making force, insight discovery force and flow optimization capability. With the advent of the cloud age, big data has attracted more and more attention, and special techniques are required for big data to effectively process a large amount of data within a tolerant elapsed time. Technologies applicable to big data include massively parallel processing databases, data mining, distributed file systems, distributed databases, cloud computing platforms, the internet, and scalable storage systems.
In one possible implementation, the embodiment of the application can be applied to the technical field of blockchain, for example, the resource classification characteristics of each resource can be stored in the blockchain. Blockchains are novel application modes of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer. The blockchain underlying platform may include processing modules for user management, basic services, smart contracts, and operational management. The user management module is responsible for identity information management of all blockchain participants, including maintenance of public and private key generation (account management), key management, maintenance of corresponding relation between the real identity of the user and the blockchain address (authority management) and the like, and under the condition of authorization, supervision and audit of transaction conditions of certain real identities, and provision of rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node devices, is used for verifying the validity of a service request, recording the service request on a storage after the effective request is identified, for a new service request, the basic service firstly analyzes interface adaptation and authenticates the interface adaptation, encrypts service information (identification management) through an identification algorithm, and transmits the encrypted service information to a shared account book (network communication) in a complete and consistent manner, and records and stores the service information; the intelligent contract module is responsible for registering and issuing contracts, triggering contracts and executing contracts, a developer can define contract logic through a certain programming language, issue the contract logic to a blockchain (contract registering), invoke keys or other event triggering execution according to the logic of contract clauses to complete the contract logic, and simultaneously provide a function of registering contract upgrading; the operation management module is mainly responsible for deployment in the product release process, modification of configuration, contract setting, cloud adaptation and visual output of real-time states in product operation, for example: alarms, managing network conditions, managing node device health status, etc. The platform product service layer provides basic capabilities and implementation frameworks of typical applications, and developers can complete the blockchain implementation of business logic based on the basic capabilities and the characteristics of the superposition business. The application service layer provides the application service based on the block chain scheme to the business participants for use.
In one possible implementation, the present application also relates to artificial intelligence related techniques. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The application mainly relates to machine learning in artificial intelligence. Machine Learning (ML) is a multi-domain interdisciplinary, and relates to multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc., and it is specially studied how a computer simulates or implements Learning behavior of a human being to obtain new knowledge or skill, and reorganizes the existing knowledge structure to continuously improve its own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
The machine learning related to the application mainly refers to how to train to obtain a feature generation network, and the resource can be classified based on the trained feature generation network in the follow-up process.
It can be understood that the above scenario is merely an example, and does not constitute a limitation on the application scenario of the technical solution provided by the embodiment of the present application, and the technical solution of the present application may also be applied to other scenarios. For example, as one of ordinary skill in the art can know, with the evolution of the system architecture and the appearance of new service scenarios, the technical solution provided by the embodiment of the present application is also applicable to similar technical problems.
Based on the above description, the embodiment of the application provides a data processing method, which is applied to the above electronic equipment. Referring to fig. 2, fig. 2 is a flow chart of a data processing method according to an embodiment of the application. The data processing method may include the following steps.
S201, acquiring a resource sample pair constructed based on the access behavior of the object set to the resource set.
The resource set comprises K resources, wherein K is a positive integer. That is, the set of resources includes at least one resource. The set of resources may be a set of all or part of the resources associated with the same application. For example, the resource may be a video resource released by each user in a video playing application (also may be simply referred to as a video, such as a short video released by a user, a television show, etc.); as another example, the resource may be a graphics resource or the like released by each user in an information class application, which is not limited herein.
In one possible implementation, the set of resources may be a set of resources of a portion selected from all resources associated with the application according to the resource release time. The resource release time may be a point in time when the resource is released in the application. For example, the resource of the resource set may be a resource that is released in the application within 3 months before the current time, which may be the time point when the resource set is obtained. Therefore, timeliness of the acquired resources in the resource set can be improved, influence of each object on access behaviors of the resources caused by change of popular trends of the resources is reduced, and accuracy of a subsequent feature generation network obtained through training according to the resource samples is further improved, so that accuracy of resource classification features generated by the network according to the trained features is improved.
The object set contains N objects, N being a positive integer. That is, the object set includes at least one object. The set of objects may be a collection of all or part of the objects (also referred to as users) in the application associated with the set of resources described above. In one possible implementation, the set of objects may be a set of resources of the portion that is screened out according to the access time corresponding to the access behavior of the object. The access time corresponding to the access behavior of the object may be a time point when the object accesses the resource. For example, the set of objects may be a set of objects that have access to any of the set of resources for the first 3 months of the current time.
The access behavior of the object set to the resource set may be the access behavior of each object in the object set to any resource in the resource set. It should be noted that, before the access behavior of the object set for the resource set is obtained, a prompt interface or popup window may be displayed to prompt the user that the access behavior of the user is about to be obtained, after the consent and authorization of the user are obtained, the access behavior of the user is obtained, and the relevant national and regional laws and regulations and standards are complied with in the processes of obtaining, using and processing the access behavior of the user. The access behavior may be a clicking behavior of an object on a resource, for example, in a video playing application, a user clicks on a video so as to display a playing page of the video, and the behavior of the user clicking on the video may be referred to as an access behavior for the video; the access behavior can also be a behavior of browsing a certain time in a resource display page of a resource, for example, in a video playing application, after a user clicks a video, displaying a playing page of the video on an electronic device, if the user browses more than 3s in the video playing page, the access behavior of the behavior of browsing more than 3s in the video playing page to the video can be determined, so that the situation that the user clicks the resource to enter the resource display video due to the false touch of the user can be reduced, the accuracy of the determination of the access behavior of the user is improved, and the accuracy of the generation of the resource sample pair is further improved.
The resource sample pair comprises a first resource and a second resource in the resource set, and the first resource and the second resource have similarity of resource content. The first resource may be any resource in the resource set, and the second resource is a resource other than the first resource. The resource content refers to specific content described by the resource itself, such as a theme, an object, a scene, an event or a behavior in video content, for example, a predation process in which the resource content of a certain video resource is tiger. The resource content may be characterized by associated text information and/or associated image information of the resource, which may be title text, profile text, content tags, etc. of the resource. The associated image information may be cover images of the asset, images in the asset (e.g., any video frame in a video asset, images in a graphics asset), etc. The first resource and the second resource have similarity of resource content, that is, the associated text information and/or the associated image information of the first resource and the associated text information and/or the associated image information of the second resource have similarity. The associated text information and/or the associated image information is understood herein to be either the associated text information alone, or the associated image information alone, or a combination of both the associated text information and the associated image information, without limitation.
In one possible implementation, if the similarity between the first resource and the second resource has the similarity of the resource content, the similarity between the resource content (i.e., the associated text information and/or the associated image information of the resource) of the first resource and the second resource is greater than or equal to the similarity threshold, and vice versa. In one possible implementation manner, the similarity of the resource contents between the first resource and the second resource indicates that the first resource and the second resource belong to the same resource content classification set, and the resource classification set to which the first resource and the second resource belong may be one resource classification set in which the resource set is divided into a plurality of resource content classification sets according to the resource content of each resource in the resource set, and each resource content classification set includes at least one resource.
M objects in N objects contained in the object set have access behaviors for the first resource and do not have access behaviors for the second resource, M is larger than or equal to a threshold number, and M is a positive integer and smaller than or equal to N. The threshold number may be understood as the minimum value of M that can determine two resources as a pair of resource samples. It can be appreciated that, since M is greater than or equal to the threshold number, and an object set containing greater than or equal to the threshold number has access behaviors for the first resource and does not have access behaviors for the second resource, it indicates that the first resource and the second resource have a large difference in access behavior of most users, which may indicate that there is a difference between the resource styles of the first resource and the second resource. The method is characterized in that the first resource and the second resource in the resource sample pair have differences in resource styles in the practical application scene by the difference between the access behaviors of the M objects for the first resource and the second resource (namely, the first resource and the second resource have access behaviors for the first resource and the second resource do not have access behaviors for the second resource). In other words, the application characterizes the difference of the styles of the resources among the resources by the difference of the access behaviors of the users to the resources, and if the access behaviors of most of the users to the first resource and the access behaviors to the second resource are greatly different (for example, only the first resource is accessed but the second resource is not accessed), the actual styles of the first resource and the second resource can be considered to be greatly different. In an abstract sense, it is understood that the present application abstracts the expression form, definition, aesthetic degree, content key style, and so on of the resource into a resource style comprehensively and abstractly, except for the content itself of the resource.
In general, whether a user accesses a resource (i.e., the user's performance with respect to the resource) may be affected by factors other than the resource content itself, such as the expression form, clarity, aesthetic appearance, style of content key, etc., of the resource (i.e., the style of the resource), in addition to the content itself. Referring to fig. 3, fig. 3 is a schematic diagram illustrating an effect of analysis of resource access behavior according to an embodiment of the present application. As shown in (1) of FIG. 3, it can be seen that the resource has two major factors, namely resource content and resource style, and the access behavior of the object is determined by the object's interest in the resource content and its tendency toward the resource style. As shown in (2) in fig. 3, the title of the resource a is "tiger in animal world", the title of the resource B is "life of large cat", the contents of the two resources are mainly used for describing life of tiger, if a plurality of users only access the resource a but do not access the resource B, the reason that the access behaviors of the users to the two resources are different is not the resource content but the resource style, the style of the resource a is serious, and the style of the resource B is more pleasant.
For example, for two videos (i.e., two videos having similarity of resource content) that are very similar in existing content features, i.e., title, cover image, and label, there is still a large difference in access performance of the user (i.e., access behavior of the user) in the case of having large exposure, as shown in the following table 1:
video content title Recommendation result performance
How two cats communicate Low and low
Lovely cat enthusiasm communication High height
TABLE 1
Wherein the recommendation result representation indicates the amount of users clicking on the video, the recommendation result representation low indicates fewer users accessing the video, and the recommendation result representation high indicates more users accessing the video. It can be seen that the content of the two videos is that the two cats are told to communicate, the title of which is how the two cats communicate is biased to the science popularization, the style of the video is serious, the corner of the style of the video, which is titled of which is lovely cat enthusiasm communication, is relaxed and warm, the two videos express similar content, but the style of the resources is different, and a larger difference exists in the manifestation of the access behaviors of users, so that the resource styles of the two resources can be characterized to have differences.
It can be understood that, since the two resources in the resource sample pair have similarity of resource content, but still have a larger difference in the performance of the user, it can be determined that the difference in the performance of the user with respect to the unused resources in the resource sample pair is mainly due to the difference between factors (i.e., resource styles) other than the similarity of the resource content, so that the feature information (i.e., the feature information of the resource styles) other than the resource content can be implied by the resource classification feature generated by the feature generation network obtained by training based on the resource sample pair, which is helpful for improving the accuracy of the resource classification.
It will be appreciated that, since the first resource is any resource in the resource sample pair, and not specifically a resource, there is an access behaviour towards the first resource and no access behaviour towards the second resource, that is, there is an access behaviour towards one of the resources in the resource sample pair and no access behaviour towards the other resource, then the one of the resources is the first resource, and the other resource is the second resource.
S202, determining a target object from M objects, and acquiring a reference resource with access behaviors of the target object in a resource set.
The target object may be any object of M objects. The reference resource may be one or more of the resources for which the target object has access behavior.
The first time when the target object accesses the reference resource is adjacent to the second time when the target object accesses the first resource. In one possible implementation, the first time when the reference resource is accessed and the second time when the first resource is accessed are adjacent, and the access time difference between the first time and the second time is less than a time difference threshold. The time difference threshold may be a preset maximum value of a time difference between a first time instant and a second time instant required to determine the resource as a reference resource. For example, when the time for the target object to access the first resource is t1 (i.e., the second time=t1), and the time difference threshold is a, the resource accessed by the target object between t1-a and t1+a may be determined as the reference resource, so that the first time for accessing the reference resource and the second time for accessing the first resource are adjacent. In one possible implementation, the first time of accessing the reference resource and the second time of accessing the first resource are adjacent, and may be represented as a preset number of resources closest to the second time of accessing the first resource. The preset number of resources may be a preset number of reference resources. For example, the number of preset resources is 4, the order of accessing the target object is { resource a, resource B, resource C, resource D, resource E, resource F, resource G }, and if the first resource is resource D, the reference resource may be determined as the reference resource for the 4 resources closest to the time of accessing the first resource (i.e., the second resource), i.e., { resource B, resource C, resource E, resource F }. In one possible implementation, the reference resource may be determined by combining the two conditions, that is, the access time difference between the first time and the second time of accessing the reference resource is less than the time difference threshold, and the reference resource is a resource within a preset number of resources closest to the second time of accessing the first resource.
It will be appreciated that the reference resource, i.e. the anchor target in the subsequent training of the feature generation network, trains the feature generation network with the reference resource as the training target. The anchoring of the reference resource implies a weak assumption that, since the reference resource is adjacent to the access time of the first resource, it may be equivalent to that the resource accessed by the target object (i.e., the reference resource) has similarity with the first resource in terms of expression, definition, beauty, style of content key, and the like, except for the content of the resource (i.e., style of resource). Moreover, as there is a difference in the resource style between the first resource and the second resource in the resource sample pair, there is a difference in the resource style between the second resource and the reference resource. And when the network is generated by the subsequent training features, the distance between the generated features of the first resource and the features of the reference resource is smaller, and the distance between the features of the second resource and the features of the reference resource is larger, so that the influence of the resource content on the access behavior of the object is eliminated, the features generated by the feature generation network obtained by training can more accurately represent the resource style of the resource, and the accuracy of resource classification is improved.
S203, training the feature generation network based on the reference resource and the resource sample to obtain a trained feature generation network.
The trained feature generation network is used for generating resource classification features of the resources. The resource classification feature may be a feature required for classifying a resource. As described above, the resource classification feature may embody a resource style of the resource, which may also be referred to as a resource style feature. The resource classification feature may be represented as a feature vector or a feature matrix, without limitation.
As described above, when the network is generated based on the reference resource and the resource sample pair, the distance between the feature of the first resource and the feature of the reference resource can be made smaller, and the distance between the feature of the second resource and the feature of the reference resource is made larger, so that the feature of the resource in terms of the resource style can be better extracted by the resource classification feature generated by the trained feature generation network, the accuracy of the resource classification feature is improved, and the accuracy of the resource classification is further improved.
According to the embodiment of the application, the resource sample pair constructed based on the access behaviors of the object set to the resource set can be obtained, the resources in the resource sample pair have the similarity of resource content, the objects with the number greater than or equal to the threshold have the access behaviors to the first resource and do not have the access behaviors to the second resource, the reference resource can be obtained, and the network is generated based on the reference resource and the resource sample pair, so that the resource classification characteristic of the network generated resource is generated according to the trained characteristic, and the classification of the resource is realized. Therefore, the network can be generated by combining the access behavior training characteristics of the object set for the resource set, and in the training process, the influence on the classification of the resources caused by the similarity of the resource content is avoided, so that the accuracy of the resource classification characteristics of the resources generated by the characteristic generating network obtained by training is improved, and the classification of the resources is facilitated more accurately through the resource classification characteristics of the resources.
Referring to fig. 4, fig. 4 is a flowchart of a data processing method according to an embodiment of the application. The method is applied to the electronic equipment. The data processing method may include the following steps.
S401, acquiring a resource sample pair constructed based on the access behavior of the object set to the resource set.
In one possible implementation, the determining the criterion for the similarity between the first resource and the second resource, where the criterion is that the similarity between the first resource and the second resource is greater than or equal to the similarity threshold, the step S401 may include the following steps: (1) and acquiring the first pending resources and the second pending resources in the resource set, and acquiring the similarity of the resource content between the first pending resources and the second pending resources. The first pending resource may be any resource in the resource set, and the second pending resource may be any resource in the resource set other than the first pending resource. The resource content similarity may be determined according to associated text information and/or associated image information of the first pending resource and associated text information and/or associated image information of the second pending resource. If the similarity between the associated text information of the first pending resource and the associated text information of the second pending resource is higher, the similarity of the resource content between the first pending resource and the second pending resource is higher; and if the similarity between the associated image information of the first pending resource and the associated image information of the second pending resource is higher, the similarity of the resource content between the first pending resource and the second pending resource is higher.
(2) And if the similarity of the resource contents is greater than or equal to the similarity threshold, determining that the first undetermined resource and the second undetermined resource have the similarity of the resource contents. The similarity threshold is used for indicating a minimum value of the similarity of the resource contents required by the similarity of the resource contents between the first undetermined resource and the second undetermined resource.
For example, if the similarity threshold is r, when the similarity of the content between the first pending resource and the second pending resource is greater than or equal to r, it is determined that the similarity of the content of the resource is provided between the first pending resource and the second pending resource. In one possible implementation, if the similarity of the resource content is less than the similarity threshold, it is determined that the first pending resource and the second pending resource do not have similarity of the resource content.
(3) And constructing a resource sample pair according to the first to-be-determined resource and the second to-be-determined resource which have the similarity of the resource content. It can be appreciated that if there is no similarity in resource content between the first pending resource and the second pending resource, then no resource sample pair is subsequently constructed from the first pending resource and the second pending resource. Therefore, the similarity of the resource content between the first resource and the second resource in the resource sample pair can be ensured, so that the influence of the content of the resource on the access behavior of the user can be eliminated when the network is generated by the training characteristics, and the resource classification characteristics obtained by the network is generated according to the trained characteristics, so that the resource style of the resource can be more accurately reflected.
In a possible implementation manner, the obtaining the similarity of the resource content between the first pending resource and the second pending resource may specifically include the following steps: (1) and inputting the associated text information and/or the associated image information of the first resource into a pre-trained content vector generation model to obtain a first resource content vector of the first resource to be determined. The pre-trained content vector generation model may be used to generate a corresponding resource content vector (also referred to as an Encoder) according to associated text information and/or associated image information of the resource.
For example, the pre-trained content vector generation model may be a pre-trained natural language processing model, such as a BERT model (a word embedding model), such that the pre-trained natural language processing model (the model may also be referred to as a network) may be invoked to generate a first resource content vector from associated text information of a first resource to be determined. For another example, the pre-trained content vector generation model may be a pre-trained image processing model, such as an effencentnet b4 model (a convolutional neural network), so that the pre-trained image processing model may be invoked to generate the first resource content vector according to the associated image information of the first resource to be determined.
For another example, the pre-trained content vector generation model may be a multi-modal model, and the associated text information and the associated image information of the first to-be-determined resource are used as inputs of the multi-modal model to obtain the first resource content vector of the first to-be-determined resource. In one possible implementation, the pre-trained content vector generation model may be trained according to sample resources associated with corresponding sample tags as supervisory signals for training the content vector generation model, where the sample tags may be manual tags of the resources, e.g., the sample tags may be movie names of the tagged resources, person names of primary roles appearing in the resources, organization names, etc., to characterize the actual content of the resources. And then, the sample resource is input into the content vector generation model to obtain a sample resource content vector corresponding to the sample resource, and further, the model parameters of the content vector generation model are adjusted based on the sample resource content vector and the feature vector corresponding to the sample label, so that the gap between the generated sample resource content vector and the sample label is smaller, and the content vector generation model with the adjusted model parameters is determined to be the pre-trained content vector generation model. The feature vector corresponding to the sample tag can be obtained through a natural language processing model, such as a BERT model. The content vector generation model can accurately generate feature vectors for representing the content of the resource.
(2) And inputting the associated text information and/or the associated image information of the second undetermined resource into a pre-trained content vector generation model to obtain a second resource content vector of the second undetermined resource. As described above, the method for generating the second resource content vector is the same as the method for generating the first resource content vector, and will not be described herein. (3) And determining the similarity of the resource contents of the first to-be-determined resource and the second to-be-determined resource according to the first resource content vector and the second resource content vector. The resource content similarity may be determined according to a distance between the first resource content vector and the second resource content vector, for example, the euclidean distance between the first resource content vector and the second resource content vector is determined as the corresponding resource content similarity.
In a possible implementation manner, the criterion for determining the similarity of the resource contents between the first resource and the second resource may also be that the first resource and the second resource belong to the same resource content classification set, where the resource content classification set is a set divided according to the resource contents of the respective resources in the resource set, and step S401 may include the following steps: (1) invoking a pre-trained content vector generation model to generate a resource content vector for each of the K resources contained in the resource set. The generated resource content vector may be a corresponding feature vector obtained by respectively inputting the associated text information and/or the associated image data of each resource into the pre-trained resource content vector generation model, and the specific description may refer to the above related description, which is not repeated herein.
(2) And performing feature clustering processing on the resource content vectors of the K resources to obtain a plurality of resource content vector classification sets of the K resource content vectors. And performing feature clustering processing on the resource content vectors of the K resources, namely determining an optimal classification mode according to the distance between the resource content vectors. For example, the feature clustering process may be a clustering process by a K-means algorithm (a clustering algorithm). Specifically, the number t of the classification sets of the resource content vectors to be classified can be preset, t resource content vectors are randomly selected from the K resource content vectors to serve as centroids, the distance (such as euclidean distance) between each centroid is calculated for each resource content vector in the K resource content vectors, and then the resource content vectors are divided into sets corresponding to centroids closest to each centroid, so that t resource content vector sets corresponding to the t centroids are obtained. The centroid is then updated based on each of the resulting t sets of resource content vectors, such as by determining the mean vector of each of the set of resource content vectors as the new centroid.
If the distance (such as euclidean distance) between the updated centroid and the centroid before updating is smaller than or equal to the distance threshold, t resource content vector sets corresponding to the updated t centroids respectively can be determined as a plurality of resource content vector classification sets of the K resource content vectors. The distance threshold may be a minimum value of a distance between the updated centroid and the centroid before the update when the resource content vector classification set is obtained. It can be appreciated that if the positions of the updated centroid and the centroid before updating are not changed greatly, the classification of the resource content vector tends to be stable or converged, so that the classification of a plurality of resource content vector classification sets can be realized. If the distance between the updated centroid and the centroid before updating is greater than the distance threshold, the subsequent updating of the centroid position can be performed again according to t resource content vector sets corresponding to the updated t centroids respectively, namely the second updating of the centroid is realized, and further, the updating of the centroid is continued or the resource content vector set corresponding to the centroid after the second updating is determined to be a final plurality of resource content vector classification sets according to the comparison of the distance between the centroid after the second updating and the centroid after the first updating and the distance threshold, and the classification of the resource content vector classification sets of the resource sets is realized.
(3) A plurality of resource content classification sets for the resource set are determined from the resource content vector classification set. Wherein any one of the resource content classification sets (which may also be referred to as resource content Topic) contains at least one resource of the resource set. The resources included in the resource content classification set may be resources corresponding to respective resource content vectors in the corresponding resource content vector classification set. One resource content vector class set corresponds to one resource content class set.
(4) And if the first undetermined resource and the second undetermined resource belong to the same resource content classification set, determining that the similarity of the resource content exists between the first undetermined resource and the second undetermined resource. If the first pending resource and the second pending resource belong to different resource content classification sets, the similarity of the resource content between the first pending resource and the second pending resource is not provided.
(5) And constructing a resource sample pair according to the first to-be-determined resource and the second to-be-determined resource which have the similarity of the resource content. Therefore, whether the resources have the similarity of the resource content or not can be determined through the division of the resource content diversity, and in the actual processing process, each resource in the resource set can be divided into a plurality of resource content classification sets in advance, so that whether the first undetermined resource and the second undetermined resource have the similarity of the resource content or not can be determined more efficiently, and the judging efficiency of the similarity of the resource content is improved.
In one possible implementation manner, the constructing a resource sample pair according to the first to-be-determined resource and the second to-be-determined resource having the similarity of the resource contents may specifically include the following steps: (1) and acquiring the access behaviors of the N objects to the first reserved resource and the second reserved resource. The N objects may have a certain number of objects accessing only the first pending resource but not the second pending resource, or a certain number of objects accessing only the second pending resource but not the first pending resource, or a certain number of objects accessing both the first pending resource and the second pending resource.
(2) If more than the threshold number of the N objects have access behaviors to the first pending resource and do not have access behaviors to the second pending resource, determining the first pending resource as the first resource and determining the second pending resource as the second resource. The number of objects having access behaviors to the first pending resource and not having access behaviors to the second pending resource is M, that is, when M is greater than or equal to the threshold number, the first pending resource is determined as the first resource, and the second pending resource is determined as the second resource. The threshold number may be a predetermined fixed value. For example, the threshold number may be set to 50 regardless of the number of objects in the object set; the threshold number may also be a certain duty cycle of the number of objects in the object set (i.e. the value of N), for example the threshold number may be 60% of the number of objects in the object set. The threshold number may also be determined based on the number of objects (denoted herein as L) that have access to the second pending resource and do not have access to the first pending resource. For example, the threshold number may be r×l, where r is a constant, and if r is set to 2, this is equivalent to determining a resource sample pair according to the first pending resource and the second pending resource if M is greater than or equal to twice L. As another example, the threshold number may be l+a, where a is a constant, and if a is set to 30, this is equivalent to determining the resource sample pair according to the first pending resource and the second pending resource if the difference between M and L is greater than or equal to 30 (i.e., a).
(3) And constructing resource sample pairs according to the first resource and the second resource. The step is to directly determine the first resource and the second resource as two resources in the resource sample pair. The method can determine that the two resources in the resource sample pair have differences in terms of the resource styles based on the differences of the two resources with the similarity of the resource content in the performance of the access behaviors of the user, so that the characteristic generation network capable of generating the resource classification characteristic which can more accurately reflect the resource styles can be obtained through subsequent training according to the resource sample pair.
S402, determining a target object from M objects, and acquiring a reference resource with access behaviors of the target object in a resource set.
Step S402 may refer to the description related to step S202, which is not described herein.
S403, calling the feature generation network to generate the reference resource feature of the reference resource.
Wherein the feature generation network may generate a reference resource feature of the reference resource based on the associated text information and/or the associated image information of the input reference resource. The reference resource signature may be represented as a signature vector or a signature matrix, without limitation.
In a possible implementation manner, the feature generating network may generate the reference resource feature of the reference resource according to both the associated text information and the associated image information of the reference resource, and then step S203 may specifically include the following steps: (1) and acquiring associated text information and associated image information of the reference resource. The text related information and the related description of the related image information may refer to the above description, and are not repeated herein. (2) The associated text information and the associated image information are input into a feature generation network, and text features of the associated text information and image features of the associated image information are generated based on the feature generation network. The feature generation network may include a natural language processing model for generating text features from associated text information and an image processing model for generating an image feature model from associated image information. For example, the natural language processing model may be a Bert model, and the image processing may be an EffencientnetB4 model, without limitation. The text features and image features may be represented as feature vectors or feature matrices, without limitation. (3) And carrying out feature fusion processing on the text features and the image features to obtain reference resource features of the reference resource. The feature fusion processing of the text feature and the image feature may be to combine and splice the text feature and the image feature or obtain the reference resource feature by a certain calculation mode (such as adding the text feature and the image feature). Therefore, the reference resource characteristics are generated based on the associated text information and the associated image information of the reference resource, and the characteristics of the resource can be more accurately embodied.
For example, referring to fig. 5, fig. 5 is a schematic structural diagram of a feature generation network according to an embodiment of the present application. As shown in FIG. 5, the feature generation network may include a natural language processing model, such as the Bert model, and also include an image processing model, such as the Effencientnet B4 model. The associated text information of the resource may be input into a natural language processing model of the feature generation network and the associated image information into an image processing model of the feature generation network. And then obtaining text features and image features through the feedforward neural network (Feed Forward Neural Networks, FFNNs) and the processing of the activation function, further obtaining features of input resources after fusion processing of the text features and the image features, and obtaining reference resource features corresponding to the reference resources if the input resources are reference resources. Wherein the feed-forward neural network, i.e. FFN, is a network without any loops, the data in the feed-forward neural network is transferred from input to output in a single pass without any previous "state memory". Technically, most networks in deep learning can be considered FFNNs, but in general "FFNN" refers to its simplest variant: densely connected multilayer perceptrons (MLPs). A common network architecture is a feed forward neural network (Feed ForWard Neural Network, FFN). Neurons are linearly arranged without a loop, forming FFNs. Because the information propagates forward inside the network, it is called feed forward. The information first passes through the input neuron layer, then passes through the hidden neuron layer and the output neuron layer. The activation function may be a tanh function, i.e. a hyperbolic tangent function, capable of returning a hyperbolic tangent value with X in radians as an input parameter. the tanh function acts on each element in the array, the field and range of this function includes complex values, and all angles are expressed in radians.
In one possible implementation, if the number of reference resources is one, the feature generation network may be invoked to directly generate the reference resource features for the reference resources.
In a possible implementation manner, if the number of the reference resources is multiple, the feature generating network may be called to generate initial reference resource features of each reference resource, and the initial reference resource features of each reference resource may be subjected to fusion processing to obtain the reference resource features of the reference resources. The merging processing of the initial reference resource characteristics of each reference resource can be the average pooling processing of the initial reference resource of each reference resource, so that the comprehensive characteristics for expressing the plurality of reference resources can be obtained, and therefore, the accuracy of the characteristic generation network obtained by training can be improved by taking the reference resource characteristics after the average pooling processing of the initial reference resource characteristics of the plurality of reference resources as the anchoring targets for training of the characteristic generation network, so that the accuracy of resource classification can be improved.
S404, calling the feature generation network to respectively generate a first resource feature of the first resource and a second resource feature of the second resource.
The manner of calling the feature generation network to generate the first resource feature of the first resource and the second resource feature of the second resource is the same as the manner of generating the reference resource feature, that is, the feature generation network shown in fig. 5 can be referred to determine the first resource feature and the second resource feature. That is, the method of obtaining the reference resource characteristics of the first resource includes the steps of: acquiring associated text information and associated image information of a first resource; inputting the associated text information and the associated image information of the first resource into a feature generation network, and generating text features of the associated text information and image features of the associated image information based on the feature generation network; and carrying out feature fusion processing on the text features and the image features to obtain the reference resource features of the first resource. The method for obtaining the reference resource characteristics of the second resource comprises the following steps: acquiring associated text information and associated image information of a second resource; inputting the associated text information and the associated image information of the second resource into a feature generation network, and generating text features of the associated text information and image features of the associated image information based on the feature generation network; and carrying out feature fusion processing on the text features and the image features to obtain the reference resource features of the second resource. Specific descriptions thereof are not repeated here.
S405, generating network parameters of the network based on the reference resource characteristics, the first resource characteristics and the second resource characteristics, and determining the characteristic generating network subjected to network parameter correction as a trained characteristic generating network.
Wherein, as described above, the network parameters of the network are generated based on the reference resource feature, the first resource feature, and the second resource feature correction feature, that is, the network parameters of the network are generated based on the distance between the reference resource feature and the first resource feature, and the distance between the reference resource feature and the second resource feature correction feature. It can be understood that the process of generating the network parameter by the corrected feature is an iterative training process, and correcting the network parameter of the corrected feature generation network can make the distance between the reference resource feature and the first resource feature of the corrected feature generation network smaller and the distance between the reference resource feature and the second resource feature larger.
In one possible implementation, generating network parameters of the network based on the reference resource feature, the first resource feature, and the second resource feature modification feature includes: (1) a first feature distance between the first resource feature and the reference resource feature is obtained. Wherein the first feature distance is used to represent a gap between the first resource feature and the reference resource feature. For example, the first feature distance may be obtained by calculating a Euclidean distance between the first resource feature and the reference resource feature. (2) And obtaining a second feature distance between the second resource feature and the reference resource feature. Wherein the second feature distance is used to represent a gap between the second resource feature and the reference resource feature. For example, the second feature distance may be obtained by calculating the Euclidean distance between the second resource feature and the reference resource feature. (3) Network parameters of the network are generated based on the first feature distance and the second feature distance modification feature. The step may call a loss function to calculate a corresponding loss value based on the first feature distance and the second feature distance, so as to gradually converge the loss value by correcting the network parameter. It can be understood that in the loss function, the smaller the first feature distance is, the smaller the loss value is, and the larger the second feature distance is, the smaller the loss value is, so that the first feature distance is gradually reduced and the second feature distance is gradually increased in the process of gradually converging the loss value in the training process, and the trained feature generation network can more accurately extract the features used for representing the resource style of the resource.
For example, referring to fig. 6, fig. 6 is a schematic diagram illustrating an effect of a training procedure of a feature generation network according to an embodiment of the present application. As shown in fig. 6, a pair of resource samples (shown as 601 in fig. 6) is first acquired, wherein the pair includes a first resource and a second resource, and then the first resource and the second resource are respectively input into a resource feature generating network (shown as 602 in fig. 6), so as to obtain a first resource feature of the first resource (shown as 603 in fig. 6) and a second resource feature of the second resource (shown as 604 in fig. 6). Acquiring reference resources (as shown in 605 of fig. 6), where the number of reference resources is 5, the initial reference resources corresponding to each reference resource may be generated by the network based on the resource signature generation network, and then the reference resource signature (as shown in 607 of fig. 6) is obtained by averaging and pooling the 5 initial reference resources (as shown in 606 of fig. 6). Then, a first feature distance is obtained based on the distance between the reference resource feature and the first resource feature (as shown by 608 in fig. 6), a second feature distance is obtained based on the distance between the reference resource feature and the second resource feature (as shown by 609 in fig. 6), and further, a loss value is obtained by calculating the first feature distance and the second feature distance based on a loss function (as shown by 610 in fig. 6), so that training of the feature generation network is realized according to the loss function.
According to the embodiment of the application, the resource sample pair constructed based on the access behaviors of the object set to the resource set can be obtained, the resources in the resource sample pair have the similarity of resource content, the objects with the number greater than or equal to the threshold have the access behaviors to the first resource and do not have the access behaviors to the second resource, the reference resource can be obtained, and the network is generated based on the reference resource and the resource sample pair, so that the resource classification characteristic of the network generated resource is generated according to the trained characteristic, and the classification of the resource is realized. Therefore, the network can be generated by combining the access behavior training characteristics of the object set for the resource set, and in the training process, the influence on the classification of the resources caused by the similarity of the resource content is avoided, so that the accuracy of the resource classification characteristics of the resources generated by the characteristic generating network obtained by training is improved, and the classification of the resources is facilitated more accurately through the resource classification characteristics of the resources.
Referring to fig. 7, fig. 7 is a flowchart of a data processing method according to an embodiment of the application. The method is applied to the electronic equipment. The data processing method may include the following steps.
S701, acquiring a resource sample pair constructed based on the access behavior of the object set to the resource set.
S702, determining a target object from M objects, and acquiring a reference resource with access behaviors of the target object in a resource set.
S703, training the feature generation network based on the reference resource and the resource sample to obtain a trained feature generation network.
The steps S701 to S703 may refer to the related descriptions of the steps S201 to S203, which are not described herein.
S704, calling the trained feature generation network to generate the resource classification feature of each resource in the K resources.
The trained feature generation network is called to generate the resource classification feature of each of the K resources, and the associated text information and/or the associated image information of each of the K resources can be input into the trained feature generation network to obtain the resource classification feature of each of the K resources. The method for obtaining the resource classification feature of each resource may refer to the method for obtaining the reference resource feature, the first resource feature and the second resource feature in the training process of the feature generation network, which is not described herein. For example, if the method for acquiring the reference resource feature, the first resource feature and the second resource feature in the training process of the feature generation network is a resource feature generated according to both the associated text information and the associated image information of the resource, the method for calling the trained feature generation network to generate the resource classification feature of each resource in the K resources may be: acquiring associated text information and associated image information of the resources aiming at each of the K resources; inputting the associated text information and the associated image information of the resource into a feature generation network, and generating text features of the associated text information and image features of the associated image information based on the feature generation network; and carrying out feature fusion processing on the text features and the image features to obtain resource classification features of the resources. Specific description may refer to the above related description, and will not be repeated here.
And S705, performing feature clustering processing on the resource classification features of the K resources to obtain a plurality of resource feature classification sets of the resource classification features of the K resources.
Wherein any one of the resource feature classification sets comprises at least one of the resource classification features of the K resources. And carrying out feature clustering processing on the resource classification features of the K resources, and indicating an optimal classification mode according to the distance between the resource classification features so as to obtain a plurality of resource feature classification sets.
In a possible implementation manner, the feature clustering process is performed on the resource classification features of the K resources, and the clustering process may be performed based on a K-means clustering algorithm, and then step S705 may include the following steps:
(1) and determining the number w of the resource characteristic classification sets required to be classified, and randomly selecting w resource content vectors from the resource classification characteristics of K resources as centroids. (2) For each of the resource classification features of the K resources, a distance (e.g., euclidean distance) from each centroid is calculated. (3) And dividing the resource classification features into sets corresponding to centroids closest to the centroids, so as to obtain w initial resource feature classification sets corresponding to the w centroids respectively. (4) And updating the centroid based on each resource content vector in the obtained w initial resource feature classification sets, for example, determining the mean feature of each resource classification feature in the initial resource feature classification set as a new centroid. (5) If the distance (such as euclidean distance) between the updated centroid and the centroid before updating is smaller than or equal to the distance threshold, w initial resource feature classification sets corresponding to the w updated centroids respectively can be determined as a plurality of resource feature classification sets of the resource classification features of the K resources. The distance threshold may be a minimum distance between the updated centroid and the centroid before updating when the plurality of resource feature classification sets are obtained. It can be appreciated that if the position of the updated centroid and the position of the updated centroid are not changed greatly, the classification of the resource content vector tends to be stable or converged, so that the classification of the plurality of resource feature classification sets can be completed. (6) If the distance between the updated centroid and the centroid before updating is greater than the distance threshold, the subsequent updating of the centroid position can be performed again according to w initial resource feature classification sets corresponding to w updated centroids respectively, namely the second updating of the centroid is realized, further, the updating of the centroid is stopped according to the comparison between the distance between the centroid after the second updating and the centroid after the first updating and the distance threshold, so as to determine that the centroid is continuously updated or the initial resource feature classification set corresponding to the centroid after the second updating is determined as a final plurality of resource feature classification sets, and similarly, if the distance between the updated centroid and the centroid before updating is greater than the distance threshold, the steps (3) - (5) are repeated until the distance between the updated centroid and the centroid before updating is less than or equal to the distance threshold, or the updating times reach a certain number of times, the updating of the centroid is stopped, w initial resource feature classification sets corresponding to w updated centroid respectively are determined as a plurality of resource feature classification sets of the K resources, and the resource feature classification sets of the K resources are realized.
S706, determining a plurality of resource classification sets of the resource set according to the plurality of resource characteristic classification sets.
Wherein any one of the resource classification sets contains at least one resource in the resource set. And each resource in the one resource classification set is a resource corresponding to the resource classification feature in the corresponding resource feature classification set.
It can be appreciated that by determining the resource set as a plurality of resource classification sets, classification of resources in the resource set can be achieved, with resources in the same resource classification set belonging to the same classification.
In one possible implementation, for each resource in the resource classification set, a corresponding resource classification label (also referred to as a style label, style classification label) may be labeled. The resource classification tags may generally summarize the resource styles of individual resources in a resource classification set, e.g., the resource classification tags may be native style, sad mood, fun style, conclusive style, etc., without limitation herein. Specifically, for each resource classification set, acquiring resources corresponding to G resource classification features, wherein the G resources correspond to the resource classification features, and the G is a positive integer, are adjacent to the centroid of the resource feature classification set corresponding to the resource classification set; and determining the resource classification label of each resource in the resource classification set according to the resources corresponding to the G resource classification features. Therefore, the resource classification labels corresponding to the resource classification sets can be determined according to the most central resources in the resource classification sets, and if the resources corresponding to the G resource classification features are all the resources in the fun style (such as fun videos), the resource classification label of each resource in the resource classification set corresponding to the G resource classification features can be determined to be in the fun style. By determining a corresponding resource classification label for each resource, the method can help to quickly determine the resource classification set to which each resource belongs and quickly know what classification the resource belongs. For example, referring to fig. 8, fig. 8 is a schematic view of the effect of a resource classification set provided by the embodiment of the present application, and 801 in fig. 8 is a header of a plurality of resources in the resource classification set 1, and it can be seen that the style of the resources in the resource classification set 1 is biased to the sad style, and then the resource classification label of each resource in the resource classification set 1 can be determined as the sad style; as shown at 802 in fig. 8, which illustrates the titles of the plurality of resources in the resource classification set 2, it can be seen that the style of the resources in the resource classification set 2 is biased toward the rural style, and then the resource classification labels of the respective resources in the resource classification set 2 can be determined as the rural style.
In one possible implementation manner, the above-mentioned resource classification set is divided based on the resource classification characteristics of the resources already existing in the application (i.e. the stock resources in the application), and in an actual application scenario, since each user may issue a new resource in the application, after performing the resource classification based on the stock resources, if it is required to determine the resource classification set to which the newly issued resource belongs, the following steps may be included: calling the trained feature generation network to obtain the resource classification feature corresponding to the newly released resource; determining a resource feature classification set to which the newly released resource belongs according to the distance between the centroid of each resource feature classification set and the resource classification feature corresponding to the newly released resource, for example, the resource feature classification set with the minimum distance between the centroid and the resource classification feature corresponding to the newly released resource can be used as the resource feature classification set to which the resource classification feature corresponding to the newly released resource belongs; and further determining the resource classification set to which the newly released resource belongs according to the resource characteristic classification set to which the corresponding resource classification characteristic of the newly released resource belongs. Therefore, the frequency of determining the resource classification set to which the newly released resource belongs by carrying out the clustering processing on the resource classification features can be reduced, and the calculated amount is reduced. In a possible implementation manner, the embodiment of the application can further divide the resource classification set of the stock resources in the application again according to the mode of clustering the resource classification features at certain intervals (such as every one week), so that the timeliness of the classification of the resources can be ensured.
In one possible implementation, after determining a plurality of resource classification sets of the resource set according to the plurality of resource feature classification sets, the resource may be recommended for the object according to the plurality of resource classification sets. The method specifically comprises the following steps: (1) and obtaining the target resource of which the recommended object has access behaviors, and determining a target resource classification set to which the target resource belongs from the plurality of resource classification sets. The recommended object may be any object that needs to be recommended by the resource. The target resource for which the recommended object has access behavior may also be referred to as a historical access resource for the recommended object. In one possible implementation, the target resource may be a resource that the recommended object has access behavior in a target time range, so that filtering of the resource that the object is recommended for in the time range may be achieved. The target time range may be a certain time period before the time point of acquiring the target resource, for example, the target time range may be the first 3 months of the time point of acquiring the target resource, so that the resource referred to when the resource recommendation is performed for the recommended object is the resource accessed in the last period of time of the recommended object, and the situation that the recommended resource does not conform to the current access preference of the recommended object due to the change of the access habit (access preference) of the recommended object is avoided. (2) And determining the resources of the target resource classification set, which do not have access behaviors, as the resources to be recommended. By this, it is possible to avoid recommending the accessed resource for the recommended object, so as to reduce the repeatability of the resource recommendation. It can be understood that, by determining the category to which each resource belongs in advance, in the actual resource recommendation scene, the similarity between the resource accessed by the recommendation object and each resource in the resource set does not need to be calculated when the recommendation object is recommended, so that the efficiency of recommending the resource is improved. (3) And recommending the resources to be recommended according to the resources to be recommended. In one possible implementation manner, the recommended object is recommended according to the to-be-recommended resource, and a certain number of resource recommendations can be randomly screened from the to-be-recommended resource to be recommended to the recommended object. In a possible implementation manner, the recommended object is recommended according to the resources to be recommended, and the distances between the resource classification features of the resources to be recommended and the resource classification features of the target resources are ranked from large to small, so that the resources to be recommended with the ranking of the first R are determined to be the recommended line recommended resources. Therefore, the resource recommendation can be performed based on the resource classification set as the recommended object, and the resource recommendation can be realized more accurately due to the fact that the division of the resource classification set is more accurate, so that the click rate of the recommended object on the recommended resource can be improved.
In a possible implementation manner, the method and the device can be applied to the recommended scenes of the video resources, can help to mine the styles of the video, namely, can realize classification according to the styles of the video, and associates corresponding style labels (such as serious, harmonic, rural and the like) on the video resources in each resource style set, so that a recommendation engine can be assisted in making better resource recommendation by combining the video content, and the efficiency of screening the video by operators is improved; preprocessing for classifying and sorting videos according to different resource style characteristics (namely resource classification characteristics) can be provided, and more multi-element resource recommendation is performed; the method and the system can effectively distinguish the content of the video from the information of 2 dimensions of the style by fully utilizing the access behavior of the user and the video, not only can carry out resource recommendation according to resources similar to the resource content of the historical access resources of the user, but also can carry out resource recommendation according to resources similar to the resource style of the historical access resources of the user, and improve the diversity of the resources recommended by the information flow.
In an actual application scenario, the embodiment of the application can be applied to a data processing system. Referring to fig. 9, fig. 9 is a schematic diagram illustrating an effect of a data processing system according to an embodiment of the application. The system mainly comprises a resource classification service, a resource classification feature generation service, a feature generation network and a sample database. The resource classification service may be used to invoke the resource classification feature generation service to generate feature classification features of the resources (as in S901 in fig. 9), so as to obtain a plurality of resource classification sets, and implement classification of the stock resources or newly released resources in the application; and storing the resources in each resource classification set and the style labels in a content database in an associated manner for subsequent recommendation of the resources. The trigger opportunity for the resource classification service may invoke the resource classification service for detecting a dispatch center service. The resource classification feature generation service is used to invoke the resource classification feature of the feature generation network generation resource (S902 in fig. 9). The feature generation network is a feature generation network trained according to the related description, and can generate resource classification features according to the associated text information and/or associated image information of the resources. The sample database for storing the resource sample pairs generated according to the above-mentioned construction method of the resource sample pairs so as to train the feature generation network based on the resource sample pairs (S903 in fig. 9); and the sample database can acquire reference resources, for example, 5 resources which are accessed closest to a certain resource A in an access sequence (namely, a sequence of the resources with access behaviors of the target object arranged according to time sequence) of the target object, and simultaneously acquire the resource A and the resource B with the similarity of the resource content by utilizing the resource content vector, and construct resource sample pairs by combining the differences of the access behaviors of the user after the exposure of the resource A, B according to the method. The access sequence of the target object can be acquired through a statistical report interface and analysis service.
In one possible implementation, the data processing system may further include a resource content production end, a resource content consumption end, a dispatch center service, a content database, an uplink and downlink content interface server, a manual auditing system, a resource content storage service, a download file service, a duplication removal service, a content distribution outlet service, a statistical reporting interface, an analysis service, and so on, which are not limited herein.
The resource content production end can be a client end for releasing resources in an application, such as a PGC (totally called Professional Generated Content, refer to professional production content and expert production content), and a MCN (product form of a multi-channel network) content producer, and can provide local or shot video content through a mobile end or an API (rear end interface) system; the method can acquire the interface address of the uploading server through communication with the uplink and downlink content interface servers, then upload the local file, and select matched music, filter templates, beautifying functions of the video and the like of the local video content in the shooting process. The resource content consumption end can be a client end used for accessing each resource in the application, can communicate with an uplink content interface server and a downlink content interface server, acquire index information of accessing a resource file (such as a video file) and an address accessed by the resource file, then communicate with a resource content storage server, download a corresponding streaming media file and play and view through a local player; if the resource is video data, behavior data, a card, loading time, playing click, exposure and the like played by a user in the uploading and downloading processes can be simultaneously reported to a statistics server, and an obvious content quality feedback entry comprising quality problem types, corresponding to the various low-quality characteristics, a corresponding video file unique ID (identification) and a corresponding time stamp is provided; the resource content consumers typically browse the video data via Feeds streams (message streams).
The dispatching center service is responsible for the whole dispatching process of resource content circulation, receives the resource content in a content database through an uplink and downlink content interface server, and then acquires the meta information of the resource from the content database; the scheduling manual auditing system and the machine processing system control the scheduling sequence and priority; for video content, the video content is communicated with the duplication elimination service so that the duplication elimination service can process different code rates, different definition, different sizes, partial black screen, whether a filter exists or not, whether logo (mark) exists or not, and the insertion of partial advertising content and the cutting of the head and the tail of the film in the similar video content, thereby effectively reducing the content to be processed in the manual auditing process; scheduling resources which enter the manual auditing system and need to be subjected to manual auditing, and filtering repeated resources; content enablement by a manual auditing system is typically a resource content consumer provided to a terminal through a content distribution outlet service, which is typically a recommendation engine or a search engine or an operational direct presentation page; invoking the title, the cover and the content quality service of the resource, directly filtering low-quality content reaching a threshold value, and not sending manual verification, or placing a part of low-quality similar content at the tail end of a verification queue, and verifying high-quality content in advance; and the resource classification service communication is mainly characterized in that the resource is used for excavating style characteristics, similar style resources are aggregated together, and the aggregated relation is stored in a content database to be used by a recommendation distribution system, so that the experience effect of resource distribution is improved.
The content database may store meta information of resources (such as video data, graphic data, etc.) issued by a resource content production end, such as file size, cover map link, code rate, file format, title, issue time, author, video file size, video format, whether original mark or first include classification of resources in a manual auditing process, etc., which will not be described herein; performing standard transcoding operation on the resource content, and asynchronously returning meta-information of the resource content after transcoding is completed, wherein the meta-information mainly comprises file size, code rate and specification, and the information of the cut cover map is stored in a content database; the result and the state of manual auditing can also be saved; the result of the resource duplication elimination can be saved, and the duplicate processing of the identical content can not be repeated for the manual work.
The manual auditing system is a carrier of manual service capability, needs to read original information of video content in a content database, is usually a system developed based on a web (network) database with complex service, and performs one round of preliminary filtering on whether the video content involves illegal information or not through manual operation; on the basis of the primary auditing, secondary auditing is carried out on the resource content, namely classification and labeling or confirmation of labels are mainly carried out on the content, and as the resource content is not completely mature through machine learning such as deep learning, secondary manual auditing processing is needed on a machine processed by the machine, and the accuracy and efficiency of labeling of the labels of the video are improved through man-machine cooperation; the method can also be used as a content consumption end for reporting and negatively feeding back the content, and the like, and is not limited herein. The uplink and downlink content interface server is used for controlling uplink and downlink of multimedia data, for example, the uplink and downlink content interface server is directly communicated with a content production end, content submitted from the content production end is stored in a content database, usually information such as a title, a publisher, a summary, an original cover image, release time, a file size and the like of the content, or a shot video directly enters a service end through the uplink and downlink content interface server, the file is stored in a content storage service, and the content submitted by the publisher of the multimedia data (including the content provided by an external channel) can be synchronously sent to a dispatching center server for subsequent processing and circulation of the multimedia data.
The file downloading service is used for downloading data from a content storage service, controlling the speed and progress of downloading, is usually a group of parallel servers and consists of related task scheduling and distribution clusters, such as downloading and acquiring multimedia data from a resource content storage service, and the downloaded multimedia data can call a video frame extraction service and a graphic content analysis service to acquire a cover image to be detected so as to provide candidate cover images for the target cover image of the subsequent acquired multimedia data.
The duplicate removal service can compare the digital fingerprint characteristics of the resources, and only one copy of repeated similar video files is reserved and continued to the subsequent links, so that unnecessary repeated files and resource consumption in processing on the links are reduced.
The content distribution outlet service is used for indicating the outlets of the machine and the manual processing link multimedia data output, and the multimedia data is distributed to the resource content consumption end through the content distribution outlet, wherein the distribution mode can be recommendation algorithm distribution and manual operation, and the distribution mode is not limited herein. The resource determined to be recommended for the recommended object can be distributed to the recommended object through the content distribution outlet.
The resource content storage service is a storage server which is widely distributed and is accessed nearby a user, and a CDN (acceleration server) is arranged at the periphery of the storage server for carrying out distributed cache acceleration, so that resources uploaded by a resource content production end are stored through an uplink content interface server and a downlink content interface server; the resource content consumption end can directly access the resource content storage server to download the corresponding content; as a data source for the internal service, the file downloading service obtains the original resource data to perform related processing, and the paths of the internal and external data sources are usually deployed separately, so that mutual influence is avoided.
The statistical reporting interface and the analysis service can report the behavior data (such as a cartoon, loading time, playing click, exposure and the like) played by a user in the process of downloading the video content to the server; and extracting the exposure data in the statistical process and the user access behavior data.
According to the embodiment of the application, the resource sample pair constructed based on the access behaviors of the object set to the resource set can be obtained, the resources in the resource sample pair have the similarity of resource content, the objects with the number greater than or equal to the threshold have the access behaviors to the first resource and do not have the access behaviors to the second resource, the reference resource can be obtained, and the network is generated based on the reference resource and the resource sample pair, so that the resource classification characteristic of the network generated resource is generated according to the trained characteristic, and the classification of the resource is realized. Therefore, the network can be generated by combining the access behavior training characteristics of the object set for the resource set, and in the training process, the influence on the classification of the resources caused by the similarity of the resource content is avoided, so that the accuracy of the resource classification characteristics of the resources generated by the characteristic generating network obtained by training is improved, and the classification of the resources is facilitated more accurately through the resource classification characteristics of the resources.
Referring to fig. 10, fig. 10 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. Alternatively, the data processing device may be provided in the above-described electronic apparatus. As shown in fig. 10, the data query device described in the present embodiment may include:
an obtaining unit 1001, configured to obtain a resource sample pair constructed based on an access behavior of an object set to a resource set; the resource set comprises K resources, K is a positive integer, a pair of resource samples comprises first resources and second resources in the resource set, the first resources and the second resources have similarity of resource content, the object set comprises N objects, N is a positive integer, M objects in the N objects have access behaviors for the first resources and do not have access behaviors for the second resources, M is greater than or equal to a threshold number, and M is a positive integer and less than or equal to N;
a processing unit 1002, configured to determine a target object from M objects, and obtain a reference resource having an access behavior of the target object in a resource set; the first moment when the target object accesses the reference resource is adjacent to the second moment when the target object accesses the first resource;
the processing unit 1002 is further configured to generate a network for training the feature based on the reference resource and the resource sample, to obtain a trained feature generation network; the trained feature generation network is used for generating resource classification features of the resources.
In one implementation, the processing unit 1002 is specifically configured to:
calling a feature generation network to generate reference resource features of reference resources;
the method comprises the steps of calling a feature generation network to respectively generate a first resource feature of a first resource and a second resource feature of a second resource;
and generating network parameters of the network based on the reference resource characteristics, the first resource characteristics and the second resource characteristics, and determining the characteristic generating network subjected to network parameter correction as a trained characteristic generating network.
In one implementation, the processing unit 1002 is specifically configured to:
acquiring a first feature distance between a first resource feature and a reference resource feature;
acquiring a second feature distance between the second resource feature and the reference resource feature;
network parameters of the network are generated based on the first feature distance and the second feature distance modification feature.
In one implementation, the processing unit 1002 is specifically configured to:
acquiring associated text information and associated image information of a reference resource;
inputting the associated text information and the associated image information into a feature generation network, and generating text features of the associated text information and image features of the associated image information based on the feature generation network;
And carrying out feature fusion processing on the text features and the image features to obtain reference resource features of the reference resource.
In one implementation, the processing unit 1002 is specifically configured to:
acquiring a first to-be-determined resource and a second to-be-determined resource in a resource set, and acquiring the similarity of resource content between the first to-be-determined resource and the second to-be-determined resource;
if the similarity of the resource contents is greater than or equal to a similarity threshold, determining that the first undetermined resource and the second undetermined resource have the similarity of the resource contents;
and constructing a resource sample pair according to the first to-be-determined resource and the second to-be-determined resource which have the similarity of the resource content.
In one implementation, the processing unit 1002 is specifically configured to:
acquiring the access behaviors of N objects to the first to-be-determined resource and the second to-be-determined resource;
if more than the threshold number of the N objects have access behaviors to the first pending resource and do not have access behaviors to the second pending resource, determining the first pending resource as the first resource and determining the second pending resource as the second resource;
and constructing resource sample pairs according to the first resource and the second resource.
In one implementation, the processing unit 1002 is further configured to:
Invoking the trained feature generation network to generate resource classification features of each of the K resources;
performing feature clustering processing on the resource classification features of the K resources to obtain a plurality of resource feature classification sets of the resource classification features of the K resources; any one of the resource feature classification sets comprises at least one resource classification feature of the resource classification features of the K resources;
determining a plurality of resource classification sets of the resource set according to the plurality of resource feature classification sets; any one of the resource classification sets contains at least one resource in the resource set.
In one implementation, the processing unit 1002 is further configured to:
acquiring target resources of which recommended objects have access behaviors, and determining a target resource classification set to which the target resources belong from a plurality of resource classification sets;
determining resources of the recommended objects, which do not have access behaviors, in the target resource classification set as resources to be recommended;
and recommending the resources to be recommended according to the resources to be recommended.
Referring to fig. 11, fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the application. The electronic device described in the present embodiment includes: processor 1101, memory 1102. Optionally, the electronic device may further include a network interface or a power module. Data may be exchanged between the processor 1101 and the memory 1102.
The processor 1101 may be a central processing unit (Central Processing Unit, CPU) that may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The network interface may include input devices, such as a control panel, microphone, receiver, etc., and/or output devices, such as a display screen, transmitter, etc., which are not shown. For example, in an embodiment of the application, the network interface may include a receiver and a transmitter.
The memory 1102 may include read-only memory and random access memory, and provides program instructions and data to the processor 1101. A portion of memory 1102 may also include non-volatile random access memory. Wherein the processor 1101, when calling the program instructions, is configured to perform:
Acquiring a resource sample pair constructed based on the access behavior of the object set to the resource set; the resource set comprises K resources, K is a positive integer, a pair of resource samples comprises first resources and second resources in the resource set, the first resources and the second resources have similarity of resource content, the object set comprises N objects, N is a positive integer, M objects in the N objects have access behaviors for the first resources and do not have access behaviors for the second resources, M is greater than or equal to a threshold number, and M is a positive integer and less than or equal to N;
determining a target object from M objects, and acquiring a reference resource with access behaviors of the target object in a resource set; the first moment when the target object accesses the reference resource is adjacent to the second moment when the target object accesses the first resource;
training a feature generation network based on the reference resource and the resource sample to obtain a trained feature generation network; the trained feature generation network is used for generating resource classification features of the resources.
In one implementation, the processor 1101 is specifically configured to:
calling a feature generation network to generate reference resource features of reference resources;
the method comprises the steps of calling a feature generation network to respectively generate a first resource feature of a first resource and a second resource feature of a second resource;
And generating network parameters of the network based on the reference resource characteristics, the first resource characteristics and the second resource characteristics, and determining the characteristic generating network subjected to network parameter correction as a trained characteristic generating network.
In one implementation, the processor 1101 is specifically configured to:
acquiring a first feature distance between a first resource feature and a reference resource feature;
acquiring a second feature distance between the second resource feature and the reference resource feature;
network parameters of the network are generated based on the first feature distance and the second feature distance modification feature.
In one implementation, the processor 1101 is specifically configured to:
acquiring associated text information and associated image information of a reference resource;
inputting the associated text information and the associated image information into a feature generation network, and generating text features of the associated text information and image features of the associated image information based on the feature generation network;
and carrying out feature fusion processing on the text features and the image features to obtain reference resource features of the reference resource.
In one implementation, the processor 1101 is specifically configured to:
acquiring a first to-be-determined resource and a second to-be-determined resource in a resource set, and acquiring the similarity of resource content between the first to-be-determined resource and the second to-be-determined resource;
If the similarity of the resource contents is greater than or equal to a similarity threshold, determining that the first undetermined resource and the second undetermined resource have the similarity of the resource contents;
and constructing a resource sample pair according to the first to-be-determined resource and the second to-be-determined resource which have the similarity of the resource content.
In one implementation, the processor 1101 is specifically configured to:
acquiring the access behaviors of N objects to the first to-be-determined resource and the second to-be-determined resource;
if more than the threshold number of the N objects have access behaviors to the first pending resource and do not have access behaviors to the second pending resource, determining the first pending resource as the first resource and determining the second pending resource as the second resource;
and constructing resource sample pairs according to the first resource and the second resource.
In one implementation, the processor 1101 is further configured to:
invoking the trained feature generation network to generate resource classification features of each of the K resources;
performing feature clustering processing on the resource classification features of the K resources to obtain a plurality of resource feature classification sets of the resource classification features of the K resources; any one of the resource feature classification sets comprises at least one resource classification feature of the resource classification features of the K resources;
Determining a plurality of resource classification sets of the resource set according to the plurality of resource feature classification sets; any one of the resource classification sets contains at least one resource in the resource set.
In one implementation, the processor 1101 is further configured to:
acquiring target resources of which recommended objects have access behaviors, and determining a target resource classification set to which the target resources belong from a plurality of resource classification sets;
determining resources of the recommended objects, which do not have access behaviors, in the target resource classification set as resources to be recommended;
and recommending the resources to be recommended according to the resources to be recommended.
Optionally, the program instructions may further implement other steps of the method in the above embodiment when executed by the processor, which is not described herein.
The present application also provides a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the above method, such as the method performed by the above electronic device, which is not described herein.
Alternatively, a storage medium such as a computer-readable storage medium to which the present application relates may be nonvolatile or may be volatile.
Alternatively, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like. The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of action described, as some steps may be performed in other order or simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps performed in the embodiments of the methods described above. For example, the computer device may be a terminal, or may be a server.
The foregoing has described in detail a data processing method, apparatus, electronic device, product and medium according to embodiments of the present application, and specific examples are applied to illustrate the principles and embodiments of the present application, where the foregoing examples are only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will vary in the specific embodiments and application scope according to the idea of the present application, the present disclosure should not be construed as limiting the present application in summary.

Claims (12)

1. A method of data processing, the method comprising:
acquiring a resource sample pair constructed based on the access behavior of the object set to the resource set; the resource set comprises K resources, K is a positive integer, the pair of resource samples comprises first resources and second resources in the resource set, the similarity of resource content is provided between the first resources and the second resources, the object set comprises N objects, N is a positive integer, M objects in the N objects have access behaviors for the first resources and do not have access behaviors for the second resources, M is greater than or equal to a threshold quantity, and M is a positive integer and less than or equal to N;
Determining a target object from the M objects, and acquiring a reference resource with access behaviors of the target object from the resource set; the first moment when the target object accesses the reference resource is adjacent to the second moment when the target object accesses the first resource;
training a feature generation network based on the reference resource and the resource sample to obtain a trained feature generation network; the trained feature generation network is used for generating resource classification features of the resources.
2. The method of claim 1, wherein the training a feature generation network based on the reference resource and the resource sample, resulting in a trained feature generation network, comprises:
invoking the feature generation network to generate reference resource features of the reference resources;
invoking the feature generation network to generate a first resource feature of the first resource and a second resource feature of the second resource respectively;
and correcting the network parameters of the characteristic generating network based on the reference resource characteristic, the first resource characteristic and the second resource characteristic, and determining the characteristic generating network subjected to network parameter correction as a trained characteristic generating network.
3. The method of claim 2, wherein modifying the network parameters of the feature generation network based on the reference resource feature, the first resource feature, and the second resource feature comprises:
acquiring a first feature distance between the first resource feature and the reference resource feature;
acquiring a second feature distance between the second resource feature and the reference resource feature;
and correcting network parameters of the feature generation network based on the first feature distance and the second feature distance.
4. The method of claim 2, wherein the invoking the feature generation network to generate the reference resource feature of the reference resource comprises:
acquiring associated text information and associated image information of the reference resource;
inputting the associated text information and the associated image information into the feature generation network, and generating text features of the associated text information and image features of the associated image information based on the feature generation network;
and carrying out feature fusion processing on the text features and the image features to obtain the reference resource features of the reference resource.
5. The method of claim 1, wherein the obtaining resource sample pairs constructed based on access behavior of the object set to the resource set comprises:
acquiring a first undetermined resource and a second undetermined resource in the resource set, and acquiring the similarity of resource content between the first undetermined resource and the second undetermined resource;
if the similarity of the resource contents is greater than or equal to a similarity threshold, determining that the first undetermined resource and the second undetermined resource have the similarity of the resource contents;
and constructing the resource sample pair according to the first to-be-determined resource and the second to-be-determined resource which have the similarity of resource contents.
6. The method of claim 5, wherein said constructing said resource sample pair from said first pending resource and said second pending resource having similarity of resource content comprises:
acquiring the access behaviors of the N objects to the first pending resource and the second pending resource;
if more than the threshold number of the N objects have access behaviors to the first pending resource and do not have access behaviors to the second pending resource, determining the first pending resource as the first resource and the second pending resource as the second resource;
And constructing the resource sample pair according to the first resource and the second resource.
7. The method according to claim 1, wherein the method further comprises:
invoking a trained feature generation network to generate resource classification features of each resource in the K resources;
performing feature clustering processing on the resource classification features of the K resources to obtain a plurality of resource feature classification sets of the resource classification features of the K resources; any one of the resource feature classification sets comprises at least one resource classification feature of the resource classification features of the K resources;
determining a plurality of resource classification sets of the resource sets according to the plurality of resource feature classification sets; any one of the resource classification sets contains at least one resource of the set of resources.
8. The method of claim 7, wherein the method further comprises:
acquiring target resources of which recommended objects have access behaviors, and determining a target resource classification set to which the target resources belong from the plurality of resource classification sets;
determining the resources of the target resource classification set, which do not have access behaviors, as resources to be recommended;
and recommending the resources to be recommended to the recommended object according to the resources to be recommended.
9. A data processing apparatus, the apparatus comprising:
the acquisition unit is used for acquiring resource sample pairs constructed based on the access behaviors of the object set to the resource set; the resource set comprises K resources, K is a positive integer, the pair of resource samples comprises first resources and second resources in the resource set, the similarity of resource content is provided between the first resources and the second resources, the object set comprises N objects, N is a positive integer, M objects in the N objects have access behaviors for the first resources and do not have access behaviors for the second resources, M is greater than or equal to a threshold quantity, and M is a positive integer and less than or equal to N;
the processing unit is used for determining a target object from the M objects and acquiring reference resources with access behaviors of the target object from the resource set; the first moment when the target object accesses the reference resource is adjacent to the second moment when the target object accesses the first resource;
the processing unit is further used for generating a network for training the characteristics based on the reference resources and the resource samples, and obtaining a trained characteristic generation network; the trained feature generation network is used for generating resource classification features of the resources.
10. An electronic device comprising a processor, a memory, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-8.
11. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1-8.
12. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-8.
CN202210228214.9A 2022-03-07 2022-03-07 Data processing method, device, electronic equipment, product and medium Pending CN116776239A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210228214.9A CN116776239A (en) 2022-03-07 2022-03-07 Data processing method, device, electronic equipment, product and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210228214.9A CN116776239A (en) 2022-03-07 2022-03-07 Data processing method, device, electronic equipment, product and medium

Publications (1)

Publication Number Publication Date
CN116776239A true CN116776239A (en) 2023-09-19

Family

ID=88012166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210228214.9A Pending CN116776239A (en) 2022-03-07 2022-03-07 Data processing method, device, electronic equipment, product and medium

Country Status (1)

Country Link
CN (1) CN116776239A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213863A (en) * 2018-08-21 2019-01-15 北京航空航天大学 A kind of adaptive recommended method and system based on learning style
CN111861678A (en) * 2020-08-03 2020-10-30 上海风秩科技有限公司 Resource recommendation method and device, electronic equipment and storage medium
CN112116024A (en) * 2020-09-28 2020-12-22 北京嘀嘀无限科技发展有限公司 Method and device for classifying models by user, electronic equipment and storage medium
US11017019B1 (en) * 2015-08-14 2021-05-25 Shutterstock, Inc. Style classification for authentic content search
CN113590849A (en) * 2021-01-27 2021-11-02 腾讯科技(深圳)有限公司 Multimedia resource classification model training method and multimedia resource recommendation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11017019B1 (en) * 2015-08-14 2021-05-25 Shutterstock, Inc. Style classification for authentic content search
CN109213863A (en) * 2018-08-21 2019-01-15 北京航空航天大学 A kind of adaptive recommended method and system based on learning style
CN111861678A (en) * 2020-08-03 2020-10-30 上海风秩科技有限公司 Resource recommendation method and device, electronic equipment and storage medium
CN112116024A (en) * 2020-09-28 2020-12-22 北京嘀嘀无限科技发展有限公司 Method and device for classifying models by user, electronic equipment and storage medium
CN113590849A (en) * 2021-01-27 2021-11-02 腾讯科技(深圳)有限公司 Multimedia resource classification model training method and multimedia resource recommendation method

Similar Documents

Publication Publication Date Title
CN110225373B (en) Video auditing method and device and electronic equipment
CN112131411A (en) Multimedia resource recommendation method and device, electronic equipment and storage medium
CN112131472B (en) Information recommendation method, device, electronic equipment and storage medium
CN112749608A (en) Video auditing method and device, computer equipment and storage medium
CN111339404A (en) Content popularity prediction method and device based on artificial intelligence and computer equipment
US20240212706A1 (en) Audio data processing
CN111368141B (en) Video tag expansion method, device, computer equipment and storage medium
CN111507097A (en) Title text processing method and device, electronic equipment and storage medium
CN112231563B (en) Content recommendation method, device and storage medium
CN113609374A (en) Data processing method, device and equipment based on content push and storage medium
CN116702737A (en) Document generation method, device, equipment, storage medium and product
CN113395594A (en) Video processing method, device, equipment and medium
CN111723295A (en) Content distribution method, device and storage medium
CN112202849A (en) Content distribution method, content distribution device, electronic equipment and computer-readable storage medium
CN112749330A (en) Information pushing method and device, computer equipment and storage medium
CN117216362A (en) Content recommendation method, device, apparatus, medium and program product
CN114329028A (en) Data processing method, data processing equipment and computer readable storage medium
CN114996486A (en) Data recommendation method and device, server and storage medium
CN113656560B (en) Emotion category prediction method and device, storage medium and electronic equipment
CN113011126A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN116665083A (en) Video classification method and device, electronic equipment and storage medium
CN113704620B (en) User tag updating method, device, equipment and medium based on artificial intelligence
CN116935170B (en) Processing method and device of video processing model, computer equipment and storage medium
US20230214676A1 (en) Prediction model training method, information prediction method and corresponding device
CN114491093B (en) Multimedia resource recommendation and object representation network generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40092314

Country of ref document: HK