CN116541685A - Method and device for determining feature importance evaluation index and electronic equipment - Google Patents

Method and device for determining feature importance evaluation index and electronic equipment Download PDF

Info

Publication number
CN116541685A
CN116541685A CN202310428737.2A CN202310428737A CN116541685A CN 116541685 A CN116541685 A CN 116541685A CN 202310428737 A CN202310428737 A CN 202310428737A CN 116541685 A CN116541685 A CN 116541685A
Authority
CN
China
Prior art keywords
target
feature
determining
data
feedback data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310428737.2A
Other languages
Chinese (zh)
Inventor
蒋江林
李亚辉
高家华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weimeng Chuangke Network Technology China Co Ltd
Original Assignee
Weimeng Chuangke Network Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weimeng Chuangke Network Technology China Co Ltd filed Critical Weimeng Chuangke Network Technology China Co Ltd
Priority to CN202310428737.2A priority Critical patent/CN116541685A/en
Publication of CN116541685A publication Critical patent/CN116541685A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2115Selection of the most significant subset of features by evaluating different subsets according to an optimisation criterion, e.g. class separability, forward selection or backward elimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2178Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The application discloses a method, a device and electronic equipment for determining feature importance evaluation indexes, wherein the method comprises the steps of acquiring at least one group of real-time data in a recommendation system; for at least one group of real-time data, respectively determining an embedded vector corresponding to the user characteristic and an embedded vector corresponding to the material characteristic, and determining a first matrix based on each embedded vector; inputting a first matrix into a target model, acquiring first prediction feedback data output by the target model, and determining first error loss between the first prediction feedback data and the target feedback data; acquiring second prediction feedback data output by the target model by inputting a second matrix into the target model, and determining second error loss between the second prediction feedback data and the target feedback data; and determining an importance evaluation index of the target feature to the fit target feedback data based on the first error loss and the second error loss.

Description

Method and device for determining feature importance evaluation index and electronic equipment
Technical Field
The present invention relates to the field of information flow recommendation, and in particular, to a method and an apparatus for determining a feature importance evaluation index, and an electronic device.
Background
In the information flow recommendation system, the importance of the feature in the model is evaluated, and the essence is that after the feature is introduced, the uncertainty/error of the model fitting target can be reduced. Therefore, the feature importance degree is evaluated, the understanding of the business and the model is facilitated, the related work of feature team feature production can be guided, the stability of a feature link in a recommendation system is monitored, important features are selected by excluding unimportant features, and therefore storage and calculation resources in the recommendation system are saved, and the business cost is reduced.
In the related art, the importance of a feature is usually evaluated based on a model evaluation manner, where the importance of the feature is usually evaluated by the weight of the feature in the model or the number of times the feature is selected to be segmented, for example, the weight coefficient of the feature in a logistic regression model can measure the importance of the feature, and a decision tree model, for example, a random forest and a gradient lifting tree, evaluates the importance of the feature by counting the number of times the feature segmentation node of each feature participates in the decision tree or the gain. However, in the above-mentioned feature importance evaluation method, a model needs to be trained in advance, and an individual test set is divided for evaluation, and the user features, the material features and the recommendation model in the recommendation system are updated in real time, so that the feature importance evaluation method in the related art has a problem of low timeliness.
Disclosure of Invention
The application discloses a method and a device for determining feature importance evaluation indexes and electronic equipment, which are used for solving the problem of low timeliness in an evaluation mode of feature importance in related technologies.
In order to solve the problems, the application adopts the following technical scheme:
in a first aspect, an embodiment of the present application provides a method for determining a feature importance evaluation index, including: acquiring at least one group of real-time data in a recommendation system, wherein the real-time data comprise user characteristics of a target user, material characteristics of a target material and target feedback data of the target user on the target material, the target user is a user requesting the recommendation system to recommend the material, and the target material is the material recommended to the target user by the recommendation system in response to the request; for the at least one set of real-time data, respectively determining an embedded vector corresponding to the user feature and an embedded vector corresponding to the material feature, and determining a first matrix based on each embedded vector; acquiring first prediction feedback data output by a target model by inputting the first matrix into the target model, and determining first error loss between the first prediction feedback data and the target feedback data; acquiring second prediction feedback data output by the target model by inputting a second matrix into the target model, and determining second error loss between the second prediction feedback data and the target feedback data, wherein the second matrix is obtained by randomly transforming an embedded vector corresponding to a target feature in the first matrix, and the target feature is any feature of the user feature or the material feature; and determining an importance assessment index of the target feature on fitting target feedback data based on the first error loss and the second error loss.
In a second aspect, an embodiment of the present application provides a determining device for a feature importance evaluation index, including: the system comprises an acquisition module, a recommendation module and a recommendation module, wherein the acquisition module is used for acquiring at least one group of real-time data in the recommendation system, wherein the real-time data comprise user characteristics of a target user, material characteristics of a target material and target feedback data of the target user on the target material, the target user is a user requesting the recommendation system to recommend the material, and the target material is the material recommended to the target user by the recommendation system in response to the request; the first determining module is used for respectively determining an embedded vector corresponding to the user characteristic and an embedded vector corresponding to the material characteristic aiming at the at least one group of real-time data, and determining a first matrix based on each embedded vector; the second determining module is used for obtaining first prediction feedback data output by the target model by inputting the first matrix into the target model, and determining first error loss between the first prediction feedback data and the target feedback data; the third determining module is configured to obtain second prediction feedback data output by the target model by inputting a second matrix into the target model, and determine a second error loss between the second prediction feedback data and the target feedback data, where the second matrix is a matrix obtained by randomly transforming an embedded vector corresponding to a target feature in the first matrix, and the target feature is any feature of the user feature or the material feature; and a fourth determining module, configured to determine an importance evaluation index of the target feature on fitting target feedback data based on the first error loss and the second error loss.
In a third aspect, embodiments of the present application provide an electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the method as described in the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor implement the steps of the method according to the first aspect.
In a fifth aspect, embodiments of the present application provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.
In a sixth aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executable by at least one processor to implement the method according to the first aspect.
The embodiment of the application provides a method for determining a feature importance evaluation index, which comprises the steps of acquiring at least one group of real-time data in a recommendation system, wherein the real-time data comprise user features of a target user requesting the recommendation system to recommend materials, material features of the target materials recommended to the target user by the recommendation system in response to the request, and target feedback data of the target user on the target materials, respectively determining an embedded vector corresponding to the user features and an embedded vector corresponding to the material features according to the acquired at least one group of real-time data, determining a first matrix based on each embedded vector, then inputting the first matrix into a target model, acquiring first prediction feedback data output by the target model, determining first error loss between the first prediction feedback data and the target feedback data, inputting a second matrix into the target model, acquiring second prediction feedback data output by the target model, determining second error loss between the second prediction feedback data and the target feedback data, and finally determining the importance evaluation index of the target features on the target feedback data based on the first error loss and the second error loss, wherein the second matrix is obtained by fitting the first matrix into a random position corresponding to the target feature, and then obtaining the feature of the target feature. According to the method for determining the feature importance evaluation index, the importance evaluation index of the target feature to the fitting target feedback data can be determined on line in real time according to the real-time data acquired in the recommendation system, training of the target model through the acquired real-time data is not needed, and compared with the mode that the model is needed to be trained in advance through the acquired data in the related technology, the feature importance evaluation method is used for evaluating the feature importance after an independent test set is divided, and therefore timeliness of feature importance evaluation is high.
Drawings
FIG. 1 is a schematic flow chart of a method for determining a feature importance assessment index according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a determining device for feature importance assessment index according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Technical solutions in the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of the protection of the present application.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type and not limited to the number of objects, e.g., the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
The method, the device and the electronic equipment for determining the feature importance evaluation index provided by the embodiment of the application are described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.
An embodiment of the present application provides a method for determining a feature importance assessment index, and fig. 1 is a schematic flow chart of a method for determining a feature importance assessment index disclosed in the embodiment of the present application. As shown in fig. 1, the method includes the following steps.
S110, acquiring at least one group of real-time data in the recommendation system.
The real-time data comprise user characteristics of a target user, material characteristics of a target material and target feedback data of the target user on the target material, wherein the target user is a user requesting a recommendation system to recommend the material, and the target material is the material recommended to the target user by the recommendation system in response to the request.
The user characteristics may include at least one of: gender, age, device number and context characteristics, wherein the device number is the number of the corresponding device when the target user sends a request to the recommendation system, and the context characteristics can include time when the target user sends the request to the recommendation system, longitude and latitude when the target user sends the request to the recommendation system, and the like. The material characteristics of the target material may include at least one of: the number of words included in the target material, the time of issuance of the target material, the current reading of the target material, and the click rate of the target material. It should be noted that the user features and the material features in the present application may also be other relevant features.
S120, respectively determining an embedded vector corresponding to the user characteristic and an embedded vector corresponding to the material characteristic aiming at the at least one group of real-time data, and determining a first matrix based on each embedded vector.
It should be noted that each of the user features has an embedding (embedding) vector corresponding to the user feature, and each of the material features has an embedding vector corresponding to the user feature. Illustratively, the gender has an embedded vector corresponding to the gender, the age has an embedded vector corresponding to the age, and the number of words included in the target material, the time of emission of the target material, and the like also have embedded vectors corresponding thereto, respectively.
The determined dimension of the first matrix is < the number of sets B of real-time data, the feature number S, the vector dimension H >, and illustratively, in the case where 1024 sets of real-time data are acquired, the feature number is 9, and the dimension of the embedded vector is 16, the determined dimension of the first matrix is <1024,9, 16>.
S130, inputting the first matrix into a target model, obtaining first prediction feedback data output by the target model, and determining first error loss between the first prediction feedback data and the target feedback data.
The target model used in the execution of the method for determining the feature importance evaluation index is a model which is not trained by the real-time data, the real-time data acquired at this time is not required to be used for training the target model in advance, the feature importance is directly evaluated by using the real-time data, and the timeliness of evaluating the feature importance can be ensured. It should be noted that, after the preset time of the determining method of the feature importance assessment index disclosed in the present application is executed once, the target model may be trained by using at least one set of real-time data acquired in the determining method, and the preset time may be set to 5 minutes.
S140, obtaining second prediction feedback data output by the target model by inputting a second matrix into the target model, and determining second error loss between the second prediction feedback data and the target feedback data.
The second matrix is a matrix obtained by randomly transforming the embedded vector corresponding to the target feature in the first matrix, and the target feature is any one of the user feature or the material feature.
The target feature may be any one of gender, age, equipment number and context feature of the target user, and the target feature may be any one of word number included in the target material, sending time of the target material, current reading number of the target material and click rate of the target material.
And S150, determining an importance evaluation index of the target feature on fitting target feedback data based on the first error loss and the second error loss.
It should be noted that the importance evaluation index in the present application is the importance degree of the influence of the target feature on the fitting target feedback data. Illustratively, where the target feature is age, an importance assessment index of age to fit the target feedback data is determined based on the first error loss and the second error loss.
After determining the importance assessment index of the target feature to the fit target feedback data, the importance assessment index of the target feature to the fit target feedback data may be written into a transport data stream (Kafka stream).
The embodiment of the application provides a method for determining a feature importance evaluation index, which comprises the steps of acquiring at least one group of real-time data in a recommendation system, wherein the real-time data comprise user features of a target user requesting the recommendation system to recommend materials, material features of the target materials recommended to the target user by the recommendation system in response to the request, and target feedback data of the target user on the target materials, respectively determining an embedded vector corresponding to the user features and an embedded vector corresponding to the material features according to the acquired at least one group of real-time data, determining a first matrix based on each embedded vector, then inputting the first matrix into a target model, acquiring first prediction feedback data output by the target model, determining first error loss between the first prediction feedback data and the target feedback data, inputting a second matrix into the target model, acquiring second prediction feedback data output by the target model, determining second error loss between the second prediction feedback data and the target feedback data, and finally determining the importance evaluation index of the target features on the target feedback data based on the first error loss and the second error loss, wherein the second matrix is obtained by fitting the first matrix into a random position corresponding to the target feature, and then obtaining the feature of the target feature. According to the method for determining the feature importance evaluation index, the importance evaluation index of the target feature to the fitting target feedback data can be determined on line in real time according to the real-time data acquired in the recommendation system, training of the target model through the acquired real-time data is not needed, and compared with the mode that the model is needed to be trained in advance through the acquired data in the related technology, the feature importance evaluation method is used for evaluating the feature importance after an independent test set is divided, and therefore timeliness of feature importance evaluation is high.
In an embodiment of the present application, the determining, for the at least one set of real-time data, an embedding vector corresponding to the user feature and an embedding vector corresponding to the material feature, and determining, based on each of the embedding vectors, a first matrix may include: respectively determining a feature ID corresponding to the first data and a feature ID corresponding to the second data by carrying out data processing on the first data corresponding to the user features and the second data corresponding to the material features, wherein one user feature corresponds to at least one first data and one material feature corresponds to at least one second data; determining first embedded vectors corresponding to the feature IDs based on the feature IDs and an embedded matrix of the target model, wherein the embedded matrix comprises a plurality of first embedded vectors, and the feature IDs are in one-to-one correspondence with the first embedded vectors; under the condition that the same feature corresponds to a plurality of feature IDs, determining a second embedded vector corresponding to the feature based on a first embedded vector corresponding to each feature ID in the plurality of feature IDs; a first matrix is determined based on the embedded vectors corresponding to the respective features, wherein each feature-corresponding embedded vector comprises the first embedded vector or the second embedded vector.
For example, in the case that the user characteristic is gender, the first data corresponding to the user characteristic may be male or female, in the case that the user characteristic is age, the first data corresponding to the user characteristic may be data such as 18, 19 or 26, in the case that the material characteristic is the number of words included in the target material, the second data corresponding to the material characteristic may be data such as 35, 56 or 64, in the case that the material characteristic is the current reading of the target material, the second data corresponding to the material characteristic may be data such as 18 or 256.
In one implementation, each feature corresponds to a first preset bit binary code, and the determining, by performing data processing on first data corresponding to the user feature and second data corresponding to the material feature, a feature ID corresponding to the first data and a feature ID corresponding to the second data respectively may include: mapping the first data corresponding to the user characteristics into a second preset bit binary code, and mapping the second data corresponding to the material characteristics into a second preset bit binary code; determining a third preset bit binary code corresponding to the first data by splicing a first preset bit binary code corresponding to the user characteristic and a second preset bit binary code corresponding to the first data; splicing the first preset bit binary codes corresponding to the material characteristics and the second preset bit binary codes corresponding to the second data, and determining a third preset bit binary code corresponding to the second data; determining an index number obtained by binary coding conversion of a third preset bit corresponding to the first data as a feature ID corresponding to the first data; and determining an index number obtained by binary coding conversion of a third preset bit corresponding to the second data as a feature ID corresponding to the second data.
In this application, for each feature, a unique feature slot (slot) may be used, and the number of the feature slot may be represented by a first preset bit binary code. By performing data processing on first data corresponding to the user features, determining a feature ID corresponding to the first data, taking age 18 (the user features are age, and the first data corresponding to the user features are 18) as an example, mapping 18 into a second preset bit binary code through hashing, then splicing the first preset bit binary code corresponding to the age feature and the second preset bit binary code corresponding to 18 to obtain a third preset bit binary code, then converting the third preset bit binary code into an index number, and determining the index number as the feature ID corresponding to age 18. The method comprises the steps of determining a feature ID corresponding to second data through data processing on the second data corresponding to the material feature, taking the number of words included in a target material as 56 (the material feature is the number of words included in the target material, and the second data corresponding to the material feature is 56) as an example, mapping the 56 hash into a second preset bit binary code, then splicing a first preset bit binary code corresponding to the number of words included in the target material and the second preset bit binary code corresponding to the 56 to obtain a third preset bit binary code, converting the third preset bit binary code into an index number, and determining the index number as the feature ID corresponding to the number of words included in the target material as 56. After determining the above-mentioned respective feature IDs, the respective feature IDs and the target feedback data corresponding thereto may be written into a transport data stream (Kafka stream) for subsequent consumption.
It should be noted that, the number of bits of the first preset bit binary code may be determined according to the feature number, the number of bits of the second preset bit binary code may be determined according to the feature value, and exemplary, the first preset bit binary code may be 10-bit binary code, and the second preset bit binary code may be 54-bit binary code, in this case, the third preset bit binary code obtained by splicing the first preset bit binary code and the second preset bit binary code is 64-bit binary code. The index number may be decimal data.
After each feature ID is obtained, a first embedding vector corresponding to each feature ID may be determined based on each feature ID and the embedding matrix of the target model, and for example, an embedding vector at a position corresponding to the feature ID in the embedding matrix of the target model may be determined, and the embedding vector may be determined as the first embedding vector corresponding to the feature ID.
Illustratively, in the case of including a plurality of pieces of age data (including, for example, age 18, age 25, and age 30), a plurality of feature IDs corresponding to the feature of the age are described, and a second embedded vector corresponding to the feature of the age is determined by summing up the first embedded vectors corresponding to the respective feature IDs in the plurality of feature IDs.
After determining the second embedded vectors corresponding to the features, a first matrix is determined based on the embedded vectors corresponding to the respective features, wherein each of the embedded vectors corresponding to the features includes the first embedded vector or the second embedded vector.
In this embodiment of the present application, the target feedback data includes at least one of a reading duration, whether to click, and whether to interact, and the predicted feedback data includes at least one of a reading duration, whether to click, and whether to interact, where the target feedback data corresponds to the predicted feedback data.
Illustratively, in the case where the target feedback data includes a reading duration, the first predictive feedback data includes a reading duration, and the second predictive feedback data also includes a reading duration, and illustratively, the determined importance evaluation index of the target feature to the fit target feedback data may be an importance evaluation index of the age to the fit reading duration; under the condition that the target feedback data comprises the reading time length and whether the target is clicked, the first prediction feedback data comprises the reading time length and whether the target is clicked, the second prediction feedback data also comprises the reading time length and whether the target is clicked, further, the first error loss corresponding to the reading time length, the first error loss corresponding to the target is clicked, the second error loss corresponding to the reading time length, the second error loss corresponding to the target is clicked, then, the importance evaluation index of the target feature to the fitting reading time length is determined based on the first error loss corresponding to the reading time length and the second error loss corresponding to the reading time length, and the importance evaluation index of the target feature to the fitting is determined based on the first error loss corresponding to the target is clicked and the second error loss corresponding to the target is clicked.
Because the predictive feedback data disclosed by the application can be multiple, the efficiency of feature importance evaluation can be further improved.
In one implementation, the determining, based on the first error loss and the second error loss, an importance assessment index of the target feature to fit target feedback data may include: and determining an importance evaluation index of the target feature to fit target feedback data based on the ratio of the first error loss to the second error loss. Exemplary, at a first error loss of Originlossi j At the second error loss of Sheffelossi j In the case of (a), the importance of the determined feature i to the fitted object j is evaluated as an indexWherein the feature i is any one of the features described above, and the fitting target j is any one of fitting reading duration, clicking or interaction.
In one possible implementation, after the determining the importance assessment index of the target feature to the fit target feedback data, the method may further include: acquiring importance evaluation indexes of a plurality of target features in a first preset time for fitting target feedback data; determining a first average value of importance evaluation indexes of the plurality of target features on the fit target feedback data within the first preset time based on the importance evaluation indexes of the plurality of target features on the fit target feedback data; and alarming when the difference value between the first average value and the second average value is larger than a preset threshold value, wherein the second average value is an average value of importance evaluation indexes of the plurality of target features to the fit target feedback data in the last first preset time. That is, when the difference between the first average value of the importance evaluation index of the target feature to the fit target feedback data in the current first preset time and the second average value of the importance evaluation index of the target feature to the fit target feedback data in the previous first preset time is greater than a preset threshold, it is indicated that the obtained data may have abnormality, alarm processing is performed, and online real-time monitoring of the feature importance is achieved.
In another possible implementation manner, after the determining the importance evaluation index of the target feature to the fit target feedback data, the method may further include: acquiring importance evaluation indexes of a plurality of target features in a second preset time for fitting target feedback data; and determining a third average value of the importance evaluation indexes of the plurality of target features on the fit target feedback data within the second preset time based on the importance evaluation indexes of the plurality of target features on the fit target feedback data, and sending the third average value to a target object in a preset mode. For example, the importance index of the plurality of target features in the Kafka stream to the fit target feedback data in the second preset time can be written into an offline distributed file system (Hadoop Distributed File System, hdfs), a third average value of the importance evaluation index of the target features to the fit target feedback data in the second preset time is determined through a hive-sql (hive-Structured Query Language) big data analysis processing tool, and the third average value of the importance evaluation index of the target features to the fit target feedback data in the second preset time is sent to each member in a team mail mode. Illustratively, the second preset time may be one day, one week, one month, etc., which is not particularly limited in the present application.
In an embodiment of the present application, before the acquiring at least one set of real-time data, the method may further include: the number of groups for acquiring real-time data is preconfigured. That is, the number of batch processes is preset, and by way of example, the number of batch processes may be 1024. In the application, the Flink can be used for carrying out batch processing on real-time data so as to improve the processing capacity of a large amount of unrepeated data such as user ID and material ID.
Furthermore, in the present application, the data sample size of feature importance assessment can be increased to hundreds of millions by processing data through distributed clusters, and the data admittance capability of feature importance assessment can be further extended by adding assessment task computing resources (i.e., assessment task computing devices).
According to the method for determining the feature importance evaluation index provided by the embodiment of the application, the execution subject can be a device for determining the feature importance evaluation index. In the embodiment of the present application, a method for determining a feature importance evaluation index is performed by a determining device for a feature importance evaluation index, which is described by way of example, and the determining device for a feature importance evaluation index provided in the embodiment of the present application is described.
Fig. 2 is a schematic structural diagram of a determining device for a feature importance evaluation index according to an embodiment of the present application. As shown in fig. 2, the feature importance evaluation index determination device 200 includes: the acquisition module 210, the first determination module 220, the second determination module 230, the third determination module 240, and the fourth determination module 250.
In the present application, an obtaining module 210 is configured to obtain at least one set of real-time data in a recommendation system, where the real-time data includes user characteristics of a target user, material characteristics of a target material, and target feedback data of the target user on the target material, the target user is a user who requests the recommendation system to recommend the material, and the target material is a material recommended to the target user by the recommendation system in response to the request; a first determining module 220, configured to determine, for the at least one set of real-time data, an embedding vector corresponding to the user feature and an embedding vector corresponding to the material feature, respectively, and determine a first matrix based on each of the embedding vectors; a second determining module 230, configured to obtain first prediction feedback data output by the target model by inputting the first matrix into the target model, and determine a first error loss between the first prediction feedback data and the target feedback data; a third determining module 240, configured to obtain second prediction feedback data output by the target model by inputting a second matrix into the target model, and determine a second error loss between the second prediction feedback data and the target feedback data, where the second matrix is a matrix obtained by randomly transforming an embedded vector corresponding to a target feature in the first matrix, and the target feature is any feature of the user feature or the material feature; a fourth determining module 250 is configured to determine an importance assessment index of the target feature to fit target feedback data based on the first error loss and the second error loss.
In one implementation, the first determining module 220 determines, for the at least one set of real-time data, an embedded vector corresponding to the user feature and an embedded vector corresponding to the material feature, respectively, and determines a first matrix based on each of the embedded vectors, including: respectively determining a feature ID corresponding to the first data and a feature ID corresponding to the second data by carrying out data processing on the first data corresponding to the user features and the second data corresponding to the material features, wherein one user feature corresponds to at least one first data and one material feature corresponds to at least one second data; determining first embedded vectors corresponding to the feature IDs based on the feature IDs and an embedded matrix of the target model, wherein the embedded matrix comprises a plurality of first embedded vectors, and the feature IDs are in one-to-one correspondence with the first embedded vectors; under the condition that the same feature corresponds to a plurality of feature IDs, determining a second embedded vector corresponding to the feature based on a first embedded vector corresponding to each feature ID in the plurality of feature IDs; a first matrix is determined based on the embedded vectors corresponding to the respective features, wherein each feature-corresponding embedded vector comprises the first embedded vector or the second embedded vector.
In one implementation, each feature corresponds to a first preset bit binary code; the first determining module 220 performs data processing on first data corresponding to the user feature and second data corresponding to the material feature to determine a feature ID corresponding to the first data and a feature ID corresponding to the second data, respectively, including: mapping the first data corresponding to the user characteristics into a second preset bit binary code, and mapping the second data corresponding to the material characteristics into a second preset bit binary code; determining a third preset bit binary code corresponding to the first data by splicing a first preset bit binary code corresponding to the user characteristic and a second preset bit binary code corresponding to the first data; splicing the first preset bit binary codes corresponding to the material characteristics and the second preset bit binary codes corresponding to the second data, and determining a third preset bit binary code corresponding to the second data; determining an index number obtained by binary coding conversion of a third preset bit corresponding to the first data as a feature ID corresponding to the first data; and determining an index number obtained by binary coding conversion of a third preset bit corresponding to the second data as a feature ID corresponding to the second data.
In one implementation, the target feedback data includes at least one of a reading duration, whether to click, and whether to interact, and the predicted feedback data includes at least one of a reading duration, whether to click, and whether to interact, the target feedback data corresponding to the predicted feedback data.
In one implementation, the fourth determining module 250 is further configured to: after the importance evaluation indexes of the target features to the fitting target feedback data are determined, the importance evaluation indexes of a plurality of target features to the fitting target feedback data in a first preset time are obtained; determining a first average value of importance evaluation indexes of the plurality of target features on the fit target feedback data within the first preset time based on the importance evaluation indexes of the plurality of target features on the fit target feedback data; and alarming when the difference value between the first average value and the second average value is larger than a preset threshold value, wherein the second average value is an average value of importance evaluation indexes of the plurality of target features to the fit target feedback data in the last first preset time.
In one implementation, the fourth determining module 250 is further configured to: after the importance evaluation indexes of the target features to the fitting target feedback data are determined, the importance evaluation indexes of a plurality of target features to the fitting target feedback data in a second preset time are obtained; and determining a third average value of the importance evaluation indexes of the plurality of target features on the fit target feedback data within the second preset time based on the importance evaluation indexes of the plurality of target features on the fit target feedback data, and sending the third average value to a target object in a preset mode.
In one implementation, the obtaining module 210 is further configured to: the number of sets of acquired real-time data is preconfigured before the acquiring of at least one set of real-time data.
In one implementation, the fourth determining module 250 determines an importance assessment indicator of the target feature to fit target feedback data based on the first error loss and the second error loss, including: and determining an importance evaluation index of the target feature to fit target feedback data based on the ratio of the first error loss to the second error loss.
The determining device of the feature importance evaluation index in the embodiment of the present application may be an electronic device, or may be a component in the electronic device, for example, an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal.
The device for determining the feature importance evaluation index provided in the embodiment of the present application can implement each process implemented by the embodiment of the method for determining the feature importance evaluation index, and in order to avoid repetition, a detailed description is omitted here.
Optionally, as shown in fig. 3, the embodiment of the present application further provides an electronic device 300, including a processor 301 and a memory 302, where a program or an instruction capable of running on the processor 301 is stored in the memory 302, and the program or the instruction implements each step of the above-mentioned embodiment of the method for determining the feature importance assessment index when executed by the processor 301, and the steps can achieve the same technical effect, so that repetition is avoided and redundant description is omitted here.
The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device described above.
The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored, where the program or the instruction implements each process of the above embodiment of the method for determining the feature importance assessment index when executed by a processor, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.
The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, the processor is configured to run a program or an instruction, implement each process of the above embodiment of the method for determining the feature importance assessment index, and achieve the same technical effect, so that repetition is avoided, and no further description is provided herein.
It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.
The embodiments of the present application provide a computer program product, which is stored in a storage medium, and the program product is executed by at least one processor to implement the respective processes of the embodiments of the method for determining a feature importance assessment index, and achieve the same technical effects, so that repetition is avoided, and a detailed description is omitted herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims (10)

1. A method for determining a feature importance evaluation index, comprising:
acquiring at least one group of real-time data in a recommendation system, wherein the real-time data comprise user characteristics of a target user, material characteristics of a target material and target feedback data of the target user on the target material, the target user is a user requesting the recommendation system to recommend the material, and the target material is the material recommended to the target user by the recommendation system in response to the request;
for the at least one set of real-time data, respectively determining an embedded vector corresponding to the user feature and an embedded vector corresponding to the material feature, and determining a first matrix based on each embedded vector;
acquiring first prediction feedback data output by a target model by inputting the first matrix into the target model, and determining first error loss between the first prediction feedback data and the target feedback data;
acquiring second prediction feedback data output by the target model by inputting a second matrix into the target model, and determining second error loss between the second prediction feedback data and the target feedback data, wherein the second matrix is obtained by randomly transforming an embedded vector corresponding to a target feature in the first matrix, and the target feature is any feature of the user feature or the material feature;
And determining an importance assessment index of the target feature on fitting target feedback data based on the first error loss and the second error loss.
2. The determining method according to claim 1, wherein the determining, for the at least one set of real-time data, the embedding vector corresponding to the user feature and the embedding vector corresponding to the material feature, respectively, and determining the first matrix based on each of the embedding vectors, comprises:
respectively determining a feature ID corresponding to the first data and a feature ID corresponding to the second data by carrying out data processing on the first data corresponding to the user features and the second data corresponding to the material features, wherein one user feature corresponds to at least one first data and one material feature corresponds to at least one second data;
determining first embedded vectors corresponding to the feature IDs based on the feature IDs and an embedded matrix of the target model, wherein the embedded matrix comprises a plurality of first embedded vectors, and the feature IDs are in one-to-one correspondence with the first embedded vectors;
under the condition that the same feature corresponds to a plurality of feature IDs, determining a second embedded vector corresponding to the feature based on a first embedded vector corresponding to each feature ID in the plurality of feature IDs;
A first matrix is determined based on the embedded vectors corresponding to the respective features, wherein each feature-corresponding embedded vector comprises the first embedded vector or the second embedded vector.
3. The method of determining according to claim 2, wherein each feature corresponds to a first predetermined bit binary code;
the step of respectively determining the feature ID corresponding to the first data and the feature ID corresponding to the second data by carrying out data processing on the first data corresponding to the user features and the second data corresponding to the material features comprises the following steps:
mapping the first data corresponding to the user characteristics into a second preset bit binary code, and mapping the second data corresponding to the material characteristics into a second preset bit binary code;
determining a third preset bit binary code corresponding to the first data by splicing a first preset bit binary code corresponding to the user characteristic and a second preset bit binary code corresponding to the first data;
splicing the first preset bit binary codes corresponding to the material characteristics and the second preset bit binary codes corresponding to the second data, and determining a third preset bit binary code corresponding to the second data;
Determining an index number obtained by binary coding conversion of a third preset bit corresponding to the first data as a feature ID corresponding to the first data;
and determining an index number obtained by binary coding conversion of a third preset bit corresponding to the second data as a feature ID corresponding to the second data.
4. The method of determining according to claim 1, wherein the target feedback data includes at least one of a reading duration, a clicking time, and an interaction time, and the predicted feedback data includes at least one of a reading duration, a clicking time, and an interaction time, and the target feedback data corresponds to the predicted feedback data.
5. The method according to claim 1, further comprising, after said determining an importance evaluation index of the target feature to fit target feedback data:
acquiring importance evaluation indexes of a plurality of target features in a first preset time for fitting target feedback data;
determining a first average value of importance evaluation indexes of the plurality of target features on the fit target feedback data within the first preset time based on the importance evaluation indexes of the plurality of target features on the fit target feedback data;
And alarming when the difference value between the first average value and the second average value is larger than a preset threshold value, wherein the second average value is an average value of importance evaluation indexes of the plurality of target features to the fit target feedback data in the last first preset time.
6. The method according to claim 1, further comprising, after said determining an importance evaluation index of the target feature to fit target feedback data:
acquiring importance evaluation indexes of a plurality of target features in a second preset time for fitting target feedback data;
and determining a third average value of the importance evaluation indexes of the plurality of target features on the fit target feedback data within the second preset time based on the importance evaluation indexes of the plurality of target features on the fit target feedback data, and sending the third average value to a target object in a preset mode.
7. The method according to claim 1, wherein the determining an importance evaluation index of the target feature to fit target feedback data based on the first error loss and the second error loss includes:
And determining an importance evaluation index of the target feature to fit target feedback data based on the ratio of the first error loss to the second error loss.
8. A feature importance evaluation index determining apparatus, comprising:
the system comprises an acquisition module, a recommendation module and a recommendation module, wherein the acquisition module is used for acquiring at least one group of real-time data in the recommendation system, wherein the real-time data comprise user characteristics of a target user, material characteristics of a target material and target feedback data of the target user on the target material, the target user is a user requesting the recommendation system to recommend the material, and the target material is the material recommended to the target user by the recommendation system in response to the request;
the first determining module is used for respectively determining an embedded vector corresponding to the user characteristic and an embedded vector corresponding to the material characteristic aiming at the at least one group of real-time data, and determining a first matrix based on each embedded vector;
the second determining module is used for obtaining first prediction feedback data output by the target model by inputting the first matrix into the target model, and determining first error loss between the first prediction feedback data and the target feedback data;
The third determining module is configured to obtain second prediction feedback data output by the target model by inputting a second matrix into the target model, and determine a second error loss between the second prediction feedback data and the target feedback data, where the second matrix is a matrix obtained by randomly transforming an embedded vector corresponding to a target feature in the first matrix, and the target feature is any feature of the user feature or the material feature;
and a fourth determining module, configured to determine an importance evaluation index of the target feature on fitting target feedback data based on the first error loss and the second error loss.
9. An electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the method of determining a feature importance assessment indicator according to any one of claims 1-7.
10. A readable storage medium, wherein a program or instructions is stored on the readable storage medium, which when executed by a processor, implements the steps of the method for determining a feature importance assessment index according to any one of claims 1-7.
CN202310428737.2A 2023-04-20 2023-04-20 Method and device for determining feature importance evaluation index and electronic equipment Pending CN116541685A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310428737.2A CN116541685A (en) 2023-04-20 2023-04-20 Method and device for determining feature importance evaluation index and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310428737.2A CN116541685A (en) 2023-04-20 2023-04-20 Method and device for determining feature importance evaluation index and electronic equipment

Publications (1)

Publication Number Publication Date
CN116541685A true CN116541685A (en) 2023-08-04

Family

ID=87442722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310428737.2A Pending CN116541685A (en) 2023-04-20 2023-04-20 Method and device for determining feature importance evaluation index and electronic equipment

Country Status (1)

Country Link
CN (1) CN116541685A (en)

Similar Documents

Publication Publication Date Title
CN108229986B (en) Feature construction method in information click prediction, information delivery method and device
CN113626241B (en) Abnormality processing method, device, equipment and storage medium for application program
CN112990281A (en) Abnormal bid identification model training method, abnormal bid identification method and abnormal bid identification device
CN114742520A (en) Post matching method, device, equipment and storage medium
CN110347973B (en) Method and device for generating information
CN116629423A (en) User behavior prediction method, device, equipment and storage medium
CN116541685A (en) Method and device for determining feature importance evaluation index and electronic equipment
WO2023050670A1 (en) False information detection method and system, computer device, and readable storage medium
CN111935279B (en) Internet of things network maintenance method based on block chain and big data and computing node
CN115905293A (en) Switching method and device of job execution engine
CN113779116B (en) Object ordering method, related equipment and medium
CN114896955A (en) Data report processing method and device, computer equipment and storage medium
CN115168509A (en) Processing method and device of wind control data, storage medium and computer equipment
CN110442767B (en) Method and device for determining content interaction platform label and readable storage medium
CN113850669A (en) User grouping method and device, computer equipment and computer readable storage medium
CN114913008A (en) Decision tree-based bond value analysis method, device, equipment and storage medium
CN115345311A (en) Data processing method and device for model training, electronic equipment and storage medium
CN117151227B (en) Reasoning method and device for semiconductor detection result
CN112667398B (en) Resource scheduling method and device, electronic equipment and storage medium
CN114860912B (en) Data processing method, device, electronic equipment and storage medium
CN114611712B (en) Prediction method based on heterogeneous federated learning, model generation method and device
CN116962579A (en) Traffic scheduling method, device, computer equipment and storage medium
CN116401088A (en) Root cause index determination method, root cause index determination device and root cause index determination equipment
CN117725923A (en) Text matching method, device, equipment and medium
CN113806517A (en) Outbound method, device, equipment and medium based on machine learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination