CN111400608B - Data processing method and device, storage medium and electronic equipment - Google Patents

Data processing method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN111400608B
CN111400608B CN202010509708.5A CN202010509708A CN111400608B CN 111400608 B CN111400608 B CN 111400608B CN 202010509708 A CN202010509708 A CN 202010509708A CN 111400608 B CN111400608 B CN 111400608B
Authority
CN
China
Prior art keywords
module
data
target
target module
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010509708.5A
Other languages
Chinese (zh)
Other versions
CN111400608A (en
Inventor
尹学正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sohu New Power Information Technology Co ltd
Original Assignee
Beijing Sohu New Power Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sohu New Power Information Technology Co ltd filed Critical Beijing Sohu New Power Information Technology Co ltd
Priority to CN202010509708.5A priority Critical patent/CN111400608B/en
Publication of CN111400608A publication Critical patent/CN111400608A/en
Application granted granted Critical
Publication of CN111400608B publication Critical patent/CN111400608B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The invention provides a data processing method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: in the information recommendation process, acquiring process data and result data of data processing performed by a target module, wherein the target module is any one module in a recommendation system, acquiring a module processing identifier of a previous module of the target module, generating a module processing identifier of the target module, and forming buried point data corresponding to the target module by using the process data, the result data, the module processing identifier of the previous module of the target module and the module processing identifier of the target module, analyzing the buried point data corresponding to the target module to obtain an analysis result, and storing the analysis result. According to the technical scheme, the data embedding point corresponding to each module in the information recommendation process is obtained, and the data embedding point is analyzed and stored, so that the traceability of the data use condition in the recommendation system is realized, and the transparency and the interpretability of the data use in the information recommendation process are improved.

Description

Data processing method and device, storage medium and electronic equipment
Technical Field
The present invention relates to the field of computer applications, and in particular, to a data processing method and apparatus, a storage medium, and an electronic device.
Background
With the increasing number of mobile internet users and the rise of User Generated Content (UGC), a great amount of Content information causes User information overload, and a recommendation system is in operation in order to alleviate the information overload.
The conventional recommendation system mainly comprises a portrait module, a recall module, a sequencing module and a recommendation engine module, wherein a user portrait and an article portrait are generated in the portrait module, the user portrait, the article portrait and user historical behavior data are combined in the recall module to screen out a plurality of articles which are possibly interested by a user, the screened articles are sequenced in the sequencing module, and finally, a recommendation result is generated by the recommendation engine module to perform information recommendation.
The existing recommendation system adopts an asynchronous processing mode to deploy the portrait module, the recall module, the sequencing module and the recommendation engine, namely, the data processing mode of each module in the recommendation system is asynchronous processing, so that the data use condition in the recommendation system cannot be traced, and the interpretability of data use is poor.
Disclosure of Invention
The application provides a data processing method and device, a storage medium and electronic equipment, and aims to solve the problem that the interpretability of data use is poor due to the fact that the data use condition in a recommendation system cannot be traced because the data processing mode of each module in the recommendation system is asynchronous processing.
In order to achieve the above object, the present application provides the following technical solutions:
a data processing method is applied to a recommendation system, the recommendation system comprises a portrait module, a recall module, a sorting module and a recommendation engine module, and the method comprises the following steps:
in the information recommendation process, acquiring process data and result data of data processing performed by a target module; the target module is any one module in the recommendation system;
acquiring a module processing identifier of a last module of the target module;
generating a module processing identifier of the target module;
forming the process data, the result data, the module processing identifier of the last module of the target module and the module processing identifier of the target module into buried point data corresponding to the target module;
and analyzing the buried point data corresponding to the target module according to a preset data analysis strategy corresponding to the target module to obtain an analysis result, and storing the analysis result.
Optionally, the above method, wherein the generating a module processing identifier of the target module includes:
starting a preset identification generation algorithm to generate a data processing identification of the target module;
acquiring preset data processing logic corresponding to the target module;
and forming the module processing identifier of the target module by the data processing identifier and the data processing logic.
Optionally, in the method, the analyzing the buried point data corresponding to the target module according to a preset data analysis policy corresponding to the target module includes:
determining each key field contained in a preset data analysis strategy corresponding to the target module;
and analyzing the buried point data corresponding to the target module, and extracting the data item corresponding to each key field from the buried point data.
Optionally, in the method, after the step of forming the process data, the result data, the module processing identifier of the previous module of the target module, and the module processing identifier of the target module into the buried point data corresponding to the target module, the method further includes:
and storing the buried point data corresponding to the target module into a pre-constructed database.
The above method, optionally, further includes:
according to a preset period, counting the data of each buried point stored in the database to obtain a counting result;
and displaying the statistical result.
Optionally, the above method, wherein the counting the data of each buried point stored in the database includes:
determining buried point data in a preset period according to the buried point data stored in the database;
calculating the data utilization rate of each module contained in the recommendation system according to the data of the buried points in the preset period; the data utilization rate of each module is used for representing the proportion condition between the result data of the module for data processing and the result data of the previous module of the module;
calculating the data coverage rate of the first module and the feature missing rate of each module contained in the recommendation system according to the buried point data in the preset period; the first module comprises a portrait module or a recall module, the data coverage rate of the first module is used for representing the proportion between effective result data obtained by data processing of the first module and the total number of information recommendation requests in the preset period, fields contained in the effective result data are non-empty fields, the feature missing rate of each module is used for representing the proportion between the number of effective module processing identifiers of a previous module of the module and the total number of module processing identifiers of the previous module of the module, and the fields contained in the effective module processing identifiers are non-empty fields.
A data processing device is applied to a recommendation system, the recommendation system comprises a portrait module, a recall module, a sorting module and a recommendation engine module, and the device comprises:
the first acquisition unit is used for acquiring process data and result data of data processing performed by a target module in the information recommendation process; the target module is any one module in the recommendation system;
a second obtaining unit, configured to obtain a module processing identifier of a previous module of the target module;
the generating unit is used for generating a module processing identifier of the target module;
the combination unit is used for combining the process data, the result data, the module processing identifier of the last module of the target module and the module processing identifier of the target module into buried point data corresponding to the target module;
and the analysis unit is used for analyzing the buried point data corresponding to the target module according to a preset data analysis strategy corresponding to the target module to obtain an analysis result, and storing the analysis result.
The above apparatus, optionally, the generating unit includes:
the generating subunit is used for starting a preset identifier generating algorithm and generating a data processing identifier of the target module;
the acquisition subunit is used for acquiring preset data processing logic corresponding to the target module;
and the combination subunit is used for combining the data processing identifier and the data processing logic into a module processing identifier of the target module.
A storage medium, the storage medium comprising stored instructions, wherein when the instructions are executed, the storage medium controls a device to execute the data processing method.
An electronic device comprising a memory, and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by the one or more processors to perform the data processing method described above.
Compared with the prior art, the invention has the following advantages:
the invention provides a data processing method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: in the information recommendation process, acquiring process data and result data of a target module for data processing, wherein the target module is any one module in the recommendation system, acquiring a module processing identifier of a previous module of the target module, generating a module processing identifier of the target module, and analyzing the process data, the result data, the module processing identifier of the previous module of the target module and the module processing identifier of the target module to form buried point data corresponding to the target module, and according to a preset data analysis strategy corresponding to the target module, analyzing the buried point data corresponding to the target module to obtain an analysis result and storing the analysis result. Therefore, according to the technical scheme provided by the invention, the data embedding point corresponding to each module in the information recommendation process is obtained, and the data embedding point is analyzed and stored, so that the traceability of the data use condition in the recommendation system is realized, and the transparency and the interpretability of the data use in the information recommendation process are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method of data processing according to the present invention;
FIG. 2 is a flow chart of another method of a data processing method according to the present invention;
FIG. 3 is a flow chart of another method of a data processing method according to the present invention;
FIG. 4 is a block diagram of a data processing system according to the present invention;
FIG. 5 is a schematic structural diagram of a data processing apparatus according to the present invention;
fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a data processing method which can be applied to a recommendation system, wherein the recommendation system comprises a portrait module, a recall module, a sorting module and a recommendation engine module, a user portrait and an article portrait are generated in the portrait module, the user portrait, the article portrait and user historical behavior data are combined in the recall module, a plurality of articles which are possibly interested by a user are screened out, the screened articles are sorted in the sorting module, and finally a recommendation result is generated by the recommendation engine module to carry out information recommendation.
An execution main body of the data processing method provided by the embodiment of the present invention may be a server running on a computer, and a flowchart of the data processing method is shown in fig. 1, and specifically includes:
s101, in the information recommending process, acquiring process data and result data of data processing performed by a target module.
In the method provided by the embodiment of the invention, the recommendation system comprises a portrait module, a recall module, a sorting module and a recommendation engine module, wherein the portrait module is respectively connected with the recall module and the sorting module, the recall module is respectively connected with the sorting module and the recommendation engine module, and the sorting module is connected with the recommendation engine module.
In the information recommendation process, acquiring process data and result data of data processing performed by a target module in real time, wherein the target module is any one module of a recommendation system, and it should be noted that if the target module is a portrait module, the result data obtained by data processing performed by the portrait module is a user portrait or an article portrait, and the next module of the portrait module is a recall module and a sorting module; if the target module is a recall module, the result data obtained by data processing of the recall module is a recall result set recommended to the user, namely a plurality of articles which are possibly interested by the user, and the next module of the recall module is a sorting module and a recommendation engine module; if the target module is a sorting module, the result data obtained by data processing of the sorting module is the result of sorting the articles recalled by the recall module, and the next module of the sorting module is a recommendation engine module; and if the target module is the recommendation engine module, the result data obtained by data processing of the recommendation engine module is the recommendation result to be recommended to the user.
It should be noted that, in the implementation of the present invention, the process of performing data processing by the target module is the prior art, and is not described herein again.
S102, obtaining a module processing identifier of a previous module of the target module.
Acquiring a module processing identifier of a previous module of the target module, wherein if the target module is the portrait module, a field corresponding to the module processing identifier of the previous module of the portrait module is a null field; if the target module is a recall module, the module processing identifier of the last module of the recall module is the module processing identifier of the portrait module; if the target module is a sorting module, the module processing identifier of the last module of the sorting module is the module processing identifier of the portrait module and the module processing identifier of the recall module; if the target module is the recommendation engine module, the module processing identifier of the last module of the recommendation engine module is the module processing identifier of the sorting module and the module processing identifier of the recall module.
It should be noted that, if a field corresponding to the module identifier of the previous module of the target module is an empty field, it indicates that the target module and the previous module of the target module have no data dependency or lack of data dependency.
It should be noted that the module processing identifier is used to uniquely identify the current information recommendation request corresponding to the module, that is, for any information recommendation request, each module generates a module processing identifier corresponding to the information recommendation request. Optionally, the module processing identifier may be represented by a tMid, where the tMid is a binary data < tid, type >, where tid is a data processing identifier generated for the information recommendation request, the data processing identifier may be generated by starting a preset identifier generation algorithm, and the preset identifier generation algorithm is an existing algorithm for generating an identifier, including but not limited to a snowflake algorithm; the type is a data processing logic corresponding to a preset module.
It should be noted that, in the method provided in the embodiment of the present invention, data sent by each module to the next module is a binary set < tMid, P >, where tMid is a module processing identifier of the module, and P is result data obtained by data processing performed by the module, that is, data transmitted by each module to the next module is result data obtained by data processing performed by the module and a module processing identifier of the module.
S103, generating a module processing identifier of the target module.
Referring to fig. 2, a specific process for generating a module processing identifier of a target module includes the following steps:
s201, starting a preset identification generation algorithm, and generating a data processing identification of the target module.
S202, acquiring data processing logic corresponding to a preset target module.
S203, forming the module processing identification of the target module by the data processing identification and the data processing logic.
In the method provided by the embodiment of the present invention, a generation process of the module processing identifier of the target module is the same as a generation process of the module processing identifier of the previous module of the above-mentioned target module, the data processing identifier tid of the target module is generated based on the preset identifier generation algorithm, the data processing logic type corresponding to the preset target module is obtained, and the data processing identifier corresponding to the target module and the data processing logic are combined to obtain the module processing identifier of the target module, that is, the binary group < tid, type >.
And S104, forming the buried point data corresponding to the target module by using the process data, the result data, the module processing identifier of the last module of the target module and the module processing identifier of the target module.
Combining the process data and the result data of the data processing performed by the target module, the module processing identifier of the previous module of the target module, and the module processing identifier of the target module to obtain the buried point data corresponding to the target module, optionally combining the process data and the result data of the data processing performed by the target module into the buried point content data, and combining the module processing identifier of the previous module of the target module, the module processing identifier of the target module, and the buried point content data into the buried point data of the target module, optionally, the buried point data of the target module can be represented by a triple < sMid, tMid, R >, wherein sMid is used for representing the module processing identifier of the previous module of the target module, tMid is used for representing the module processing identifier of the target module, and R is used for representing the buried point content data.
It should be noted that the embedded point data corresponding to the target module includes a module processing identifier of a previous module of the target module, and a link between the embedded point data corresponding to each module in the recommendation system is established through the module processing identifier of the previous module of the target module included in the embedded point data, so that the data use condition of any information recommendation request can be traced.
Optionally, the buried point data corresponding to the target module may be written into a pre-constructed Message Queue in real time, and optionally, the Message Queue may be a Kafka Message Queue, where Kafka is a distributed Message Queue system (Message Queue) developed by Linkedin.
And S105, analyzing the buried point data corresponding to the target module according to a preset data analysis strategy corresponding to the target module to obtain an analysis result, and storing the analysis result.
According to a preset data analysis strategy corresponding to the target module, the process of analyzing the buried point data corresponding to the target module specifically comprises the following steps:
determining each key field contained in a data analysis strategy corresponding to a preset target module;
and analyzing the buried point data corresponding to the target module, and extracting the data item corresponding to each key field from the buried point data.
In the method provided by the embodiment of the invention, the data analysis strategy corresponding to each module is constructed in advance. For example, the data parsing policy corresponding to the recall module is shown in table 1:
table 1 data parsing policy table
Figure 768383DEST_PATH_IMAGE001
Determining a data analysis strategy corresponding to a preset target module, determining each key field contained in the data analysis strategy, analyzing the buried point data corresponding to the target module, and extracting each data item of each key field from the buried point data.
And storing the analysis result obtained by analysis, and optionally, storing the analysis result obtained by analysis into a Search engine Elastic Search.
Optionally, before analyzing the buried point data corresponding to the target module according to a preset data analysis policy corresponding to the target module, the method further includes:
monitoring whether buried point data exists in a message queue in real time;
and when the buried point data exists in the message queue, acquiring the buried point data.
In the method provided by the embodiment of the invention, the message queue is monitored in real time, and when the buried point data exists in the message queue, the buried point data is obtained.
In the data processing method provided by the embodiment of the invention, in the information recommendation process, process data and result data of a target module for data processing are obtained, the target module is any one module in a recommendation system, a module processing identifier of a previous module of the target module is obtained, a module processing identifier of the target module is generated, the process data, the result data, the module processing identifier of the previous module of the target module and the module processing identifier of the target module form buried point data corresponding to the target module, buried point data corresponding to the target module is analyzed according to a preset data analysis strategy corresponding to the target module to obtain an analysis result, and the analysis result is stored. By applying the data processing method provided by the embodiment of the invention, the data embedding point corresponding to each module in the information recommendation process is obtained, and the data embedding point is analyzed and stored, so that the traceability of the data use condition in the recommendation system is realized, and the transparency and the interpretability of the data use in the information recommendation process are improved.
After the process data, the result data, the module processing identifier of the previous module of the target module, and the module processing identifier of the target module related to step S104 disclosed in the embodiment of the present invention in fig. 1 are combined into the buried point data corresponding to the target module, the flowchart is shown in fig. 3, and may further include the following steps:
s301, storing the buried point data corresponding to the target module into a pre-constructed database.
After the process data and the result data in the data processing process of the target module, the module processing identifier of the last module of the target module and the module processing identifier of the target module form the buried point data corresponding to the target module, the buried point data corresponding to the target module is stored in a pre-constructed database, optionally, the buried point data corresponding to the target module can be stored in a data warehouse tool Hive, wherein the Hive is a data warehouse tool based on a distributed system Hadoop and is used for data extraction, transformation and loading, and the mechanism is capable of storing, inquiring and analyzing large-scale data stored in the Hadoop.
S302, counting the data of each buried point stored in the database according to a preset period to obtain a counting result.
The concrete implementation process of counting the data of each buried point stored in the database according to the preset period comprises the following steps:
determining buried point data in a preset period according to the buried point data stored in the database;
calculating the data utilization rate of each module contained in the recommendation system according to the data of the buried points in the preset period; the data utilization rate of each module is used for representing the proportion condition between the result data of the module for data processing and the result data of the previous module of the module;
calculating the data coverage rate of the first module and the characteristic missing rate of each module contained in the recommendation system according to the data of the buried points in the preset period; the first module comprises a portrait module or a recall module, the data coverage rate of the first module is used for representing the occupation situation between effective result data obtained by data processing of the first module and the total number of information recommendation requests in a preset period, fields contained in the effective result data are non-empty fields, the feature missing rate of each module is used for representing the effective module processing identification of the last module of the module, the occupation situation between the effective module processing identification of the last module of the module and the total number of the module processing identifications of the last module of the module, and the fields contained in the effective module processing identification are non-empty fields.
In the method provided by the embodiment of the present invention, the buried point data in the preset period is determined according to the buried point data stored in the database, for example, if the statistical time is set to perform statistics on a daily basis, all the buried point data in one day are determined, optionally, timing statistics may be set, and when the timing time arrives, the buried point data is counted.
In the method provided by the embodiment of the invention, the data utilization rate of each module, the data coverage rate of the first module and the characteristic missing rate of each module can be counted. The first module comprises an image module and a recall module, namely the data coverage rate of the image module and the data coverage rate of the recall module are counted, the data utilization rate of each module is used for representing the result data of the module for data processing, and the proportion condition between the result data of the previous module of the module is that, for example, in a preset period, the recall module receives 3000 thousands of result data sent by the image module, the recall module processes 2000 thousands of result data, and the data utilization rate of the recall module is 66.67%; the data coverage rate of the portrait module is used for representing the proportion condition between effective result data obtained by data processing of the portrait module and the total number of information recommendation requests in a preset period, the data coverage rate of the recall module is used for representing the proportion request between the effective result data obtained by data processing of the recall module and the total number of the information recommendation requests in the preset period, and fields contained in the effective result data are non-empty fields; the feature missing rate of each module is used for representing the proportion between the number of the effective module processing identifiers of the previous module of the module and the total number of the module processing identifiers of the previous module of the module.
Optionally, a relevant statistical graph may be generated according to the buried point data in the preset period, where the relevant statistical graph includes, but is not limited to, a bar statistical graph, and a polyline statistical graph, for example, a polyline statistical graph of the total number of the buried point data corresponding to every two adjacent hours in one day is generated.
It should be noted that the buried point data contained in the database may be counted off-line or in real time according to a preset period.
And S303, displaying the statistical result.
And displaying the statistical result obtained by statistics, namely visualizing the statistical result, namely displaying the data utilization rate of each module, the data coverage rate of the portrait module, the data coverage rate of the recall module, the feature missing rate of each module and a related statistical chart.
According to the data processing method provided by the embodiment of the invention, the buried point data is counted according to the preset period so as to evaluate the working condition of each module in the recommendation system, and the counted result is visualized so as to improve the user experience.
An embodiment of the present invention further provides a data processing system, a schematic structural diagram of which is shown in fig. 4, and the data processing system specifically includes:
a representation module 401, a recall module 402, a ranking module 403, a recommendation engine module 404, a buried point data collection module 405, a data analysis module 406, and a data visualization module 407.
The recall module 402 is respectively connected with the portrait module 401, the sorting module 403 and the recommendation engine module 404; ranking module 403 is coupled to portrait module 401 and recommendation engine module 404, respectively.
The buried point data collection module 405 is connected to the portrait module 401, the recall module 402, the sorting module 403, and the recommendation engine module 404, and is configured to obtain buried point data of each module in an information recommendation process, where the buried point data of each module includes a module processing identifier of the module, a module processing identifier of a previous module of the module, and buried point content data, and the buried point content data includes process data and result data of data processing performed by the module.
The data analysis module 406 includes a real-time data analysis module 4061 and an offline data analysis module 4062, and the data analysis module 406 is connected to the buried point data collection module 405 and the data visualization module 407, and is configured to perform real-time data analysis and offline data analysis on the buried point data, and send a result obtained by the real-time data analysis and a result obtained by the offline data analysis to the visualization module 408 for visualization.
Corresponding to the method described in fig. 1, an embodiment of the present invention further provides a data processing apparatus, which is used for implementing the method in fig. 1 specifically, and a schematic structural diagram of the data processing apparatus is shown in fig. 5, and specifically includes:
a first obtaining unit 501, configured to obtain process data and result data of data processing performed by a target module in an information recommendation process; the target module is any one module in the recommendation system;
a second obtaining unit 502, configured to obtain a module processing identifier of a previous module of the target module;
a generating unit 503, configured to generate a module processing identifier of the target module;
a combining unit 504, configured to combine the process data, the result data, a module processing identifier of a previous module of the target module, and a module processing identifier of the target module into buried point data corresponding to the target module;
and the analyzing unit 505 is configured to analyze the buried point data corresponding to the target module according to a preset data analysis policy corresponding to the target module to obtain an analysis result, and store the analysis result.
In the data processing apparatus provided in the embodiment of the present invention, in the information recommendation process, process data and result data of a target module for performing data processing are obtained, the target module is any one module in a recommendation system, a module processing identifier of a previous module of the target module is obtained, a module processing identifier of the target module is generated, the process data, the result data, the module processing identifier of the previous module of the target module, and the module processing identifier of the target module form buried point data corresponding to the target module, buried point data corresponding to the target module is analyzed according to a preset data analysis policy corresponding to the target module, an analysis result is obtained, and the analysis result is stored. By applying the data processing device provided by the embodiment of the invention, the data embedding point corresponding to each module in the information recommendation process is obtained, and the data embedding point is analyzed and stored, so that the traceability of the data use condition in the recommendation system is realized, and the transparency and the interpretability of the data use in the information recommendation process are improved.
In an embodiment of the present invention, based on the foregoing scheme, the generating unit 503 is configured to:
the generating subunit is used for starting a preset identifier generating algorithm and generating a data processing identifier of the target module;
the acquisition subunit is used for acquiring preset data processing logic corresponding to the target module;
and the combination subunit is used for combining the data processing identifier and the data processing logic into a module processing identifier of the target module.
In an embodiment of the present invention, based on the foregoing solution, the parsing unit 505 executes a data parsing strategy according to a preset data corresponding to the target module, and parses the buried point data corresponding to the target module, so as to:
determining each key field contained in a preset data analysis strategy corresponding to the target module;
and analyzing the buried point data corresponding to the target module, and extracting the data item corresponding to each key field from the buried point data.
In an embodiment of the present invention, based on the foregoing solution, the method may further include:
and the storage unit is used for storing the buried point data corresponding to the target module into a pre-constructed database.
In an embodiment of the present invention, based on the foregoing solution, the method may further include:
the statistical unit is used for carrying out statistics on each buried point data stored in the database according to a preset period to obtain a statistical result;
and the display unit is used for displaying the statistical result.
In an embodiment of the present invention, based on the foregoing scheme, the statistics unit performs statistics on each buried point data stored in the database, for:
determining buried point data in a preset period according to the buried point data stored in the database;
calculating the data utilization rate of each module contained in the recommendation system according to the data of the buried points in the preset period; the data utilization rate of each module is used for representing the proportion condition between the result data of the module for data processing and the result data of the previous module of the module;
calculating the data coverage rate of the first module and the feature missing rate of each module contained in the recommendation system according to the buried point data in the preset period; the first module comprises a portrait module or a recall module, the data coverage rate of the first module is used for representing the proportion between effective result data obtained by data processing of the first module and the total number of information recommendation requests in the preset period, fields contained in the effective result data are non-empty fields, the feature missing rate of each module is used for representing the proportion between the number of effective module processing identifiers of a previous module of the module and the total number of module processing identifiers of the previous module of the module, and the fields contained in the effective module processing identifiers are non-empty fields.
The embodiment of the invention also provides a storage medium, which comprises a stored instruction, wherein when the instruction runs, the device where the storage medium is located is controlled to execute the data processing method.
An electronic device is provided in an embodiment of the present invention, and the structural diagram of the electronic device is shown in fig. 6, which specifically includes a memory 601 and one or more instructions 602, where the one or more instructions 602 are stored in the memory 601 and configured to be executed by one or more processors 603 to perform the following operations on the one or more instructions 602:
in the information recommendation process, acquiring process data and result data of data processing performed by a target module; the target module is any one module in the recommendation system;
acquiring a module processing identifier of a last module of the target module;
generating a module processing identifier of the target module;
forming the process data, the result data, the module processing identifier of the last module of the target module and the module processing identifier of the target module into buried point data corresponding to the target module;
and analyzing the buried point data corresponding to the target module according to a preset data analysis strategy corresponding to the target module to obtain an analysis result, and storing the analysis result.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the units may be implemented in the same software and/or hardware or in a plurality of software and/or hardware when implementing the invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The data processing method and apparatus, the storage medium, and the electronic device provided by the present invention are described in detail above, and a specific example is applied in the present disclosure to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A data processing method is applied to a recommendation system, wherein the recommendation system comprises a portrait module, a recall module, a sorting module and a recommendation engine module, and the method comprises the following steps:
in the information recommendation process, acquiring process data and result data of data processing performed by a target module; the target module is any one module in the recommendation system;
acquiring a module processing identifier of a last module of the target module;
generating a module processing identifier of the target module;
forming the process data, the result data, the module processing identifier of the last module of the target module and the module processing identifier of the target module into buried point data corresponding to the target module;
and analyzing the buried point data corresponding to the target module according to a preset data analysis strategy corresponding to the target module to obtain an analysis result, and storing the analysis result.
2. The method of claim 1, wherein generating the module process identification of the target module comprises:
starting a preset identification generation algorithm to generate a data processing identification of the target module;
acquiring preset data processing logic corresponding to the target module;
and forming the module processing identifier of the target module by the data processing identifier and the data processing logic.
3. The method of claim 1, wherein the analyzing the buried point data corresponding to the target module according to a preset data analysis strategy corresponding to the target module comprises:
determining each key field contained in a preset data analysis strategy corresponding to the target module;
and analyzing the buried point data corresponding to the target module, and extracting the data item corresponding to each key field from the buried point data.
4. The method of claim 1, wherein the step of composing the process data, the result data, the module processing identifier of the previous module of the target module, and the module processing identifier of the target module into the buried point data corresponding to the target module further comprises:
and storing the buried point data corresponding to the target module into a pre-constructed database.
5. The method of claim 4, further comprising:
according to a preset period, counting the data of each buried point stored in the database to obtain a counting result;
and displaying the statistical result.
6. The method of claim 5, wherein said counting each buried point data stored in said database comprises:
determining buried point data in a preset period according to the buried point data stored in the database;
calculating the data utilization rate of each module contained in the recommendation system according to the data of the buried points in the preset period; the data utilization rate of each module is used for representing the proportion condition between the result data of the module for data processing and the result data of the previous module of the module;
calculating the data coverage rate of the first module and the feature missing rate of each module contained in the recommendation system according to the buried point data in the preset period; the first module comprises a portrait module or a recall module, the data coverage rate of the first module is used for representing the proportion between effective result data obtained by data processing of the first module and the total number of information recommendation requests in the preset period, fields contained in the effective result data are non-empty fields, the feature missing rate of each module is used for representing the proportion between the number of effective module processing identifiers of a previous module of the module and the total number of module processing identifiers of the previous module of the module, and the fields contained in the effective module processing identifiers are non-empty fields.
7. A data processing device is applied to a recommendation system, the recommendation system comprises a portrait module, a recall module, a sorting module and a recommendation engine module, and the device comprises:
the first acquisition unit is used for acquiring process data and result data of data processing performed by a target module in the information recommendation process; the target module is any one module in the recommendation system;
a second obtaining unit, configured to obtain a module processing identifier of a previous module of the target module;
the generating unit is used for generating a module processing identifier of the target module;
the combination unit is used for combining the process data, the result data, the module processing identifier of the last module of the target module and the module processing identifier of the target module into buried point data corresponding to the target module;
and the analysis unit is used for analyzing the buried point data corresponding to the target module according to a preset data analysis strategy corresponding to the target module to obtain an analysis result, and storing the analysis result.
8. The apparatus of claim 7, wherein the generating unit comprises:
the generating subunit is used for starting a preset identifier generating algorithm and generating a data processing identifier of the target module;
the acquisition subunit is used for acquiring preset data processing logic corresponding to the target module;
and the combination subunit is used for combining the data processing identifier and the data processing logic into a module processing identifier of the target module.
9. A storage medium comprising stored instructions, wherein the instructions, when executed, control a device on which the storage medium resides to perform a data processing method according to any one of claims 1 to 6.
10. An electronic device comprising a memory and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by the one or more processors to perform the data processing method of any one of claims 1 to 6.
CN202010509708.5A 2020-06-08 2020-06-08 Data processing method and device, storage medium and electronic equipment Active CN111400608B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010509708.5A CN111400608B (en) 2020-06-08 2020-06-08 Data processing method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010509708.5A CN111400608B (en) 2020-06-08 2020-06-08 Data processing method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN111400608A CN111400608A (en) 2020-07-10
CN111400608B true CN111400608B (en) 2020-08-28

Family

ID=71437633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010509708.5A Active CN111400608B (en) 2020-06-08 2020-06-08 Data processing method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111400608B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069384A (en) * 2020-09-04 2020-12-11 中国平安人寿保险股份有限公司 Buried point data processing method, server and readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9892447B2 (en) * 2013-05-08 2018-02-13 Ebay Inc. Performing image searches in a network-based publication system
CN109815381A (en) * 2018-12-21 2019-05-28 平安科技(深圳)有限公司 User's portrait construction method, system, computer equipment and storage medium
CN109948059A (en) * 2019-03-28 2019-06-28 北京字节跳动网络技术有限公司 Recommended method, device, equipment and the storage medium of content
CN110619094A (en) * 2019-09-09 2019-12-27 上海钧正网络科技有限公司 Riding vehicle recommendation method, device, system, computer equipment and storage medium
CN110851706B (en) * 2019-10-10 2022-11-01 百度在线网络技术(北京)有限公司 Training method and device for user click model, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111400608A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN107590188B (en) Crawler crawling method and management system for automatic vertical subdivision field
CN111026971B (en) Content pushing method and device and computer storage medium
CN109582903B (en) Information display method, device, equipment and storage medium
CN112800095B (en) Data processing method, device, equipment and storage medium
CN106815254B (en) Data processing method and device
CN111666490A (en) Information pushing method, device, equipment and storage medium based on kafka
CN106959965A (en) A kind of information processing method and server
CN108229986B (en) Feature construction method in information click prediction, information delivery method and device
CN111209310B (en) Service data processing method and device based on stream computing and computer equipment
CN109698798B (en) Application identification method and device, server and storage medium
CN115471283A (en) Advertisement batch delivery method, device, equipment and storage medium
CN111400608B (en) Data processing method and device, storage medium and electronic equipment
CN113190426B (en) Stability monitoring method for big data scoring system
CN107644042B (en) Software program click rate pre-estimation sorting method and server
CN112182460B (en) Resource pushing method and device, storage medium and electronic device
CN111666298A (en) Method and device for detecting user service class based on flink, and computer equipment
CN110851173A (en) Report generation method and device
CN114491093B (en) Multimedia resource recommendation and object representation network generation method and device
CN115619475A (en) Commodity recommendation method, commodity recommendation system and related devices
CN110471586B (en) Project recommendation method, apparatus, computer device and storage medium
CN111782688A (en) Request processing method, device and equipment based on big data analysis and storage medium
CN111159544A (en) Space-based information service demand processing system, method and medium based on user preference
CN111127208A (en) Abnormal transaction real-time monitoring system and calculation method
CN113158031B (en) Method and device for determining user resource information, computer storage medium and terminal
CN114611712B (en) Prediction method based on heterogeneous federated learning, model generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant