CN113051479B - File processing and recommendation information generation methods, devices, equipment and storage medium - Google Patents

File processing and recommendation information generation methods, devices, equipment and storage medium Download PDF

Info

Publication number
CN113051479B
CN113051479B CN202110419249.6A CN202110419249A CN113051479B CN 113051479 B CN113051479 B CN 113051479B CN 202110419249 A CN202110419249 A CN 202110419249A CN 113051479 B CN113051479 B CN 113051479B
Authority
CN
China
Prior art keywords
feature
configuration information
information file
dictionary
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110419249.6A
Other languages
Chinese (zh)
Other versions
CN113051479A (en
Inventor
黄韵萍
李小聪
龚柳华
魏龙
王峰
王召玺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110419249.6A priority Critical patent/CN113051479B/en
Publication of CN113051479A publication Critical patent/CN113051479A/en
Application granted granted Critical
Publication of CN113051479B publication Critical patent/CN113051479B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a method, a device, equipment and a storage medium for file processing and recommended information generation, relates to the technical field of computers, and particularly relates to the field of information flow and deep learning. The specific implementation scheme is as follows: acquiring a feature configuration information file and a feature dictionary corresponding to a target service for receiving recommendation information; the feature configuration information file is used for generating input data of a recommendation information pre-estimation model of the target service; determining abnormal input characteristics of the recommended information pre-estimation model according to the characteristic configuration information file and the characteristic dictionary; and modifying the feature configuration information file according to the abnormal input features. The method and the device can improve the effectiveness of the input data of the recommendation information pre-estimation model.

Description

File processing and recommendation information generation methods, devices, equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technology, and in particular, to the field of information flow and deep learning.
Background
The recommendation system is a product after the rapid development of the internet, and with the rapid growth of the scale of users and the arrival of the media age, each user can be a creator of network content or a consumer of network content, and the essence of the recommendation system is to select interested information for the users according to user attributes and user behavior data.
Over time, the characteristics used by the model of the link terminal of the recommendation system change over time, and the input characteristics of the model need to be updated. But currently there is a lack of efficient ways to update.
Disclosure of Invention
The disclosure provides a file processing and recommendation information generation method, device, equipment and storage medium.
According to an aspect of the present disclosure, there is provided a file processing method including:
Acquiring a feature configuration information file and a feature dictionary corresponding to a target service for receiving recommendation information; the feature configuration information file is used for generating input data of a recommendation information pre-estimation model of the target service;
determining abnormal input characteristics of the recommended information pre-estimation model according to the characteristic configuration information file and the characteristic dictionary;
and modifying the feature configuration information file according to the abnormal input features.
According to another aspect of the present disclosure, there is provided a recommendation information generation method including:
Generating input data of a recommendation information estimation model of the target service according to a feature configuration information file corresponding to the target service receiving the recommendation information; the feature configuration information file is a modified feature configuration information file provided by any one embodiment of the disclosure;
And inputting the input data into a recommendation information estimation model to obtain recommendation information of the target service.
According to another aspect of the present disclosure, there is provided a document processing apparatus including:
the acquisition module is used for acquiring a feature configuration information file and a feature dictionary corresponding to the target service for receiving the recommendation information; the feature configuration information file is used for generating input data of a recommendation information pre-estimation model of the target service;
The abnormal input feature module is used for determining abnormal input features of the recommended information prediction model according to the feature configuration information file and the feature dictionary;
And the modification module is used for modifying the feature configuration information file according to the abnormal input features.
According to another aspect of the present disclosure, there is provided a recommendation information generating apparatus including:
The input data module is used for generating input data of a recommendation information estimation model of the target service according to the feature configuration information file corresponding to the target service for receiving the recommendation information; the feature configuration information file is a modified feature configuration information file provided by any one embodiment of the disclosure;
And the generation module is used for inputting the input data into the recommendation information prediction model to obtain the recommendation information of the target service.
According to another aspect of the present disclosure, there is provided an electronic device including:
At least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method in any of the embodiments of the present disclosure.
According to the technology disclosed by the invention, the characteristic configuration information file can be modified according to the abnormal input characteristics, so that more accurate and better-effect recommendation information can be obtained after input data obtained according to the modified characteristic configuration information file is input into the recommendation information prediction model, and better recommendation service is provided for users.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram of a document processing method according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a file processing method according to another embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a file processing method according to yet another embodiment of the present disclosure;
FIG. 4 is a schematic drawing of information extraction according to an example of the present disclosure;
FIG. 5 is a schematic diagram of a graphical tree according to another example of the present disclosure;
FIG. 6 is a schematic diagram of information transfer according to yet another example of the present disclosure;
FIG. 7 is a schematic diagram of an interception process according to yet another example of the present disclosure;
FIG. 8 is a schematic diagram of an intercept stage according to yet another example of the present disclosure;
FIG. 9 is a schematic diagram of an intercept notification according to yet another example of the present disclosure;
FIG. 10 is a schematic diagram of a document processing device according to an embodiment of the present disclosure;
FIG. 11 is a schematic diagram of a document processing device according to another embodiment of the present disclosure;
FIG. 12 is a schematic view of a document processing apparatus according to yet another embodiment of the present disclosure;
FIG. 13 is a schematic view of a document processing apparatus according to yet another embodiment of the present disclosure;
FIG. 14 is a schematic view of a document processing apparatus according to yet another embodiment of the present disclosure;
FIG. 15 is a schematic view of a document processing apparatus according to yet another embodiment of the present disclosure;
fig. 16 is a block diagram of an electronic device for implementing a file processing method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The embodiment of the disclosure provides a file processing method, as shown in fig. 1, including:
Step S11: acquiring a feature configuration information file and a feature dictionary corresponding to a target service for receiving recommendation information; the feature configuration information file is used for generating input data of a recommendation information pre-estimation model of the target service;
Step S12: determining abnormal input characteristics of the recommended information pre-estimation model according to the characteristic configuration information file and the characteristic dictionary;
Step S13: and modifying the feature configuration information file according to the abnormal input features.
In this embodiment, the target service may be a service of an application dedicated to providing recommended information to the user, such as a service of a news distribution application, a service of an entertainment information distribution application, or the like.
The target service industry may be a service with information recommendation functions, such as a service of a shopping application, a service of a travel application, and the like.
In one possible implementation manner, each time the corresponding application of the target service generates the recommendation information, the input feature of the recommendation information prediction model is determined according to the configuration information file corresponding to the target service, the input feature can be used as a part of input data of the recommendation information prediction model, the recommendation information prediction model is input, the recommendation information prediction model can generate ranking information of each piece of optional recommendation information according to the input data such as the input feature, and the recommendation information is determined according to the ranking information of each piece of optional recommendation information.
In one possible implementation, each target service may correspond to a feature dictionary that may include a plurality of features. When the input features of the recommended information estimation model are determined according to the configuration information file, the features in the feature configuration information file can be used as the input features of the recommended information estimation model.
The configuration features in the feature configuration information file may be selected according to the features stored in the configuration feature storage pool to determine the features in the configuration information file.
The feature configuration information file may also include a manner of extracting the configuration features, such as a function of extracting the configuration features from the configuration feature storage pool.
The feature dictionary may be a dictionary containing a plurality of experimentally determined features, as determined by specific experimental data. The feature dictionary can be determined and generated according to the effect of the output result of the recommendation information pre-estimation model corresponding to the input features obtained through experiments.
In a specific implementation, the feature configuration information file and the feature dictionary may be files maintained in an online and offline state, respectively. Because two files are maintained respectively in an offline state and an online state, diff (difference) appears in the artificial adding and deleting process, and abnormal input characteristics are caused. Based on diff conditions, the abnormal input features are divided into two types: firstly, missing features, namely, in a predicted feature dictionary, feature extraction is not performed, and the features are invalid in a recommended system link; and secondly, the method is an outdated feature, and feature extraction is performed on line, but the method is not actually used, so that on-line resources are wasted. In this embodiment, the abnormal input feature can be determined according to the online and offline information, and the feature configuration information file is modified according to the abnormal input feature, so that the difference feature can be avoided, and the validity and timeliness of the input feature are ensured.
In one possible implementation manner, the recommendation information estimation model may be a model adopted by a content recommendation system, specifically may adopt a Ctr (Click Through Rate ) estimation model, and may be used to take an input feature as at least a part of input data, and generate click through rate estimation information related to recommended content according to the input data. As a content recommendation system, features can be extracted from user portraits (static data), user operation content behaviors (dynamic data) and content data, models are built and continuously learned, and the yield model provides online estimation services.
In one example, the user image, the user operation content behavior and the content data are respectively produced in different modules, and finally converged into a Feature-Service (Feature-Service) module through different processing flows, and are used as input source sample information for offline training and online pre-estimation of a model. In general, due to the huge amount of source sample information, the data link is long and complex, and lacks unified management specifications, and the data quality is gradually improved along with the iteration of each module. The recommendation information pre-estimation model is deployed at a recommendation system link terminal and is influenced maximally, but is not perceived, invalid low-quality data not only phagocytose a large amount of resources, but also influence the promotion of input feature iteration of the recommendation information pre-estimation model, increase development and test cost, and influence the final recommendation effect of information flow (Feed flow) of the recommendation system.
If the recommendation information estimation model acquires a large amount of historical display click data from the log of the offline Feature-service landing disk and the log in the sample data after the user acts is spliced, extracting useful Feature data from the log to train the offline model, obtaining the condition of the click rate corresponding to different priori features, estimating candidate recommendation contents in the online recommendation system by Ctr and other information, and providing reference for personalized recommendation. After the model network structure is determined, the subsequent iteration work focuses on the selection and extraction of features. Because of huge feature groups, in the iterative process, whether the estimated model is negatively influenced or not needs to be considered one by one to delete the historical features, the verification cost is very high, invalid features are not dared to be deleted at will, and certain hysteresis exists. According to the file processing method, by establishing an invalid low-quality feature interception mechanism, real-time and rapid verification is carried out on feature configuration and feature extraction results, invalid low-quality features of feature configuration information files are intercepted, correction and optimization of the feature configuration information files are timely promoted, and quality of input features provided for a recommended information prediction model on line is guaranteed.
In another embodiment, the output result of the recommendation information pre-estimation model may be the ranking of the selectable recommendation information or the recommendation information, and the effect of the output result of the recommendation information pre-estimation model may be determined according to the feedback of the user on the recommendation information, for example, the browsing duration of the user, whether the user completes browsing, etc.
In one implementation, the abnormal input feature of the recommended information prediction model is determined according to the feature configuration information file and the feature dictionary, and at least one of the low-quality feature and the invalid feature is determined according to the feature configuration information file and the feature dictionary, and the at least one of the low-quality feature and the invalid feature is taken as the abnormal input feature.
In one possible implementation, the feature configuration information file is modified according to the abnormal input feature, and the feature corresponding to the abnormal input feature in the configuration information file may be deleted.
In a possible implementation, the configuration information file may not be modified when it is determined from the feature configuration information file and the feature dictionary that no abnormal input features are present.
In general, if the invalid and low-quality feature interception of the online pre-estimation side is realized through a monitoring means, the feature information can be obtained after the analysis of the sampling log of the online request, and the feature evaluation is performed. If the offline side evaluation step of invalid and low-quality features is generated in the feature preprocessing stage in an offline state, after information is pushed to a user by online prediction, a user interaction log is collected, and model training is performed after data preprocessing and feature preprocessing. Processing the input features of the recommended information predictive model may suffer from the disadvantages of longer update time, lag in feedback results, etc. For large recommended application systems, invalid and low-quality features are found through online logs, so that on-line results are not confidence, resources are wasted and the like, and double manpower and resources are spent for preprocessing the features during offline training.
In this embodiment, the feature configuration information file is modified according to the abnormal input feature, so that after the input data obtained according to the modified feature configuration information file is input into the recommendation information prediction model, more accurate and better-effect recommendation information can be obtained, and better recommendation service is provided for the user.
In one embodiment, determining abnormal input features of the recommendation information pre-estimation model based on the feature configuration information file and the feature dictionary, includes,
Determining a first feature that is present in the feature configuration information file and that is not present in the feature dictionary; and/or the number of the groups of groups,
Determining a second feature that is not present in the feature configuration information file and is present in the feature dictionary;
the first feature and/or the second feature are/is used as an abnormal input feature.
In a particular implementation, at least one of the first feature and the second feature may be present in the abnormal input feature. In the case where only the first feature or the second feature exists in the abnormal input features, the first feature or the second feature may be regarded as the abnormal input feature. In the case where the first feature and the second feature exist in the abnormal input feature, the first feature and the second feature may be regarded as the abnormal input feature.
In a possible implementation manner, the feature configuration information file is modified according to the abnormal input feature, and the first feature in the configuration information file can be deleted when the abnormal input feature comprises the first feature; in the case where the abnormal input feature includes a second feature, the second feature may be added to the configuration information file.
In a possible implementation manner of the present disclosure, the second feature that does not exist in the feature configuration information file and exists in the feature dictionary may be a feature that cannot be extracted from the feature configuration information file and exists in the feature dictionary.
In this embodiment, the features in the feature dictionary are features with better effects after the updated features are determined through experiments, so that features existing in the feature dictionary and not existing in the configuration information file are reserved. And because the feature dictionary is maintained offline, the features in the configuration information file may not be updated along with the feature dictionary, so that the features existing in the configuration information file and not existing in the feature dictionary may be outdated features, and the features are deleted in the configuration information file, so that the validity of the input data of the recommendation information estimation model can be ensured, and the effect of the feature change on the output result of the recommendation information estimation model is not influenced.
In one embodiment, as shown in fig. 2, the file processing method further includes:
Step S21: and determining the first feature and/or the second feature according to the first tree diagram of the feature configuration information file and the second tree diagram of the feature dictionary.
In this embodiment, the first feature and/or the second feature are determined according to the first tree diagram of the feature configuration information file and the second tree diagram of the feature dictionary, which may be obtained by comparing the first graph tree of the configuration information file with the second graph tree of the feature dictionary, determining information having a difference in the configuration information file and the feature dictionary, and determining the first feature and/or the second feature according to the difference information.
In this embodiment, the configuration information file and the feature dictionary may be converted into a tree diagram. The configuration information file is generated according to the feature extraction operation. The current feature extraction process may correspond to an independent code library, and the ordering model online architecture specifies features that need to be extracted in a feature storage pool by writing feature configurations in a fixed format (e.g., feature name-extraction function-dependency field). Meanwhile, the online architecture maintains a set of dictionary of online required characteristics, and the dictionary is used as the reference input characteristics of an online recommended information prediction model.
In this embodiment, the difference features in the feature configuration information file and the feature dictionary can be rapidly and accurately determined through the tree graph, so that the correction efficiency of the configuration information file is improved.
In one embodiment, determining the first feature and/or the second feature from the first tree view of the feature configuration information file and the second tree view of the feature dictionary includes:
Generating a first tree diagram according to the characteristic configuration information file, wherein ancestor nodes of the first tree diagram comprise characteristic information in the characteristic configuration information file, and each leaf node of the first tree diagram is source sample information corresponding to the characteristic information of the ancestor nodes;
Generating a second tree diagram according to the feature dictionary, wherein ancestor nodes of the second tree diagram comprise feature information in the feature dictionary, and each leaf node of the second tree diagram is source sample information corresponding to the feature information of the ancestor node;
Comparing the first tree diagram with the second tree diagram to obtain a difference node;
the first feature and/or the second feature is determined based on the difference node.
In particular implementations, the source sample information may be information extracted from multiple data sources. For example, as shown in fig. 4, one piece of information provided by every 10000 data sources is extracted as source sample information 41, and information provided by a plurality of data sources may be spliced to be used as source sample information 41. A plurality of input features 42 are selected from a plurality of source sample information 41 through a feature configuration information file 44, a recommendation information estimation model 43 of a target service corresponding to the plurality of feature inputs 42 is obtained, an output result is obtained, and recommendation information is determined according to the output result.
In this embodiment, the source sample information and the feature information may exist in the form of codes in the tree graph, and each feature may correspond to an ancestor node.
In one possible implementation, ancestor nodes of a leaf node of the tree graph may be features related to source sample information of the leaf node, such as features obtained from the source sample information. The source sample information may exist in the form of codes in a tree graph.
The source sample information may include user portraits (static data), user operation content behaviors (dynamic data), content data, etc., and may be part or all of the data extracted from the user's weblog.
In one possible implementation, each difference node may correspond to one of the first features or one of the second features.
In this embodiment, as shown in FIG. 5, a plurality of leaf nodes 51 may correspond to one ancestor node 52. Other nodes may exist between the root node and the leaf node.
Feature extraction plays a role in the recommendation information prediction model. The method comprises the steps of extracting source sample information spliced by different data sources into fields suitable for model training through feature configuration and a specific feature extraction method, and inputting features into a model for training and pre-estimating through feature cleaning and specific coding mode conversion to generate a final desired result. Therefore, the extraction of invalid and low-quality features can directly influence the final result of the features, further influence the model training and the model estimation effect, and meanwhile, the time consumption of the feature extraction can be increased, and the overall time for the recommended information estimation model to execute the estimation is prolonged.
In this embodiment, by comparing each node of the first tree diagram and the second tree diagram, a difference node can be obtained, and further, the first feature and/or the second feature are determined, so that the failure or outdated feature and the modification feature configuration information file can be obtained with higher efficiency.
In one embodiment, acquiring a feature configuration information file and a feature dictionary corresponding to a target service that receives recommendation information includes:
obtaining a feature dictionary according to the recommendation information pre-estimated model training result of the offline target service;
and generating a feature configuration information file according to the information extracted from the feature configuration storage pool.
In this embodiment, the training result of the recommendation information pre-estimation model of the offline target service may be that in an offline state, input data of the recommendation information pre-estimation model is determined according to the selected feature information, an actual effect corresponding to the input data is determined according to an output result corresponding to the input data, and whether to incorporate the selected feature information into the feature dictionary is determined according to the actual effect corresponding to the input data.
In this embodiment, a plurality of features for each different target service may be stored in the feature configuration storage pool, and when the feature configuration information file is generated, the plurality of features may be selected from the feature configuration storage pool according to the instruction information to form the feature configuration information file.
In this embodiment, by maintaining the feature dictionary and the feature information configuration file, the degree of compliance between the recommendation information of the recommendation information prediction model and the user requirement can be effectively improved.
The embodiment of the disclosure further provides a recommendation information generation method, as shown in fig. 3, including:
step S31: generating input data of a recommendation information estimation model of the target service according to a feature configuration information file corresponding to the target service receiving the recommendation information; the feature configuration information file is a modified feature configuration information file provided by any one embodiment of the disclosure;
step S32: and inputting the input data into a recommendation information estimation model to obtain recommendation information of the target service.
In this embodiment, the feature configuration information file can be modified according to the feature dictionary and the feature configuration information file, and recommendation information of the target service is generated by using the modified feature configuration information file, so that outdated features and/or invalid features in the feature configuration information file are modified, thereby improving the degree of coincidence between input data of the recommendation information estimation model and current habits of users, and being beneficial to providing recommendation information more in line with the will of the users.
Secondly, because the recommendation system is huge, the fields are too many, and the conditions of field upgrading, low coverage rate and the like exist, if the coverage rate of the sample field on which one feature depends is extremely low or the value similarity is extremely high, the corresponding sample field can be considered to be a low-quality field, and the information quantity of the corresponding feature of the low-quality field can be lower than that of the common feature.
As shown in fig. 6, in one example of the present disclosure, a feature dictionary 61 and a feature configuration information file 62 are maintained for a recommendation information pre-estimation model 63 corresponding to a target service. The two configurations are compared, the unique identification feature number is used as a comparison keyword, and the root node goes deep into the leaf node, so that the problems of invalid features and missing features can be effectively intercepted. The method does not need to build a complete offline environment, only needs to acquire an environment code and locally execute diff operation, so that direct features or indirect dependent features which are not used by a model any more can be timely and effectively found, invalid/missing features can be found, and the current execution can be completed only by 30 seconds.
In one embodiment of the present disclosure, as shown in FIG. 7, an online log 72 of a user is used as a data source, valid requests are fished from the online log, online real requests 73 are obtained, and an offline simulation environment 74 is created from the online real requests 73. In the local offline simulation environment 74, an offline configured feature dictionary 75 is obtained. Source sample information 71 is obtained according to a data source which can comprise an online log, feature extraction is performed according to the source sample information 71 and a feature configuration information file, and the extracted input features are input into a recommendation information estimation model for estimation. And analyzing results according to the source sample information 71, the feature dictionary 75 and the features extracted based on the feature configuration information file to obtain abnormal input features, so that the feature configuration information file is changed and maintained in time, and low-quality features (slots) are intercepted. After the complete test environment is built, the whole online request process is simulated through online request, and the original sample and the feature extraction result can be printed into a log for subsequent analysis. For invalid features that exist in the feature dictionary 75 but do not exist in the input features, the cause of the invalidation can be determined, and the feature configuration file is corrected according to the cause of the invalidation, such as adding features, changing feature extraction functions, and the like. The invalid reasons for invalid features may include no feature extraction required fields in the sample, minimal requests to satisfy conditions in the sample, etc. If the intercepted characteristics have little influence on the model, the model can be directly deleted; if the intercepted characteristics have no condition of low sample occupation ratio, the source tracing can be further carried out to judge whether the upstream data missing field is caused.
In another example of the present disclosure, the feature dictionary may be maintained by an online feature management platform. In general, the number of model fields of the information recommendation system is tens of thousands, sources are different, processing and using modes are different, and along with iterative updating, a large number of obsolete fields exist at present. In order to improve the quality of the on-line features, the embodiment of the disclosure can be applied to a feature management platform to maintain the features in the feature management platform. The feature management platform stores information such as the source of the field on line, the feature null rate, whether the feature is off line or not, and the quality of the field is judged by acquiring the feature off line information of the field, and the field which is not maintained any more is replaced or off line uniformly.
Because the models are numerous, the mode of carrying out effect regression becomes time-consuming and labor-consuming and cannot meet the requirement of rapid iteration, low-quality fields are intercepted mainly during iteration, and updating and deleting of the characteristic use fields are promoted. As shown in fig. 8, one field offline may at least need to undergo stages such as field arrangement (offline field confirmation) 81, offline preparation (multi-model offline regression, multiparty evaluation, feature management platform labeling pre-offline) 82, offline evaluation (input feature iteration interception, model consistency evaluation) 83, online evaluation (field formal offline) 84, subsequent maintenance (feature management platform labeling pre-offline, input feature iteration interception) 85, and the like, while a complete low-quality field interception tool can reduce the working costs of two stages of offline evaluation and subsequent maintenance, forming a complete tool chain for each offline flow.
In the above way, by analyzing the feature configuration of a certain model, the leaf node of the feature dependence, namely the source sample information or the field, can be obtained, and then the feature platform is opened to obtain the offline information of the field, so that whether the field belongs to a low-quality field can be judged, if so, interception can be directly performed, and therefore, the interception of the low-quality field of the iterative model is completed.
As shown in fig. 9, for effect consideration, ensuring that the local interception is consistent with the notification effect of monitoring interception, verifying invalid and low-efficiency features can be integrated in a model iteration test stage, downloading corresponding codes of the model iteration test (preparing for localization verification), performing configuration check (intercepting invalid features and intercepting low-quality features) according to the downloaded codes, performing local simulation (missing feature correction) according to a configuration check result, then executing interception notification, realizing accurate pushing through a visual interface, and pushing the result to a developer in a report form.
In a specific example, through practical verification, features which exist in a plurality of feature dictionaries and are not in existence of configuration information files can be effectively intercepted, missing features which do not exist in the feature configuration files but exist in the feature dictionaries are corrected, and low-quality features are deleted.
The embodiment of the disclosure can be performed locally without relying on online log statistics or real-time user requests. In a real recommendation system, the effective interception time can be shortened to a minute level from the hour level of log collection and alarm analysis. According to the file processing method provided by the embodiment of the disclosure, verification of the features can be completed in the test work of feature iteration, and the verification result is sent to a developer through a visual report, so that real-time sensing and interception of feature abnormality are achieved, and the frequency of on-line problems caused by invalid and low-quality features is reduced. Meanwhile, the file processing method provided by the embodiment of the disclosure can be in butt joint with the feature management platform, the whole scheme is more flexible, not only can the current low-quality features be intercepted, but also the upcoming low-quality features can be intercepted, for example, a certain module of a data stream needs to be offline for a certain field, the relevant field can be marked on the feature management platform, and then front interception and the like are carried out.
The embodiment of the disclosure further provides a file processing device, as shown in fig. 10, including:
an obtaining module 101, configured to obtain a feature configuration information file and a feature dictionary corresponding to a target service that receives recommendation information; the feature configuration information file is used for generating input data of a recommendation information pre-estimation model of the target service;
The abnormal input feature module 102 is configured to determine abnormal input features of the recommended information prediction model according to the feature configuration information file and the feature dictionary;
and the modifying module 103 is used for modifying the feature configuration information file according to the abnormal input feature.
In one embodiment, as shown in fig. 11, the abnormal input feature module includes,
A first feature unit 111 for determining a first feature that exists in the feature configuration information file and does not exist in the feature dictionary; and/or the number of the groups of groups,
A second feature unit 112 for determining a second feature that does not exist in the feature configuration information file and that exists in the feature dictionary;
the feature processing unit 113 is configured to take the first feature and the second feature as abnormal input features.
In one embodiment, as shown in fig. 12, the file processing apparatus further includes:
The graphic tree module 121 is configured to determine the first feature and/or the second feature according to the first tree diagram of the feature configuration information file and the second tree diagram of the feature dictionary.
In one embodiment, as shown in FIG. 13, the graphic tree module includes:
A first graphic tree unit 131, configured to generate a first tree graph according to the feature configuration information file, where ancestor nodes of the first tree graph include feature information in the feature configuration information file, and each leaf node of the first tree graph is source sample information corresponding to the feature information of the ancestor node;
A second graphic tree unit 132, configured to generate a second tree graph according to the feature dictionary, where ancestor nodes of the second tree graph include feature information in the feature dictionary, and each leaf node of the second tree graph is source sample information corresponding to the feature information of the ancestor node;
A difference node unit 133, configured to compare the first tree diagram and the second tree diagram to obtain a difference node;
the difference node processing unit 134 is configured to determine the first feature and/or the second feature based on the difference node.
In one embodiment, as shown in fig. 14, the acquisition module includes:
a dictionary unit 141, configured to estimate a model training result according to recommendation information of an offline target service, and obtain a feature dictionary;
And a configuration information file unit 142, configured to generate a feature configuration information file according to the information extracted from the feature configuration storage pool.
The embodiment of the disclosure also provides a recommendation information generation device, as shown in fig. 15, including:
An input data module 151, configured to generate input data of a recommendation information estimation model of the target service according to a feature configuration information file corresponding to the target service that receives the recommendation information; the feature configuration information file is a modified feature configuration information file provided by any one embodiment of the disclosure;
The generating module 152 is configured to input the input data into a recommendation information prediction model, and obtain recommendation information of the target service.
The embodiment of the disclosure can be applied to the technical field of computers, in particular to the technical fields of information flow, deep learning and the like.
The functions of each unit, module or sub-module in each apparatus of the embodiments of the present disclosure may be referred to the corresponding descriptions in the above method embodiments, which are not repeated herein.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 16 shows a schematic block diagram of an example electronic device 160 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 16, the electronic device 160 includes a computing unit 161 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 162 or a computer program loaded from a storage unit 168 into a Random Access Memory (RAM) 163. In the RAM163, various programs and data required for the operation of the electronic device 160 may also be stored. The computing unit 161, the ROM162, and the RAM163 are connected to each other by a bus 164. An input output (I/O) interface 165 is also connected to bus 164.
Various components in electronic device 160 are connected to I/O interface 165, including: an input unit 166 such as a keyboard, mouse, etc.; an output unit 167 such as various types of displays, speakers, and the like; a storage unit 168 such as a magnetic disk, optical disk, etc.; and a communication unit 169 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 169 allows the electronic device 160 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The computing unit 161 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 161 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 161 performs the respective methods and processes described above, such as a file processing method. For example, in some embodiments, the file processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 168. In some embodiments, some or all of the computer program may be loaded and/or installed onto electronic device 160 via ROM162 and/or communication unit 169. When the computer program is loaded into the RAM163 and executed by the computing unit 161, one or more steps of the file processing method described above may be performed. Alternatively, in other embodiments, the computing unit 161 may be configured to perform the file processing method in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (12)

1. A document processing method, comprising:
acquiring a feature configuration information file and a feature dictionary corresponding to a target service for receiving recommendation information; the feature configuration information file is used for generating input data of a recommendation information pre-estimation model of the target service;
Determining abnormal input characteristics of the recommended information estimation model according to the characteristic configuration information file and the characteristic dictionary;
Modifying the feature configuration information file according to the abnormal input feature;
determining abnormal input characteristics of the recommended information estimation model according to the characteristic configuration information file and the characteristic dictionary, wherein the abnormal input characteristics comprise,
Determining a first feature that is present in the feature configuration information file and that is not present in a feature dictionary; and/or the number of the groups of groups,
Determining a second feature that does not exist in the feature configuration information file and that exists in a feature dictionary;
and taking the first characteristic and the second characteristic as the abnormal input characteristic.
2. The method of claim 1, further comprising:
And determining the first feature and/or the second feature according to the first tree diagram of the feature configuration information file and the second tree diagram of the feature dictionary.
3. The method of claim 2, wherein the determining the first feature and/or the second feature from the first tree view of the feature configuration information file and the second tree view of the feature dictionary comprises:
Generating a first tree graph according to the characteristic configuration information file, wherein ancestor nodes of the first tree graph comprise characteristic information in the characteristic configuration information file, and each leaf node of the first tree graph is source sample information corresponding to the characteristic information of the ancestor node;
Generating a second tree diagram according to the feature dictionary, wherein ancestor nodes of the second tree diagram comprise feature information in the feature dictionary, and each leaf node of the second tree diagram is source sample information corresponding to the feature information of the ancestor node;
Comparing the first tree diagram with the second tree diagram to obtain a difference node;
The first feature and/or the second feature is determined based on the difference node.
4. The method of claim 1, wherein the obtaining the feature configuration information file and the feature dictionary corresponding to the target service that receives the recommendation information comprises:
Obtaining the feature dictionary according to the offline recommendation information pre-estimation model training result of the target service;
and generating the feature configuration information file according to the information extracted from the feature configuration storage pool.
5. A recommendation information generation method, comprising:
Generating input data of a recommendation information estimation model of the target service according to a feature configuration information file corresponding to the target service receiving recommendation information; the feature configuration information file is the modified feature configuration information file according to any one of claims 1-4;
And inputting the input data into a recommendation information pre-estimation model to obtain recommendation information of the target service.
6. A document processing apparatus comprising:
the acquisition module is used for acquiring a feature configuration information file and a feature dictionary corresponding to the target service for receiving the recommendation information; the feature configuration information file is used for generating input data of a recommendation information pre-estimation model of the target service;
The abnormal input feature module is used for determining abnormal input features of the recommended information estimation model according to the feature configuration information file and the feature dictionary;
the modification module is used for modifying the feature configuration information file according to the abnormal input features;
The abnormal input feature module includes,
A first feature unit configured to determine a first feature that exists in the feature configuration information file and does not exist in a feature dictionary; and/or the number of the groups of groups,
A second feature unit configured to determine a second feature that does not exist in the feature configuration information file and that exists in a feature dictionary;
And the feature processing unit is used for taking the first feature and the second feature as the abnormal input feature.
7. The apparatus of claim 6, further comprising:
And the graphic tree module is used for determining the first feature and/or the second feature according to the first tree diagram of the feature configuration information file and the second tree diagram of the feature dictionary.
8. The apparatus of claim 7, wherein the graphics tree module comprises:
the first graphic tree unit is used for generating a first tree diagram according to the characteristic configuration information file, ancestor nodes of the first tree diagram comprise characteristic information in the characteristic configuration information file, and each leaf node of the first tree diagram is source sample information corresponding to the characteristic information of the ancestor node;
A second graph tree unit, configured to generate a second tree graph according to the feature dictionary, where ancestor nodes of the second tree graph include feature information in the feature dictionary, and each leaf node of the second tree graph is source sample information corresponding to the feature information of the ancestor node;
the difference node unit is used for comparing the first tree diagram with the second tree diagram to obtain a difference node;
and the difference node processing unit is used for determining the first characteristic and/or the second characteristic based on the difference node.
9. The apparatus of claim 6, wherein the acquisition module comprises:
The dictionary unit is used for estimating a model training result according to the recommendation information of the offline target service to obtain the feature dictionary;
And the configuration information file unit is used for generating the characteristic configuration information file according to the information extracted from the characteristic configuration storage pool.
10. A recommendation information generating device comprising:
The input data module is used for generating input data of a recommendation information estimation model of the target service according to a feature configuration information file corresponding to the target service for receiving the recommendation information; the feature configuration information file is a modified feature configuration information file according to any one of claims 6-9;
And the generation module is used for inputting the input data into a recommendation information estimation model to obtain the recommendation information of the target service.
11. An electronic device, comprising:
At least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-6.
CN202110419249.6A 2021-04-19 2021-04-19 File processing and recommendation information generation methods, devices, equipment and storage medium Active CN113051479B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110419249.6A CN113051479B (en) 2021-04-19 2021-04-19 File processing and recommendation information generation methods, devices, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110419249.6A CN113051479B (en) 2021-04-19 2021-04-19 File processing and recommendation information generation methods, devices, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113051479A CN113051479A (en) 2021-06-29
CN113051479B true CN113051479B (en) 2024-04-26

Family

ID=76520670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110419249.6A Active CN113051479B (en) 2021-04-19 2021-04-19 File processing and recommendation information generation methods, devices, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113051479B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113836291B (en) * 2021-09-29 2023-08-15 北京百度网讯科技有限公司 Data processing method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109493940A (en) * 2018-11-06 2019-03-19 大国创新智能科技(东莞)有限公司 Prescription personalized recommendation method and system based on deep learning and knowledge base
CN110674406A (en) * 2019-09-29 2020-01-10 百度在线网络技术(北京)有限公司 Recommendation method and device, electronic equipment and storage medium
CN111966908A (en) * 2020-08-25 2020-11-20 贝壳技术有限公司 Recommendation system and method, electronic device, and computer-readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10936974B2 (en) * 2018-12-24 2021-03-02 Icertis, Inc. Automated training and selection of models for document analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109493940A (en) * 2018-11-06 2019-03-19 大国创新智能科技(东莞)有限公司 Prescription personalized recommendation method and system based on deep learning and knowledge base
CN110674406A (en) * 2019-09-29 2020-01-10 百度在线网络技术(北京)有限公司 Recommendation method and device, electronic equipment and storage medium
CN111966908A (en) * 2020-08-25 2020-11-20 贝壳技术有限公司 Recommendation system and method, electronic device, and computer-readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于推荐技术的个性化信息助手的改良;何金金;郭振波;张宇;;工业控制计算机(第01期);第119-121页 *

Also Published As

Publication number Publication date
CN113051479A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
CN107301170B (en) Method and device for segmenting sentences based on artificial intelligence
US11809505B2 (en) Method for pushing information, electronic device
CN110858172A (en) Automatic test code generation method and device
CN110555205A (en) negative semantic recognition method and device, electronic equipment and storage medium
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
CN113051479B (en) File processing and recommendation information generation methods, devices, equipment and storage medium
CN114037059A (en) Pre-training model, model generation method, data processing method and data processing device
CN110807097A (en) Method and device for analyzing data
CN116720489B (en) Page filling method and device, electronic equipment and computer readable storage medium
CN113204695A (en) Website identification method and device
CN116304236A (en) User portrait generation method and device, electronic equipment and storage medium
CN115687717A (en) Method, device and equipment for acquiring hook expression and computer readable storage medium
CN113127357B (en) Unit test method, apparatus, device, storage medium, and program product
CN115186738A (en) Model training method, device and storage medium
CN115344786A (en) Cloud resource recommendation system, method, equipment and storage medium
CN114881521A (en) Service evaluation method, device, electronic equipment and storage medium
CN114117248A (en) Data processing method and device and electronic equipment
CN113076254A (en) Test case set generation method and device
CN113485763A (en) Data processing method and device, electronic equipment and computer readable medium
CN113052325A (en) Method, device, equipment, storage medium and program product for optimizing online model
CN111754062B (en) Method and device for establishing article quality detection model
CN115062154A (en) Method and device for training pre-training model, text classification and system operation and maintenance
CN114547451A (en) Model information analysis method and device, electronic equipment and computer storage medium
CN117453988A (en) Product recommendation method and device
CN117313670A (en) Document generation method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant