CN114493844A

CN114493844A - Information processing method and device and electronic equipment

Info

Publication number: CN114493844A
Application number: CN202210103750.6A
Authority: CN
Inventors: 徐世界; 刘昊骋; 许韩晨玺; 田建; 徐靖宇; 王天祺
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2022-05-13

Abstract

The disclosure provides an information processing method and device and electronic equipment, relates to the technical field of data processing, and particularly relates to the technical field of big data and deep learning. The specific implementation scheme is as follows: in response to a risk assessment request of a target account, obtaining original data associated with the target account from at least one data source, and querying configuration information, wherein the configuration information is used for indicating a target feature extraction dimension corresponding to the at least one data source, so that according to the configuration information, feature extraction corresponding to the target feature extraction dimension is performed on the original data of the at least one data source to obtain at least one dimension feature of the target account, and then a risk assessment strategy indicated by the risk assessment request is invoked to process the at least one dimension feature of the target account to obtain risk information of the target account. Therefore, the wind control information processing method and the system can realize easy multiplexing of wind control information processing under different wind control requirements, avoid the process of repeated development and improve the efficiency of risk assessment.

Description

Information processing method and device and electronic equipment

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to the field of big data and deep learning, and in particular, to an information processing method and apparatus, and an electronic device.

Background

The credit risk is generated in the loan transaction of a commercial bank and mainly refers to the uncertainty of whether a borrower can perform a contract to pay for the loan. Because the credit business is the core business of the commercial bank, and the credit assets occupy the absolute proportion of the total assets of the commercial bank, the normal operation of the commercial bank is directly influenced by the credit risk, and therefore, how to effectively control the credit risk is very important.

Disclosure of Invention

The disclosure provides an information processing method and apparatus, an information generating method and apparatus, an electronic device, a storage medium, and a computer program product.

According to an aspect of the present disclosure, there is provided an information processing method including:

in response to a risk assessment request of a target account, acquiring original data associated with the target account from at least one data source;

querying configuration information, wherein the configuration information is used for indicating a target feature extraction dimension corresponding to the at least one data source;

according to the configuration information, performing feature extraction corresponding to a target feature extraction dimension on the original data of the at least one data source to obtain at least one dimension feature of the target account;

and calling a risk assessment strategy indicated by the risk assessment request, and processing the at least one dimension characteristic of the target account to obtain the risk information of the target account.

According to another aspect of the present disclosure, there is provided an information generating method including:

acquiring a time sequence of sample data and risk information labeled by the time sequence from at least one data source;

according to the candidate combinations of the multiple feature extraction dimensions, respectively performing feature extraction on the sample data in the time sequence to obtain feature time sequences corresponding to the multiple candidate combinations;

dividing the characteristic time sequences corresponding to the candidate combinations according to time to obtain a training set corresponding to the candidate combinations and a test set corresponding to the candidate combinations;

training a risk assessment model by adopting the training set of the candidate combinations, and testing the trained risk assessment model by adopting the test set corresponding to the candidate combinations to obtain performance data of the risk assessment model corresponding to the candidate combinations;

screening target combinations from the candidate combinations according to the performance data of the risk assessment models corresponding to the candidate combinations, and determining target feature extraction dimensions corresponding to the at least one data source according to the target combinations;

and generating configuration information according to the target feature extraction dimension corresponding to the at least one data source.

According to still another aspect of the present disclosure, there is provided an information processing apparatus including:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for responding to a risk assessment request of a target account and acquiring original data associated with the target account from at least one data source;

the query module is used for querying configuration information, wherein the configuration information is used for indicating a target feature extraction dimension corresponding to the at least one data source;

the first feature extraction module is used for performing feature extraction corresponding to a target feature extraction dimension on the original data of the at least one data source according to the configuration information to obtain at least one dimension feature of the target account;

and the first processing module is used for calling a risk assessment strategy indicated by the risk assessment request and processing the at least one dimension characteristic of the target account to obtain the risk information of the target account.

According to still another aspect of the present disclosure, there is provided an information generating apparatus including:

the second acquisition module is used for acquiring the time sequence of the sample data and the risk information labeled by the time sequence from at least one data source;

the second feature extraction module is used for respectively extracting features of sample data in the time sequence according to candidate combinations of the feature extraction dimensions to obtain feature time sequences corresponding to the candidate combinations;

the dividing module is used for dividing the characteristic time sequences corresponding to the candidate combinations according to time to obtain a training set corresponding to the candidate combinations and a test set corresponding to the candidate combinations;

the second processing module is used for adopting the training set of the candidate combinations to train the risk assessment model, adopting the test set corresponding to the candidate combinations to test the trained risk assessment model, and obtaining the performance data of the risk assessment model corresponding to the candidate combinations;

the screening module is used for screening target combinations from the candidate combinations according to the performance data of the risk assessment models corresponding to the candidate combinations, and determining target feature extraction dimensions corresponding to the at least one data source according to the target combinations;

and the generating module is used for generating configuration information according to the target feature extraction dimension corresponding to the at least one data source.

According to still another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of the one aspect or the method of the another aspect.

According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of the preceding aspect or perform the method of the preceding aspect.

According to yet another aspect of the disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method according to the preceding aspect or performs a method according to the preceding aspect.

According to the information processing method, the device, the electronic equipment and the storage medium, after responding to a risk assessment request of a target account, original data associated with the target account is obtained from at least one data source, configuration information is inquired, wherein the configuration information is used for indicating a target feature extraction dimension corresponding to the at least one data source, so that feature extraction corresponding to the target feature extraction dimension is performed on the original data of the at least one data source according to the configuration information to obtain at least one dimension feature of the target account, a risk assessment strategy indicated by the risk assessment request is called, and at least one dimension feature of the target account is processed to obtain risk information of the target account. Therefore, the original data are subjected to feature extraction corresponding to the target feature extraction dimension based on the configuration information, so that the wind control information processing under different wind control requirements is easily reused, the repeated development process is avoided, and the risk assessment efficiency is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic flow chart of an information processing method provided according to a first embodiment of the present disclosure;

fig. 2 is a schematic flow chart of an information processing method according to a second embodiment of the present disclosure;

fig. 3 is a flowchart illustrating an information processing method according to a third embodiment of the present disclosure;

fig. 4 is a schematic flow chart diagram of an information generating method provided according to a fourth embodiment of the present disclosure;

FIG. 5 is a schematic flow chart illustrating risk information of acquiring a time series and a time series annotation of sample data according to a fourth embodiment of the disclosure;

fig. 6 is a schematic configuration diagram of an information processing apparatus provided according to a fifth embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an information generating apparatus provided according to a sixth embodiment of the present disclosure;

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein.

It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, and the like of the personal information of the related user all conform to the regulations of the relevant laws and regulations, and do not violate the good custom of the public order.

Currently, the processing of the wind control information based on big data is not left regardless of the approval of the personal credit card or the approval of the personal loan. In the related art, different wind control requirements need to be developed respectively, so that a large number of repeated development processes are caused. How to quickly and efficiently process the wind control information is a problem at present.

The embodiment of the disclosure provides an information processing method aiming at the problem that the wind control information processing cost is too high in the related technology, and the information processing method can be reused for wind control information processing under different wind control requirements, so that the repeated development process is avoided.

An information processing method, an apparatus, an electronic device, a non-transitory computer-readable storage medium, and a computer program product of the embodiments of the present disclosure are described below with reference to the accompanying drawings.

First, with reference to fig. 1, the information processing method provided by the present disclosure is described in detail.

Fig. 1 is a flowchart illustrating an information processing method according to a first embodiment of the present disclosure.

The disclosed embodiments are exemplified in that the information processing method is configured in an information processing apparatus, which can be applied to any electronic device, so that the electronic device can execute an information processing function.

The electronic device may be any device having a computing capability, for example, a Personal Computer (PC), a mobile terminal, and the like, and the mobile terminal may be a hardware device having various operating systems, touch screens, and/or display screens, such as a mobile phone, a tablet Computer, a Personal digital assistant, and a wearable device.

As shown in fig. 1, the information processing method includes the steps of:

step 101, in response to a risk assessment request of a target account, raw data associated with the target account is acquired from at least one data source.

It should be noted that the original data associated with different accounts is different, for example, for the a card account, the associated original data includes credit investigation report, personal Application information and third party data, and for the B card and C card accounts, the associated original data includes credit investigation report, personal Application information, third party data, historical debit and payment behavior data and APP (Application) use behavior characteristics. The Card a is an Application Score Card (Application Score Card), which is commonly used in credit or credit Card approval processes of financial institutions, and mainly utilizes basic information and credit investigation information of customers to perform risk assessment. The B Card is a Behavior scoring Card (Behavior Score Card), which is commonly used in a loan approval process or an increase and decrease process of a financial institution, and mainly utilizes historical loan and payment Behavior information and credit investigation information of a client to perform risk assessment. The C Card is a Collection Score Card (Collection Score Card) commonly used in post-loan Collection processes of financial institutions, and mainly uses historical loan and payment information and credit information of customers to predict the user's probability of Collection.

For purposes of analytical reporting and decision support, in exemplary embodiments, a data warehouse may be established to provide all types of data support for decision-making processes. Wherein, the Data provided in the ODS (Operational Data Store, Data operation layer) is close to the original Data, and the total number of customers may be thousands, and the dimensions are thousands or even tens of thousands. Data feature extraction can be carried out on DW (Data Warehouse, Data Warehouse layer) based on a deep learning model, and a feature wide table is formed, wherein the feature wide table is usually about one thousand dimensions. Data characteristics are embodied in APP (Application, data Application layer), and the dimensionality is about 20-50 dimensions.

In embodiments of the present disclosure, in response to a risk assessment request for a target account, raw data associated with the target account may be obtained from at least one data source. The data source can comprise a first credit report data source and a second credit report data source. As a possible implementation manner, the electronic device in which the information processing apparatus according to the embodiment of the present disclosure is located may obtain a risk assessment request of a target account sent by a user, and perform analysis processing on the risk assessment request of the target account to obtain target account information included in the risk assessment request of the target account, so as to obtain original data associated with the target account from at least one data source.

In the technical scheme of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the related users all conform to the regulations of the related laws and regulations, and do not violate the good custom of the public order.

And 102, inquiring configuration information, wherein the configuration information is used for indicating a target feature extraction dimension corresponding to at least one data source.

Since the configuration information is used for indicating the target feature extraction dimension corresponding to the at least one data source, the target feature extraction dimension corresponding to the at least one data source can be determined by querying the configuration information.

In this embodiment of the present disclosure, the corresponding configuration information may be queried according to the original data associated with the target account acquired in step 101, and the target feature extraction dimension corresponding to at least one data source is determined.

Step 103, according to the configuration information, performing feature extraction corresponding to the target feature extraction dimension on the original data of the at least one data source to obtain at least one dimension feature of the target account.

In the embodiment of the present disclosure, feature extraction corresponding to a target feature extraction dimension may be performed on original data of at least one data source according to the queried configuration information, so as to obtain at least one dimension feature of the target account.

It should be noted that, because the data table structures of the raw data of at least one data source are different, the information units in the data table include one-to-one correspondence and one-to-many correspondence, and thus, specific analysis is required for the raw data of at least one data source when feature extraction corresponding to the target feature extraction dimension is performed on the raw data of the at least one data source. For example, for a one-to-one correspondence of data items, i.e., only unique information units, such as name, age, academic calendar, etc., exist in each data table structure, the features can be directly extracted. For one-to-many information units, i.e. a plurality of information units, such as billing amount information units, in each data table structure, which may relate to the billing amount of the last 60 months and thus may relate to the data table contents of the original data of other data sources, the information units in the related data tables need to be associated and then extracted as features.

And 104, calling a risk evaluation strategy indicated by the risk evaluation request, and processing at least one dimensional characteristic of the target account to obtain risk information of the target account.

The risk assessment request of the target account not only reflects the target account information, but also indicates the adopted risk assessment strategy, so that at least one dimension characteristic of the target account can be processed by calling the risk assessment strategy indicated by the risk assessment request, and the risk information of the target account can be obtained. The risk assessment policy may be a risk assessment model, for example, a classification model to obtain a risk type, or may be in other forms, for example, rules, and the like, which is not limited in this embodiment.

According to the information processing method provided by the embodiment of the disclosure, after responding to a risk assessment request of a target account, original data associated with the target account is acquired from at least one data source, configuration information is inquired, wherein the configuration information is used for indicating a target feature extraction dimension corresponding to the at least one data source, so that feature extraction corresponding to the target feature extraction dimension is performed on the original data of the at least one data source according to the configuration information to obtain at least one dimension feature of the target account, a risk assessment strategy indicated by the risk assessment request is further called, and at least one dimension feature of the target account is processed to obtain risk information of the target account. Therefore, the original data are subjected to feature extraction corresponding to the target feature extraction dimension based on the configuration information, so that the wind control information processing under different wind control requirements is easily reused, the repeated development process is avoided, and the risk assessment efficiency is improved.

As can be seen from the above analysis, in the embodiment of the present disclosure, feature extraction corresponding to the target feature extraction dimension may be performed on the raw data of at least one data source according to the configuration information, so as to obtain at least one dimension feature of the target account, and a process of obtaining at least one dimension feature of the target account according to the configuration information is further described below with reference to fig. 2. In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

Fig. 2 is a flowchart illustrating an information processing method according to a second embodiment of the disclosure. As shown in fig. 2, the information processing method includes the steps of:

step 201, in response to a risk assessment request of a target account, raw data associated with the target account is acquired from at least one data source.

Step 202, querying configuration information, wherein the configuration information is used for indicating a target feature extraction dimension corresponding to at least one data source.

It should be noted that, the specific implementation process of the step 201-202 can refer to the description of the embodiment 101-102, and the principle is the same, which is not described herein again.

And step 203, converting the format of the original data of the at least one data source according to the data format indicated by the configuration information.

Because the configuration information indicates a data format for performing feature extraction corresponding to the target feature extraction dimension, when the data format of the raw data of the at least one data source is different from the data format indicated by the configuration information, format conversion needs to be performed on the raw data of the at least one data source, the data format of the raw data of the at least one data source is converted into a data format consistent with the data format indicated by the configuration information, and then feature extraction corresponding to the target feature extraction dimension is performed.

And 204, extracting information according to the data table structure of the original data of the at least one data source to obtain a plurality of information units of the at least one data source.

Because the data table structures of the original data of at least one data source are different, the position needing to be extracted in the data table structure is determined for each data source, and information extraction is carried out to obtain a plurality of information units of at least one data source. Namely, a plurality of information units are extracted from the same data table, so that the characteristics of the information units extracted from the same data table can be extracted later, and the data table content of the original data of other data sources is not involved.

Step 205, according to the configuration information, determining a first feature extraction dimension corresponding to a plurality of information units of at least one data source from the target feature extraction dimensions corresponding to the at least one data source.

Since the configuration information is used to indicate the target feature extraction dimension corresponding to the at least one data source, the configuration information and the plurality of information units of the at least one data source obtained in step 204 may be queried to determine, from the target feature extraction dimension corresponding to the at least one data source, a first feature extraction dimension corresponding to the plurality of information units of the at least one data source.

And step 206, performing feature extraction on the plurality of information units by adopting an extraction strategy corresponding to the first feature extraction dimension to obtain at least one dimension feature of the target account.

The plurality of information units of the at least one data source obtained in step 204 may correspond to the extraction policy, so that for the plurality of information units, the extraction policy corresponding to the first feature extraction dimension may be adopted to perform feature extraction, so as to obtain at least one dimension feature of the target account. For example: one information unit corresponds to one extraction strategy, and for example, at least two information units correspond to the same extraction strategy, so that the extraction strategy is adopted for the at least two information units to extract the features of the corresponding dimension.

For example, for continuous information units, i.e. numerical information units, such as amount of borrowed money, amount of overdue money, statistics such as summation, maximum, count, average, etc. can be made in combination with time slicing; the amount category information unit may calculate the occupation ratio, for example, the usage rate of the amount is the usage amount/total amount, the repayment rate is the repayment amount/the due repayment amount, or the financial institution category with the largest borrowing times may be extracted, which is not limited in this embodiment.

For the discrete information units, i.e. the category type information units, such as account status, payment status, service category, and query reason, statistics such as counting, deduplication, mode, etc. may be performed in combination with time slicing, or may be converted into a multi-hot (multi-hot coding) format, and the number of occurrences in each category is counted, or the latest status of the information unit may also be taken, which is not limited in this embodiment.

For the time type information unit, for example, query time and application time, statistics such as maximum, minimum, and time difference may be performed on the time series, or whether the time belongs to a working day, which quarter, and the like may be determined, which is not limited in this embodiment.

As a possible implementation manner for the special derivative information unit, a mobile phone number attribution and an operator can be derived by using the mobile phone number, and then the processing is performed according to a continuous processing manner and a discrete processing manner.

For the service rule type information unit, detailed category screening can be performed according to the service rule, and then statistics is performed. For example, the business rule may be a number with an account type of "D1" (acyclic credit account) in the basic information segment and an account status of not 3 (checkout), 5 (roll out)) and a repayment status of 1-7 in the latest performance information segment.

Step 207, invoking a risk assessment policy indicated by the risk assessment request, and processing at least one dimensional feature of the target account to obtain risk information of the target account.

It should be noted that, the specific implementation process of step 207 may refer to the description of embodiment 104, and the principle is the same, and is not described herein again.

According to the information processing method provided by the embodiment of the disclosure, after format conversion is performed on original data of at least one data source according to a data format indicated by configuration information, information extraction is performed according to a data table structure of the original data of the at least one data source to obtain a plurality of information units of the at least one data source, so that according to the configuration information, a first feature extraction dimension corresponding to the plurality of information units of the at least one data source is determined from a target feature extraction dimension corresponding to the at least one data source, and then feature extraction is performed on the plurality of information units by using an extraction strategy corresponding to the first feature extraction dimension to obtain at least one dimension feature of a target account. Therefore, different information units are subjected to feature extraction by adopting different extraction strategies of target feature extraction dimensions, at least one dimension feature of the target account is determined, corresponding feature extraction is carried out on different data according to different risk assessment requests, and the accuracy of risk assessment can be effectively improved. In addition, the data format of the original data of at least one data source can be converted into the data format consistent with the data format indicated by the configuration information, so that the efficiency of feature extraction is effectively improved.

As can be seen from the above analysis, in the embodiment of the present disclosure, the configuration information is configured in advance, so that feature extraction corresponding to a target feature extraction dimension may be performed on the raw data of at least one data source according to the configuration information, so as to obtain at least one dimension feature of the target account. Therefore, the method can be suitable for feature extraction of various different data sources and is suitable for the data sources required under different risk assessment requirements. The process of obtaining at least one dimension characteristic of the target account according to the configuration information is further described below with reference to fig. 3.

Fig. 3 is a flowchart illustrating an information processing method according to a third embodiment of the present disclosure. As shown in fig. 3, the information processing method includes the steps of:

step 301, in response to a risk assessment request of a target account, obtaining raw data associated with the target account from at least one data source.

Step 302, querying configuration information, wherein the configuration information is used for indicating a target feature extraction dimension corresponding to at least one data source.

It should be noted that, the specific implementation process of steps 301-302 may refer to the description of embodiment 101-102, and the principle is the same, which is not described herein again.

Step 303, performing format conversion on the original data of the at least one data source according to the data format indicated by the configuration information.

It should be noted that, the specific implementation process of step 303 may refer to the description of embodiment 203, and the principle is the same, which is not described herein again.

Step 304, according to a data table structure of original data of at least one data source, performing information extraction to obtain at least two target data tables, wherein the target data tables comprise at least one information unit.

Because the data table structures of the original data of at least one data source are different, the position needing to be extracted in the data table structure is determined for each data source, and information extraction is carried out to obtain at least two target data tables. Wherein the target data table comprises at least one information unit. Due to the fact that at least two target data tables are obtained, it can be determined that information units in the data tables are one-to-many and relate to data table contents of original data of other data sources.

And 305, associating the information units in the at least two target data tables according to the configuration information.

Due to the fact that the at least two target data tables are obtained, information units in the at least two target data tables need to be associated according to the configuration information, the information units which are associated with each other are obtained, and feature extraction corresponding to the feature extraction dimension of the target is facilitated.

Step 306, according to the configuration information, determining a second feature extraction dimension corresponding to the information unit associated with each other from the target feature extraction dimensions corresponding to the at least one data source.

Since the configuration information is used to indicate the target feature extraction dimension corresponding to at least one data source, a second feature extraction dimension corresponding to the correlated information unit can be determined from the target feature extraction dimensions corresponding to the at least one data source according to the configuration information and the correlated information unit obtained in step 305.

And 307, performing feature extraction on the information units which are associated with each other by adopting an extraction strategy corresponding to the second feature extraction dimension to obtain at least one dimension feature of the target account.

In the embodiment of the present disclosure, feature extraction may be performed on the information units associated with each other by using an extraction policy corresponding to the second feature extraction dimension, so as to obtain at least one dimension feature of the target account. For example: one information unit corresponds to one extraction strategy, and for example, at least two information units correspond to the same extraction strategy, so that the extraction strategy is adopted for the at least two information units to extract the features of the corresponding dimension.

For the time type information unit, for example, query time and application time, statistics such as maximum, minimum, and time difference may be performed on the time sequence, or whether the time belongs to a working day, which quarter, and the like may be determined, which is not limited in this embodiment.

And 308, calling a risk evaluation strategy indicated by the risk evaluation request, and processing at least one dimensional characteristic of the target account to obtain the risk information of the target account.

It should be noted that, the specific implementation process of step 308 may refer to the description of embodiment 104, and the principle is the same, and is not described herein again.

According to the information processing method provided by the embodiment of the disclosure, after format conversion is performed on original data of at least one data source according to a data format indicated by configuration information, information extraction is performed according to a data table structure of the original data of the at least one data source to obtain at least two target data tables, wherein the target data tables comprise at least one information unit, so that information units in the at least two target data tables are associated according to the configuration information, a second feature extraction dimension corresponding to the information unit associated with each other is determined from target feature extraction dimensions corresponding to the at least one data source, and then feature extraction is performed on the information unit associated with each other by using an extraction strategy corresponding to the second feature extraction dimension to obtain at least one dimension feature of a target account. Therefore, by associating the information units of the at least two target data tables and determining the second feature extraction dimension corresponding to the information units associated with each other, feature extraction is performed on the information units associated with each other by adopting an extraction strategy corresponding to the second feature extraction dimension, at least one dimension feature of the target account is determined, corresponding feature extraction is performed on different data according to different risk assessment requests, and accuracy of risk assessment can be effectively improved. In addition, the data format of the original data of at least one data source can be converted into the data format consistent with the data format indicated by the configuration information, so that the efficiency of feature extraction is effectively improved.

Because the risk assessment strategy may be a risk assessment model, the accuracy of the configuration information may be improved by training the risk assessment model, and the risk identification based on the configuration information optimization is implemented, so that the embodiment of the present disclosure provides a possible implementation manner of the information generation method, and fig. 4 is a schematic flow diagram of the information generation method according to the fourth embodiment of the present disclosure.

As shown in fig. 4, the information processing method may include the steps of:

step 401, obtaining a time series of sample data and risk information labeled by the time series from at least one data source.

The time sequence can be determined according to a plurality of attribute items in the data source, so that the time sequence of the sample data can be acquired from at least one data source, and the risk information labeled by the time sequence is determined according to the time sequence. Wherein the sample data is data used to train the model.

Step 402, according to the candidate combinations of the multiple feature extraction dimensions, feature extraction is respectively performed on sample data in the time sequence to obtain feature time sequences corresponding to the multiple candidate combinations.

In the embodiment of the disclosure, there are a plurality of feature extraction dimensions, which can be freely combined to obtain a plurality of candidate combinations of feature extraction dimensions, and feature extraction is respectively performed on sample data in a time sequence according to the candidate combinations of the plurality of feature extraction dimensions to obtain a feature time sequence corresponding to the plurality of candidate combinations.

Step 403, for the characteristic time sequences corresponding to the multiple candidate combinations, obtaining a training set corresponding to the candidate combinations and a testing set corresponding to the candidate combinations by time division.

In this embodiment of the present disclosure, the feature time sequences corresponding to the multiple candidate combinations obtained in step 402 may be divided according to time, so as to obtain a training set corresponding to the candidate combinations and a test set corresponding to the candidate combinations. The training set and the test set are data sets, each data set is composed of a plurality of lines of samples, and each sample is composed of a main key, an observation point, a characteristic and a y value.

As one possible implementation, the training set for training and validation and the oot (out of time) test set for test evaluation may be cut out according to the time of the sample observation point. For example, in the case where the data set is composed of samples whose credit card application time is from 1 month of 2020 to 5 months of 2020, since the time of the sample observation point is the credit card application time, i.e., from 1 month of 2020 to 5 months of 2020, samples from 1 month of 2020 to 4 months of 2020 can be selected to constitute the training set, and samples from 5 months of 2020 can be selected to constitute the oot test set.

Step 404, training a risk assessment model by using a training set of a plurality of candidate combinations, and testing the trained risk assessment model by using a test set corresponding to the candidate combinations to obtain performance data of the risk assessment model corresponding to the candidate combinations.

In the embodiment of the present disclosure, the risk assessment model may be trained by using a training set of a plurality of candidate combinations, and the trained risk assessment model may be tested by using a test set corresponding to the candidate combinations, so as to obtain performance data of the risk assessment model corresponding to the plurality of candidate combinations. It should be noted that the training set and the test set should be consistent candidate combinations.

The training set of a plurality of candidate combinations is adopted to train the risk assessment model, and the testing set corresponding to the candidate combinations is adopted to test the trained risk assessment model, so that the accuracy of the risk assessment model can be effectively improved.

Step 405, according to the performance data of the risk assessment model corresponding to the multiple candidate combinations, a target combination is obtained by screening from the multiple candidate combinations, and according to the target combination, a target feature extraction dimension corresponding to at least one data source is determined.

In this embodiment of the disclosure, a target combination may be screened from the multiple candidate combinations according to the performance data of the risk assessment model corresponding to the multiple candidate combinations obtained in step 404, so that a target feature extraction dimension corresponding to at least one data source is determined according to the target combination.

And 406, generating configuration information according to the target feature extraction dimension corresponding to the at least one data source.

In the embodiment of the present disclosure, the configuration information may be generated based on the target feature extraction dimension corresponding to the at least one data source determined in step 405. The configuration information is used for indicating the target feature extraction dimension corresponding to at least one data source.

The performance data of the risk evaluation models corresponding to the multiple candidate combinations can be obtained through testing the trained risk evaluation model, the target combinations are obtained through screening from the multiple candidate combinations, the target feature extraction dimension corresponding to at least one data source is determined according to the target combinations, and the configuration information is generated, so that the accuracy of the configuration information can be effectively improved, the risk identification based on the configuration information is optimized, and the accuracy of the risk evaluation strategy is improved.

The information generation method provided by the embodiment of the disclosure obtains a time sequence of sample data and risk information labeled by the time sequence from at least one data source, realizes candidate combinations according to a plurality of feature extraction dimensions, respectively performs feature extraction on the sample data in the time sequence to obtain feature time sequences corresponding to a plurality of candidate combinations, divides the feature time sequences corresponding to the candidate combinations by time to obtain training sets corresponding to the candidate combinations and test sets corresponding to the candidate combinations, adopts the training sets of the candidate combinations, trains a risk assessment model, and adopts the test sets corresponding to the candidate combinations to test the trained risk assessment model to obtain performance data of the risk assessment model corresponding to the candidate combinations, thereby obtaining a target combination from the candidate combinations by screening according to the performance data of the risk assessment model corresponding to the candidate combinations, and determining a target feature extraction dimension corresponding to at least one data source according to the target combination, and generating configuration information according to the target feature extraction dimension corresponding to at least one data source. Therefore, the target feature extraction dimension corresponding to at least one data source is determined according to the target combination by training the risk assessment model, and the configuration information is generated according to the target feature extraction dimension corresponding to at least one data source, so that the risk identification is optimized based on the configuration information, and the accuracy of the risk assessment strategy is effectively improved.

In order to clearly illustrate the process of acquiring the time series of the sample data and the risk information labeled by the time series from the at least one data source in step 401 in the embodiment shown in fig. 4, a schematic flowchart of acquiring the time series of the sample data and the risk information labeled by the time series shown in fig. 5 is provided in this embodiment, and as shown in fig. 5, acquiring the time series of the sample data and the risk information labeled by the time series from the at least one data source may include the following steps:

step 501, according to a set interval, determining a target attribute item from a plurality of attribute items of at least one data source, wherein the number of candidate values of the target attribute item is within the set interval.

Here, the setting section may be set as needed, so that the target attribute item is determined from among the plurality of attribute items of the at least one data source according to the setting section. And the number of the candidate values of the target attribute items is within a set interval.

Step 502, obtaining sample data of the target attribute item at a plurality of monitoring moments from at least one data source.

Here, sample data of the target attribute item at a plurality of monitoring times may be acquired from at least one data source. Wherein, the monitoring time is different.

Step 503, generating a time sequence according to the sample data of the target attribute item at a plurality of monitoring moments.

Here, a time series may be generated according to the sample data of the target attribute item acquired in step 502 at a plurality of monitoring times. Wherein, the time series is marked with risk information.

In summary, the target attribute item is determined from the multiple attribute items of the at least one data source according to the set interval, wherein the number of candidate values of the target attribute item is within the set interval, so that sample data of the target attribute item at multiple monitoring moments is obtained from the at least one data source, and a time sequence is generated according to the sample data of the target attribute item at the multiple monitoring moments. Therefore, the time sequence can be generated as required, the model can be trained, and the effect of model training is effectively improved.

Corresponding to the information processing method provided in the embodiments of fig. 1 to 3, the present disclosure also provides an information processing apparatus, and since the information processing apparatus provided in the embodiments of the present disclosure corresponds to the information processing method provided in the embodiments of fig. 1 to 3, the implementation manner of the information processing method is also applicable to the information processing apparatus provided in the embodiments of the present disclosure, and is not described in detail in the embodiments of the present disclosure.

Fig. 6 is a schematic configuration diagram of an information processing apparatus provided according to a fifth embodiment of the present disclosure.

As shown in fig. 6, the information processing apparatus 60 includes: a first obtaining module 61, a query module 62, a first feature extraction module 63 and a first processing module 64.

A first obtaining module 61, configured to obtain, in response to a risk assessment request of a target account, raw data associated with the target account from at least one data source;

a query module 62, configured to query configuration information, where the configuration information is used to indicate a target feature extraction dimension corresponding to the at least one data source;

a first feature extraction module 63, configured to perform feature extraction corresponding to a target feature extraction dimension on the original data of the at least one data source according to the configuration information, so as to obtain at least one dimension feature of the target account;

a first processing module 64, configured to invoke a risk assessment policy indicated by the risk assessment request, and process the at least one dimension feature of the target account to obtain risk information of the target account.

In a possible implementation manner of the embodiment of the present disclosure, the first feature extraction module 63 is specifically configured to:

extracting information according to a data table structure of the original data of the at least one data source to obtain a plurality of information units of the at least one data source;

according to the configuration information, determining a first feature extraction dimension corresponding to a plurality of information units of the at least one data source from target feature extraction dimensions corresponding to the at least one data source;

and performing feature extraction on the plurality of information units by adopting an extraction strategy corresponding to a first feature extraction dimension to obtain at least one dimension feature of the target account.

extracting information according to a data table structure of the original data of the at least one data source to obtain at least two target data tables, wherein the target data tables comprise at least one information unit;

according to the configuration information, associating information units in at least two target data tables;

according to the configuration information, determining a second feature extraction dimension corresponding to the information unit which is associated with each other from the target feature extraction dimensions corresponding to the at least one data source;

and performing feature extraction on the information units which are mutually associated by adopting an extraction strategy corresponding to a second feature extraction dimension to obtain at least one dimension feature of the target account.

In one possible implementation manner of the embodiment of the present disclosure, the apparatus further includes a format conversion module 65.

A format conversion module 65, configured to perform format conversion on the raw data of the at least one data source according to the data format indicated by the configuration information.

The information processing apparatus provided by the embodiment of the disclosure, after responding to a risk assessment request of a target account, acquires original data associated with the target account from at least one data source, and queries configuration information, where the configuration information is used to indicate a target feature extraction dimension corresponding to the at least one data source, so as to perform feature extraction corresponding to the target feature extraction dimension on the original data of the at least one data source according to the configuration information, so as to obtain at least one dimension feature of the target account, and further invoke a risk assessment policy indicated by the risk assessment request, and process at least one dimension feature of the target account, so as to obtain risk information of the target account. Therefore, the original data are subjected to feature extraction corresponding to the target feature extraction dimension based on the configuration information, so that the wind control information processing under different wind control requirements is easily reused, the repeated development process is avoided, and the risk assessment efficiency is improved.

Corresponding to the information generating method provided in the embodiments of fig. 4 to 5, the present disclosure also provides an information generating apparatus, and since the information generating apparatus provided in the embodiments of the present disclosure corresponds to the information generating method provided in the embodiments of fig. 4 to 5, the implementation manner of the information generating method is also applicable to the information generating apparatus provided in the embodiments of the present disclosure, and is not described in detail in the embodiments of the present disclosure.

Fig. 7 is a schematic structural diagram of an information generating apparatus provided according to a sixth embodiment of the present disclosure.

As shown in fig. 7, the information generating apparatus 70 includes: a second obtaining module 71, a second feature extraction module 72, a dividing module 73, a second processing module 74, a screening module 75, and a generating module 76.

A second obtaining module 71, configured to obtain a time series of sample data and risk information labeled by the time series from at least one data source;

a second feature extraction module 72, configured to perform feature extraction on sample data in the time sequence according to candidate combinations of multiple feature extraction dimensions, respectively, so as to obtain feature time sequences corresponding to multiple candidate combinations;

a dividing module 73, configured to divide the feature time sequences corresponding to the multiple candidate combinations by time to obtain a training set corresponding to the candidate combinations and a test set corresponding to the candidate combinations;

a second processing module 74, configured to train a risk assessment model using the training set of the multiple candidate combinations, and test the trained risk assessment model using the test set corresponding to the candidate combinations, so as to obtain performance data of the risk assessment model corresponding to the multiple candidate combinations;

a screening module 75, configured to screen a target combination from the multiple candidate combinations according to performance data of the risk assessment models corresponding to the multiple candidate combinations, so as to determine, according to the target combination, a target feature extraction dimension corresponding to the at least one data source;

a generating module 76, configured to generate configuration information according to the target feature extraction dimension corresponding to the at least one data source.

In a possible implementation manner of the embodiment of the present disclosure, the second obtaining module 71 includes:

the determining unit is used for determining a target attribute item from a plurality of attribute items of the at least one data source according to a set interval, wherein the number of candidate values of the target attribute item is within the set interval;

the acquisition unit is used for acquiring sample data of the target attribute item at a plurality of monitoring moments from the at least one data source;

and the generating unit is used for generating the time sequence according to the sample data of the target attribute item at a plurality of monitoring moments.

The information generating device provided by the embodiment of the disclosure obtains a time sequence of sample data and risk information labeled by the time sequence from at least one data source, implements candidate combinations according to a plurality of feature extraction dimensions, respectively performs feature extraction on the sample data in the time sequence to obtain feature time sequences corresponding to a plurality of candidate combinations, divides the feature time sequences corresponding to the plurality of candidate combinations by time to obtain a training set corresponding to the candidate combinations and a test set corresponding to the candidate combinations, trains a risk assessment model by using the training sets of the plurality of candidate combinations, tests the trained risk assessment model by using the test set corresponding to the candidate combinations to obtain performance data of the risk assessment model corresponding to the plurality of candidate combinations, thereby obtaining a target combination by screening from the plurality of candidate combinations according to the performance data of the risk assessment model corresponding to the plurality of candidate combinations, and further determining a target feature extraction dimension corresponding to at least one data source according to the target combination. Therefore, the target feature extraction dimension corresponding to at least one data source is determined according to the target combination by training the risk assessment model, and the configuration information is generated according to the target feature extraction dimension corresponding to at least one data source, so that the risk identification is optimized based on the configuration information, and the accuracy of the risk assessment strategy is effectively improved.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the device 800 includes a computing unit 801 that can perform various appropriate actions and processes in accordance with a computer program stored in a ROM (Read-Only Memory) 802 or a computer program loaded from a storage unit 808 into a RAM (Random Access Memory) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An I/O (Input/Output) interface 805 is also connected to the bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing Unit 801 include, but are not limited to, a CPU (Central Processing Unit), a GPU (graphics Processing Unit), various dedicated AI (Artificial Intelligence) computing chips, various computing Units running machine learning model algorithms, a DSP (Digital Signal Processor), and any suitable Processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as the information processing method and/or the information generation method described above. For example, in some embodiments, the information processing methods and/or information generation methods described above may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the information processing method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the above-described information processing method and/or information generating method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be realized in digital electronic circuitry, Integrated circuitry, FPGAs (Field Programmable Gate arrays), ASICs (Application-Specific Integrated circuits), ASSPs (Application Specific Standard products), SOCs (System On Chip, System On a Chip), CPLDs (Complex Programmable Logic devices), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an EPROM (Electrically Programmable Read-Only-Memory) or flash Memory, an optical fiber, a CD-ROM (Compact Disc Read-Only-Memory), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: other types of devices may also be used to provide interaction with a user, for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: LAN (Local Area Network), WAN (Wide Area Network), internet, and blockchain Network.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be noted that artificial intelligence is a subject for studying a computer to simulate some human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), and includes both hardware and software technologies. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge map technology and the like.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in this disclosure may be performed in parallel, sequentially, or in a different order, as long as the desired results of the technical solutions provided by this disclosure can be achieved, and are not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An information processing method comprising:

in response to a risk assessment request of a target account, obtaining raw data associated with the target account from at least one data source;

2. The method of claim 1, wherein the performing, according to the configuration information, feature extraction corresponding to a target feature extraction dimension on the raw data of the at least one data source to obtain at least one dimension feature of the target account comprises:

3. The method of claim 1, wherein the performing, according to the configuration information, feature extraction corresponding to a target feature extraction dimension on the raw data of the at least one data source to obtain at least one dimension feature of the target account comprises:

determining a second feature extraction dimension corresponding to the information units which are related to each other according to the configuration information;

4. The method according to any one of claims 1 to 3, wherein before performing feature extraction corresponding to a target feature extraction dimension on the raw data of the at least one data source according to the configuration information to obtain at least one dimension feature of the target account, the method further comprises:

and converting the format of the original data of the at least one data source according to the data format indicated by the configuration information.

5. An information generating method, comprising:

screening target combinations from the candidate combinations according to the performance data of the risk evaluation models corresponding to the candidate combinations, and determining target feature extraction dimensions corresponding to the at least one data source according to the target combinations;

6. The method of claim 5, wherein said obtaining a time series of sample data from at least one data source comprises:

according to a set interval, determining a target attribute item from a plurality of attribute items of the at least one data source, wherein the number of candidate values of the target attribute item is within the set interval;

acquiring sample data of the target attribute item at a plurality of monitoring moments from the at least one data source;

and generating the time sequence according to the sample data of the target attribute item at a plurality of monitoring moments.

7. An information processing apparatus comprising:

the first feature extraction module is used for performing feature extraction corresponding to target feature extraction dimensions on the original data of the at least one data source according to the configuration information to obtain at least one dimension feature of the target account;

8. The apparatus of claim 7, wherein the first feature extraction module is further configured to:

9. The apparatus of claim 7, wherein the first feature extraction module is further configured to:

10. The apparatus of any of claims 7-9, wherein the apparatus further comprises:

and the format conversion module is used for performing format conversion on the original data of the at least one data source according to the data format indicated by the configuration information.

11. An information processing apparatus includes:

the second feature extraction module is used for respectively extracting features of sample data in the time sequence according to candidate combinations of a plurality of feature extraction dimensions to obtain feature time sequences corresponding to the candidate combinations;

12. The apparatus of claim 11, wherein the second obtaining means comprises:

13. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4 or to perform the method of any one of claims 5-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-4 or the method of any one of claims 5-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-4 or performs the method of any of claims 5-6.