CN113806641A - Deep learning-based recommendation method and device, electronic equipment and storage medium - Google Patents

Deep learning-based recommendation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113806641A
CN113806641A CN202111110476.7A CN202111110476A CN113806641A CN 113806641 A CN113806641 A CN 113806641A CN 202111110476 A CN202111110476 A CN 202111110476A CN 113806641 A CN113806641 A CN 113806641A
Authority
CN
China
Prior art keywords
data
data group
group
cluster
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111110476.7A
Other languages
Chinese (zh)
Other versions
CN113806641B (en
Inventor
汪誉
田鸥
葛艺舟
高民
肖地长
熊嘉娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202111110476.7A priority Critical patent/CN113806641B/en
Publication of CN113806641A publication Critical patent/CN113806641A/en
Application granted granted Critical
Publication of CN113806641B publication Critical patent/CN113806641B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Technology Law (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to the technical field of artificial intelligence, and particularly discloses a deep learning-based recommendation method, a deep learning-based recommendation device, electronic equipment and a storage medium, wherein the recommendation method comprises the following steps: obtaining a modeling sample set; clustering at least one historical entrusted data to obtain at least one cluster; performing data retrieval according to each historical entrusted data to obtain at least one first data group; preprocessing each first data group in at least one first data group to obtain at least one second data group; inputting at least one second data set into the deep learning model for training to obtain a sequencing model; obtaining a to-be-recommended entrusting, and determining a first clustering cluster in at least one clustering cluster according to the to-be-recommended entrusting; acquiring a lawyer data group corresponding to the first clustering cluster; inputting the lawyer data group and the to-be-recommended entrusts into the sequencing model to obtain target lawyer data, and recommending the target lawyer data to the requesters of the to-be-recommended entrusts.

Description

Deep learning-based recommendation method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a deep learning-based recommendation method and device, electronic equipment and a storage medium.
Background
At present, judicial lawsuits are the last barrier for financial asset associated companies to maintain national financial debt, are also common means for pursuing debts in economic activities, and are one of the important ways for effectively urging collection of undesirable assets. However, the personal loan service has the characteristics of small credit amount, large number of clients, scattered client sources and the like, so the problem of difficult litigation is particularly prominent. The reason for such problems is that the accuracy of lawyer selection is low, so that the matching degree between the lawyer and the commission is low, the requirements of the financial institutions are not matched with legal resources, so that the lawsuit cannot follow up in time, the rate of money recovery is low, the disposal efficiency of bad assets is greatly reduced, the progress of the lawsuit is influenced, and the collection cost is increased.
Disclosure of Invention
In order to solve the above problems in the prior art, the embodiments of the present application provide a deep learning-based recommendation method, apparatus, electronic device and storage medium, which can improve the fine management level of poor asset clearance and disposal by accurately recommending lawyers for the specificity of entrusted business.
In a first aspect, an embodiment of the present application provides a deep learning-based recommendation method, including:
obtaining a modeling sample set, wherein the modeling sample set comprises at least one piece of historical delegation data, and each piece of historical delegation data in the at least one piece of historical delegation data is used for recording one-time complete bad asset delegation data;
clustering at least one historical entrusted data to obtain at least one cluster;
performing data retrieval according to each historical entrusted data to obtain at least one first data group, wherein the at least one first data group corresponds to the at least one historical entrusted data one to one;
preprocessing each first data group in at least one first data group to obtain at least one second data group, wherein the at least one second data group is in one-to-one correspondence with the at least one first data group;
inputting at least one second data set into the deep learning model for training to obtain a sequencing model;
obtaining a to-be-recommended entrusting, and determining a first clustering cluster in at least one clustering cluster according to the to-be-recommended entrusting, wherein the similarity between a history entrusting corresponding to the first clustering cluster and the to-be-recommended entrusting is highest;
acquiring a lawyer data group corresponding to the first clustering cluster;
inputting the lawyer data group and the to-be-recommended entrusts into the sequencing model to obtain target lawyer data, and recommending the target lawyer data to the requesters of the to-be-recommended entrusts.
In a second aspect, an embodiment of the present application provides a deep learning-based recommendation apparatus, including:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a modeling sample set, the modeling sample set comprises at least one piece of historical entrusting data, and each piece of historical entrusting data in the at least one piece of historical entrusting data is used for recording one-time complete bad asset entrusting data;
the clustering module is used for clustering at least one historical entrusted data to obtain at least one clustering cluster;
the preprocessing module is used for performing data retrieval according to each historical entrusted data to obtain at least one first data group, wherein the at least one first data group is in one-to-one correspondence with the at least one historical entrusted data, and preprocessing each first data group in the at least one first data group to obtain at least one second data group, and the at least one second data group is in one-to-one correspondence with the at least one first data group;
the training module is used for inputting at least one second data set into the deep learning model for training to obtain a sequencing model;
and the recommending module is used for acquiring the entrusts to be recommended, determining a first cluster in at least one cluster according to the entrusts to be recommended, wherein the similarity between the historical entrusts corresponding to the first cluster and the entrusts to be recommended is highest, acquiring lawyer data groups corresponding to the first cluster, inputting the lawyer data groups and the entrusts to be recommended into the sequencing model to obtain target lawyer data, and recommending the target lawyer data to the requesters of the entrusts to be recommended.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor coupled to the memory, the memory for storing a computer program, the processor for executing the computer program stored in the memory to cause the electronic device to perform the method of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having a computer program stored thereon, the computer program causing a computer to perform the method according to the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer operable to cause the computer to perform a method according to the first aspect.
The implementation of the embodiment of the application has the following beneficial effects:
in the embodiment of the application, a modeling sample set is formed by acquiring a plurality of historical commission data of once-recorded complete bad asset commission data. And then clustering at least one historical entrusted data contained in the modeling sample set based on the modeling sample set, and dividing the at least one historical entrusted data into N clustering clusters, wherein the clustering clusters contain the pairing relation between the entrusted business in the clusters and lawyers. And then, performing data retrieval according to each historical entrusted data, acquiring retrieval data related to each historical entrusted data, supplementing each historical entrusted data, preprocessing each supplemented historical entrusted data, obtaining a training sample for training a sequencing model, and training the deep learning model. And then, comparing the real-time entrusts to be recommended with the N clustering clusters, confirming the clustering clusters with similar service characteristics of the real-time entrusts to be recommended in the N clustering clusters, and acquiring lawyer data groups corresponding to the clustering clusters. And finally, inputting the lawyer data group and the to-be-recommended entrusts into the sequencing model to obtain target lawyer data, and recommending the target lawyer data to the requesters of the to-be-recommended entrusts. Therefore, historical poor assets are divided into a plurality of asset groups through a clustering algorithm, lawyer groups are associated, accurate division of the poor assets is achieved, and the intelligent recommended recall strategy is achieved. And then aiming at lawyers in different asset groups, the recovery rate in the commission period is predicted, the fine ranking is carried out according to the recovery effect, the calculation complexity can be effectively reduced, the recommendation efficiency and accuracy are greatly improved, and the fine management level of collection and disposal of the bad assets is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic hardware structure diagram of a deep learning-based recommendation device according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a deep learning-based recommendation method according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of a method for preprocessing each first data set in at least one first data set to obtain at least one second data set according to an embodiment of the present disclosure;
fig. 4 is a schematic flowchart of a method for performing data expansion on at least one first data set according to at least one set of feature tags to obtain at least one third data set according to an embodiment of the present application;
fig. 5 is a schematic flowchart of a method for performing data cleansing on each of at least one third data set to obtain at least one second data set according to an embodiment of the present application;
fig. 6 is a block diagram illustrating functional modules of a deep learning based recommendation device according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application are within the scope of protection of the present application.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
First, referring to fig. 1, fig. 1 is a schematic diagram of a hardware structure of a deep learning-based recommendation device according to an embodiment of the present disclosure. The deep learning based recommendation device 100 comprises at least one processor 101, a communication line 102, a memory 103 and at least one communication interface 104.
In this embodiment, the processor 101 may be a general processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs according to the present disclosure.
The communication link 102, which may include a path, carries information between the aforementioned components.
The communication interface 104 may be any transceiver or other device (e.g., an antenna, etc.) for communicating with other devices or communication networks, such as an ethernet, RAN, Wireless Local Area Network (WLAN), etc.
The memory 103 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a Random Access Memory (RAM) or other types of dynamic storage devices that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), but is not limited to, magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
In this embodiment, the memory 103 may be independent and connected to the processor 101 through the communication line 102. The memory 103 may also be integrated with the processor 101. The memory 103 provided in the embodiments of the present application may generally have a nonvolatile property. The memory 103 is used for storing computer-executable instructions for executing the scheme of the application, and is controlled by the processor 101 to execute. The processor 101 is configured to execute computer-executable instructions stored in the memory 103, thereby implementing the methods provided in the embodiments described below in the present application.
In alternative embodiments, computer-executable instructions may also be referred to as application code, which is not specifically limited in this application.
In alternative embodiments, processor 101 may include one or more CPUs, such as CPU0 and CPU1 of FIG. 1.
In an alternative embodiment, the deep learning based recommendation device 100 may include multiple processors, such as the processor 101 and the processor 107 of fig. 1. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In an alternative embodiment, if the deep learning based recommendation device 100 is a server, for example, the server may be an independent server, or may be a cloud server that provides basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data and artificial intelligence platform, and the like. The deep learning based recommendation apparatus 100 may further include an output device 105 and an input device 106. The output device 105 is in communication with the processor 101 and may display information in a variety of ways. For example, the output device 105 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 106 is in communication with the processor 101 and may receive user input in a variety of ways. For example, the input device 106 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
The deep learning based recommendation apparatus 100 may be a general-purpose device or a special-purpose device. The present embodiment does not limit the type of the recommendation device 100 based on deep learning.
Next, it should be noted that the embodiments disclosed in the present application may acquire and process related data based on artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Furthermore, the embodiments disclosed in the present application can also be applied in medical application scenarios. Specifically, in a medical application scenario, the input sample may be a medical image, and the type of the object included in the input sample is a lesion, that is, a portion of the body where a lesion occurs. Medical images refer to images of internal tissues, e.g., stomach, abdomen, heart, knee, brain, which are non-invasively obtained for medical treatment or medical research, such as images generated by medical instruments from CT (Computed Tomography), MRI (Magnetic Resonance Imaging), US (ultrasound), X-ray images, electroencephalograms, and photo-optical lamps.
In one possible embodiment, the input data may be medical data, such as personal health records, prescriptions, exam reports, and the like. In another possible embodiment, the input text may be medical text, and the medical text may be a medical Electronic Record (Electronic Healthcare Record), an Electronic personal health Record, and a series of Electronic records including medical records, electrocardiograms, medical images, and the like with a value to be saved for future reference.
In this embodiment, each of the input samples can be obtained quickly by querying information. For example, in the medical field, when case data is required as input data, medical record information required by a user can be queried from a vast number of electronic medical records based on an artificial intelligence model.
In the following, the deep learning based recommendation method disclosed in the present application will be explained:
referring to fig. 2, fig. 2 is a schematic flowchart of a deep learning-based recommendation method according to an embodiment of the present disclosure. The deep learning-based recommendation method comprises the following steps of:
201: a set of modeling samples is obtained.
In this embodiment, the modeling sample set includes at least one historical commitment data, each historical commitment data of the at least one historical commitment data for recording a complete bad asset commitment data. Illustratively, the historical commitment data may be obtained from an inline loan information system, a banking transaction system, a bad-account handling system, or the like.
202: and clustering at least one historical entrusted data to obtain at least one cluster.
In this embodiment, the behavior characteristics in the historical delegation data are not hierarchically classified based on the business scenarios recommended by lawyers. Therefore, clustering and merging can be performed according to the characteristics of the historical entrusted data, and lawyers corresponding to the aggregated historical entrusted data are integrated to form a related lawyer group. Therefore, at least one piece of historical entrusting data is divided into at least one cluster, and each cluster comprises the pairing relation between the entrusting business in the cluster and the lawyers.
In this embodiment, the clustering process may be performed on at least one piece of historical request data by five algorithms, i.e., BIRCH (hierarchy-based), KMEANS (partition-based), DBSCAN (density-based), GMM (model-based), and Spectral clustering. Specifically, first, a clustering model may be constructed based on each clustering algorithm, and parameters of the clustering model are initially set in the construction process. And then, selecting the clustering category number according to the CH Index and the average profile coefficient, and determining the optimal parameters of each clustering model by combining the sample size distribution condition in each clustering group. And finally, determining the optimal clustering number of the algorithm according to the optimal parameter combination.
In the embodiment, the Spectral algorithm only needs the similarity matrix between data, so that the clustering method is effective for processing sparse data and friendly to large-sample multi-dimensional sparse data. However, the lawyer recommendation scene in this application adopts a large amount of asset characteristics and behavior data, and there are many sparse characteristic dimensions, and spectral clustering can well make up for the regret. Based on this, the following describes a process of clustering at least one historical delegation data by taking a Spectral algorithm as an example.
Specifically, each historical delegation data is mapped to points in space according to characteristics of each historical delegation data in at least one historical delegation data, and the points can be connected through edges. Wherein, the weight value of the edge between two points with longer distance is lower, and the weight value of the edge between two points with shorter distance is higher. Then, the graph formed by all data points is cut, and the edge weight sum of different sub graphs after the graph is cut is as low as possible, while the edge weight sum in the sub graphs is as high as possible. Therefore, the optimal segmentation mode is obtained, and at least one historical entrusted data is segmented into a plurality of clustering clusters, so that the clustering purpose is achieved.
203: and performing data retrieval according to each historical entrusted data to obtain at least one first data group.
In this embodiment, at least one first data set corresponds to at least one historical delegation data in a one-to-one correspondence. In this embodiment, in order to enable each historical entrustment data to reflect all features in one entrustment more comprehensively, keyword extraction may be performed on each historical entrustment, and then each historical entrustment data may be supplemented according to data corresponding to keyword search to obtain at least one first data group.
Specifically, the extracted keywords may be: the name of the owner of the business corresponding to each historical entrusted data, the name of the lawyer who handles the business corresponding to each historical entrusted data, and the name of the business corresponding to each historical entrusted data. Based on this, through searching, the personal data of the owner of the business corresponding to each historical delegation data, the personal data of the lawyer who handles the business corresponding to each historical delegation data, and the business data of the business corresponding to each historical delegation data can be obtained as supplements to each historical delegation data.
204: and preprocessing each first data group in the at least one first data group to obtain at least one second data group.
In this embodiment, the at least one second data group and the at least one first data group correspond one to one. Specifically, the present application provides a method for preprocessing each of at least one first data set to obtain at least one second data set, as shown in fig. 3, the method includes:
301: and performing feature extraction on each first data group in the at least one first data group to obtain at least one group of feature labels.
In this embodiment, at least one set of feature tags corresponds to at least one first data set, and each of the at least one set of feature tags is used to declare a feature of an asset or a lawyer's ability in historical committee data corresponding to each first data set. In particular, each set of feature tags of the at least one set of feature tags may comprise: a client portrait label, an asset portrait label, and a lawyer portrait label.
302: and performing data expansion on at least one first data group according to at least one group of feature tags to obtain at least one third data group.
In this embodiment, at least one third data group corresponds to at least one first data group. Specifically, the present application provides a method for performing data expansion on at least one first data set according to at least one set of feature tags to obtain at least one third data set, as shown in fig. 4, the method includes:
401: and inputting the client portrait label and the asset portrait label corresponding to each first data set into a client relationship management model for data derivation to obtain at least one derived client data.
In the present embodiment, each first data set is data-derived through a customer relationship management (RFM) model, that is, at least one derived business data and at least one first data set are in one-to-one correspondence. In the RFM model, R represents the last consumption Recency, F represents the consumption Frequency, and M represents the consumption amount Monetary. Specifically, the preference of the client for the asset service can be reflected by calculating F, the value contributed by the client can be reflected by calculating M, and whether the client is a silent client or not can be reflected by calculating R. Therefore, the RFM characteristics of the business owner corresponding to each first data group are constructed, and then a series of character data which are in accordance with the RFM characteristics are derived through the data derivation device to serve as derived customer data.
402: inputting the lawyer portrait label and the asset portrait label corresponding to each first data set into a client relationship management model for data derivation to obtain at least one derived lawyer data.
In this embodiment, the at least one derivative attorney data corresponds one-to-one with the at least one first data set. Meanwhile, the method for generating derived attorney data is similar to the method for generating derived client data in step 401, and is not described herein again.
403: and combining each first data group, the derivative business data corresponding to each first data group and the derivative attorney data corresponding to each first data group to obtain at least one third data group.
In this embodiment, the derived client data and the derived attorney data generated based on the same first data set may be combined with the first data set to obtain a third data set.
303: and performing data cleaning on each third data group in the at least one third data group to obtain at least one second data group.
In this embodiment, at least one second data group corresponds to at least one third data group one to one. Specifically, the present application provides a method for performing data cleansing on each of at least one third data set to obtain at least one second data set, as shown in fig. 5, where the method includes:
501: and screening at least one fourth data group from the at least one third data group for completion according to a preset completion rule to obtain at least one fifth data group.
In the present embodiment, at least one fifth data group and at least one fourth data group correspond one-to-one. Illustratively, the at least one third data set may be filtered by determining a deletion rate for each third data set. Specifically, the missing rate is a data structure of each third data group, and the missing part accounts for a proportion of the standard data structure with respect to the standard data structure of the data type corresponding to each third data group.
Illustratively, the data structure of some third data group is [ data type 1; data type 2; data type 5; the data type 8 is a standard data structure of the corresponding data type [ data type 1 ]; data type 2; data type 3; data type 5; data type 6; data type 8). Then, with respect to the standard data structure, the missing data types of the third data group are [ data type 3 ] and [ data type 6 ], and the number is 2, and meanwhile, the number of the data types in the standard data structure is 6, then the missing rate of the third data group is: 2/6-33%.
Thus, in the present embodiment, the third data group having the deletion rate smaller than the fourth threshold value can be set as the fourth data group. In short, if the missing rate of a certain third data group exceeds the fourth threshold, it indicates that the data of the third data group is seriously missing, and even if the missing value is technically complemented, the supplemented data has insufficient accuracy due to the deficiency of the basic data, and finally, garbage data is formed, which affects the training of the model. Therefore, the third data group with the missing rate exceeding the fourth threshold value can be directly discarded, so as to improve the processing efficiency. Specifically, the fourth threshold may be 30%.
Then, for each fourth data group passing the screening, a completion method corresponding to the data type can be obtained according to the data type of each fourth data group, and each fourth data group is completed to obtain at least one fifth data group.
In this embodiment, since at least one fourth data group includes different types of data, and the data characteristics of the different types of data are different, different complementing methods can be used for samples of different data types. Illustratively, the completion method may include neighbor supplementation, median supplementation, and mean supplementation.
502: for each of the at least one fifth data set, a scrambling code rate of each fifth data set is determined.
In this embodiment, the character set of each fifth data group may be obtained according to the data type of the data group, and then the number of characters in the character set that does not belong to the fifth data group and exists in the fifth data group is determined, so that the ratio of the number of characters in the character set that does not belong to the fifth data group and exists in the fifth data group to the total number of characters in the fifth data group is used as the scrambling rate of the fifth data group.
503: and respectively carrying out discrete processing on each fifth data group, and determining the number of code values obtained after each fifth data group is discrete.
504: and determining at least one sixth data group in the at least one fifth data group according to the scrambling code rate of each fifth data group and the number of code values obtained after the dispersion of each fifth data group.
In this embodiment, the scrambling code rate corresponding to each of the at least one sixth data group is greater than the first threshold, or the number of code values obtained by discretizing each of the sixth data groups is greater than the second threshold. For example, in a simple manner, if the scrambling rate of a certain fifth data group exceeds the first threshold, it indicates that the data scrambling of the fifth data group is serious, and even if the scrambling is recovered and complemented, the precision of the complemented data is insufficient due to the deficiency of the basic data, and finally, poor-quality data is formed, which affects the training of the model. Similarly, the number of code values obtained by a certain fifth data group after dispersion is greater than the second threshold, which indicates that the fifth data group includes many label points, the data dispersion is serious, high-quality data features cannot be extracted, and the training of the model is also affected. Therefore, the fifth data group with the code scrambling rate exceeding the first threshold or the number of the code values obtained after the dispersion being larger than the second threshold can be directly discarded, so as to improve the processing efficiency. Specifically, the first threshold value may be 30%, and the second threshold value may be 500.
505: and screening at least one seventh data group from the at least one sixth data group according to a preset importance rule to calculate the importance, so as to obtain at least one characteristic importance.
In the present embodiment, at least one feature importance degree corresponds one-to-one to at least one seventh data group.
In this embodiment, a plurality of sixth data sets with strong correlation can be screened out as a seventh data set by calculating a correlation coefficient between each sixth data set, so that the overall feature direction of the data is improved, and the complexity of subsequent model training is reduced.
For example, a Spearman correlation coefficient between any two different ones of the at least one sixth data sets may be calculated to determine a correlation between the any two different ones of the sixth data sets.
Specifically, the Spearman correlation coefficient can be represented by the formula (i):
Figure BDA0003272273390000121
wherein, gjRepresents the difference of any two different sixth data sets; t represents at least oneThe number of sixth data sets. In this embodiment, several sixth data sets with Spearman correlation coefficients greater than 0.7 may be clustered to obtain at least one seventh data set.
In an alternative embodiment, a method of determining a correlation coefficient between any two different sixth data sets is also provided. Specifically, assuming that any two different sixth data sets are respectively called a sixth data set a and a sixth data set B, the sixth data set a may be subjected to modulo operation to obtain a first modulo, and then the sixth data set B may be subjected to modulo operation to obtain a second modulo. Then, a product value of the first modulus and the second modulus is determined, and an inner product between the sixth data group a and the sixth data group B is determined. Finally, the quotient of the inner product and the product value is used as a correlation coefficient between the sixth data set a and the sixth data set B.
For example, the cosine value of the angle between the sixth data group a and the sixth data group B may be calculated by dot product, and the cosine value of the angle may be used as the correlation coefficient between the sixth data group a and the sixth data group B.
Specifically, for the sixth data set a ═ a1, a2, …, ai, …, an ], and the sixth data set B ═ B1, B2, …, bi, …, bn ], where i ═ 1, 2, …, n. The cosine value of the included angle can be expressed by a formula II:
Figure BDA0003272273390000131
where a · B represents the inner product of the sixth data set a and the sixth data set B, | is an analog symbol, | a | represents the modulus of the sixth data set a, | B | represents the modulus of the sixth data set B.
Further, the inner product of the sixth data group a and the sixth data group B can be expressed by formula (c):
Figure BDA0003272273390000132
further, the modulus of the sixth data set a can be expressed by the formula (iv):
Figure BDA0003272273390000133
and finally, taking the cosine value of the included angle as a correlation coefficient between the sixth data group A and the sixth data group B. For example, the correlation coefficient between the sixth data group a and the sixth data group B may be expressed by the formula (v):
d=cosθ…………⑤
because the value range of the cosine value is (1, 1), the cosine value still has the properties of 1 when the cosine value is the same, 0 when the cosine value is orthogonal and-1 when the cosine value is opposite under the condition of high dimension. That is, the closer the cosine value is to 1, the closer the direction between the two features is, the greater the correlation is; the closer they approach-1, the more opposite their direction, the less relevant; close to 0 indicates that the two features are nearly orthogonal, which may represent a relative difference in the direction of the two features. Thus, using the cosine value as the correlation coefficient between the sixth data group a and the sixth data group B, the degree of correlation between the sixth data group a and the sixth data group B can be accurately represented.
In addition, in an optional implementation manner, keyword extraction may be performed on each sixth data group, and then semantic extraction is performed on the extracted keywords to obtain corresponding semantic vectors. And performing correlation calculation on the semantic vector and the field vector of the service field corresponding to the training, and determining that the correlation between the sixth data set and the service field corresponding to the training is strong when the correlation number obtained by calculation is greater than a preset threshold value. Based on this, a sixth data group in which the number of correlations is larger than a preset threshold value is set as the seventh data group, and specifically, the threshold value of the correlation coefficient may be 0.8.
Meanwhile, in the present embodiment, the Information Value (IV) of each of the seventh data groups may be calculated as the characteristic importance of each of the seventh data groups. Specifically, the IV value may be represented by the formula (vii):
Figure BDA0003272273390000141
wherein, PykAfter each seventh data group is subjected to binning, the data which do not meet the preset conditions in the kth bin account for the total data of the kth bin; pckRepresenting the proportion of the data meeting the preset condition in the kth box to the total data of the kth box; WOEkThe corresponding evidence weight (weight of evidence) of the k box; f represents the number of boxes obtained after each seventh data group is subjected to binning.
In the present embodiment, WOEkCan be expressed by the formula (c):
Figure BDA0003272273390000142
506: at least one second data set is determined from the at least one seventh data set on the basis of the at least one characteristic degree of importance.
In this embodiment, the feature importance degree corresponding to each of the at least one second data group is greater than the third threshold. Specifically, the third threshold may be 0.02.
205: and inputting at least one second data set into the deep learning model for training to obtain a sequencing model.
In this embodiment, the deep learning model may be an integrated model of Logistic Regression (LR) + gradient boosting iterative decision tree Regression (GBR).
206: and obtaining a to-be-recommended entrusting, and determining a first clustering cluster in at least one clustering cluster according to the to-be-recommended entrusting.
In this embodiment, the similarity between the history delegation corresponding to the first cluster and the delegation to be recommended is the highest. Specifically, the delegation to be recommended may include: entrusted business data and customer data, wherein entrusted business data is data identifying the bad asset to be disposed of, and customer data is data identifying an owner of the bad asset to be disposed of.
Based on this, in the present embodiment, a first similarity between the business data and the entrusted business data of each of the at least one cluster may be determined first, and a second similarity between the character data and the client data of each cluster may be determined. And then, acquiring a weight group according to the service type of the service data of each cluster, and performing weighted summation on the first similarity and the second similarity according to the weight group to obtain the similarity between the entrusts to be recommended and each cluster. And finally, taking the cluster with the highest similarity as a first cluster in at least one cluster.
207: and acquiring a lawyer data group corresponding to the first cluster.
208: inputting the lawyer data group and the to-be-recommended entrusts into the sequencing model to obtain target lawyer data, and recommending the target lawyer data to the requesters of the to-be-recommended entrusts.
In summary, in the deep learning-based recommendation method provided by the invention, a modeling sample set is formed by acquiring a plurality of historical commission data of the bad asset commission data which is recorded once and completely. Then, clustering at least one historical entrusted data contained in the modeling sample set based on the modeling sample set, and dividing the at least one historical entrusted data into N clustering clusters, wherein the clustering clusters contain the pairing relation between the entrusted business in the clusters and lawyers. And then, performing data retrieval according to each historical entrusted data, acquiring retrieval data related to each historical entrusted data to supplement each historical entrusted data, preprocessing each supplemented historical entrusted data, obtaining a training sample for training a sequencing model, and training the deep learning model. And then, comparing the real-time entrusts to be recommended with the N clustering clusters, confirming the clustering clusters with similar service characteristics of the real-time entrusts to be recommended in the N clustering clusters, and acquiring lawyer data groups corresponding to the clustering clusters. And finally, inputting the lawyer data group and the to-be-recommended entrusts into the sequencing model to obtain target lawyer data, and recommending the target lawyer data to the requesters of the to-be-recommended entrusts. Therefore, historical poor assets are divided into a plurality of asset groups through a clustering algorithm, lawyer groups are associated, accurate division of poor assets is achieved, and the intelligent recommended recall strategy is achieved. And then aiming at lawyers in different asset groups, the recovery rate in the commission period is predicted, fine ranking is performed according to the recovery effect, the calculation complexity can be effectively reduced, the recommendation efficiency and accuracy are greatly improved, and the fine management level of collection and disposal of the bad assets is improved.
Referring to fig. 6, fig. 6 is a block diagram illustrating functional modules of a deep learning based recommendation device according to an embodiment of the present disclosure. As shown in fig. 6, the deep learning based recommendation apparatus 600 includes:
the modeling system comprises an acquisition module 601, a storage module and a processing module, wherein the acquisition module 601 is used for acquiring a modeling sample set, the modeling sample set comprises at least one history delegation data, and each history delegation data in the at least one history delegation data is used for recording one-time complete bad asset delegation data;
a clustering module 602, configured to perform clustering processing on at least one historical delegation data to obtain at least one cluster;
a preprocessing module 603, configured to perform data retrieval according to each historical delegation data to obtain at least one first data group, where the at least one first data group corresponds to the at least one historical delegation data in a one-to-one manner, and perform preprocessing on each first data group in the at least one first data group to obtain at least one second data group, where the at least one second data group corresponds to the at least one first data group in a one-to-one manner;
a training module 604, configured to input at least one second data set into the deep learning model for training, so as to obtain a ranking model;
and the recommending module 605 is configured to obtain the delegation to be recommended, determine a first cluster in the at least one cluster according to the delegation to be recommended, wherein the similarity between the historical delegation corresponding to the first cluster and the delegation to be recommended is highest, obtain a lawyer data group corresponding to the first cluster, input the lawyer data group and the delegation to be recommended into the ranking model, obtain target lawyer data, and recommend the target lawyer data to the requester of the delegation to be recommended.
In an embodiment of the present invention, in terms of preprocessing each first data set in the at least one first data set to obtain at least one second data set, the preprocessing module 603 is specifically configured to:
performing feature extraction on each first data group in at least one first data group to obtain at least one group of feature tags, wherein the at least one group of feature tags are in one-to-one correspondence with the at least one first data group, and each group of feature tags in the at least one group of feature tags is used for declaring the features of the assets or the abilities of lawyers in the historical entrusted data corresponding to each first data group;
performing data expansion on at least one first data group according to at least one group of feature tags to obtain at least one third data group, wherein the at least one third data group corresponds to the at least one first data group in a one-to-one manner;
and performing data cleaning on each third data group in the at least one third data group to obtain at least one second data group, wherein the at least one second data group is in one-to-one correspondence with the at least one third data group.
In an embodiment of the invention, each set of at least one set of feature tags comprises: based on the client portrait tag, the asset portrait tag, and the lawyer portrait tag, in terms of performing data expansion on at least one first data group according to at least one group of feature tags to obtain at least one third data group, the preprocessing module 603 is specifically configured to:
inputting the client portrait label and the asset portrait label corresponding to each first data group into a client relationship management model for data derivation to obtain at least one derived client data, wherein the at least one derived service data corresponds to the at least one first data group one to one;
inputting the lawyer portrait label and the asset portrait label corresponding to each first data group into a client relationship management model for data derivation to obtain at least one derived lawyer data, wherein the at least one derived lawyer data corresponds to the at least one first data group one to one;
and combining each first data group, the derivative business data corresponding to each first data group and the derivative attorney data corresponding to each first data group to obtain at least one third data group.
In an embodiment of the present invention, in terms of performing data cleansing on each of the at least one third data group to obtain the at least one second data group, the preprocessing module 603 is specifically configured to:
screening at least one fourth data group from the at least one third data group according to a preset completion rule, and performing completion to obtain at least one fifth data group, wherein the at least one fifth data group is in one-to-one correspondence with the at least one fourth data group;
for each fifth data group in the at least one fifth data group, respectively determining the scrambling code rate of each fifth data group;
respectively carrying out discrete processing on each fifth data group, and determining the number of code values obtained after each fifth data group is discrete;
determining at least one sixth data group in at least one fifth data group according to the scrambling code rate of each fifth data group and the number of code values obtained after each fifth data group is dispersed, wherein the scrambling code rate corresponding to each sixth data group in the at least one sixth data group is greater than a first threshold value, or the number of code values obtained after each sixth data group is dispersed is greater than a second threshold value;
screening at least one seventh data group from the at least one sixth data group according to a preset importance rule to perform importance calculation to obtain at least one feature importance, wherein the at least one feature importance is in one-to-one correspondence with the at least one seventh data group;
and determining at least one second data group in the at least one seventh data group according to the at least one characteristic importance, wherein the characteristic importance corresponding to each second data group in the at least one second data group is greater than a third threshold value.
In an embodiment of the present invention, in terms of screening out at least one fourth data set from at least one third data set according to a preset completion rule to perform completion to obtain at least one fifth data set, the preprocessing module 603 is specifically configured to:
for each third data group in the at least one third data group, determining the missing rate of each third data group according to the data type of each third data group;
determining at least one fourth data group in the at least one third data group according to the missing rate of each third data group, wherein the missing rate corresponding to each fourth data group in the at least one fourth data group is smaller than a fourth threshold value;
and respectively according to the data type of each fourth data group, obtaining a completion method corresponding to the data type to complete each fourth data group, and obtaining at least one fifth data group.
In an embodiment of the present invention, in terms of obtaining at least one feature importance degree by screening at least one seventh data group from at least one sixth data group according to a preset importance degree rule and performing importance degree calculation, the preprocessing module 603 is specifically configured to:
calculating a correlation coefficient between any two different sixth data sets of the at least one sixth data set;
screening at least one seventh data group from the at least one sixth data group according to a correlation coefficient between any two different sixth data groups of the at least one sixth data group, wherein the correlation coefficient between any two different seventh data groups of the at least one seventh data group is greater than a fifth threshold value;
and calculating the information value of each seventh data group in the at least one seventh data group to obtain at least one characteristic importance degree.
In an embodiment of the present invention, the delegation to be recommended includes: entrusted business data and customer data, wherein entrusted business data is data identifying the undesirable asset to be disposed of, and customer data is data identifying an owner of the undesirable asset to be disposed of. Based on this, in obtaining the delegation to be recommended, and determining a first cluster in the at least one cluster according to the delegation to be recommended, the recommending module 605 is specifically configured to:
determining a first similarity between the service data and the entrusted service data of each cluster in at least one cluster;
determining a second similarity between the character data and the client data of each cluster;
acquiring a weight group according to the service type of the service data of each cluster, and performing weighted summation on the first similarity and the second similarity according to the weight group to obtain the similarity between the entrusts to be recommended and each cluster;
and in at least one cluster, the cluster with the highest similarity is taken as a first cluster.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 7, the electronic device 700 includes a transceiver 701, a processor 702, and a memory 703. Connected to each other by a bus 704. The memory 703 is used to store computer programs and data, and may transfer the data stored in the memory 703 to the processor 702.
The processor 702 is configured to read the computer program in the memory 703 to perform the following operations:
obtaining a modeling sample set, wherein the modeling sample set comprises at least one piece of historical delegation data, and each piece of historical delegation data in the at least one piece of historical delegation data is used for recording one-time complete bad asset delegation data;
clustering at least one historical entrusted data to obtain at least one cluster;
performing data retrieval according to each historical entrusted data to obtain at least one first data group, wherein the at least one first data group corresponds to the at least one historical entrusted data one to one;
preprocessing each first data group in at least one first data group to obtain at least one second data group, wherein the at least one second data group is in one-to-one correspondence with the at least one first data group;
inputting at least one second data set into the deep learning model for training to obtain a sequencing model;
obtaining a to-be-recommended entrusting, and determining a first clustering cluster in at least one clustering cluster according to the to-be-recommended entrusting, wherein the similarity between a history entrusting corresponding to the first clustering cluster and the to-be-recommended entrusting is highest;
acquiring a lawyer data group corresponding to the first clustering cluster;
inputting the lawyer data group and the to-be-recommended entrusts into the sequencing model to obtain target lawyer data, and recommending the target lawyer data to the requesters of the to-be-recommended entrusts.
In an embodiment of the present invention, in terms of preprocessing each first data set of the at least one first data set to obtain at least one second data set, the processor 702 is specifically configured to:
performing feature extraction on each first data group in at least one first data group to obtain at least one group of feature tags, wherein the at least one group of feature tags are in one-to-one correspondence with the at least one first data group, and each group of feature tags in the at least one group of feature tags is used for declaring the features of the assets or the abilities of lawyers in the historical entrusted data corresponding to each first data group;
performing data expansion on at least one first data group according to at least one group of feature tags to obtain at least one third data group, wherein the at least one third data group corresponds to the at least one first data group in a one-to-one manner;
and performing data cleaning on each third data group in the at least one third data group to obtain at least one second data group, wherein the at least one second data group is in one-to-one correspondence with the at least one third data group.
In an embodiment of the invention, each set of at least one set of feature tags comprises: a client portrait tag, an asset portrait tag, and a lawyer portrait tag, based on which, in data augmenting the at least one first data set according to the at least one set of feature tags to obtain at least one third data set, the processor 702 is specifically configured to:
inputting the client portrait label and the asset portrait label corresponding to each first data group into a client relationship management model for data derivation to obtain at least one derived client data, wherein the at least one derived service data corresponds to the at least one first data group one to one;
inputting the lawyer portrait label and the asset portrait label corresponding to each first data group into a client relationship management model for data derivation to obtain at least one derived lawyer data, wherein the at least one derived lawyer data corresponds to the at least one first data group one to one;
and combining each first data group, the derivative business data corresponding to each first data group and the derivative attorney data corresponding to each first data group to obtain at least one third data group.
In an embodiment of the present invention, in terms of performing data cleansing on each of the at least one third data group to obtain at least one second data group, the processor 702 is specifically configured to:
screening at least one fourth data group from the at least one third data group according to a preset completion rule, and performing completion to obtain at least one fifth data group, wherein the at least one fifth data group is in one-to-one correspondence with the at least one fourth data group;
for each fifth data group in the at least one fifth data group, respectively determining the scrambling code rate of each fifth data group;
respectively carrying out discrete processing on each fifth data group, and determining the number of code values obtained after each fifth data group is discrete;
determining at least one sixth data group in at least one fifth data group according to the scrambling code rate of each fifth data group and the number of code values obtained after each fifth data group is dispersed, wherein the scrambling code rate corresponding to each sixth data group in the at least one sixth data group is greater than a first threshold value, or the number of code values obtained after each sixth data group is dispersed is greater than a second threshold value;
screening at least one seventh data group from the at least one sixth data group according to a preset importance rule to perform importance calculation to obtain at least one feature importance, wherein the at least one feature importance is in one-to-one correspondence with the at least one seventh data group;
and determining at least one second data group in the at least one seventh data group according to the at least one characteristic importance, wherein the characteristic importance corresponding to each second data group in the at least one second data group is greater than a third threshold value.
In an embodiment of the present invention, in terms of screening out at least one fourth data set from the at least one third data set for completion according to a preset completion rule to obtain at least one fifth data set, the processor 702 is specifically configured to perform the following operations:
for each third data group in the at least one third data group, determining the missing rate of each third data group according to the data type of each third data group;
determining at least one fourth data group in the at least one third data group according to the missing rate of each third data group, wherein the missing rate corresponding to each fourth data group in the at least one fourth data group is smaller than a fourth threshold value;
and respectively according to the data type of each fourth data group, obtaining a completion method corresponding to the data type to complete each fourth data group, and obtaining at least one fifth data group.
In an embodiment of the present invention, in terms of screening out at least one seventh data set from the at least one sixth data set according to a preset importance rule to perform importance calculation, so as to obtain at least one feature importance, the processor 702 is specifically configured to perform the following operations:
calculating a correlation coefficient between any two different sixth data sets of the at least one sixth data set;
screening at least one seventh data group from the at least one sixth data group according to a correlation coefficient between any two different sixth data groups of the at least one sixth data group, wherein the correlation coefficient between any two different seventh data groups of the at least one seventh data group is greater than a fifth threshold value;
and calculating the information value of each seventh data group in the at least one seventh data group to obtain at least one characteristic importance degree.
In an embodiment of the present invention, the delegation to be recommended includes: entrusted business data and customer data, wherein entrusted business data is data identifying the undesirable asset to be disposed of, and customer data is data identifying an owner of the undesirable asset to be disposed of. Based on this, in terms of obtaining the delegation to be recommended and determining a first cluster in the at least one cluster according to the delegation to be recommended, the processor 702 is specifically configured to perform the following operations:
determining a first similarity between the service data and the entrusted service data of each cluster in at least one cluster;
determining a second similarity between the character data and the client data of each cluster;
acquiring a weight group according to the service type of the service data of each cluster, and performing weighted summation on the first similarity and the second similarity according to the weight group to obtain the similarity between the entrusts to be recommended and each cluster;
and in at least one cluster, the cluster with the highest similarity is taken as a first cluster.
It should be understood that the deep learning based recommendation device in the present application may include a smart Phone (e.g., an Android Phone, an iOS Phone, a Windows Phone, etc.), a tablet computer, a palm computer, a notebook computer, a Mobile Internet device MID (Mobile Internet Devices, abbreviated as MID), a robot, a wearable device, etc. The deep learning based recommendation device is only an example, not an exhaustive list, and includes but is not limited to the deep learning based recommendation device. In practical applications, the deep learning based recommendation apparatus may further include: intelligent vehicle-mounted terminal, computer equipment and the like.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention can be implemented by combining software and a hardware platform. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments or some parts of the embodiments.
Accordingly, the present application also provides a computer readable storage medium, which stores a computer program, wherein the computer program is executed by a processor to implement part or all of the steps of any one of the deep learning based recommendation methods as described in the above method embodiments. For example, the storage medium may include a hard disk, a floppy disk, an optical disk, a magnetic tape, a magnetic disk, a flash memory, and the like.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any one of the deep learning based recommendation methods as set forth in the above method embodiments.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are all alternative embodiments and that the acts and modules referred to are not necessarily required by the application.
In the above embodiments, the description of each embodiment has its own emphasis, and for parts not described in detail in a certain embodiment, reference may be made to the description of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is merely a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit can be realized in a hardware form, and can also be realized in a software program module form.
The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be essentially implemented or a part of or all of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, and the memory may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the methods and their core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A deep learning based recommendation method, the method comprising:
obtaining a modeling sample set, wherein the modeling sample set comprises at least one historical delegation data, and each historical delegation data in the at least one historical delegation data is used for recording one-time complete bad asset delegation data;
clustering the at least one historical entrusted data to obtain at least one cluster;
performing data retrieval according to each historical entrusted data to obtain at least one first data group, wherein the at least one first data group is in one-to-one correspondence with the at least one historical entrusted data;
preprocessing each first data group in the at least one first data group to obtain at least one second data group, wherein the at least one second data group is in one-to-one correspondence with the at least one first data group;
inputting the at least one second data set into a deep learning model for training to obtain a sequencing model;
obtaining a to-be-recommended entrusting, and determining a first clustering cluster in the at least one clustering cluster according to the to-be-recommended entrusting, wherein the similarity between a history entrusting corresponding to the first clustering cluster and the to-be-recommended entrusting is highest;
acquiring a lawyer data group corresponding to the first clustering cluster;
and inputting the lawyer data group and the to-be-recommended entrusts into the sequencing model to obtain target lawyer data, and recommending the target lawyer data to the requesters of the to-be-recommended entrusts.
2. The method of claim 1, wherein the preprocessing each of the at least one first data group to obtain at least one second data group comprises:
performing feature extraction on each first data group in the at least one first data group to obtain at least one group of feature tags, wherein the at least one group of feature tags are in one-to-one correspondence with the at least one first data group, and each group of feature tags in the at least one group of feature tags is used for declaring the features of the assets or the abilities of lawyers in the historical entrusted data corresponding to each first data group;
performing data expansion on the at least one first data group according to the at least one group of feature tags to obtain at least one third data group, wherein the at least one third data group is in one-to-one correspondence with the at least one first data group;
and performing data cleaning on each third data group in the at least one third data group to obtain the at least one second data group, wherein the at least one second data group is in one-to-one correspondence with the at least one third data group.
3. The method of claim 2,
each of the at least one set of feature tags comprises: a client portrait tag, an asset portrait tag, and a lawyer portrait tag;
performing data expansion on the at least one first data group according to the at least one group of feature tags to obtain at least one third data group, including:
inputting the client portrait label and the asset portrait label corresponding to each first data group into a client relationship management model for data derivation to obtain at least one derived client data, wherein the at least one derived service data is in one-to-one correspondence with the at least one first data group;
inputting the lawyer portrait label and the asset portrait label corresponding to each first data group into a client relationship management model for data derivation to obtain at least one derived lawyer data, wherein the at least one derived lawyer data is in one-to-one correspondence with the at least one first data group;
and combining each first data group, the derived business data corresponding to each first data group and the derived attorney data corresponding to each first data group to obtain at least one third data group.
4. The method according to claim 2 or 3, wherein the performing data cleansing on each of the at least one third data group to obtain the at least one second data group comprises:
screening at least one fourth data group from the at least one third data group according to a preset completion rule to perform completion to obtain at least one fifth data group, wherein the at least one fifth data group is in one-to-one correspondence with the at least one fourth data group;
for each fifth data group in the at least one fifth data group, respectively determining a scrambling code rate of each fifth data group;
respectively carrying out discrete processing on each fifth data group, and determining the number of code values obtained after each fifth data group is discrete;
determining at least one sixth data group in the at least one fifth data group according to the scrambling code rate of each fifth data group and the number of code values obtained after each fifth data group is dispersed, wherein the scrambling code rate corresponding to each sixth data group in the at least one sixth data group is greater than a first threshold value, or the number of code values obtained after each sixth data group is dispersed is greater than a second threshold value;
screening at least one seventh data group from the at least one sixth data group according to a preset importance rule to perform importance calculation to obtain at least one feature importance, wherein the at least one feature importance is in one-to-one correspondence with the at least one seventh data group;
and determining the at least one second data group in the at least one seventh data group according to the at least one characteristic importance, wherein the characteristic importance corresponding to each second data group in the at least one second data group is greater than a third threshold value.
5. The method according to claim 4, wherein the screening out at least one fourth data group from the at least one third data group for completion according to a preset completion rule to obtain at least one fifth data group comprises:
for each third data group in the at least one third data group, determining the missing rate of each third data group according to the data type of each third data group;
determining at least one fourth data group in the at least one third data group according to the missing rate of each third data group, wherein the missing rate of each fourth data group in the at least one fourth data group is smaller than a fourth threshold value;
and respectively according to the data type of each fourth data group, obtaining a completion method corresponding to the data type to complete each fourth data group, and obtaining the at least one fifth data group.
6. The method according to claim 4, wherein the step of screening at least one seventh data set from the at least one sixth data set according to a preset importance rule to perform importance calculation to obtain at least one feature importance comprises:
calculating a correlation coefficient between any two different sixth data sets of the at least one sixth data set;
screening the at least one seventh data group from the at least one sixth data group according to a correlation coefficient between any two different sixth data groups of the at least one sixth data group, wherein the correlation coefficient between any two different seventh data groups of the at least one seventh data group is greater than a fifth threshold;
and calculating the information value of each seventh data group in the at least one seventh data group to obtain the at least one characteristic importance degree.
7. The method according to any one of claims 1 to 6,
the entrusting to be recommended comprises the following steps: entrusted business data and customer data, wherein the entrusted business data is data for identifying the bad assets to be disposed, and the customer data is data for identifying owners of the bad assets to be disposed;
the obtaining of the delegation to be recommended and the determining of a first cluster in the at least one cluster according to the delegation to be recommended include:
determining a first similarity between the service data of each cluster in the at least one cluster and the entrusted service data;
determining a second similarity between the character data and the customer data of each cluster;
acquiring a weight group according to the service type of the service data of each cluster, and performing weighted summation on the first similarity and the second similarity according to the weight group to obtain the similarity between the entrusts to be recommended and each cluster;
and in the at least one cluster, taking the cluster with the highest similarity as the first cluster.
8. A deep learning based recommendation device, the device comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a modeling sample set, the modeling sample set comprises at least one piece of historical delegation data, and each piece of historical delegation data in the at least one piece of historical delegation data is used for recording one-time complete bad asset delegation data;
the clustering module is used for clustering the at least one historical entrusted data to obtain at least one clustering cluster;
the preprocessing module is used for performing data retrieval according to each historical entrusted data to obtain at least one first data group, wherein the at least one first data group is in one-to-one correspondence with the at least one historical entrusted data, and preprocessing each first data group in the at least one first data group to obtain at least one second data group, wherein the at least one second data group is in one-to-one correspondence with the at least one first data group;
the training module is used for inputting the at least one second data set into a deep learning model for training to obtain a sequencing model;
and the recommending module is used for acquiring a to-be-recommended commission, determining a first clustering cluster in the at least one clustering cluster according to the to-be-recommended commission, wherein the similarity between the historical commission corresponding to the first clustering cluster and the to-be-recommended commission is highest, acquiring a lawyer data group corresponding to the first clustering cluster, inputting the lawyer data group and the to-be-recommended commission into the sequencing model, acquiring target lawyer data, and recommending the target lawyer data to a requester of the to-be-recommended commission.
9. An electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the one or more programs including instructions for performing the steps in the method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method according to any one of claims 1-7.
CN202111110476.7A 2021-09-22 2021-09-22 Recommendation method and device based on deep learning, electronic equipment and storage medium Active CN113806641B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111110476.7A CN113806641B (en) 2021-09-22 2021-09-22 Recommendation method and device based on deep learning, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111110476.7A CN113806641B (en) 2021-09-22 2021-09-22 Recommendation method and device based on deep learning, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113806641A true CN113806641A (en) 2021-12-17
CN113806641B CN113806641B (en) 2023-09-26

Family

ID=78896219

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111110476.7A Active CN113806641B (en) 2021-09-22 2021-09-22 Recommendation method and device based on deep learning, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113806641B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117874633A (en) * 2024-03-13 2024-04-12 金祺创(北京)技术有限公司 Network data asset portrayal generation method and device based on deep learning algorithm

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164764A (en) * 2011-12-14 2013-06-19 杭州律动信息技术有限公司 Device combining case detail consultation function and lawyer entrustment function
CN103164762A (en) * 2011-12-14 2013-06-19 上海乾图信息技术有限公司 Lawyer commission device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117874633A (en) * 2024-03-13 2024-04-12 金祺创(北京)技术有限公司 Network data asset portrayal generation method and device based on deep learning algorithm
CN117874633B (en) * 2024-03-13 2024-05-28 金祺创(北京)技术有限公司 Network data asset portrayal generation method and device based on deep learning algorithm

Also Published As

Publication number Publication date
CN113806641B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
Chen et al. Selecting critical features for data classification based on machine learning methods
Liu et al. Artificial intelligence in the 21st century
Qayyum et al. Medical image retrieval using deep convolutional neural network
Qi et al. An effective and efficient hierarchical K-means clustering algorithm
Onan et al. An improved ant algorithm with LDA-based representation for text document clustering
Jan et al. Ensemble approach for developing a smart heart disease prediction system using classification algorithms
Hung et al. Customer segmentation using hierarchical agglomerative clustering
Phoong et al. The bibliometric analysis on finite mixture model
Mohammadi et al. Exploring research trends in big data across disciplines: A text mining analysis
CN110188357B (en) Industry identification method and device for objects
Ahmad et al. SiNC: Saliency-injected neural codes for representation and efficient retrieval of medical radiographs
JP4906900B2 (en) Image search apparatus, image search method and program
Ji et al. Fuzzy DEA-based classifier and its applications in healthcare management
Angadi et al. Multimodal sentiment analysis using reliefF feature selection and random forest classifier
Ribeiro et al. Supporting content-based image retrieval and computer-aided diagnosis systems with association rule-based techniques
US20210397905A1 (en) Classification system
Spanier et al. A new method for the automatic retrieval of medical cases based on the RadLex ontology
CN113806641B (en) Recommendation method and device based on deep learning, electronic equipment and storage medium
CN111581296B (en) Data correlation analysis method and device, computer system and readable storage medium
CN108846142A (en) A kind of Text Clustering Method, device, equipment and readable storage medium storing program for executing
Nayak et al. Non-linear cellular automata based edge detector for optical character images
Yasmin et al. Improving crowdsourcing-based image classification through expanded input elicitation and machine learning
Rodrigues et al. Evaluation of deep learning techniques on a novel hierarchical surgical tool dataset
CN113724017A (en) Pricing method and device based on neural network, electronic equipment and storage medium
JP6026036B1 (en) DATA ANALYSIS SYSTEM, ITS CONTROL METHOD, PROGRAM, AND RECORDING MEDIUM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant