CN117251445B

CN117251445B - Deep learning-based CRM data screening method, system and medium

Info

Publication number: CN117251445B
Application number: CN202311310723.7A
Authority: CN
Inventors: 郭伟; 王闽东
Original assignee: Hangzhou Jinyuan Biaoju Technology Co ltd
Current assignee: Hangzhou Jinyuan Biaoju Technology Co ltd
Priority date: 2023-10-11
Filing date: 2023-10-11
Publication date: 2024-06-04
Anticipated expiration: 2043-10-11
Also published as: CN117251445A

Abstract

The invention provides a CRM data searching and rescreening method, a system and a medium based on deep learning, which relate to the technical field of data searching and rescreening and comprise the following steps: acquiring multiple groups of CRM data which are subjected to weight checking in the system, and acquiring multiple groups of CRM data in multiple groups of Internet by using a web crawler; establishing a basic duplication checking model, and putting standard historical data and interference historical data into the basic duplication checking model to perform duplication checking treatment, wherein the establishment basis of the basic duplication checking model is a text duplication checking model; the basic weight checking model is improved based on the weight checking result of the basic weight checking model; the invention is used for solving the problem that the prior art lacks improvement of comprehensively checking multiple data types when checking the CRM data, which can lead to the cancellation of orders caused by repeated repetition when the same customer issues multiple orders.

Description

Deep learning-based CRM data screening method, system and medium

Technical Field

The invention relates to the technical field of data re-screening, in particular to a CRM data re-screening method, a system and a medium based on deep learning.

Background

CRM data refers to various customer-related data recorded and managed in a customer relationship management system (CRM). Such data may include customer base information (e.g., name, phone, email, address, etc.), purchase history, contacts, communication records, customer feedback, sales opportunities, marketing campaigns, etc., which are collected and analyzed by the CRM system, which allows businesses to better understand customer needs and behavior, improve customer satisfaction and loyalty, and thus increase sales and market share.

The prior art is generally an improvement on CRM data processing, for example, in chinese patent with publication No. CN106203810a, a CRM data processing method based on a cloud platform is disclosed, which effectively controls the number of services by meeting the core requirement of the enterprise concerned business related ontology, improves the service discovery efficiency, is beneficial to realizing the individuation of enterprise services, and other improvements on CRM data are generally an improvement on checking the basis information of CRM data, and the improvement on checking the basis information of CRM data is lack of an improvement on comprehensively checking multiple data types when checking the CRM data, which can lead to that when the same customer issues multiple orders, the repeated orders are mistakenly considered to be cancelled, so that the prior CRM data checking needs to be improved.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention aims to provide a deep learning-based CRM data duplicate checking screening method, a deep learning-based CRM data duplicate checking screening system and a deep learning-based CRM data duplicate checking medium, which are used for solving the problem that in the prior art, when the CRM data is duplicate checked, the comprehensive duplicate checking improvement on a plurality of data types is lacking, and the problem that when the same customer sells a plurality of orders, the repeated orders are mistakenly considered to be cancelled is caused.

In order to achieve the above object, the present invention provides a CRM data screening method based on deep learning, including:

Multiple sets of CRM data which are passed through duplicate checking in the system are acquired and marked as standard historical data, and multiple sets of CRM data in multiple sets of Internet are acquired by using a web crawler and marked as interference historical data;

establishing a basic duplication checking model, and putting standard historical data and interference historical data into the basic duplication checking model to perform duplication checking treatment, wherein the establishment basis of the basic duplication checking model is a text duplication checking model;

And (3) improving the basic weight checking model based on the weight checking result of the basic weight checking model, and marking the improved basic weight checking model as a CRM weight checking model.

Further, obtaining multiple sets of CRM data passing through check in the system, which is marked as standard historical data, using a web crawler to obtain multiple sets of CRM data in multiple sets of internet, which is marked as interference historical data, and the method comprises the following sub-steps:

acquiring multiple groups of CRM data which pass the check in the system, and recording the CRM data as standard historical data 1 to standard historical data N1;

Obtaining multiple groups of CRM data in the Internet by using a web crawler, and recording the CRM data as interference historical data 1 to interference historical data N2;

Wherein, the standard historical data and the interference historical data have the same data type, and all the data types are recorded as data type 1 to data type M.

Further, a basic duplication checking model is established, and standard historical data and interference historical data are put into the basic duplication checking model for duplication checking processing, which comprises the following sub-steps:

establishing a basic duplication checking model, wherein the basis of the basic duplication checking model is a text duplication checking model;

for any one of the standard historical data 1 to the standard historical data N1, a standard duplication checking method is used for checking duplication of the standard historical data, and standard historical data except for the standard historical data checked duplication in the standard historical data 1 to the standard historical data N1 is recorded as a standard duplication checking library;

Recording standard historical data after duplicate checking by using a standard duplicate checking method as standard duplicate checking data;

Recording the interference historical data 1 to the interference historical data N2 as an interference check database, and checking the standard check data by using an interference check method, wherein any one of the interference historical data 1 to the interference historical data N2 is replaced by the standard historical data;

And recording all standard check data after check by using the interference check method as calibration check data 1 to calibration check data N1.

Further, the standard duplicate checking method is as follows:

acquiring data types 1 to M of standard historical data, and marking the data types as standard types 1 to M;

For any one standard type from the standard type 1 to the standard type M, acquiring the position of the standard type in standard historical data, and recording the position as a position to be checked;

recording the data types of the to-be-checked heavy positions of all standard historical data in the standard checking heavy database as the to-be-checked heavy database;

Performing duplication checking processing on the standard type of the duplication checking position and the content corresponding to the standard type by using the basic duplication checking model, wherein a database used during duplication checking is a database to be checked;

The number of data types in the database to be checked is marked as K1, wherein K1=N1-1;

when the basic duplication checking model starts duplication checking processing, adding 1 to the value of K2 whenever one data type and the content corresponding to the data type are completely equal to the standard type and the content corresponding to the standard type of the duplication checking position in the database to be checked, wherein K2 is a positive integer and is initially 0, and the value of K2 is completely equal to the text of the data type which is the same as the text of the standard type and the content corresponding to the data type is the same as the text of the content corresponding to the standard type;

the value of dividing K2 by K1 is recorded as the standard weight corresponding to the standard type;

and acquiring all standard weights corresponding to the standard types 1 to M of the standard historical data, and recording the standard weights as the standard weights 1 to M.

Further, the interference duplication checking method is as follows:

obtaining standard weight 1 to standard weight M of standard check weight data;

for any standard type corresponding to the standard weight, marking the position in the standard check weight data of the standard type as the position to be interfered;

Acquiring data types of positions to be interfered in the interference historical data 1 to the interference historical data N2, and marking the data types as positioning interference types 1 to N2;

marking the positioning interference type 1 to the positioning interference type N2 as a positioning interference database;

Performing duplication checking processing by using a basic duplication checking model standard type and contents corresponding to the standard type, wherein a database used for the duplication checking processing is a positioning interference database, and adding 1 to a value of K3 when one positioning interference type and the contents corresponding to the positioning interference type are completely equal to the standard type and the contents corresponding to the standard type in the positioning interference database, wherein K3 is a positive integer and is initially 0;

The value of K3 divided by N2 is recorded as interference weight;

and obtaining the interference weights corresponding to all standard types of the standard check weight data, and recording the interference weights as interference weights 1 to interference weights M.

Further, the basic weight checking model is improved based on the weight checking result of the basic weight checking model, the improved basic weight checking model is marked as a CRM weight checking model, and the method comprises the following steps:

Obtaining a calibration passing rate from the calibration check data 1 to the calibration check data N1 by using a calibration judgment method, and improving a basic check model when the calibration passing rate is smaller than or equal to a standard passing rate;

And when the calibration passing rate is greater than the standard passing rate, marking the basic check-up model as a CRM check-up model, and performing check-up screening on the CRM data by using the CRM check-up model.

Further, the calibration judgment method comprises:

For any one of the calibration check data 1 to the calibration check data N1, obtaining standard weight 1 to standard weight M, interference weight 1 and interference weight M of the calibration check data;

Comparing the standard weight 1 to the standard weight M with the interference weight 1 to the interference weight M one by one, and when the interference weight is smaller than or equal to the standard weight, the interference weight is equal to the first interference number or the interference weight is equal to the second interference number, marking the interference weight as a failed weight;

When the interference weight is greater than the standard weight, marking the interference weight as a passed weight;

Acquiring the number of failed weights, marking the number as M1, and marking the calibration check weight data as the pass check weight data when the number of M1 is smaller than or equal to a third interference number;

when M1 is larger than the third interference number, the calibration check data is recorded as failed check data;

the number of passing check weight data is recorded as M2, and the value obtained by dividing M2 by N1 is recorded as the calibration passing rate.

Further, the improvement of the basic check-up model comprises the following steps:

When the basic check-up model judges whether the two groups of data are completely equal, randomly marking one group of data as replaceable data and the other group of data as fixed data;

Performing Chinese word segmentation processing on the replaceable data, and acquiring words with the first word segmentation quantity with the highest approximation degree corresponding to each Chinese word segmentation in a similar word lexicon, and recording the words as similar words of the Chinese word segmentation;

the Chinese word segmentation of the second word segmentation number in the replaceable data is replaced by the similar word of the Chinese word segmentation, and then the similar word is compared with the fixed data, and when the characters of the replaceable data are identical to those of the fixed data, the characters are recorded as being completely equal to each other;

wherein, when the basic check-up model is improved for the first time, the second word number is 1 and then the second word number is added by 1 each time the basic check-up model is improved.

In a second aspect, the invention also provides a CRM data duplicate checking and screening system based on deep learning, which comprises a data acquisition module, a duplicate checking module and an improvement application module;

The data acquisition module is used for acquiring multiple groups of CRM data which pass through check in the system, marking the CRM data as standard historical data, and acquiring multiple groups of CRM data in multiple groups of Internet by using a web crawler, and marking the CRM data as interference historical data;

The establishment of the basis weight checking model is a text weight checking model;

the improvement application module is used for improving the basic weight checking model based on the weight checking result of the basic weight checking model, and the improved basic weight checking model is marked as a CRM weight checking model.

In a third aspect, a storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the method as described above.

The invention has the beneficial effects that: according to the invention, multiple groups of CRM data in multiple groups of Internet are acquired by acquiring multiple groups of CRM data which pass check in the system and using a web crawler; then, a basic weight checking model is established, standard historical data and interference historical data are put into the basic weight checking model for weight checking treatment, and the method has the advantages that the basic weight checking model can be initially trained through multiple groups of checked CRM data, standard weights are obtained, the basic weight checking model can be further trained through multiple groups of CRM data acquired by a web crawler, interference weights are obtained, training conditions of the basic weight checking model can be obtained based on the standard weights and the interference weights, and the training of the basic weight checking model is facilitated;

The invention finally improves the basic weight checking model based on the weight checking result of the basic weight checking model, and marks the improved basic weight checking model as the CRM weight checking model.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is a schematic block diagram of a deep learning-based CRM data screening system;

FIG. 2 is a flow chart of the steps of a deep learning-based CRM data screening method of the present invention;

FIG. 3 is a flow chart of the processing of standard history data according to the present invention.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention.

Embodiments of the invention and features of the embodiments may be combined with each other without conflict.

Example 1

Referring to fig. 1, the invention provides a CRM data duplication searching and screening system based on deep learning, which comprises a data acquisition module, a duplication searching establishment module and an improvement application module;

the data acquisition module is used for acquiring multiple groups of CRM data which pass through check in the system, marking the CRM data as standard historical data, and acquiring multiple groups of CRM data in multiple groups of Internet by using a web crawler, and marking the CRM data as interference historical data.

The data acquisition module is configured with a CRM data acquisition policy comprising:

In the specific implementation process, the conventional check weight direction can be primarily judged by acquiring standard historical data, for example, if the check weight rate in the standard historical data is 5% -10%, the CRM data with the check weight rate less than 5% and greater than 10% can be recorded as failed data;

The basic weight checking model can be initially trained by acquiring the interference historical data, the weight checking results of the standard historical data and the interference historical data are compared through the basic weight checking model, whether the weight checking efficiency of the basic weight checking model is further optimized or not can be judged, the computing load of the basic weight checking model is improved while the weight checking capability is improved when the basic weight checking model is optimized every time, and the computing load of the basic weight checking model at the moment is the minimum load for achieving the target weight checking efficiency when the efficiency of the basic weight checking model just can effectively check the CRM data is reached, so the basic weight checking model at the moment can be marked as the CRM weight checking model and put into use;

The standard historical data and the interference historical data have the same data type, and all the data types are recorded as data types 1 to M;

In the specific implementation process, the text weight checking model can judge whether the two sections of characters are repeated or not by judging whether the two sections of characters are identical, the text weight checking model is a weight checking model with smaller operation load and lower weight checking capability, and when the basic weight checking model is improved subsequently, the operation load of the basic weight checking model is improved while the weight checking capability of the basic weight checking model is improved.

The establishment check module is configured with a check model establishment strategy, and the check model establishment strategy comprises the following steps:

Referring to fig. 3, for any one of the standard history data 1 to the standard history data N1, a standard duplication checking method is used to check duplication of the standard history data, and standard history data other than the standard history data checked duplication in the standard history data 1 to the standard history data N1 is recorded as a standard duplication checking library;

In the specific implementation process, the data in the standard weight checking library are all historical data of passing weight checking, so that the weight checking based on the standard weight checking library is the weight checking data in the most standard condition, and when the weight checking weight obtained in other data is greater than the weight checking weight in the most standard condition on the premise of the weight checking data in the most standard condition, the weight checking capability of the basic weight checking model at the moment is higher and the basic weight checking model can be put into use;

in the specific implementation process, the standard historical data is added into the interference historical data, so that the interference weight is always not 0, and in the invention, the check weight results obtained by the data of the same type in the standard check database and the interference check database are equal under ideal conditions, and the data in the interference check database is disordered, so that the result obtained by checking the basic check model based on the interference check database is slightly higher than the result obtained based on the standard check database, and after the standard historical data is added into the interference check database, the obtained interference weight in the ideal condition is higher than the standard weight, so that the condition that the standard weight is equal to the interference weight can be marked as not passing when the standard check model is checked;

all standard check data after check by using an interference check method are recorded as calibration check data 1 to calibration check data N1;

The standard duplicate checking method comprises the following steps: acquiring data types 1 to M of standard historical data, and marking the data types as standard types 1 to M;

and for any one standard type from the standard type 1 to the standard type M, acquiring the position of the standard type in the standard historical data, and recording the position as a position to be checked.

The standard duplicate checking method also comprises the following steps:

In the implementation process, specific examples of the complete equality are: the data type is transaction condition, the content corresponding to the data type is transaction completion but adds one example, the standard type is transaction condition, and the standard type is transaction completion but adds one example; at this time, the data type and the content corresponding to the data type can be considered to be completely equal to the standard type and the content corresponding to the standard type;

The standard duplicate checking method also comprises the following steps:

The interference duplication checking method comprises the following steps:

obtaining standard weight 1 to standard weight M of standard check weight data;

The value of K3 divided by N2 is recorded as interference weight;

Obtaining interference weights corresponding to all standard types of the standard check weight data, and marking the interference weights as interference weights 1 to M;

The improvement application module is used for improving the basic weight checking model based on the weight checking result of the basic weight checking model, and marking the improved basic weight checking model as a CRM weight checking model;

the improvement application module comprises the steps of obtaining a calibration passing rate by using a calibration judgment strategy on the calibration check data 1 to the calibration check data N1, and improving the basic check model when the calibration passing rate is smaller than or equal to a standard passing rate;

when the calibration passing rate is larger than the standard passing rate, the basic check-up model is marked as a CRM check-up model, and the CRM check-up model is used for checking up and screening CRM data;

in the specific implementation process, the standard passing rate is set to be 0.8, and when the calibration passing rate is obtained through the calibration judgment strategy and is 0.85, the calibration passing rate is larger than the standard passing rate, and the basic check weighing model can be marked as the CRM check weighing model.

The improved application module is configured with a calibration decision strategy comprising:

In the specific implementation process, the first interference number is set to 0, the second interference number is set to 1, and because standard historical data is added in the interference check database, the interference weight cannot be 0, and the data are more, so that all data cannot be consistent, and therefore when the interference weight is equal to the first interference number or the interference weight is equal to the second interference number, the interference weight is marked as failed weight;

In the specific implementation process, the third interference number is 5% of M, when the number of M1 is less than or equal to 5% of the total number of interference weights, which indicates that the calibration check data is basically checked and completed, the calibration check data can be recorded as passing check data, for example, when the value of M is 100 and the value of M1 is 4, at this time, M1 is less than 5, so that the calibration check data is recorded as passing check data;

Acquiring the number of passing check weight data, marking the number as M2, and marking the value of dividing M2 by N1 as a calibration passing rate;

in the specific implementation process, when the value of N1 is 1000 and the number of the acquired passing check weight data is 200, 0.2 is recorded as the calibration passing rate.

Example 2

Referring to fig. 2, the invention further provides a CRM data searching and rescreening method based on deep learning, which includes:

Step S1, acquiring multiple groups of CRM data which pass check in the system, marking the CRM data as standard historical data, and acquiring multiple groups of CRM data in multiple groups of Internet by using a web crawler, and marking the CRM data as interference historical data;

s2, a basic duplication checking model is established, standard historical data and interference historical data are put into the basic duplication checking model to be subjected to duplication checking treatment, wherein the establishment basis of the basic duplication checking model is a text duplication checking model;

And S3, improving the basic weight checking model based on the weight checking result of the basic weight checking model, and marking the improved basic weight checking model as a CRM weight checking model.

Step S1 comprises the following sub-steps:

Step S101, a plurality of groups of CRM data which pass the check in the system are obtained and recorded as standard historical data 1 to standard historical data N1;

Step S102, using a web crawler to acquire multiple groups of CRM data in the Internet, and recording the CRM data as interference historical data 1 to interference historical data N2;

Step S2 comprises the following sub-steps:

Step S201, a basic duplication checking model is established, and the basis of the basic duplication checking model is a text duplication checking model;

step S202, for any one of the standard historical data 1 to the standard historical data N1, checking the standard historical data by using a standard checking method, and recording the standard historical data except the checked standard historical data in the standard historical data 1 to the standard historical data N1 as a standard checking database;

Step S203, marking the interference historical data 1 to the interference historical data N2 as an interference check database, and checking the standard check data by using an interference check method, wherein any one of the interference historical data 1 to the interference historical data N2 is replaced by the standard historical data;

The standard duplicate checking method comprises the following steps:

The standard duplicate checking method also comprises the following steps:

when the basic duplication checking model starts duplication checking processing, adding 1 to the value of K2 whenever one data type and the content corresponding to the data type are completely equal to the standard type and the content corresponding to the standard type of the duplication checking position in the database to be checked, wherein K2 is a positive integer and is initially 0, and the value of K2 is completely equal to the text of the data type identical to the text of the standard type and the content corresponding to the data type identical to the text of the content corresponding to the standard type.

The standard duplicate checking method also comprises the following steps:

The interference duplication checking method comprises the following steps:

obtaining standard weight 1 to standard weight M of standard check weight data;

The value of K3 divided by N2 is recorded as interference weight;

The step S3 comprises the following steps:

The calibration judgment method comprises the following steps:

The improvement of the basic check model comprises the following steps:

Example 3

The present invention provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method as described above. By the above technical solution, the computer program, when executed by the processor, performs the method in any of the alternative implementations of the above embodiments to implement the following functions: according to the invention, multiple groups of CRM data in multiple groups of Internet are acquired by acquiring multiple groups of CRM data which pass check in the system and using a web crawler; and then, establishing a basic weight checking model, putting the standard historical data and the interference historical data into the basic weight checking model for weight checking treatment, and finally, improving the basic weight checking model based on the weight checking result of the basic weight checking model, and marking the improved basic weight checking model as a CRM weight checking model.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein. The storage medium may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as static random access Memory (Static Random Access Memory, SRAM), electrically erasable Programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), erasable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

The above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The CRM data screening method based on deep learning is characterized by comprising the following steps of:

The basic weight checking model is improved based on the weight checking result of the basic weight checking model, and the improved basic weight checking model is recorded as a CRM weight checking model;

obtaining multiple sets of CRM data which pass the check-up in the system and are marked as standard historical data, obtaining multiple sets of CRM data in multiple sets of Internet by using a web crawler and marking as interference historical data, wherein the steps comprise:

The method for establishing the basic duplication checking model, and putting the standard historical data and the interference historical data into the basic duplication checking model for duplication checking comprises the following sub-steps:

the standard duplicate checking method comprises the following steps:

Acquiring all standard weights corresponding to the standard types 1 to M of the standard historical data, and marking the standard weights as the standard weights 1 to M;

the interference duplication checking method comprises the following steps:

obtaining standard weight 1 to standard weight M of standard check weight data;

The value of K3 divided by N2 is recorded as interference weight;

2. The deep learning-based CRM data duplication screening method of claim 1, wherein the improvement of the basic duplication checking model based on the duplication checking result of the basic duplication checking model, and the marking of the improved basic duplication checking model as the CRM duplication checking model comprises:

the calibration judgment method comprises the following steps:

3. The deep learning-based CRM data screening method according to claim 2, wherein the improvement of the basic screening model comprises:

4. A deep learning-based CRM data screening system, which is suitable for the deep learning-based CRM data screening method according to any one of claims 1-3, and is characterized by comprising a data acquisition module, a establishment screening module and an improvement application module;

5. A storage medium having stored thereon a computer program which, when executed by a processor, runs the steps in a deep learning based CRM data screening method according to any of claims 1-3.