CN117251445B - Deep learning-based CRM data screening method, system and medium - Google Patents

Deep learning-based CRM data screening method, system and medium Download PDF

Info

Publication number
CN117251445B
CN117251445B CN202311310723.7A CN202311310723A CN117251445B CN 117251445 B CN117251445 B CN 117251445B CN 202311310723 A CN202311310723 A CN 202311310723A CN 117251445 B CN117251445 B CN 117251445B
Authority
CN
China
Prior art keywords
data
standard
interference
weight
checking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311310723.7A
Other languages
Chinese (zh)
Other versions
CN117251445A (en
Inventor
郭伟
王闽东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Jinyuan Biaoju Technology Co ltd
Original Assignee
Hangzhou Jinyuan Biaoju Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Jinyuan Biaoju Technology Co ltd filed Critical Hangzhou Jinyuan Biaoju Technology Co ltd
Priority to CN202311310723.7A priority Critical patent/CN117251445B/en
Publication of CN117251445A publication Critical patent/CN117251445A/en
Application granted granted Critical
Publication of CN117251445B publication Critical patent/CN117251445B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Quality & Reliability (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a CRM data searching and rescreening method, a system and a medium based on deep learning, which relate to the technical field of data searching and rescreening and comprise the following steps: acquiring multiple groups of CRM data which are subjected to weight checking in the system, and acquiring multiple groups of CRM data in multiple groups of Internet by using a web crawler; establishing a basic duplication checking model, and putting standard historical data and interference historical data into the basic duplication checking model to perform duplication checking treatment, wherein the establishment basis of the basic duplication checking model is a text duplication checking model; the basic weight checking model is improved based on the weight checking result of the basic weight checking model; the invention is used for solving the problem that the prior art lacks improvement of comprehensively checking multiple data types when checking the CRM data, which can lead to the cancellation of orders caused by repeated repetition when the same customer issues multiple orders.

Description

Deep learning-based CRM data screening method, system and medium
Technical Field
The invention relates to the technical field of data re-screening, in particular to a CRM data re-screening method, a system and a medium based on deep learning.
Background
CRM data refers to various customer-related data recorded and managed in a customer relationship management system (CRM). Such data may include customer base information (e.g., name, phone, email, address, etc.), purchase history, contacts, communication records, customer feedback, sales opportunities, marketing campaigns, etc., which are collected and analyzed by the CRM system, which allows businesses to better understand customer needs and behavior, improve customer satisfaction and loyalty, and thus increase sales and market share.
The prior art is generally an improvement on CRM data processing, for example, in chinese patent with publication No. CN106203810a, a CRM data processing method based on a cloud platform is disclosed, which effectively controls the number of services by meeting the core requirement of the enterprise concerned business related ontology, improves the service discovery efficiency, is beneficial to realizing the individuation of enterprise services, and other improvements on CRM data are generally an improvement on checking the basis information of CRM data, and the improvement on checking the basis information of CRM data is lack of an improvement on comprehensively checking multiple data types when checking the CRM data, which can lead to that when the same customer issues multiple orders, the repeated orders are mistakenly considered to be cancelled, so that the prior CRM data checking needs to be improved.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention aims to provide a deep learning-based CRM data duplicate checking screening method, a deep learning-based CRM data duplicate checking screening system and a deep learning-based CRM data duplicate checking medium, which are used for solving the problem that in the prior art, when the CRM data is duplicate checked, the comprehensive duplicate checking improvement on a plurality of data types is lacking, and the problem that when the same customer sells a plurality of orders, the repeated orders are mistakenly considered to be cancelled is caused.
In order to achieve the above object, the present invention provides a CRM data screening method based on deep learning, including:
Multiple sets of CRM data which are passed through duplicate checking in the system are acquired and marked as standard historical data, and multiple sets of CRM data in multiple sets of Internet are acquired by using a web crawler and marked as interference historical data;
establishing a basic duplication checking model, and putting standard historical data and interference historical data into the basic duplication checking model to perform duplication checking treatment, wherein the establishment basis of the basic duplication checking model is a text duplication checking model;
And (3) improving the basic weight checking model based on the weight checking result of the basic weight checking model, and marking the improved basic weight checking model as a CRM weight checking model.
Further, obtaining multiple sets of CRM data passing through check in the system, which is marked as standard historical data, using a web crawler to obtain multiple sets of CRM data in multiple sets of internet, which is marked as interference historical data, and the method comprises the following sub-steps:
acquiring multiple groups of CRM data which pass the check in the system, and recording the CRM data as standard historical data 1 to standard historical data N1;
Obtaining multiple groups of CRM data in the Internet by using a web crawler, and recording the CRM data as interference historical data 1 to interference historical data N2;
Wherein, the standard historical data and the interference historical data have the same data type, and all the data types are recorded as data type 1 to data type M.
Further, a basic duplication checking model is established, and standard historical data and interference historical data are put into the basic duplication checking model for duplication checking processing, which comprises the following sub-steps:
establishing a basic duplication checking model, wherein the basis of the basic duplication checking model is a text duplication checking model;
for any one of the standard historical data 1 to the standard historical data N1, a standard duplication checking method is used for checking duplication of the standard historical data, and standard historical data except for the standard historical data checked duplication in the standard historical data 1 to the standard historical data N1 is recorded as a standard duplication checking library;
Recording standard historical data after duplicate checking by using a standard duplicate checking method as standard duplicate checking data;
Recording the interference historical data 1 to the interference historical data N2 as an interference check database, and checking the standard check data by using an interference check method, wherein any one of the interference historical data 1 to the interference historical data N2 is replaced by the standard historical data;
And recording all standard check data after check by using the interference check method as calibration check data 1 to calibration check data N1.
Further, the standard duplicate checking method is as follows:
acquiring data types 1 to M of standard historical data, and marking the data types as standard types 1 to M;
For any one standard type from the standard type 1 to the standard type M, acquiring the position of the standard type in standard historical data, and recording the position as a position to be checked;
recording the data types of the to-be-checked heavy positions of all standard historical data in the standard checking heavy database as the to-be-checked heavy database;
Performing duplication checking processing on the standard type of the duplication checking position and the content corresponding to the standard type by using the basic duplication checking model, wherein a database used during duplication checking is a database to be checked;
The number of data types in the database to be checked is marked as K1, wherein K1=N1-1;
when the basic duplication checking model starts duplication checking processing, adding 1 to the value of K2 whenever one data type and the content corresponding to the data type are completely equal to the standard type and the content corresponding to the standard type of the duplication checking position in the database to be checked, wherein K2 is a positive integer and is initially 0, and the value of K2 is completely equal to the text of the data type which is the same as the text of the standard type and the content corresponding to the data type is the same as the text of the content corresponding to the standard type;
the value of dividing K2 by K1 is recorded as the standard weight corresponding to the standard type;
and acquiring all standard weights corresponding to the standard types 1 to M of the standard historical data, and recording the standard weights as the standard weights 1 to M.
Further, the interference duplication checking method is as follows:
obtaining standard weight 1 to standard weight M of standard check weight data;
for any standard type corresponding to the standard weight, marking the position in the standard check weight data of the standard type as the position to be interfered;
Acquiring data types of positions to be interfered in the interference historical data 1 to the interference historical data N2, and marking the data types as positioning interference types 1 to N2;
marking the positioning interference type 1 to the positioning interference type N2 as a positioning interference database;
Performing duplication checking processing by using a basic duplication checking model standard type and contents corresponding to the standard type, wherein a database used for the duplication checking processing is a positioning interference database, and adding 1 to a value of K3 when one positioning interference type and the contents corresponding to the positioning interference type are completely equal to the standard type and the contents corresponding to the standard type in the positioning interference database, wherein K3 is a positive integer and is initially 0;
The value of K3 divided by N2 is recorded as interference weight;
and obtaining the interference weights corresponding to all standard types of the standard check weight data, and recording the interference weights as interference weights 1 to interference weights M.
Further, the basic weight checking model is improved based on the weight checking result of the basic weight checking model, the improved basic weight checking model is marked as a CRM weight checking model, and the method comprises the following steps:
Obtaining a calibration passing rate from the calibration check data 1 to the calibration check data N1 by using a calibration judgment method, and improving a basic check model when the calibration passing rate is smaller than or equal to a standard passing rate;
And when the calibration passing rate is greater than the standard passing rate, marking the basic check-up model as a CRM check-up model, and performing check-up screening on the CRM data by using the CRM check-up model.
Further, the calibration judgment method comprises:
For any one of the calibration check data 1 to the calibration check data N1, obtaining standard weight 1 to standard weight M, interference weight 1 and interference weight M of the calibration check data;
Comparing the standard weight 1 to the standard weight M with the interference weight 1 to the interference weight M one by one, and when the interference weight is smaller than or equal to the standard weight, the interference weight is equal to the first interference number or the interference weight is equal to the second interference number, marking the interference weight as a failed weight;
When the interference weight is greater than the standard weight, marking the interference weight as a passed weight;
Acquiring the number of failed weights, marking the number as M1, and marking the calibration check weight data as the pass check weight data when the number of M1 is smaller than or equal to a third interference number;
when M1 is larger than the third interference number, the calibration check data is recorded as failed check data;
the number of passing check weight data is recorded as M2, and the value obtained by dividing M2 by N1 is recorded as the calibration passing rate.
Further, the improvement of the basic check-up model comprises the following steps:
When the basic check-up model judges whether the two groups of data are completely equal, randomly marking one group of data as replaceable data and the other group of data as fixed data;
Performing Chinese word segmentation processing on the replaceable data, and acquiring words with the first word segmentation quantity with the highest approximation degree corresponding to each Chinese word segmentation in a similar word lexicon, and recording the words as similar words of the Chinese word segmentation;
the Chinese word segmentation of the second word segmentation number in the replaceable data is replaced by the similar word of the Chinese word segmentation, and then the similar word is compared with the fixed data, and when the characters of the replaceable data are identical to those of the fixed data, the characters are recorded as being completely equal to each other;
wherein, when the basic check-up model is improved for the first time, the second word number is 1 and then the second word number is added by 1 each time the basic check-up model is improved.
In a second aspect, the invention also provides a CRM data duplicate checking and screening system based on deep learning, which comprises a data acquisition module, a duplicate checking module and an improvement application module;
The data acquisition module is used for acquiring multiple groups of CRM data which pass through check in the system, marking the CRM data as standard historical data, and acquiring multiple groups of CRM data in multiple groups of Internet by using a web crawler, and marking the CRM data as interference historical data;
The establishment of the basis weight checking model is a text weight checking model;
the improvement application module is used for improving the basic weight checking model based on the weight checking result of the basic weight checking model, and the improved basic weight checking model is marked as a CRM weight checking model.
In a third aspect, a storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the method as described above.
The invention has the beneficial effects that: according to the invention, multiple groups of CRM data in multiple groups of Internet are acquired by acquiring multiple groups of CRM data which pass check in the system and using a web crawler; then, a basic weight checking model is established, standard historical data and interference historical data are put into the basic weight checking model for weight checking treatment, and the method has the advantages that the basic weight checking model can be initially trained through multiple groups of checked CRM data, standard weights are obtained, the basic weight checking model can be further trained through multiple groups of CRM data acquired by a web crawler, interference weights are obtained, training conditions of the basic weight checking model can be obtained based on the standard weights and the interference weights, and the training of the basic weight checking model is facilitated;
The invention finally improves the basic weight checking model based on the weight checking result of the basic weight checking model, and marks the improved basic weight checking model as the CRM weight checking model.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
FIG. 1 is a schematic block diagram of a deep learning-based CRM data screening system;
FIG. 2 is a flow chart of the steps of a deep learning-based CRM data screening method of the present invention;
FIG. 3 is a flow chart of the processing of standard history data according to the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention.
Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
Example 1
Referring to fig. 1, the invention provides a CRM data duplication searching and screening system based on deep learning, which comprises a data acquisition module, a duplication searching establishment module and an improvement application module;
the data acquisition module is used for acquiring multiple groups of CRM data which pass through check in the system, marking the CRM data as standard historical data, and acquiring multiple groups of CRM data in multiple groups of Internet by using a web crawler, and marking the CRM data as interference historical data.
The data acquisition module is configured with a CRM data acquisition policy comprising:
acquiring multiple groups of CRM data which pass the check in the system, and recording the CRM data as standard historical data 1 to standard historical data N1;
In the specific implementation process, the conventional check weight direction can be primarily judged by acquiring standard historical data, for example, if the check weight rate in the standard historical data is 5% -10%, the CRM data with the check weight rate less than 5% and greater than 10% can be recorded as failed data;
The basic weight checking model can be initially trained by acquiring the interference historical data, the weight checking results of the standard historical data and the interference historical data are compared through the basic weight checking model, whether the weight checking efficiency of the basic weight checking model is further optimized or not can be judged, the computing load of the basic weight checking model is improved while the weight checking capability is improved when the basic weight checking model is optimized every time, and the computing load of the basic weight checking model at the moment is the minimum load for achieving the target weight checking efficiency when the efficiency of the basic weight checking model just can effectively check the CRM data is reached, so the basic weight checking model at the moment can be marked as the CRM weight checking model and put into use;
Obtaining multiple groups of CRM data in the Internet by using a web crawler, and recording the CRM data as interference historical data 1 to interference historical data N2;
The standard historical data and the interference historical data have the same data type, and all the data types are recorded as data types 1 to M;
The establishment of the basis weight checking model is a text weight checking model;
In the specific implementation process, the text weight checking model can judge whether the two sections of characters are repeated or not by judging whether the two sections of characters are identical, the text weight checking model is a weight checking model with smaller operation load and lower weight checking capability, and when the basic weight checking model is improved subsequently, the operation load of the basic weight checking model is improved while the weight checking capability of the basic weight checking model is improved.
The establishment check module is configured with a check model establishment strategy, and the check model establishment strategy comprises the following steps:
establishing a basic duplication checking model, wherein the basis of the basic duplication checking model is a text duplication checking model;
Referring to fig. 3, for any one of the standard history data 1 to the standard history data N1, a standard duplication checking method is used to check duplication of the standard history data, and standard history data other than the standard history data checked duplication in the standard history data 1 to the standard history data N1 is recorded as a standard duplication checking library;
In the specific implementation process, the data in the standard weight checking library are all historical data of passing weight checking, so that the weight checking based on the standard weight checking library is the weight checking data in the most standard condition, and when the weight checking weight obtained in other data is greater than the weight checking weight in the most standard condition on the premise of the weight checking data in the most standard condition, the weight checking capability of the basic weight checking model at the moment is higher and the basic weight checking model can be put into use;
Recording standard historical data after duplicate checking by using a standard duplicate checking method as standard duplicate checking data;
Recording the interference historical data 1 to the interference historical data N2 as an interference check database, and checking the standard check data by using an interference check method, wherein any one of the interference historical data 1 to the interference historical data N2 is replaced by the standard historical data;
in the specific implementation process, the standard historical data is added into the interference historical data, so that the interference weight is always not 0, and in the invention, the check weight results obtained by the data of the same type in the standard check database and the interference check database are equal under ideal conditions, and the data in the interference check database is disordered, so that the result obtained by checking the basic check model based on the interference check database is slightly higher than the result obtained based on the standard check database, and after the standard historical data is added into the interference check database, the obtained interference weight in the ideal condition is higher than the standard weight, so that the condition that the standard weight is equal to the interference weight can be marked as not passing when the standard check model is checked;
all standard check data after check by using an interference check method are recorded as calibration check data 1 to calibration check data N1;
The standard duplicate checking method comprises the following steps: acquiring data types 1 to M of standard historical data, and marking the data types as standard types 1 to M;
and for any one standard type from the standard type 1 to the standard type M, acquiring the position of the standard type in the standard historical data, and recording the position as a position to be checked.
The standard duplicate checking method also comprises the following steps:
recording the data types of the to-be-checked heavy positions of all standard historical data in the standard checking heavy database as the to-be-checked heavy database;
Performing duplication checking processing on the standard type of the duplication checking position and the content corresponding to the standard type by using the basic duplication checking model, wherein a database used during duplication checking is a database to be checked;
The number of data types in the database to be checked is marked as K1, wherein K1=N1-1;
when the basic duplication checking model starts duplication checking processing, adding 1 to the value of K2 whenever one data type and the content corresponding to the data type are completely equal to the standard type and the content corresponding to the standard type of the duplication checking position in the database to be checked, wherein K2 is a positive integer and is initially 0, and the value of K2 is completely equal to the text of the data type which is the same as the text of the standard type and the content corresponding to the data type is the same as the text of the content corresponding to the standard type;
In the implementation process, specific examples of the complete equality are: the data type is transaction condition, the content corresponding to the data type is transaction completion but adds one example, the standard type is transaction condition, and the standard type is transaction completion but adds one example; at this time, the data type and the content corresponding to the data type can be considered to be completely equal to the standard type and the content corresponding to the standard type;
The standard duplicate checking method also comprises the following steps:
the value of dividing K2 by K1 is recorded as the standard weight corresponding to the standard type;
and acquiring all standard weights corresponding to the standard types 1 to M of the standard historical data, and recording the standard weights as the standard weights 1 to M.
The interference duplication checking method comprises the following steps:
obtaining standard weight 1 to standard weight M of standard check weight data;
for any standard type corresponding to the standard weight, marking the position in the standard check weight data of the standard type as the position to be interfered;
Acquiring data types of positions to be interfered in the interference historical data 1 to the interference historical data N2, and marking the data types as positioning interference types 1 to N2;
marking the positioning interference type 1 to the positioning interference type N2 as a positioning interference database;
Performing duplication checking processing by using a basic duplication checking model standard type and contents corresponding to the standard type, wherein a database used for the duplication checking processing is a positioning interference database, and adding 1 to a value of K3 when one positioning interference type and the contents corresponding to the positioning interference type are completely equal to the standard type and the contents corresponding to the standard type in the positioning interference database, wherein K3 is a positive integer and is initially 0;
The value of K3 divided by N2 is recorded as interference weight;
Obtaining interference weights corresponding to all standard types of the standard check weight data, and marking the interference weights as interference weights 1 to M;
The improvement application module is used for improving the basic weight checking model based on the weight checking result of the basic weight checking model, and marking the improved basic weight checking model as a CRM weight checking model;
the improvement application module comprises the steps of obtaining a calibration passing rate by using a calibration judgment strategy on the calibration check data 1 to the calibration check data N1, and improving the basic check model when the calibration passing rate is smaller than or equal to a standard passing rate;
when the calibration passing rate is larger than the standard passing rate, the basic check-up model is marked as a CRM check-up model, and the CRM check-up model is used for checking up and screening CRM data;
in the specific implementation process, the standard passing rate is set to be 0.8, and when the calibration passing rate is obtained through the calibration judgment strategy and is 0.85, the calibration passing rate is larger than the standard passing rate, and the basic check weighing model can be marked as the CRM check weighing model.
The improved application module is configured with a calibration decision strategy comprising:
For any one of the calibration check data 1 to the calibration check data N1, obtaining standard weight 1 to standard weight M, interference weight 1 and interference weight M of the calibration check data;
Comparing the standard weight 1 to the standard weight M with the interference weight 1 to the interference weight M one by one, and when the interference weight is smaller than or equal to the standard weight, the interference weight is equal to the first interference number or the interference weight is equal to the second interference number, marking the interference weight as a failed weight;
In the specific implementation process, the first interference number is set to 0, the second interference number is set to 1, and because standard historical data is added in the interference check database, the interference weight cannot be 0, and the data are more, so that all data cannot be consistent, and therefore when the interference weight is equal to the first interference number or the interference weight is equal to the second interference number, the interference weight is marked as failed weight;
When the interference weight is greater than the standard weight, marking the interference weight as a passed weight;
Acquiring the number of failed weights, marking the number as M1, and marking the calibration check weight data as the pass check weight data when the number of M1 is smaller than or equal to a third interference number;
when M1 is larger than the third interference number, the calibration check data is recorded as failed check data;
In the specific implementation process, the third interference number is 5% of M, when the number of M1 is less than or equal to 5% of the total number of interference weights, which indicates that the calibration check data is basically checked and completed, the calibration check data can be recorded as passing check data, for example, when the value of M is 100 and the value of M1 is 4, at this time, M1 is less than 5, so that the calibration check data is recorded as passing check data;
Acquiring the number of passing check weight data, marking the number as M2, and marking the value of dividing M2 by N1 as a calibration passing rate;
in the specific implementation process, when the value of N1 is 1000 and the number of the acquired passing check weight data is 200, 0.2 is recorded as the calibration passing rate.
Example 2
Referring to fig. 2, the invention further provides a CRM data searching and rescreening method based on deep learning, which includes:
Step S1, acquiring multiple groups of CRM data which pass check in the system, marking the CRM data as standard historical data, and acquiring multiple groups of CRM data in multiple groups of Internet by using a web crawler, and marking the CRM data as interference historical data;
s2, a basic duplication checking model is established, standard historical data and interference historical data are put into the basic duplication checking model to be subjected to duplication checking treatment, wherein the establishment basis of the basic duplication checking model is a text duplication checking model;
And S3, improving the basic weight checking model based on the weight checking result of the basic weight checking model, and marking the improved basic weight checking model as a CRM weight checking model.
Step S1 comprises the following sub-steps:
Step S101, a plurality of groups of CRM data which pass the check in the system are obtained and recorded as standard historical data 1 to standard historical data N1;
Step S102, using a web crawler to acquire multiple groups of CRM data in the Internet, and recording the CRM data as interference historical data 1 to interference historical data N2;
Wherein, the standard historical data and the interference historical data have the same data type, and all the data types are recorded as data type 1 to data type M.
Step S2 comprises the following sub-steps:
Step S201, a basic duplication checking model is established, and the basis of the basic duplication checking model is a text duplication checking model;
step S202, for any one of the standard historical data 1 to the standard historical data N1, checking the standard historical data by using a standard checking method, and recording the standard historical data except the checked standard historical data in the standard historical data 1 to the standard historical data N1 as a standard checking database;
Recording standard historical data after duplicate checking by using a standard duplicate checking method as standard duplicate checking data;
Step S203, marking the interference historical data 1 to the interference historical data N2 as an interference check database, and checking the standard check data by using an interference check method, wherein any one of the interference historical data 1 to the interference historical data N2 is replaced by the standard historical data;
And recording all standard check data after check by using the interference check method as calibration check data 1 to calibration check data N1.
The standard duplicate checking method comprises the following steps:
acquiring data types 1 to M of standard historical data, and marking the data types as standard types 1 to M;
and for any one standard type from the standard type 1 to the standard type M, acquiring the position of the standard type in the standard historical data, and recording the position as a position to be checked.
The standard duplicate checking method also comprises the following steps:
recording the data types of the to-be-checked heavy positions of all standard historical data in the standard checking heavy database as the to-be-checked heavy database;
Performing duplication checking processing on the standard type of the duplication checking position and the content corresponding to the standard type by using the basic duplication checking model, wherein a database used during duplication checking is a database to be checked;
The number of data types in the database to be checked is marked as K1, wherein K1=N1-1;
when the basic duplication checking model starts duplication checking processing, adding 1 to the value of K2 whenever one data type and the content corresponding to the data type are completely equal to the standard type and the content corresponding to the standard type of the duplication checking position in the database to be checked, wherein K2 is a positive integer and is initially 0, and the value of K2 is completely equal to the text of the data type identical to the text of the standard type and the content corresponding to the data type identical to the text of the content corresponding to the standard type.
The standard duplicate checking method also comprises the following steps:
the value of dividing K2 by K1 is recorded as the standard weight corresponding to the standard type;
and acquiring all standard weights corresponding to the standard types 1 to M of the standard historical data, and recording the standard weights as the standard weights 1 to M.
The interference duplication checking method comprises the following steps:
obtaining standard weight 1 to standard weight M of standard check weight data;
for any standard type corresponding to the standard weight, marking the position in the standard check weight data of the standard type as the position to be interfered;
Acquiring data types of positions to be interfered in the interference historical data 1 to the interference historical data N2, and marking the data types as positioning interference types 1 to N2;
marking the positioning interference type 1 to the positioning interference type N2 as a positioning interference database;
Performing duplication checking processing by using a basic duplication checking model standard type and contents corresponding to the standard type, wherein a database used for the duplication checking processing is a positioning interference database, and adding 1 to a value of K3 when one positioning interference type and the contents corresponding to the positioning interference type are completely equal to the standard type and the contents corresponding to the standard type in the positioning interference database, wherein K3 is a positive integer and is initially 0;
The value of K3 divided by N2 is recorded as interference weight;
and obtaining the interference weights corresponding to all standard types of the standard check weight data, and recording the interference weights as interference weights 1 to interference weights M.
The step S3 comprises the following steps:
Obtaining a calibration passing rate from the calibration check data 1 to the calibration check data N1 by using a calibration judgment method, and improving a basic check model when the calibration passing rate is smaller than or equal to a standard passing rate;
And when the calibration passing rate is greater than the standard passing rate, marking the basic check-up model as a CRM check-up model, and performing check-up screening on the CRM data by using the CRM check-up model.
The calibration judgment method comprises the following steps:
For any one of the calibration check data 1 to the calibration check data N1, obtaining standard weight 1 to standard weight M, interference weight 1 and interference weight M of the calibration check data;
Comparing the standard weight 1 to the standard weight M with the interference weight 1 to the interference weight M one by one, and when the interference weight is smaller than or equal to the standard weight, the interference weight is equal to the first interference number or the interference weight is equal to the second interference number, marking the interference weight as a failed weight;
When the interference weight is greater than the standard weight, marking the interference weight as a passed weight;
Acquiring the number of failed weights, marking the number as M1, and marking the calibration check weight data as the pass check weight data when the number of M1 is smaller than or equal to a third interference number;
when M1 is larger than the third interference number, the calibration check data is recorded as failed check data;
the number of passing check weight data is recorded as M2, and the value obtained by dividing M2 by N1 is recorded as the calibration passing rate.
The improvement of the basic check model comprises the following steps:
When the basic check-up model judges whether the two groups of data are completely equal, randomly marking one group of data as replaceable data and the other group of data as fixed data;
Performing Chinese word segmentation processing on the replaceable data, and acquiring words with the first word segmentation quantity with the highest approximation degree corresponding to each Chinese word segmentation in a similar word lexicon, and recording the words as similar words of the Chinese word segmentation;
the Chinese word segmentation of the second word segmentation number in the replaceable data is replaced by the similar word of the Chinese word segmentation, and then the similar word is compared with the fixed data, and when the characters of the replaceable data are identical to those of the fixed data, the characters are recorded as being completely equal to each other;
wherein, when the basic check-up model is improved for the first time, the second word number is 1 and then the second word number is added by 1 each time the basic check-up model is improved.
Example 3
The present invention provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method as described above. By the above technical solution, the computer program, when executed by the processor, performs the method in any of the alternative implementations of the above embodiments to implement the following functions: according to the invention, multiple groups of CRM data in multiple groups of Internet are acquired by acquiring multiple groups of CRM data which pass check in the system and using a web crawler; and then, establishing a basic weight checking model, putting the standard historical data and the interference historical data into the basic weight checking model for weight checking treatment, and finally, improving the basic weight checking model based on the weight checking result of the basic weight checking model, and marking the improved basic weight checking model as a CRM weight checking model.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein. The storage medium may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as static random access Memory (Static Random Access Memory, SRAM), electrically erasable Programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), erasable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
The above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (5)

1. The CRM data screening method based on deep learning is characterized by comprising the following steps of:
Multiple sets of CRM data which are passed through duplicate checking in the system are acquired and marked as standard historical data, and multiple sets of CRM data in multiple sets of Internet are acquired by using a web crawler and marked as interference historical data;
establishing a basic duplication checking model, and putting standard historical data and interference historical data into the basic duplication checking model to perform duplication checking treatment, wherein the establishment basis of the basic duplication checking model is a text duplication checking model;
The basic weight checking model is improved based on the weight checking result of the basic weight checking model, and the improved basic weight checking model is recorded as a CRM weight checking model;
obtaining multiple sets of CRM data which pass the check-up in the system and are marked as standard historical data, obtaining multiple sets of CRM data in multiple sets of Internet by using a web crawler and marking as interference historical data, wherein the steps comprise:
acquiring multiple groups of CRM data which pass the check in the system, and recording the CRM data as standard historical data 1 to standard historical data N1;
Obtaining multiple groups of CRM data in the Internet by using a web crawler, and recording the CRM data as interference historical data 1 to interference historical data N2;
The standard historical data and the interference historical data have the same data type, and all the data types are recorded as data types 1 to M;
The method for establishing the basic duplication checking model, and putting the standard historical data and the interference historical data into the basic duplication checking model for duplication checking comprises the following sub-steps:
establishing a basic duplication checking model, wherein the basis of the basic duplication checking model is a text duplication checking model;
for any one of the standard historical data 1 to the standard historical data N1, a standard duplication checking method is used for checking duplication of the standard historical data, and standard historical data except for the standard historical data checked duplication in the standard historical data 1 to the standard historical data N1 is recorded as a standard duplication checking library;
Recording standard historical data after duplicate checking by using a standard duplicate checking method as standard duplicate checking data;
Recording the interference historical data 1 to the interference historical data N2 as an interference check database, and checking the standard check data by using an interference check method, wherein any one of the interference historical data 1 to the interference historical data N2 is replaced by the standard historical data;
all standard check data after check by using an interference check method are recorded as calibration check data 1 to calibration check data N1;
the standard duplicate checking method comprises the following steps:
acquiring data types 1 to M of standard historical data, and marking the data types as standard types 1 to M;
For any one standard type from the standard type 1 to the standard type M, acquiring the position of the standard type in standard historical data, and recording the position as a position to be checked;
recording the data types of the to-be-checked heavy positions of all standard historical data in the standard checking heavy database as the to-be-checked heavy database;
Performing duplication checking processing on the standard type of the duplication checking position and the content corresponding to the standard type by using the basic duplication checking model, wherein a database used during duplication checking is a database to be checked;
the number of data types in the database to be checked is marked as K1, wherein K1=N1-1;
when the basic duplication checking model starts duplication checking processing, adding 1 to the value of K2 whenever one data type and the content corresponding to the data type are completely equal to the standard type and the content corresponding to the standard type of the duplication checking position in the database to be checked, wherein K2 is a positive integer and is initially 0, and the value of K2 is completely equal to the text of the data type which is the same as the text of the standard type and the content corresponding to the data type is the same as the text of the content corresponding to the standard type;
the value of dividing K2 by K1 is recorded as the standard weight corresponding to the standard type;
Acquiring all standard weights corresponding to the standard types 1 to M of the standard historical data, and marking the standard weights as the standard weights 1 to M;
the interference duplication checking method comprises the following steps:
obtaining standard weight 1 to standard weight M of standard check weight data;
for any standard type corresponding to the standard weight, marking the position in the standard check weight data of the standard type as the position to be interfered;
Acquiring data types of positions to be interfered in the interference historical data 1 to the interference historical data N2, and marking the data types as positioning interference types 1 to N2;
marking the positioning interference type 1 to the positioning interference type N2 as a positioning interference database;
Performing duplication checking processing by using a basic duplication checking model standard type and contents corresponding to the standard type, wherein a database used for the duplication checking processing is a positioning interference database, and adding 1 to a value of K3 when one positioning interference type and the contents corresponding to the positioning interference type are completely equal to the standard type and the contents corresponding to the standard type in the positioning interference database, wherein K3 is a positive integer and is initially 0;
The value of K3 divided by N2 is recorded as interference weight;
and obtaining the interference weights corresponding to all standard types of the standard check weight data, and recording the interference weights as interference weights 1 to interference weights M.
2. The deep learning-based CRM data duplication screening method of claim 1, wherein the improvement of the basic duplication checking model based on the duplication checking result of the basic duplication checking model, and the marking of the improved basic duplication checking model as the CRM duplication checking model comprises:
Obtaining a calibration passing rate from the calibration check data 1 to the calibration check data N1 by using a calibration judgment method, and improving a basic check model when the calibration passing rate is smaller than or equal to a standard passing rate;
when the calibration passing rate is larger than the standard passing rate, the basic check-up model is marked as a CRM check-up model, and the CRM check-up model is used for checking up and screening CRM data;
the calibration judgment method comprises the following steps:
For any one of the calibration check data 1 to the calibration check data N1, obtaining standard weight 1 to standard weight M, interference weight 1 and interference weight M of the calibration check data;
Comparing the standard weight 1 to the standard weight M with the interference weight 1 to the interference weight M one by one, and when the interference weight is smaller than or equal to the standard weight, the interference weight is equal to the first interference number or the interference weight is equal to the second interference number, marking the interference weight as a failed weight;
When the interference weight is greater than the standard weight, marking the interference weight as a passed weight;
Acquiring the number of failed weights, marking the number as M1, and marking the calibration check weight data as the pass check weight data when the number of M1 is smaller than or equal to a third interference number;
when M1 is larger than the third interference number, the calibration check data is recorded as failed check data;
the number of passing check weight data is recorded as M2, and the value obtained by dividing M2 by N1 is recorded as the calibration passing rate.
3. The deep learning-based CRM data screening method according to claim 2, wherein the improvement of the basic screening model comprises:
When the basic check-up model judges whether the two groups of data are completely equal, randomly marking one group of data as replaceable data and the other group of data as fixed data;
Performing Chinese word segmentation processing on the replaceable data, and acquiring words with the first word segmentation quantity with the highest approximation degree corresponding to each Chinese word segmentation in a similar word lexicon, and recording the words as similar words of the Chinese word segmentation;
the Chinese word segmentation of the second word segmentation number in the replaceable data is replaced by the similar word of the Chinese word segmentation, and then the similar word is compared with the fixed data, and when the characters of the replaceable data are identical to those of the fixed data, the characters are recorded as being completely equal to each other;
wherein, when the basic check-up model is improved for the first time, the second word number is 1 and then the second word number is added by 1 each time the basic check-up model is improved.
4. A deep learning-based CRM data screening system, which is suitable for the deep learning-based CRM data screening method according to any one of claims 1-3, and is characterized by comprising a data acquisition module, a establishment screening module and an improvement application module;
The data acquisition module is used for acquiring multiple groups of CRM data which pass through check in the system, marking the CRM data as standard historical data, and acquiring multiple groups of CRM data in multiple groups of Internet by using a web crawler, and marking the CRM data as interference historical data;
The establishment of the basis weight checking model is a text weight checking model;
the improvement application module is used for improving the basic weight checking model based on the weight checking result of the basic weight checking model, and the improved basic weight checking model is marked as a CRM weight checking model.
5. A storage medium having stored thereon a computer program which, when executed by a processor, runs the steps in a deep learning based CRM data screening method according to any of claims 1-3.
CN202311310723.7A 2023-10-11 2023-10-11 Deep learning-based CRM data screening method, system and medium Active CN117251445B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311310723.7A CN117251445B (en) 2023-10-11 2023-10-11 Deep learning-based CRM data screening method, system and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311310723.7A CN117251445B (en) 2023-10-11 2023-10-11 Deep learning-based CRM data screening method, system and medium

Publications (2)

Publication Number Publication Date
CN117251445A CN117251445A (en) 2023-12-19
CN117251445B true CN117251445B (en) 2024-06-04

Family

ID=89132870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311310723.7A Active CN117251445B (en) 2023-10-11 2023-10-11 Deep learning-based CRM data screening method, system and medium

Country Status (1)

Country Link
CN (1) CN117251445B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968006A (en) * 2019-12-13 2020-04-07 杭州今元标矩科技有限公司 RPA robot control system and method based on mail system
WO2020119097A1 (en) * 2018-12-13 2020-06-18 平安医疗健康管理股份有限公司 Data standardization processing method and device, and storage medium
CN111539196A (en) * 2020-04-15 2020-08-14 京东方科技集团股份有限公司 Text duplicate checking method and device, text management system and electronic equipment
WO2020228182A1 (en) * 2019-05-15 2020-11-19 平安科技(深圳)有限公司 Big data-based data deduplication method and apparatus, device, and storage medium
CN113011154A (en) * 2021-03-16 2021-06-22 华南理工大学 Job duplicate checking method based on deep learning
CN113901783A (en) * 2021-11-18 2022-01-07 青岛科技大学 Domain-oriented document duplicate checking method and system
CN114490940A (en) * 2022-01-25 2022-05-13 中国人民解放军国防科技大学 Self-adaptive project duplicate checking method and system
CN115936003A (en) * 2022-11-30 2023-04-07 湖南科创信息技术股份有限公司 Software function point duplicate checking method, device, equipment and medium based on neural network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020119097A1 (en) * 2018-12-13 2020-06-18 平安医疗健康管理股份有限公司 Data standardization processing method and device, and storage medium
WO2020228182A1 (en) * 2019-05-15 2020-11-19 平安科技(深圳)有限公司 Big data-based data deduplication method and apparatus, device, and storage medium
CN110968006A (en) * 2019-12-13 2020-04-07 杭州今元标矩科技有限公司 RPA robot control system and method based on mail system
CN111539196A (en) * 2020-04-15 2020-08-14 京东方科技集团股份有限公司 Text duplicate checking method and device, text management system and electronic equipment
CN113011154A (en) * 2021-03-16 2021-06-22 华南理工大学 Job duplicate checking method based on deep learning
CN113901783A (en) * 2021-11-18 2022-01-07 青岛科技大学 Domain-oriented document duplicate checking method and system
CN114490940A (en) * 2022-01-25 2022-05-13 中国人民解放军国防科技大学 Self-adaptive project duplicate checking method and system
CN115936003A (en) * 2022-11-30 2023-04-07 湖南科创信息技术股份有限公司 Software function point duplicate checking method, device, equipment and medium based on neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黑盒威胁模型下深度学习对抗样本的生成;孟东宇;;电子设计工程;20181220(第24期);第170-173页 *

Also Published As

Publication number Publication date
CN117251445A (en) 2023-12-19

Similar Documents

Publication Publication Date Title
US20230259857A1 (en) Systems and methods for vulnerability assessment and remedy identification
US8504408B2 (en) Customer analytics solution for enterprises
US8725586B2 (en) Accounting system and management methods of transaction classifications that is simple, accurate and self-adapting
CN104866484A (en) Data processing method and device
WO2007053630A2 (en) System and method for providing a fraud risk score
US9069904B1 (en) Ranking runs of test scenarios based on number of different organizations executing a transaction
US9104815B1 (en) Ranking runs of test scenarios based on unessential executed test steps
US20170270546A1 (en) Service churn model
Lin et al. Omnichannel facility location and fulfillment optimization
US20130226833A1 (en) Method and System For Generating Compliance Data
US20210089979A1 (en) Analytics system and method for a competitive vulnerability and customer and employee retention
CN117251445B (en) Deep learning-based CRM data screening method, system and medium
US20150106151A1 (en) Systems and Methods for Creating a Maturity Model Based Roadmap and Business Information Framework for Managing Enterprise Business Information
EP3539016B1 (en) Traceability identifier
EP3652686A1 (en) System and method for rendering compliance status dashboard
CN109711984B (en) Pre-loan risk monitoring method and device based on collection urging
US20120253886A1 (en) Systems and Methods for Client Development
US9092579B1 (en) Rating popularity of clusters of runs of test scenarios based on number of different organizations
US20190156262A1 (en) System and method for evaluating a corporate strategy in a data network
US20150095106A1 (en) Customer Relationship Management (CRM) System Having a Rules Engine for Processing Sales Program Rules
CN114693428A (en) Data determination method and device, computer readable storage medium and electronic equipment
WO2014082182A1 (en) Method and system for generating compliance data
CN112231634A (en) Credit limit calculation method, system and equipment based on enterprise information
CN106294366A (en) The methods of exhibiting of bar code temperature and device
Chen et al. Efficiency Sorting Among Foreign Affiliates: Evidence From C hina

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant