WO2019232952A1 - 名单清洗方法、系统、计算机设备和存储介质 - Google Patents

名单清洗方法、系统、计算机设备和存储介质 Download PDF

Info

Publication number
WO2019232952A1
WO2019232952A1 PCT/CN2018/104298 CN2018104298W WO2019232952A1 WO 2019232952 A1 WO2019232952 A1 WO 2019232952A1 CN 2018104298 W CN2018104298 W CN 2018104298W WO 2019232952 A1 WO2019232952 A1 WO 2019232952A1
Authority
WO
WIPO (PCT)
Prior art keywords
field
customer
information
basic
attribute
Prior art date
Application number
PCT/CN2018/104298
Other languages
English (en)
French (fr)
Inventor
王春辉
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019232952A1 publication Critical patent/WO2019232952A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising

Definitions

  • the present application relates to the technical field of data processing, and in particular, to a method, a system, a computer device, and a storage medium for cleaning a list.
  • Telemarketing refers to the use of telephone operators to attract new and old customers to determine their level of satisfaction or acceptance of orders. In terms of daily order acceptance, it is called telesales. Many customers usually order goods and services over the phone. Direct marketers use all major media to provide direct services to potential customers. In the telemarketing business, the customer list is extremely important. The completeness and accuracy of the data in the customer list directly affects the efficiency of the telemarketing agent and the connection of the agent. And sales success ratio.
  • a list cleaning method including:
  • the first cleaning rule deployed in the memory is called to pre-clean the basic fields in the customer's original information to obtain the customer's basic information, and the pre-cleaning is used to achieve a uniform format of the basic fields in the customer's basic information;
  • the second cleaning rule deployed in the memory is called to format and clean the basic fields in the customer basic information to obtain the customer list information.
  • the format cleaning is used to read, match or convert the basic fields into unified content. Unify the content of customer list information;
  • the third cleaning rule deployed in the memory is called, the attribute field of the original customer information is read, the attribute field is compared with the attribute information in the preset attribute rule table, and the attribute field matching the content in the attribute rule table is converted into After the event type, it is stored in the customer list information.
  • a list cleaning system including:
  • the obtaining unit is configured to obtain an electricity sales list, and store the electricity sales list in a memory, and the electricity sales list contains original customer information from different sources;
  • the pre-cleaning unit is configured to call the first cleaning rule deployed in the memory, and pre-clean the basic fields in the original customer information to obtain the basic customer information.
  • the pre-cleaning is used to achieve a uniform format of the basic fields in the basic customer information.
  • the format cleaning unit is configured to call a second cleaning rule deployed in the memory, and format and clean the basic fields in the customer basic information to obtain customer list information.
  • the format cleaning is used to read the basic fields, Match or transform into unified content to realize the content of customer list information;
  • the conversion unit is set to call the third cleaning rule deployed in the memory, read the attribute field of the original customer information, compare the attribute field with the attribute information in the preset attribute rule table, and match the content in the attribute rule table After converting the attribute field into an activity type, it is stored in the customer list information.
  • a computer device includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor causes the processor to perform the following steps:
  • the first cleaning rule deployed in the memory is called to pre-clean the basic fields in the customer's original information to obtain the customer's basic information, and the pre-cleaning is used to achieve a uniform format of the basic fields in the customer's basic information;
  • the second cleaning rule deployed in the memory is called to format and clean the basic fields in the customer basic information to obtain the customer list information.
  • the format cleaning is used to read, match or convert the basic fields into unified content. Unify the content of customer list information;
  • the third cleaning rule deployed in the memory is called, the attribute field of the original customer information is read, the attribute field is compared with the attribute information in the preset attribute rule table, and the attribute field matching the content in the attribute rule table is converted into After the event type, it is stored in the customer list information.
  • a storage medium storing computer-readable instructions.
  • the one or more processors execute the following steps:
  • the first cleaning rule deployed in the memory is called to pre-clean the basic fields in the customer's original information to obtain the customer's basic information, and the pre-cleaning is used to achieve a uniform format of the basic fields in the customer's basic information;
  • the second cleaning rule deployed in the memory is called to format and clean the basic fields in the customer basic information to obtain the customer list information.
  • the format cleaning is used to read, match or convert the basic fields into unified content. Unify the content of customer list information;
  • the third cleaning rule deployed in the memory is called, the attribute field of the original customer information is read, the attribute field is compared with the attribute information in the preset attribute rule table, and the attribute field matching the content in the attribute rule table is converted into After the event type, it is stored in the customer list information.
  • the above list cleaning method, device, computer equipment and storage medium include obtaining an electricity sales list containing original information of customers from different sources, and storing the electricity sales list in a memory; calling a first cleaning rule deployed in the memory, The basic fields in the information are pre-cleaned to obtain the customer basic information.
  • the pre-cleaning is used to achieve the uniform format of the basic fields in the customer basic information; the second cleaning rule deployed in the memory is called to perform the basic fields in the customer basic information. Formatting and cleaning is performed to obtain customer list information.
  • the formatting and cleaning are used to read, match or convert basic fields to unified content to achieve uniform content of the customer list information; call a third cleaning rule deployed in the memory,
  • the attribute field of the original customer information is read, the attribute field is compared with the attribute information in a preset attribute rule table, and the attribute field matching the content in the attribute rule table is converted into an activity type and stored in the customer list information.
  • This application uses the above-mentioned pre-cleaning, format cleaning, and attribute conversion of the customer's original information to ensure the completeness and accuracy of the final customer list information, which can be used by agents to make targeted calls, improve agent work efficiency, and connect and sell successfully. ratio.
  • FIG. 1 is a flowchart of a list cleaning method according to an embodiment of the present application
  • FIG. 2 is a flowchart of step S3 in FIG. 1;
  • FIG. 3 is a structural diagram of a list cleaning system in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a module of the format cleaning unit in FIG. 3.
  • FIG. 1 is a flowchart of a list cleaning method according to an embodiment of the present application. As shown in FIG. 1, the method includes:
  • Step S1 Obtain an electricity sales list: the electricity sales list is stored in a memory, and the electricity sales list contains customer original information from different sources.
  • a business system can be selected.
  • the business system is preferably a Linux system.
  • the Linux system is an enterprise-oriented system, and the Linux system is a multi-user, multi-tasking, multi-thread and multi-CPU operating system based on POSIX and UNIX. Can run major UNIX tool software, applications and network protocols, users can modify their source code at will.
  • Linux supports multiple users, and each user has its own special rights to its own file device, ensuring that each user does not affect each other.
  • Linux has both a character interface and a graphical interface. In the character interface, users can enter corresponding instructions through the keyboard to operate.
  • an information upload page for uploading can be set in advance on a web front end of the system, and a staff member can access the information upload page through a webpage of a terminal device and upload original customer information through the information upload page. In this way, the purpose of uploading at any place at any time can be achieved.
  • the staff can upload the original customer information through the input field through the information upload interface in the system or by uploading files.
  • the above original customer information is preferably in the format of an excel spreadsheet.
  • the excel spreadsheet includes basic information such as "contact information”, "city”, “document number”, and “gender” fields, and the attributes of the customer's assets, economy, and occupation. information.
  • basic information such as "contact information”, "city”, “document number”, and “gender” fields, and the attributes of the customer's assets, economy, and occupation. information.
  • information capacity is large, uploading is simple, and obtaining and storing the electricity sales list is reliable. It is also very convenient for subsequent reading and conversion of various information.
  • Step S2 pre-cleaning: calling the first cleaning rule deployed in the memory, pre-cleaning the basic fields in the customer's original information to obtain the customer's basic information, and the pre-cleaning is used to achieve a uniform format of the basic fields in the customer's basic information .
  • the basic fields in the customer's original information such as contact information fields, city fields, etc.
  • the same field may There are formatting issues such as the difference between full-width and half-width, differences in English capitalization, spaces, and invalid characters.
  • this step after cleaning these non-uniform formats through the first cleaning rule, the format of the basic customer information is unified.
  • Step S3 format cleaning: calling a second cleaning rule deployed in the memory, formatting and cleaning the basic fields in the customer basic information to obtain customer list information, and the format cleaning is used to read the basic fields, Match or transform into unified content to realize the content of customer list information.
  • the data in the period has content inconsistencies, such as "city”, and the presentation may be content issues such as "Shanghai", “Shanghai”, or "Shanghai”.
  • the second step Cleaning rules, after formatting and cleaning these non-uniform content, realize the uniform content of the final customer list information.
  • Step S4 attribute conversion: calling the third cleaning rule deployed in the memory, reading the attribute field of the original customer information, comparing the attribute field with the attribute information in the preset attribute rule table, and matching the content in the attribute rule table The attribute field is converted into the event type and stored in the customer list information.
  • the customer's original information not only basic information such as "contact information”, "city”, “document number”, “gender” fields, but also attribute information such as the customer's assets, economy, occupation, etc.
  • the specific information is, for example, whether there is a house, a car, a loan, a credit card, and the number of years of the credit card, credit card limit, real estate situation, salary method, occupation type, education, social security provident fund, etc.
  • the above information is also very important for subsequent sales of products.
  • this step presets an attribute rule table in the memory.
  • the three cleaning rules compare the attribute fields in the customer's original information with the attribute information in the attribute rule table to obtain the corresponding activity type, and store the customers corresponding to the activity type in the customer list information.
  • the basic information in the customer's original information is pre-cleaned, formatted, and the attribute information in the customer's original information is converted into the activity type. Finally, a complete, accurate, and high-value customer list is obtained. information.
  • the original customer information in the telemarketing list includes at least one basic field among a contact field, a city field, a document number field, or a gender field.
  • the original customer information in the telemarketing list also includes at least one attribute field in the asset field, the economic field, or the occupation field.
  • the basic fields are some basic information of the customer. These basic fields need to be unified with the cleaning and the formatting, and formatting and cleaning to achieve the same content.
  • the content of the attribute field is more complicated. From the field, it can include special information such as the customer's assets, economy, and occupation. From the specific attributes, it can include whether there is a house, a car, a loan, a credit card, And the number of years of credit card ownership, credit card limit, real estate situation, monthly salary, pay method, occupation type, education, social security provident fund payment status, how long continuous payment.
  • the content of the attribute field directly affects the event type of the subsequent sales product.
  • the attribute field needs to be directly converted to the event type for subsequent agents' reference.
  • step S2 the first cleaning rule deployed in the memory is called to pre-clean the basic fields in the customer's original information by using the following methods: processing full-width turn to half-angle, removing tabs, returning, At least one of the operation of line break, space removal, invalid field clearing or data merging and deduplication is implemented to realize uniform format of the basic field.
  • the operations for clearing invalid fields include: When the basic field in the customer's original information contains the contact field, the contact field that is less than 11 digits and not a number is defined as an invalid field and cleared; the basic field in the customer's original information contains When the city field, the non-Chinese character city field is defined as an invalid field and cleared; when the basic field in the customer's original information contains a document number field, the non-numeric document number field is defined as an invalid field and cleared.
  • the first cleaning rule When specifically configuring the first cleaning rule, you can first perform data combination and deduplication according to one of the keywords in the basic field, such as using the uniqueness principle of the contact method field or the document number field as a keyword for the customer.
  • the original information is merged and deduplicated.
  • the implementation method of deduplication can be realized through SQL language or the tool that comes with excel. Secondly, the original information of each customer is traversed and cleaned in turn.
  • the basic fields of the original information of the customer can be pre-cleaned as much as possible, and the overall integrity of the basic information is solved through the pre-cleaning.
  • step S3 includes the following specific steps:
  • Step S301 cleaning the contact field:
  • the basic field in the customer basic information contains the contact field
  • the contact field determines that the contact field is a number and is not less than 11 digits, and intercept 11 digits from the front as the customer
  • the content of the contact field in the list information On the premise that the format of the basic fields in the customer's basic information is uniform, this step also formats and cleans the contact field.
  • the second cleaning rule applied by it includes the following table 1:
  • Step S302 cleaning the city field:
  • the city field is read, and when the city field is judged to be Chinese characters, it is matched with the preset city base table by exact or fuzzy matching to obtain
  • the city code in the city base table is used as the content of the city field in the customer list information.
  • the city base table is configured in advance.
  • the contents of the city base table include the city name, city short name, city Code, the specific part of the city base table is shown in Table 2 below:
  • the second cleaning rule applied by it includes the following table 3:
  • Table 3 The exact match in Table 3 above refers to matching only if the entire field is the same as the search term.
  • the Chinese character in the city field is used as a fixed phrase to search the content in the city base table. Only when the Chinese character and the city abbreviation or city name in the city base table are completely consistent, the matching is considered successful, and the corresponding city code is obtained.
  • Fuzzy matching refers to the occurrence of a word regardless of its position.
  • the Chinese character in the city field is split into the unit concept of a single Chinese character, and a logical AND operation is performed.
  • a corresponding city code is obtained.
  • Logic and operation can use the relationship of "and", "or”, and "not” according to the actual situation of actual implementation.
  • a combination of precise matching and fuzzy matching is used to unify the content of the city field in the basic customer information to meet the subsequent data collection of customer list information. Automatic allocation can be made according to the city when subsequent agents call sales.
  • Step S303 cleaning the document number field:
  • the basic field in the customer basic information contains the document number field, read the document number field and judge that the document number field is 18 digits in length and the first 17 digits are considered to be the document number field as Identity card, which is converted into the gender field, date of birth field, age field, and stored in the customer list information together with the document number field.
  • the second cleaning rule applied by it includes the following table 4:
  • the resident ID number has uniqueness and coding rules.
  • Its number structure is a set of feature combination codes, which consists of a seventeen-digit body code and a check code. The order of arrangement is from left to right: six-digit address code, eight-digit birth date code, three-digit sequence code, and one-digit check code.
  • the first and second digits represent provinces (autonomous regions, municipalities, and special administrative regions), and the third and fourth digits represent cities (prefecture-level cities, autonomous prefectures, leagues, and municipalities and counties to which the municipalities belong. Code), the fifth and sixth digits indicate counties (city-administered districts, county-level cities, and flags).
  • the seventeenth odd number is assigned to males and the even number is assigned to females.
  • Step S304 cleaning the gender field:
  • the gender field is read, and it is judged that the content in the gender field contains M, male, sir or male, and is not FeMale, the gender field is considered as M is stored in the customer list information. If the content in the gender field is F, female, lady, lady, or FeMale, then the gender field is considered to be stored in the customer list information.
  • the second cleaning rule applied by it includes the following table 5:
  • the city information is obtained through the contact field and converted into a city code as the content of the city field in the customer list information.
  • the contact method is a number with a certain coding rule, such as the 4th to 7th digits in the mobile phone number indicating the place where the number belongs, the customer's city can also be identified through the contact method field.
  • this embodiment reads the number attribution from the contact information field, and after conversion, obtains the city code as the city in the customer list information. Field content.
  • the gender field converted by the content of the ID card is stored in the customer list information.
  • the citizen's ID number is unique and authoritative, when the customer's basic information contains an ID, the 17th odd number is assigned to males and the even number is assigned to women. Has higher priority than the gender information obtained in other fields. This embodiment can increase the data correctness of the gender field, and provide the agent with complete and accurate customer list information.
  • an attribute rule table is preset in the memory, and the attribute rule table lists attribute information including an asset field, an economic field, or an occupation field.
  • the activity type corresponding to each type of attribute information; the third cleaning rule deployed in the memory is called, and after reading the attribute fields of the customer's original information, the attribute fields in the customer's original information are compared with the attribute information in the attribute rule table. For comparison, if the comparison is the same, read the activity type corresponding to this attribute information, convert the attribute field in the customer's original information into this activity type, and store it in the customer list information.
  • the attribute information preset in the attribute rule table includes “car” attribute information, and the corresponding activity type is "SMS" MGM activity ", by calling the third cleaning rule, after reading the attribute fields of the customer's original information, by comparing the attribute fields in the customer's original information with the attribute information in the attribute rule table, read the" SMS MGM activity " Activity type, stored in customer list information.
  • a list cleaning system is proposed, as shown in FIG. 3, and includes the following units:
  • the obtaining unit is configured to obtain the electricity sales list, and the electricity sales list is stored in the memory, and the electricity sales list contains the original information of customers from different sources; the pre-cleaning unit is set to call the first cleaning rule deployed in the memory, and to the customer The basic fields in the original information are pre-cleaned to obtain customer basic information.
  • the pre-cleaning is used to achieve a uniform format of the basic fields in the customer basic information.
  • the cleaning unit is formatted to call a second cleaning rule deployed in the memory.
  • Formatting and cleaning the basic fields in the customer basic information to obtain customer list information the formatting and cleaning is used to read, match or convert the basic fields into unified content to achieve uniform content of the customer list information; a conversion unit, It is set to call the third cleaning rule deployed in the memory, read the attribute field of the original customer information, compare the attribute field with the attribute information in the preset attribute rule table, and match the attribute field with the content in the attribute rule table After conversion to event type, it is stored in the customer list information.
  • the pre-cleaning unit is further configured to handle at least one of full-width to half-width, detab, carriage return, line feed, space, clear invalid fields, or data merge and deduplication to achieve the format of the base field Unite.
  • clearing the invalid field includes: when the basic field in the customer's original information contains the contact field, defining a contact field that is less than 11 digits and not a number as an invalid field and clearing it; the basic field in the customer's original information When the city field is included, the non-Chinese character city field is defined as an invalid field and cleared; when the basic field in the customer's original information contains the document number field, the non-numeric document number field is defined as an invalid field and cleared.
  • the format cleaning unit includes a cleaning contact module, which is configured to read the contact field and determine the contact field when the basic field in the customer basic information includes the contact field. When it is a number and is not less than 11 digits, 11 digits are intercepted from the back as the contact field content in the customer list information.
  • the city cleaning module is set to read when the basic field in the customer basic information contains the city field. City field. When the city field is judged to be a Chinese character, it is matched with a preset city base table using exact or fuzzy matching, and the city code in the city base table is used as the content of the city field in the customer list information.
  • the certificate number module is cleaned.
  • the basic field in the customer basic information includes a document number field
  • read the document number field determine that the document number field is 18 digits in length and the first 17 digits are digits, consider that the document number field is an ID card
  • the code field is stored in the customer list information together.
  • the gender module is cleaned and set to include the gender field in the basic field of the customer basic information.
  • the gender field is read to determine whether the content in the gender field contains M, male, sir, or Male. When it is not FeMale, the gender field is considered to be stored in the customer list information. If the content of the gender field is F, female, lady, lady, or FeMale, the gender field is considered to be stored in the customer list information. .
  • the city information is obtained through the contact field and converted into a city code as the content of the city field in the customer list information; in the customer basic information
  • the basic field of ID contains the ID number field and this ID number field is an ID card
  • the gender field converted from the ID card content is stored in the customer list information.
  • the conversion unit is further configured to preset an attribute rule table in the memory.
  • the attribute rule table lists attribute information including an asset field, an economic field, or a career field, and each attribute information corresponds to Activity type;
  • the conversion unit further includes: a comparison module configured to call a third cleaning rule deployed in the memory, and after reading the attribute field of the customer's original information, the attribute field in the customer's original information is compared with the attribute in the attribute rule table The information is compared. If the comparison is the same, the activity type corresponding to this attribute information is read, and the attribute field in the original customer information is converted into this activity type and stored in the customer list information.
  • a computer device which includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor causes the processor to execute the method for cleaning lists in the foregoing embodiments. Steps.
  • a storage medium storing computer-readable instructions, and when the computer-readable instructions are executed by one or more processors, the one or more processors are caused to execute the method for cleaning a list in the foregoing embodiments. Steps.
  • the storage medium may be a non-volatile storage medium.
  • the program may be stored in a computer-readable storage medium.
  • the storage medium may include: Read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks, etc.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种名单清洗方法、系统、计算机设备和存储介质,涉及数据处理技术领域,其中清洗方法包括:获取含有不同来源的客户原始信息的电销名单;调用第一清洗规则,对客户原始信息中的基础字段进行预清洗得到客户基础信息;调用第二清洗规则,对客户基础信息中的基础字段进行格式化清洗后得到客户名单信息;调用第三清洗规则,读取客户原始信息的属性字段,将与属性规则表中内容匹配的属性字段转换成活动类型后,存储在客户名单信息中。该方法通过将客户原始信息进行预清洗、格式化清洗及属性转换,确保了最终客户名单信息的完整、准确。

Description

名单清洗方法、系统、计算机设备和存储介质
本申请要求于2018年06月04日提交中国专利局、申请号为201810561479.4、发明名称为“名单清洗方法、系统、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,尤其涉及一种名单清洗方法、系统、计算机设备和存储介质。
背景技术
电销业务是指利用电话接线员来吸引新顾客和联系老客户,以确定他们的满意程度或能否接受订单。就日常的接受订单而言,它被称为电话销售(telesales)。许多顾客通常是通过电话方式来订购商品和服务。直接营销者利用一切主要媒体向潜在的客户提供直接服务,而在电销业务中,客户名单极为重要,客户名单中数据的完整性和准确性直接影响了电销坐席的工作效率、坐席接通并销售成功的比率。
目前电销业务的客户名单原始信息来源有多个渠道,比如上载、网销脱落、渠道引流、客户主动呼入等,间接造成名单原始信息的数据完整性及准确性参差不齐,现有的做法是通过人工方式进行筛选,即需要坐席对名单原始信息数据进行额外的筛选和完善工作,工作量较大,且数据准确性存在不确定性,明显影响坐席拨打的效率以及接通并销售成功的比率。
发明内容
有鉴于此,有必要针对现有技术由于客户名单原始信息的数据完整性和准确性参差不齐,需要人工筛选客户名单的缺陷,提供一种名单清洗方法、系统、计算机设备和存储介质。
一种名单清洗方法,包括:
获取电销名单,将电销名单保存在存储器中,电销名单中含有不同来源的客户原始信息;
调用部署在存储器中的第一清洗规则,对客户原始信息中的基础字段进行预清洗得到客户基础信息,所述预清洗用于实现客户基础信息中的基础字段的格式统一;
调用部署在存储器中的第二清洗规则,对客户基础信息中的基础字段进行格式化清洗后得到客户名单信息,所述格式化清洗用于对基础字段的读取、匹配或转换成统一内容,实现客户名单信息的内容统一;
调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段,将属性字段与预设的属性规则表中的属性信息进行比较,将与属性规则表中内容匹配的属性字段转换成活动类型后,存储在客户名单信息中。
一种名单清洗系统,包括:
获取单元,设置为获取电销名单,将电销名单保存在存储器中,电销名单中含有不同来源的客户原始信息;
预清洗单元,设置为调用部署在存储器中的第一清洗规则,对客户原始信息中的基础字段进行预清洗得到客户基础信息,所述预清洗用于实现客户基础信息中的基础字段的格式统一;
格式化清洗单元,设置为调用部署在存储器中的第二清洗规则,对客户基础信息中的基础字段进行格式化清洗后得到客户名单信息,所述格式化清洗用于对基础字段的读取、匹配或转换成统一内容,实现客户名单信息的内容统一;
转换单元,设置为调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段,将属性字段与预设的属性规则表中的属性信息进行比较,将与属性规则表中内容匹配的属性字段转换成活动类型后,存储在客户名单信息中。
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
获取电销名单,将电销名单保存在存储器中,电销名单中含有不同来源的客户原始信息;
调用部署在存储器中的第一清洗规则,对客户原始信息中的基础字段进行预清洗得到客户基础信息,所述预清洗用于实现客户基础信息中的基础字段的格式统一;
调用部署在存储器中的第二清洗规则,对客户基础信息中的基础字段进行格式化清洗后得到客户名单信息,所述格式化清洗用于对基础字段的读取、匹配或转换成统一内容,实现客户名单信息的内容统一;
调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段,将属性字段与预设的属性规则表中的属性信息进行比较,将与属性规则表中内容匹配的属性字段转换成活动类型后,存储在客户名单信息中。
一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
获取电销名单,将电销名单保存在存储器中,电销名单中含有不同来源的客户原始信息;
调用部署在存储器中的第一清洗规则,对客户原始信息中的基础字段进行预清洗得到客户基础信息,所述预清洗用于实现客户基础信息中的基础字段的格式统一;
调用部署在存储器中的第二清洗规则,对客户基础信息中的基础字段进行格式化清洗后得到客户名单信息,所述格式化清洗用于对基础字段的读取、匹配或转换成统一内容,实现客户名单信息的内容统一;
调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段,将属性字段与预设的属性规则表中的属性信息进行比较,将与属性规则表中内容匹配的属性字段转换成活动类型后,存储在客户名单信息中。
上述名单清洗方法、装置、计算机设备和存储介质,包括获取含有不同来源的客户原始信息的电销名单,将电销名单保存在存储器中;调用部署在存储器中的第一清洗规则,对客户原始信息中的基础字段进行预清洗得到客户基础信息,所述预清洗用于实现客户基础信息中的基础字段的格式统一;调用部署在存储器中的第二清洗规则,对客户基础信息中的基础字段进行格式化清洗后得到客户名单信息,所述格式化清洗用于对基础字段的读取、匹配或转换成统 一内容,实现客户名单信息的内容统一;调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段,将属性字段与预设的属性规则表中的属性信息进行比较,将与属性规则表中内容匹配的属性字段转换成活动类型后,存储在客户名单信息中。本申请通过上述将客户原始信息进行预清洗、格式化清洗及属性转换,确保了最终客户名单信息的完整、准确,可供坐席有针对性的拨打,提高坐席工作效率和接通并销售成功的比率。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。
图1为本申请一个实施例中的名单清洗方法的流程图;
图2为图1中步骤S3的流程图;
图3为本申请一个实施例中的名单清洗系统的结构图;
图4为图3中的格式化清洗单元的模块示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。
图1为本申请一个实施例中的名单清洗方法的流程图,如图1所示,包括:
步骤S1,获取电销名单:将电销名单保存在存储器中,电销名单中含有不同来源的客户原始信息。
本实施例可以选择一业务系统,业务系统优选Linux系统,Linux系统是面向企业的系统,且Linux系统是一个基于POSIX和UNIX的多用户、多任务、支持多线程和多CPU的操作系统,它能运行主要的UNIX工具软件、应用程序和网络协议,用户可以任意修改其源代码。Linux支持多用户,各个用户对于自己的文件设备有自己特殊的权利,保证了各用户之间互不影响。Linux同时具有字符界面和图形界面,在字符界面用户可以通过键盘输入相应的指令来进行操作。
基于Linux系统时可以在系统的web前端预先设置供上传的信息上传页面,工作人员可以通过终端设备的网页访问信息上传页面,将各自的客户原始信息通过信息上传页面进行上传。这种方式,能实现异地随时上传的目的。基于Linux系统时还可以在系统中设置信息上传界面,工作人员在本地,通过系统中的信息上传界面通过输入字段的方式,或者上传文件的方式上传客户原始信息。
上述客户原始信息优选采用excel电子表格的格式,excel电子表格中包括“联系方式”、“城市”、“证件号码”、“性别”字段等基础信息,客户的资产、经济、职业等情况的属性信息。excel电子表格的形式,其信息容量大,上传简单,获取和存储电销名单可靠。在后续读取和转换各种信息时,也非常方便。
步骤S2,预清洗:调用部署在存储器中的第一清洗规则,对客户原始信息中的基础字段进行预清洗得到客户基础信息,所述预清洗用于实现客户基础信息中的基础字段的格式统一。客户原始信息中的基础字段,如联系方式字段、城市字段等,在上载、网销脱落、渠道引流或人工录入时,每个客户的基础字段大多没有统一成相同的格式,同一项字段可能会存在全角和半角的区别、英文大小写的区别、空格、无效字符等格式问题。本步骤,通过第一清洗规则,将这些没有统一的格式经过清洗后,实现客户基础信息的格式统一。
步骤S3,格式化清洗:调用部署在存储器中的第二清洗规则,对客户基础信息中的基础字段进行格式化清洗后得到客户名单信息,所述格式化清洗用于对基础字段的读取、匹配或转换成统一内容,实现客户名单信息的内容统一。
在得到格式统一的客户基础信息后,期内的数据存在内容不一致问题,比如“城市”,呈现方式可能是“上海”、“上海市”或“沪”等内容问题,本步骤,通过第二清洗规则,将这些没有统一的内容经过格式化清洗后,实现最终的客 户名单信息的内容统一。
步骤S4,属性转换:调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段,将属性字段与预设属性规则表中的属性信息进行比较,将与属性规则表中内容匹配的属性字段转换成活动类型后存储在客户名单信息中。
在客户原始信息中,不仅仅存在如“联系方式”、“城市”、“证件号码”、“性别”字段等基础信息,还存在有如客户的资产、经济、职业等情况的属性信息。这些信息具体例如:是否有房、是否有车、是否有贷、是否有信用卡,以及拥有信用卡年限、信用卡额度、房产情况、发薪方式、职业类别、学历、社保公积金等。上述信息对后续销售产品也至关重要,为了实现属性信息与销售产品直接对应,增加坐席的工作效率和接通并销售成功的比率,本步骤在存储器中预设有属性规则表,通过调用第三清洗规则,将客户原始信息中的属性字段,与属性规则表中的属性信息进行比较,得到对应的活动类型,将活动类型对应的客户在客户名单信息中进行存储。
本实施例,通过将客户原始信息中的基础信息进行预清洗、格式化清洗,将客户原始信息中的属性信息转换为活动类型,最终得到了完整、准确,具有较高销售策略价值的客户名单信息。
在一个实施例中,步骤S1中,电销名单中的客户原始信息包括联系方式字段、城市字段、证件号码字段或性别字段中的至少一种基础字段。电销名单中的客户原始信息还包括资产字段、经济字段或职业字段中的至少一种属性字段。
基础字段是客户的一些基本信息,需要对这些基础字段进行与清洗实现格式统一,格式化清洗实现内容统一。而属性字段内容较为繁杂,从领域来说,可以包括客户的资产、经济、职业情况等特殊信息,从具体属性来说,可以包括是否有房、是否有车、是否有贷、是否有信用卡、以及拥有信用卡年限、信用卡额度、房产情况、月薪、发薪方式、职业类别、学历、社保公积金缴纳情况、连续缴纳多久等。属性字段的内容直接影响了后续销售产品的活动类型,需要对属性字段进行直接转换为活动类型,供后续坐席参考。
在一个实施例中,步骤S2中,调用部署在存储器中的第一清洗规则,对客户原始信息中的基础字段进行预清洗时,采用如下方式:处理全角转半角、去 tab、去回车、去换行、去空格、清除无效字段或数据合并去重中的至少一项操作,实现基础字段的格式统一。
其中,清除无效字段的操作方式包括:客户原始信息中的基础字段含有联系方式字段时,将小于11位且不为数字的联系方式字段定义为无效字段并清除;客户原始信息中的基础字段含有城市字段时,将非汉字的城市字段定义为无效字段并清除;客户原始信息中的基础字段含有证件号码字段时,将非数字的证件号码字段定义为无效字段并清除。采用上述各清洗的操作方式,能将客户原始信息中的基础字段进行有效清洗,得到比较准确的基础字段信息,为下一步实现客户名单信息的内容统一提供精确数据。
在具体配置第一清洗规则时,首先可以根据基础字段中的某一项关键字,先进行数据合并去重工作,如通过联系方式字段或证件号码字段的唯一性原则,作为关键字,对客户原始信息进行合并去重,去重的实现方法可以通过SQL语言或excel自带的工具实现。其次,再依次对每个客户原始信息进行遍历清洗。
采用本实施例的第一清洗规则,可以尽可能的对客户原始信息的基础字段进行预清洗,通过预清洗解决了基础信息的整体完整性问题。
在一个实施例中,如图2所示,步骤S3包括如下具体步骤:
步骤S301,清洗联系方式字段:客户基础信息中的基础字段含有联系方式字段时,读取联系方式字段,判断联系方式字段为数字且不小于11位时,从后往前截取11位数字作为客户名单信息中的联系方式字段内容。在客户基础信息中基础字段格式统一的前提下,本步骤还对联系方式字段进行了格式化清洗,其运用的第二清洗规则包括如下表1所示:
Figure PCTCN2018104298-appb-000001
表1
通过上述联系方式字段的清洗规则,可以得到较为准确的客户联系方式,为后续坐席进行电销时的接通率得到保证。
步骤S302,清洗城市字段:客户基础信息中的基础字段含有城市字段时,读取城市字段,判断城市字段为汉字时,采用精确匹配或模糊匹配的方式与预设的城市基表进行匹配,获取城市基表中的城市代码作为客户名单信息中的城市字段内容。在客户基础信息中基础字段格式统一的前提下,还对城市字段进行了格式化清洗,格式化清洗前预先在存储器中配置城市基表,城市基表中的内容包括城市名称、城市简称、城市代码,具体的部分城市基表如下表2所示:
城市名称 城市简称 城市代码
北京市 1101
天津市 1201
上海市 3101
浙江省杭州市 3301
浙江省宁波市 3302
表2
本步骤在对城市字段进行了格式化清洗时,其运用的第二清洗规则包括如下表3所示:
Figure PCTCN2018104298-appb-000002
表3
上述表3中精确匹配是指只有整个字段与检索词相同才匹配。本步骤将城市字段中的汉字当固定词组,与城市基表中内容进行检索,只有汉字与城市基表中的城市简称或城市名称完全一致才认为匹配成功,获取对应的城市代码。
模糊匹配是指无论词的位置怎样只要出现该词即可。在本步骤中,将城市字段中的汉字拆分为单个汉字的单元概念,并进行逻辑与运算,根据预设的逻辑与运算认为匹配成功时,获取对应的城市代码。逻辑与运算可以根据实际实施时的具体情况,运用“and”、“or”、“not”关系。例如,城市字段内容为“上海”,则将“上海”拆分为“上”和“海”,预设逻辑是“and”关系,即,只要城市名称中的某一项存在“上”and“海”则认为匹配,获取对应的城市代码“3101”,作为客户名单信息中的城市字段内容。
本步骤采用精确匹配和模糊匹配相结合的方式,将客户基础信息中城市字段进行内容的统一,以满足后续对客户名单信息的数据采集,在后续坐席拨打销售时可以根据城市进行自动分配。
步骤S303,清洗证件号码字段:客户基础信息中的基础字段含有证件号码字段时,读取证件号码字段,判断证件号码字段长度为18位且前17位是数字位时,认为此证件号码字段为身份证,将身份证内容转换为性别字段、出生日期字段、年龄字段后,与证件号码字段一起存储在客户名单信息中。
在客户基础信息中基础字段格式统一的前提下,本步骤还对证件号码字段进行了格式化清洗,其运用的第二清洗规则包括如下表4所示:
Figure PCTCN2018104298-appb-000003
表4
居民身份证号码具有唯一性和编码规则性,其号码结构是一组特征组合码,由十七位数字本体码和一位校验码组成。排列顺序从左至右依次为:六位数字 地址码,八位数字出生日期码,三位数字顺序码和一位数字校验码。其中,六位数字地址码中,第一、二位表示省(自治区、直辖市、特别行政区),第三、四位表示市(地级市、自治州、盟及国家直辖市所属市辖区和县的汇总码),第五、六位表示县(市辖区、县级市、旗)。其中,三位数字顺序码中,第十七位奇数分给男性,偶数分给女性。
本步骤根据上述编码规则,在确定证件号码字段为身份证的前提下,可以获取到众多客户的基本信息,补充到客户名单信息中,供后续坐席参考。
步骤S304,清洗性别字段:客户基础信息中的基础字段含有性别字段时,读取性别字段,判断性别字段中的内容带有M、男、先生或Male,且不为FeMale时,认为性别字段为M存储在客户名单信息中,判断性别字段中的内容带有F、女、小姐、女士、或FeMale,则认为性别字段为F存储在客户名单信息中。
在客户基础信息中基础字段格式统一的前提下,本步骤还对性别字段进行了格式化清洗,其运用的第二清洗规则包括如下表5所示:
Figure PCTCN2018104298-appb-000004
表5
由于对性别的描述比较繁杂,因此本步骤,对性别字段的判断时,罗列了几种常见的撰写情况,依次进行判断,识别出该客户是男性(M)或女性(F)。判断性别后,能为后续坐席在电销过程中,推荐产品的类别提供直接方向。
在一个实施例中,客户基础信息中的基础字段含有联系方式字段、未含有城市字段时,通过联系方式字段获取城市信息,转换为城市代码作为客户名单信息中的城市字段内容。由于联系方式是一种具有一定编码规则的数字,如手机号码中的第4到7位表示号码归属地,因此通过联系方式字段也能识别出客户的城市。在客户基础信息中没有城市字段时,为了尽可能的完善客户的基础信息,本实施例通过从联系方式字段中,读取号码归属地,经转换后,获取城 市代码作为客户名单信息中的城市字段内容。
在一个实施例中,客户基础信息中的基础字段含有证件号码字段且此证件号码字段为身份证时,以身份证内容转换的性别字段存储在客户名单信息中。
由于公民的身份证号码具有唯一性和权威性,因此在客户基础信息中含有身份证时,由于身份证号码中,第十七位奇数分给男性,偶数分给女性,因此以身份证上获取的性别信息优先级高于其他字段获取的性别信息。本实施例能增加对性别字段的数据正确性,为坐席提供完整和准确的客户名单信息。
在一个实施例中,在调用部署在存储器中的第三清洗规则前,在存储器中预设一属性规则表,属性规则表中罗列了包括资产字段、经济字段或职业字段在内的属性信息,及每一种属性信息对应的活动类型;调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段后,通过将客户原始信息中的属性字段与属性规则表中的属性信息进行比较,如比较相同时,读取此属性信息对应的活动类型,将客户原始信息中的属性字段转换成此活动类型,存储在客户名单信息中。具体的,例如客户原始信息中存在有某一客户的属性字段为“有车”,而属性规则表中预设的属性信息中设有“有车”属性信息,及对应的活动类型为“短信MGM活动”,则通过调用第三清洗规则,读取客户原始信息的属性字段后,通过将客户原始信息中的属性字段与属性规则表中的属性信息进行比较,读取“短信MGM活动”的活动类型,存储在客户名单信息中。
在一个实施例中,提出了一种名单清洗系统,如图3所示,包括如下单元:
获取单元,设置为获取电销名单,将电销名单保存在存储器中,电销名单中含有不同来源的客户原始信息;预清洗单元,设置为调用部署在存储器中的第一清洗规则,对客户原始信息中的基础字段进行预清洗得到客户基础信息,所述预清洗用于实现客户基础信息中的基础字段的格式统一;格式化清洗单元,设置为调用部署在存储器中的第二清洗规则,对客户基础信息中的基础字段进行格式化清洗后得到客户名单信息,所述格式化清洗用于对基础字段的读取、匹配或转换成统一内容,实现客户名单信息的内容统一;转换单元,设置为调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段,将属性字段与预设的属性规则表中的属性信息进行比较,将与属性规则表中内容匹配的 属性字段转换成活动类型后,存储在客户名单信息中。
在一个实施例中,预清洗单元还设置为处理全角转半角、去tab、去回车、去换行、去空格、清除无效字段或数据合并去重中的至少一项操作,实现基础字段的格式统一。
在一个实施例中,清除无效字段包括:客户原始信息中的基础字段含有联系方式字段时,将小于11位且不为数字的联系方式字段定义为无效字段并清除;客户原始信息中的基础字段含有城市字段时,将非汉字的城市字段定义为无效字段并清除;客户原始信息中的基础字段含有证件号码字段时,将非数字的证件号码字段定义为无效字段并清除。
在一个实施例中,如图4所示,格式化清洗单元包括:清洗联系方式模块,设置为所述客户基础信息中的基础字段含有联系方式字段时,读取联系方式字段,判断联系方式字段为数字且不小于11位时,从后往前截取11位数字作为客户名单信息中的联系方式字段内容;清洗城市模块,设置为所述客户基础信息中的基础字段含有城市字段时,读取城市字段,判断城市字段为汉字时,采用精确匹配或模糊匹配的方式与预设的城市基表进行匹配,获取城市基表中的城市代码作为客户名单信息中的城市字段内容;清洗证件号码模块,设置为所述客户基础信息中的基础字段含有证件号码字段时,读取证件号码字段,判断证件号码字段长度为18位且前17位是数字位时,认为此证件号码字段为身份证,将身份证内容转换为性别字段、出生日期字段、年龄字段后,与证件号码字段一起存储在客户名单信息中;清洗性别模块,设置为所述客户基础信息中的基础字段含有性别字段时,读取性别字段,判断性别字段中的内容带有M、男、先生或Male,且不为FeMale时,认为性别字段为M存储在客户名单信息中,判断性别字段中的内容带有F、女、小姐、女士、或FeMale,则认为性别字段为F存储在客户名单信息中。
在一个实施例中,客户基础信息中的基础字段含有联系方式字段、未含有城市字段时,通过联系方式字段获取城市信息,转换为城市代码作为客户名单信息中的城市字段内容;客户基础信息中的基础字段含有证件号码字段且此证件号码字段为身份证时,以身份证内容转换的性别字段存储在客户名单信息中。
在一个实施例中,转换单元还设置为在存储器中预设一属性规则表,属性规则表中罗列了包括资产字段、经济字段或职业字段在内的属性信息,及每一种属性信息对应的活动类型;转换单元还包括:比较模块,设置为调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段后,通过将客户原始信息中的属性字段与属性规则表中的属性信息进行比较,如比较相同时,读取此属性信息对应的活动类型,将客户原始信息中的属性字段转换成此活动类型,存储在客户名单信息中。
在一个实施例中,提出了一种计算机设备,包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行上述各实施例里名单清洗方法中的步骤。
在一个实施例中,提出了一种存储有计算机可读指令的存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述各实施例里名单清洗方法中的步骤。其中,存储介质可以为非易失性存储介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请一些示例性实施例,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种名单清洗方法,包括:
    获取电销名单,将电销名单保存在存储器中,电销名单中含有不同来源的客户原始信息;
    调用部署在存储器中的第一清洗规则,对客户原始信息中的基础字段进行预清洗得到客户基础信息,所述预清洗用于实现客户基础信息中的基础字段的格式统一;
    调用部署在存储器中的第二清洗规则,对客户基础信息中的基础字段进行格式化清洗后得到客户名单信息,所述格式化清洗用于对基础字段的读取、匹配或转换成统一内容,实现客户名单信息的内容统一;
    调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段,将属性字段与预设的属性规则表中的属性信息进行比较,将与属性规则表中内容匹配的属性字段转换成活动类型后,存储在客户名单信息中。
  2. 根据权利要求1所述的名单清洗方法,其中,调用部署在存储器中的第一清洗规则,对客户原始信息中的基础字段进行预清洗时,包括:处理全角转半角、去tab、去回车、去换行、去空格、清除无效字段或数据合并去重中的至少一项操作,实现基础字段的格式统一。
  3. 根据权利要求2所述的名单清洗方法,其中,所述清除无效字段包括:
    所述客户原始信息中的基础字段含有联系方式字段时,将小于11位且不为数字的联系方式字段定义为无效字段并清除;
    所述客户原始信息中的基础字段含有城市字段时,将非汉字的城市字段定义为无效字段并清除;
    所述客户原始信息中的基础字段含有证件号码字段时,将非数字的证件号码字段定义为无效字段并清除。
  4. 根据权利要求1所述的名单清洗方法,其中,调用部署在存储器中的第二清洗规则,对客户基础信息中的基础字段进行格式化清洗时,包括:
    所述客户基础信息中的基础字段含有联系方式字段时,读取联系方式字段,判断联系方式字段为数字且不小于11位时,从后往前截取11位数字作为客户名单信息中的联系方式字段内容;
    所述客户基础信息中的基础字段含有城市字段时,读取城市字段,判断城市字段为汉字时,采用精确匹配或模糊匹配的方式与预设的城市基表进行匹配,获取城市基表中的城市代码作为客户名单信息中的城市字段内容;
    所述客户基础信息中的基础字段含有证件号码字段时,读取证件号码字段,判断证件号码字段长度为18位且前17位是数字位时,认为此证件号码字段为身份证,将身份证内容转换为性别字段、出生日期字段、年龄字段后,与证件号码字段一起存储在客户名单信息中;
    所述客户基础信息中的基础字段含有性别字段时,读取性别字段,判断性别字段中的内容带有M、男、先生或Male,且不为FeMale时,认为性别字段为M存储在客户名单信息中,判断性别字段中的内容带有F、女、小姐、女士、或FeMale,则认为性别字段为F存储在客户名单信息中。
  5. 根据权利要求4所述的名单清洗方法,其中,所述客户基础信息中的基础字段含有联系方式字段、未含有城市字段时,通过联系方式字段获取城市信息,转换为城市代码作为客户名单信息中的城市字段内容;
    所述客户基础信息中的基础字段含有证件号码字段且此证件号码字段为身份证时,以身份证内容转换的性别字段存储在客户名单信息中。
  6. 根据权利要求1所述的名单清洗方法,其中,在调用部署在存储器中的第三清洗规则前,在存储器中预设一属性规则表,所述属性规则表中罗列了包括资产字段、经济字段或职业字段在内的属性信息,及每一种属性信息对应的活动类型;
    调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段后,通过将客户原始信息中的属性字段与属性规则表中的属性信息进行比较,如比较相同时,读取此属性信息对应的活动类型,将客户原始信息中的属性字段转换成此活动类型,存储在客户名单信息中。
  7. 一种名单清洗系统,包括:
    获取单元,设置为获取电销名单,将电销名单保存在存储器中,电销名单中含有不同来源的客户原始信息;
    预清洗单元,设置为调用部署在存储器中的第一清洗规则,对客户原始信 息中的基础字段进行预清洗得到客户基础信息,所述预清洗用于实现客户基础信息中的基础字段的格式统一;
    格式化清洗单元,设置为调用部署在存储器中的第二清洗规则,对客户基础信息中的基础字段进行格式化清洗后得到客户名单信息,所述格式化清洗用于对基础字段的读取、匹配或转换成统一内容,实现客户名单信息的内容统一;
    转换单元,设置为调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段,将属性字段与预设的属性规则表中的属性信息进行比较,将与属性规则表中内容匹配的属性字段转换成活动类型后,存储在客户名单信息中。
  8. 根据权利要求7所述的名单清洗系统,其中,预清洗单元还设置为处理全角转半角、去tab、去回车、去换行、去空格、清除无效字段或数据合并去重中的至少一项操作,实现基础字段的格式统一。
  9. 根据权利要求8所述的名单清洗系统,其中,所述清除无效字段包括:
    所述客户原始信息中的基础字段含有联系方式字段时,将小于11位且不为数字的联系方式字段定义为无效字段并清除;
    所述客户原始信息中的基础字段含有城市字段时,将非汉字的城市字段定义为无效字段并清除;
    所述客户原始信息中的基础字段含有证件号码字段时,将非数字的证件号码字段定义为无效字段并清除。
  10. 根据权利要求7所述的名单清洗系统,其中,所述格式化清洗单元包括:
    清洗联系方式模块,设置为所述客户基础信息中的基础字段含有联系方式字段时,读取联系方式字段,判断联系方式字段为数字且不小于11位时,从后往前截取11位数字作为客户名单信息中的联系方式字段内容;
    清洗城市模块,设置为所述客户基础信息中的基础字段含有城市字段时,读取城市字段,判断城市字段为汉字时,采用精确匹配或模糊匹配的方式与预设的城市基表进行匹配,获取城市基表中的城市代码作为客户名单信息中的城市字段内容;
    清洗证件号码模块,设置为所述客户基础信息中的基础字段含有证件号码 字段时,读取证件号码字段,判断证件号码字段长度为18位且前17位是数字位时,认为此证件号码字段为身份证,将身份证内容转换为性别字段、出生日期字段、年龄字段后,与证件号码字段一起存储在客户名单信息中;
    清洗性别模块,设置为所述客户基础信息中的基础字段含有性别字段时,读取性别字段,判断性别字段中的内容带有M、男、先生或Male,且不为FeMale时,认为性别字段为M存储在客户名单信息中,判断性别字段中的内容带有F、女、小姐、女士、或FeMale,则认为性别字段为F存储在客户名单信息中。
  11. 根据权利要求10所述的名单清洗系统,其中,所述客户基础信息中的基础字段含有联系方式字段、未含有城市字段时,通过联系方式字段获取城市信息,转换为城市代码作为客户名单信息中的城市字段内容;
    所述客户基础信息中的基础字段含有证件号码字段且此证件号码字段为身份证时,以身份证内容转换的性别字段存储在客户名单信息中。
  12. 根据权利要求7所述的名单清洗系统,其中,所述转换单元还设置为在存储器中预设一属性规则表,所述属性规则表中罗列了包括资产字段、经济字段或职业字段在内的属性信息,及每一种属性信息对应的活动类型;
    所述转换单元还包括:
    比较模块,设置为调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段后,通过将客户原始信息中的属性字段与属性规则表中的属性信息进行比较,如比较相同时,读取此属性信息对应的活动类型,将客户原始信息中的属性字段转换成此活动类型,存储在客户名单信息中。
  13. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
    获取电销名单,将电销名单保存在存储器中,电销名单中含有不同来源的客户原始信息;
    调用部署在存储器中的第一清洗规则,对客户原始信息中的基础字段进行预清洗得到客户基础信息,所述预清洗用于实现客户基础信息中的基础字段的格式统一;
    调用部署在存储器中的第二清洗规则,对客户基础信息中的基础字段进行格式化清洗后得到客户名单信息,所述格式化清洗用于对基础字段的读取、匹配或转换成统一内容,实现客户名单信息的内容统一;
    调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段,将属性字段与预设的属性规则表中的属性信息进行比较,将与属性规则表中内容匹配的属性字段转换成活动类型后,存储在客户名单信息中。
  14. 根据权利要求13所述的计算机设备,其中,调用部署在存储器中的第一清洗规则,对客户原始信息中的基础字段进行预清洗时,使得所述处理器执行:处理全角转半角、去tab、去回车、去换行、去空格、清除无效字段或数据合并去重中的至少一项操作,实现基础字段的格式统一。
  15. 根据权利要求14所述的计算机设备,其中,处理所述清除无效字段时,使得所述处理器执行以下步骤:
    所述客户原始信息中的基础字段含有联系方式字段时,将小于11位且不为数字的联系方式字段定义为无效字段并清除;
    所述客户原始信息中的基础字段含有城市字段时,将非汉字的城市字段定义为无效字段并清除;
    所述客户原始信息中的基础字段含有证件号码字段时,将非数字的证件号码字段定义为无效字段并清除。
  16. 根据权利要求13所述的计算机设备,其中,调用部署在存储器中的第二清洗规则,对客户基础信息中的基础字段进行格式化清洗时,使得所述处理器执行以下步骤:
    所述客户基础信息中的基础字段含有联系方式字段时,读取联系方式字段,判断联系方式字段为数字且不小于11位时,从后往前截取11位数字作为客户名单信息中的联系方式字段内容;
    所述客户基础信息中的基础字段含有城市字段时,读取城市字段,判断城市字段为汉字时,采用精确匹配或模糊匹配的方式与预设的城市基表进行匹配,获取城市基表中的城市代码作为客户名单信息中的城市字段内容;
    所述客户基础信息中的基础字段含有证件号码字段时,读取证件号码字段, 判断证件号码字段长度为18位且前17位是数字位时,认为此证件号码字段为身份证,将身份证内容转换为性别字段、出生日期字段、年龄字段后,与证件号码字段一起存储在客户名单信息中;
    所述客户基础信息中的基础字段含有性别字段时,读取性别字段,判断性别字段中的内容带有M、男、先生或Male,且不为FeMale时,认为性别字段为M存储在客户名单信息中,判断性别字段中的内容带有F、女、小姐、女士、或FeMale,则认为性别字段为F存储在客户名单信息中。
  17. 一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
    获取电销名单,将电销名单保存在存储器中,电销名单中含有不同来源的客户原始信息;
    调用部署在存储器中的第一清洗规则,对客户原始信息中的基础字段进行预清洗得到客户基础信息,所述预清洗用于实现客户基础信息中的基础字段的格式统一;
    调用部署在存储器中的第二清洗规则,对客户基础信息中的基础字段进行格式化清洗后得到客户名单信息,所述格式化清洗用于对基础字段的读取、匹配或转换成统一内容,实现客户名单信息的内容统一;
    调用部署在存储器中的第三清洗规则,读取客户原始信息的属性字段,将属性字段与预设的属性规则表中的属性信息进行比较,将与属性规则表中内容匹配的属性字段转换成活动类型后,存储在客户名单信息中。
  18. 根据权利要求17所述的存储介质,其中,调用部署在存储器中的第一清洗规则,对客户原始信息中的基础字段进行预清洗时,使得一个或多个处理器执行:处理全角转半角、去tab、去回车、去换行、去空格、清除无效字段或数据合并去重中的至少一项操作,实现基础字段的格式统一。
  19. 根据权利要求18所述的存储介质,其中,处理所述清除无效字段时,使得一个或多个处理器执行以下步骤:
    所述客户原始信息中的基础字段含有联系方式字段时,将小于11位且不为数字的联系方式字段定义为无效字段并清除;
    所述客户原始信息中的基础字段含有城市字段时,将非汉字的城市字段定义为无效字段并清除;
    所述客户原始信息中的基础字段含有证件号码字段时,将非数字的证件号码字段定义为无效字段并清除。
  20. 根据权利要求17所述的存储介质,其中,调用部署在存储器中的第二清洗规则,对客户基础信息中的基础字段进行格式化清洗时,使得一个或多个处理器执行以下步骤:
    所述客户基础信息中的基础字段含有联系方式字段时,读取联系方式字段,判断联系方式字段为数字且不小于11位时,从后往前截取11位数字作为客户名单信息中的联系方式字段内容;
    所述客户基础信息中的基础字段含有城市字段时,读取城市字段,判断城市字段为汉字时,采用精确匹配或模糊匹配的方式与预设的城市基表进行匹配,获取城市基表中的城市代码作为客户名单信息中的城市字段内容;
    所述客户基础信息中的基础字段含有证件号码字段时,读取证件号码字段,判断证件号码字段长度为18位且前17位是数字位时,认为此证件号码字段为身份证,将身份证内容转换为性别字段、出生日期字段、年龄字段后,与证件号码字段一起存储在客户名单信息中;
    所述客户基础信息中的基础字段含有性别字段时,读取性别字段,判断性别字段中的内容带有M、男、先生或Male,且不为FeMale时,认为性别字段为M存储在客户名单信息中,判断性别字段中的内容带有F、女、小姐、女士、或FeMale,则认为性别字段为F存储在客户名单信息中。
PCT/CN2018/104298 2018-06-04 2018-09-06 名单清洗方法、系统、计算机设备和存储介质 WO2019232952A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810561479.4 2018-06-04
CN201810561479.4A CN109241363A (zh) 2018-06-04 2018-06-04 名单清洗方法、系统、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2019232952A1 true WO2019232952A1 (zh) 2019-12-12

Family

ID=65083699

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/104298 WO2019232952A1 (zh) 2018-06-04 2018-09-06 名单清洗方法、系统、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN109241363A (zh)
WO (1) WO2019232952A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287406A (zh) * 2019-05-21 2019-09-27 深圳壹账通智能科技有限公司 渠道用户推荐方法、服务器及计算机可读存储介质
CN112380201A (zh) * 2020-11-10 2021-02-19 中国人寿保险股份有限公司 一种保险信息报送方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473375A (zh) * 2013-09-29 2013-12-25 方正国际软件有限公司 数据清洗系统和数据清洗方法
CN104765806A (zh) * 2015-04-01 2015-07-08 国家电网公司 营销客户基础信息不规范的自动处理技术
CN107239581A (zh) * 2017-07-07 2017-10-10 小草数语(北京)科技有限公司 数据清洗方法及装置
CN108073591A (zh) * 2016-11-10 2018-05-25 北京宸信征信有限公司 一种具有身份属性的多源数据的整合存储系统及方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103489051A (zh) * 2012-06-11 2014-01-01 上海佳锐信息科技有限公司 基金公司的多个信息系统中客户信息核对归一的一种方法
US9454588B2 (en) * 2012-08-14 2016-09-27 International Business Machines Corporation Custom object-in-memory format in data grid network appliance
CN107679718B (zh) * 2017-09-19 2020-12-22 平安科技(深圳)有限公司 名单分配方法、设备以及计算机可读存储介质
CN107909473A (zh) * 2017-12-27 2018-04-13 中国银行股份有限公司 一种基于用户行为分析的网上银行营销方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473375A (zh) * 2013-09-29 2013-12-25 方正国际软件有限公司 数据清洗系统和数据清洗方法
CN104765806A (zh) * 2015-04-01 2015-07-08 国家电网公司 营销客户基础信息不规范的自动处理技术
CN108073591A (zh) * 2016-11-10 2018-05-25 北京宸信征信有限公司 一种具有身份属性的多源数据的整合存储系统及方法
CN107239581A (zh) * 2017-07-07 2017-10-10 小草数语(北京)科技有限公司 数据清洗方法及装置

Also Published As

Publication number Publication date
CN109241363A (zh) 2019-01-18

Similar Documents

Publication Publication Date Title
US20100262607A1 (en) System and Method for Automatic Matching of Contracts to Impression Opportunities Using Complex Predicates and an Inverted Index
CN101625686A (zh) 一种监控多数据库之间数据一致性的方法及系统
US9443244B2 (en) System and method for utilizing customer data in a communication system
CN111046237A (zh) 用户行为数据处理方法、装置、电子设备及可读介质
CN111382279A (zh) 审单方法和装置
WO2019232952A1 (zh) 名单清洗方法、系统、计算机设备和存储介质
CN110942392A (zh) 一种业务数据处理方法、装置、设备和介质
CN111815457A (zh) 目标对象的评估方法以及装置
CN111708897A (zh) 目标信息确定方法、装置和设备
CN114328878A (zh) 一种信息回复方法、装置、介质
CN113111066A (zh) 一种数据库操作工单自动上线方法、装置、系统和计算机设备
CN112052259A (zh) 数据处理方法、装置、设备及计算机存储介质
CN108989063B (zh) 群组账户数据的处理方法、装置、设备及可读存储介质
CN116228384A (zh) 一种数据处理方法、装置、电子设备及计算机可读介质
CN112307334B (zh) 信息推荐方法、信息推荐装置、存储介质与电子设备
CN114756685A (zh) 一种投诉单的投诉风险识别方法及装置
WO2020214768A1 (en) Company size estimation system
CN114357280A (zh) 一种信息推送方法、装置、电子设备及计算机可读介质
CN113128595A (zh) 整合客户信息的方法、装置、设备和计算机可读介质
US20140237003A1 (en) Data Communication and Analytics Platform
CN111127077A (zh) 一种基于流计算的推荐方法和装置
CN111105270A (zh) 一种管理推送数据的方法及装置
CN113971007B (zh) 信息处理方法、装置、电子设备及介质
CN115470254A (zh) 数据查询方法、装置、设备及存储介质
CN112883246A (zh) 一种业务事项的展示方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18921766

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18921766

Country of ref document: EP

Kind code of ref document: A1