WO2019056750A1 - Information uniqueness identification method, application server, system, and storage medium - Google Patents

Information uniqueness identification method, application server, system, and storage medium Download PDF

Info

Publication number
WO2019056750A1
WO2019056750A1 PCT/CN2018/084325 CN2018084325W WO2019056750A1 WO 2019056750 A1 WO2019056750 A1 WO 2019056750A1 CN 2018084325 W CN2018084325 W CN 2018084325W WO 2019056750 A1 WO2019056750 A1 WO 2019056750A1
Authority
WO
WIPO (PCT)
Prior art keywords
group
customer
identification
customers
information
Prior art date
Application number
PCT/CN2018/084325
Other languages
French (fr)
Chinese (zh)
Inventor
王恩贵
项同德
钱慧敏
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019056750A1 publication Critical patent/WO2019056750A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Definitions

  • the present application relates to the field of information recognition technologies, and in particular, to an information uniqueness identification method, an application server, a system, and a storage medium.
  • the purpose of the present application is to provide an information unique identification method, an application server, a system, and a storage medium, which can perform accurate recognition or fuzzy recognition according to different types of group customers, and solve the problem.
  • Some group customers who are incomplete customer information are unable to perform unique identification and data integration issues.
  • An information uniqueness identification method includes the following steps:
  • the group customers in the precise identification category and the fuzzy recognition category are respectively accurately identified and fuzzyly identified, and the group customers who are the same customer are identified;
  • An application server for uniquely identifying information comprising: a processor, a memory, and a communication bus;
  • a computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to achieve unique identification of information as described above The steps in the method.
  • An information uniqueness identification system includes a plurality of source databases, which further includes an application server uniquely identified by the information as described above; each source database is configured to store basic information of a group client; and the application server is configured to acquire Basic information of a group customer stored in each source database, wherein the basic information includes a customer name and identification information; and marking each group customer as a precise identification class or a fuzzy recognition class according to the identification information; according to a preset identification rule Accurately identify and fuzzyly identify group customers in the precision recognition class and the fuzzy recognition class, identify group customers who are the same customer; and obtain the recognition results of accurate recognition and fuzzy recognition, and all the same customers according to the recognition result
  • the basic information of the group customers is integrated to obtain uniquely identified group customer information.
  • the information uniqueness identification method obtains basic information of a group client stored in each source database, where The basic information includes the customer name and the identification information; then, each group customer is marked as a precise identification class or a fuzzy recognition class according to the identification information; and then the group customers in the precise recognition class and the fuzzy recognition class are separately performed according to the preset identification rule.
  • Accurate identification and fuzzy recognition identify group customers who are the same customer; then obtain the recognition results of precise identification and fuzzy recognition, and integrate the basic information of all group customers who are the same customer according to the recognition result to obtain unique identification Group customer information.
  • Accurate identification or fuzzy identification can be carried out according to the types of different groups of customers, which solves the problem that some group customers cannot perform unique identification and data integration due to incomplete customer information.
  • FIG. 1 is a schematic diagram of an application environment of an information uniqueness identification method provided by the present application.
  • FIG. 3 is a flowchart of step S20 in the method for uniquely identifying information provided by the present application
  • step S30 is a flowchart of step S30 in the method for uniquely identifying information provided by the present application
  • FIG. 5 is a flowchart of step S32 in the method for uniquely identifying information provided by the present application.
  • FIG. 6 is a flowchart of step S33 in the method for uniquely identifying information provided by the present application.
  • FIG. 7 is a flowchart of step S332 in the method for uniquely identifying information provided by the present application.
  • FIG. 8 is a flowchart of step S333 in the method for uniquely identifying information provided by the present application.
  • FIG. 9 is a schematic diagram of an operating environment of a preferred embodiment of the information uniqueness identification program of the present application.
  • FIG. 10 is a functional block diagram of a preferred embodiment of an application server for installing an information uniqueness identification program of the present application
  • FIG. 11 is a structural block diagram of an information uniqueness identification system provided by the present application.
  • the purpose of the present application is to provide an information unique identification method, an application server, a system and a storage medium, which can be respectively according to different types of group customers.
  • Accurate identification or fuzzy recognition solves the problem that some group customers cannot perform unique identification and data integration due to incomplete customer information.
  • FIG. 1 is a schematic diagram of an application environment of the information uniqueness identification method provided by the present application.
  • one or more applications can be installed in the application server to process related data.
  • the basic information of the group clients stored in the respective source databases may be received by the application server, and the application server marks each group client as a precise identification class or a fuzzy recognition class according to the basic information, and accurately identifies the group.
  • the group customers in the class and fuzzy recognition class respectively perform precise identification and fuzzy recognition, identify group customers who are the same customer, and integrate the basic information of the group customers according to the recognition result, thereby achieving accurate according to the types of different groups of customers. Identify or blur identification to avoid unrecognized customer information.
  • the method for uniquely identifying information includes the following steps:
  • a plurality of source databases may be set to store group customer data of different industrial companies. Since a group customer may have business contacts with different industrial companies under the same company, the same source database stores the same. Group customer data, so it is necessary to uniquely identify and integrate group customers in all source databases to facilitate group customer management and data analysis, specifically to obtain stored group customers from various source databases of different industry companies.
  • Basic information wherein the basic information includes a customer name and identification information, and the specific identification information is information that can identify the identity of the group customer, such as an organization code, a business registration number, a tax registration number, a business license number, and the like.
  • the source database is an oracle database (the oracle database is also named Oracle) RDBMS, or Oracle for short, is a relational database management system from Oracle), MySQL (MySQL is an open source small relational database management system) database or PostgreSQL (PostgreSQL) Is a free object-relational database server database, the target database is hive (hive is a data warehouse tool based on Hadoop) database.
  • Oracle is also named Oracle
  • Oracle is also named Oracle
  • Oracle is a relational database management system from Oracle
  • MySQL MySQL is an open source small relational database management system
  • PostgreSQL PostgreSQL
  • hive is a data warehouse tool based on Hadoop
  • the identification information of different group customers is different, the identification information of different groups of customers is marked as accurate identification or fuzzy recognition, and corresponding to different categories of customers.
  • the uniqueness identification whether the customer identification information is complete or missing, can realize the identification and integration of the group customer, and broaden the application scope of the unique identification of the information. Please refer to FIG. 3 , which is the step S20 of the information uniqueness identification method provided by the present application. Flow chart.
  • the step S20 includes:
  • the identification information of each group customer is analyzed, and the content included in the identification information of each group customer is obtained, and the identification information of each group customer is sequentially determined whether the preset accurate identification information is included. If there is, it is marked as a precise recognition class, otherwise it is marked as a fuzzy recognition class, so as to realize the identification classification of the group customers, and provide a data base for the subsequent targeted identification process.
  • the preset accurate identification information is preferably a unique and always-changing code identifier of the organization code
  • the group customer including the organization code in the identification information is marked as a precise identification class, thereby enabling Fast and accurate accurate identification, and group customers who do not include the organization code are marked as fuzzy recognition, and fuzzy recognition is realized through other customer information comprehensively, which realizes hierarchical recognition and enriches the application scenario of information unique identification.
  • S30 Perform accurate identification and fuzzy recognition on the group customers in the precise identification category and the fuzzy recognition category according to the preset identification rule, and identify group customers that are the same customer.
  • step S30 is a flowchart of step S30 in the method for uniquely identifying information provided by the present application.
  • the step S30 includes:
  • S31 Perform text detection on the customer name of all group customers, and obtain the word number and text content of the customer name;
  • S32 Perform unique identification according to the preset accurate identification information and the customer name of each group customer in the accurate identification class, and identify the group customers in the accurate identification class that are the same customer;
  • S33 Perform unique identification according to the identification information of each group client in the fuzzy recognition class, the number of words of the customer name, and the text content, and identify the group customers in the fuzzy recognition class that are the same customer.
  • the text detection and recognition can adopt the existing OCR character recognition technology, and then According to the customer category, for the customers in the precision identification class, because the identification information includes the preset accurate identification information (in this embodiment, the organization code), according to the preset accuracy of each group customer in the accurate identification class.
  • the identification information and the customer name are uniquely identified, and for the customer in the fuzzy recognition class, since the identification information does not include the preset accurate identification information, it is necessary to combine the other contents in the identification information with the word number and the text of the customer name.
  • the content and the unique identification ensure that the customers in the accurate identification class can accurately and quickly identify, and also ensure that the customers in the fuzzy recognition class can integrate other basic information to identify and satisfy the different information integrity. Identify needs.
  • FIG. 5 is a flowchart of step S32 in the method for uniquely identifying information provided by the present application.
  • the step S32 includes:
  • S322. Determine, according to the comparison result, whether there is a group customer whose preset precision identification information is the same and the customer name is the same, and if yes, the group customer with the same accurate identification information and the same customer name is identified as the group customer with the selected group customer as The same customer; if not, the selected group customer and other group customers in the precision identification category are identified as different customers;
  • the group customers obtained from different source databases there are ten group customers marked as accurate identification classes, which are recorded as customer 1, customer 2, ..., customer 10,
  • accurate identification arbitrarily select one of the group customers to start unique identification, for example, select customer 1, compare its organization code and customer name with other nine group customers, and get the organization of customer 3 and customer 4.
  • the code and customer name are exactly the same as customer 1.
  • customer 1, customer 3 and customer 4 are identified as the same customer, and the customer data of the three can be integrated, and there is no need for customer 3 and customer 4 for subsequent accurate identification.
  • Identify then continue to compare customer 2's organization code and customer name with the other six group customers, and conclude that the organization code and customer name without group customer are exactly the same as customer 2, then customer 2 and other Nine group customers are identified as different customers; according to the above identification process, customers 5, customers 6, ..., and customers 10 are uniquely identified, so that ten customers in the precision identification class are uniquely identified to facilitate group customers. Data analysis and management.
  • step S33 includes:
  • the threshold is greater than or equal to the preset threshold, the uniqueness is determined according to the number of words and the text content of the customer name of the group client;
  • the unique identification is performed according to the identification information of the group customer, the number of words of the customer name, and the text content.
  • the different identification process is performed according to the number of the customer name of the group client, and the number of the customer name of each group client in the fuzzy identification class is determined to be greater than a preset threshold.
  • the preset threshold is preferred. 8 is to judge whether the customer name of each group customer in the fuzzy recognition class is greater than or equal to 8 words. If it is greater than or equal to 8 words, at this time, because the customer name is long, the probability of occurrence of different customer duplicate names is very small, so according to the customer The name word number and the text content are uniquely identified.
  • step S332 includes:
  • S3322 Determine, according to the comparison result, whether there is a group client whose text content of the customer name is completely the same, if yes, identify the group client with the same text content as the same client as the selected group client; if not, the group client will be selected Group customers and all other group customers are identified as different customers;
  • a group client whose number of words in the fuzzy recognition class is greater than or equal to 8 words is arbitrarily selected, and the client is selected.
  • the name is compared with the customer name of all group customers (including the group customers in the precision identification category and the fuzzy recognition category), and the uniqueness is determined based on the comparison result to determine whether there is a group customer whose textual content of the customer name is identical. If there is a group customer who has the same text content and the selected group customer as the same customer, and no matter how many customer accounts with the same text content of the customer name are identified, they are identified as the same customer as the selected group customer.
  • fuzzy recognition For example, among the group customers obtained from different source databases, there are ten group customers marked as fuzzy recognition classes and the number of customer name words is greater than or equal to 8 words, which are recorded as customer 11, customer 12, ..., customer 20, in progress
  • fuzzy recognition arbitrarily select one of the group customers to start unique identification. For example, select customer 11 and compare its customer name with all other group customers. At this time, it may happen that the customer 11 is the same as the group customer in the precision identification class. The customer, but because the organization code is missing from the identification information, it is marked as fuzzy identification. Therefore, when performing fuzzy recognition, it is necessary to compare the selected group customers with the names of all other group customers to ensure the comprehensiveness of the identification.
  • the customer in the precision identification class 2, the customer 13 in the fuzzy recognition class, and the client 14 have the same customer name text content as the customer 11, then the customer 2, the customer 11, the customer 13 and The customer 14 is identified as the same customer, and the customer data of the four can be integrated; after that, the customer of the customer 12 continues to be The name is compared with the other six group customers, and the customer name without the group customer is exactly the same as the customer 12. At this time, the customer 12 and all other group customers are identified as different customers; according to the above identification process, the customer 15 and the customer are sequentially 16.
  • the customer 20 performs unique identification and completes the uniqueness identification process of the group customer whose name word number is greater than 8 words.
  • FIG. 8 is a flowchart of step S333 in the method for uniquely identifying information provided by the present application.
  • the step S333 includes:
  • a group customer whose name is less than 8 words in the fuzzy recognition class is arbitrarily selected, and the customer name is The customer names of all group customers (including group customers in the precision identification category and fuzzy recognition category) are compared with the text content, and uniquely identified based on the comparison results to determine whether there is a group customer whose textual content of the customer name is identical, if any, Since the selected group customer name is short and the name of the group is likely to occur, it is determined whether the group client with the identical text content has the same identification information as the selected group client, such as the business registration number, the tax registration number, At least one of the business license numbers is the same, the customer name text content is identical, and the group customer with any identical identification information is identified as the same customer as the selected group customer; if not, the selected group customer and all others are selected Group customers are identified as different customers and continue to be selected Another group customer with less than 8 words in the fuzzy recognition
  • the recognition results of accurate recognition and fuzzy recognition are obtained at this time, and the recognition results are mutually Integrate the basic information of the group customers of the same customer, specifically to unify the names of all customers who are the same customer, and to complement and integrate the respective identification information to obtain uniquely identified group customer information, according to unique identification.
  • the group customer information can obtain the customer data stored in different source databases by a group of customers at one time, realizes the unified integration analysis of the same customer data in different source databases, and can perform data analysis according to the identification and integration of customer data. Combined with the user data stored in different industrial companies, it can comprehensively analyze the data life cycle, propensity to consume and risk control information of the group customers, which is beneficial to the follow-up and management of group customers.
  • the present application further provides an application server for information unique identification, which includes a processor 10, a memory 20, and a display 30.
  • Figure 9 shows only some of the components of the application server for information uniqueness identification, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
  • the memory 20 may, in some embodiments, be an internal storage unit of an application server uniquely identified by the information, such as a hard disk or memory of an application server. In other embodiments, the memory 20 may also be an external storage device of the application server uniquely identified by the information, for example, a plug-in hard disk equipped on the application server uniquely identified by the information, and a smart memory card (Smart Media Card, SMC), Secure Digital (SD) card, flash card (Flash) Card) and so on. Further, the memory 20 may also include an internal storage unit of the application server uniquely identified by the information and an external storage device. The memory 20 is configured to store application software and various types of data installed on an application server uniquely identified by the information, such as a program code of an application server uniquely identified by the installation information.
  • a program code of an application server uniquely identified by the installation information such as a program code of an application server uniquely identified by the installation information.
  • the memory 20 can also be used to temporarily store data that has been output or is about to be output.
  • an information uniqueness identification program 40 is stored on the memory 20, and the information uniqueness identification program 40 can be executed by the processor 10 to implement the information uniqueness identification method of various embodiments of the present application.
  • the processor 10 may be a central processing unit (Central Processing Unit) in some embodiments.
  • the display 30 may be an LED display, a liquid crystal display, a touch liquid crystal display, and an OLED (Organic) in some embodiments. Light-Emitting Diode, organic light emitting diodes), etc.
  • the display 30 is for displaying information of an application server uniquely identified by the information and a user interface for displaying visualization, such as a recognition result interface or the like.
  • the components 10-30 of the application server uniquely identified by the information communicate with one another via a system bus.
  • the processor 10 executes the information uniqueness identification program 40 in the memory 20, the following steps are implemented in the following embodiments of the information uniqueness identification method, and details are not described herein again.
  • FIG. 10 is a functional block diagram of a preferred embodiment of an application server for installing an information uniqueness identification program of the present application.
  • the application server that installs the information uniqueness identification program may be divided into one or more modules, the one or more modules being stored in the memory 20 and being processed by one or more processors ( This embodiment is performed by the processor 10) to complete the application.
  • an application server that installs the information uniqueness identification program may be divided into an acquisition module 21, a classification module 22, an identification module 23, and an integration module 24.
  • a module referred to in this application refers to a series of computer program instruction segments capable of performing a specific function, and is more suitable than the program to describe the execution process of the group customer uniqueness identification program in the application server uniquely identified by the group client. The following description will specifically describe the functions of the modules 21-24.
  • the obtaining module 21 is configured to obtain basic information of a group client stored in each source database, where the basic information includes a customer name and identification information;
  • the classification module 22 is configured to mark each group customer as a precise recognition class or a fuzzy recognition class according to the identification information
  • the identification module 23 is configured to accurately identify and fuzzyly identify the group customers in the precise identification class and the fuzzy recognition class according to the preset identification rule, and identify group customers that are mutually the same customer;
  • the integration module 24 is configured to obtain the recognition result of the accurate identification and the fuzzy recognition, and integrate the basic information of all the group customers who are the same customer according to the recognition result to obtain the uniquely identified group customer information.
  • the classification module 22 includes:
  • a parsing unit for parsing identification information of each group client
  • the classification unit is configured to determine whether the identification information of each group customer includes preset accurate identification information, and if so, the label is a precise recognition class; otherwise, the label is a fuzzy recognition class.
  • the identification module 23 includes:
  • the detecting unit is configured to perform text detection on the customer names of all group customers, and obtain the word number and text content of the customer name;
  • the accurate identification unit is configured to uniquely identify according to the preset accurate identification information and the customer name of each group customer in the accurate identification class, and identify the group customers in the precision identification class that are the same customer;
  • the fuzzy identification unit is configured to perform unique identification according to the identification information of each group client in the fuzzy recognition class, the number of words of the customer name, and the text content, and identify the group customers in the fuzzy recognition class that are mutually the same customer.
  • the precision identification unit includes:
  • the first comparison subunit is configured to arbitrarily select a group customer in the precision identification class, and compare the preset accurate identification information and the customer name with the preset accurate identification information and the customer name of other group customers in the precision identification category;
  • the first identification subunit is configured to determine, according to the comparison result, whether there is a group customer whose preset precision identification information is the same and the customer name is the same, and if yes, the group customer with the same accurate identification information and the same customer name is preset
  • the selected group customers are identified as the same customer; if not, the selected group customers and other group customers in the precision identification category are identified as different customers.
  • the fuzzy recognition unit includes:
  • a first determining sub-unit configured to determine whether a number of customer name words of each group customer in the fuzzy recognition class is greater than or equal to a preset threshold
  • a second identification subunit configured to perform unique identification according to the number of words and text content of the customer name of the group client if the preset threshold is greater than or equal to;
  • the third identification subunit is configured to perform unique identification according to the identification information of the group customer, the number of words of the customer name, and the text content if the threshold is less than a preset threshold.
  • the second identification subunit includes:
  • a second comparison sub-unit configured to arbitrarily select a group customer whose number of customers in the fuzzy recognition class is greater than or equal to a preset threshold, and compare the customer name with the customer name of all group customers;
  • a second determining sub-unit configured to determine, according to the comparison result, whether there is a group client whose text content of the customer name is identical, and if present, identify the group client with the identical text content as the same client as the selected group client; If it exists, the selected group customer and all other group customers are identified as different customers.
  • the third identification subunit includes:
  • the third comparison sub-unit is configured to arbitrarily select a group customer whose number of customers in the fuzzy recognition class is less than a preset threshold, and compare the customer name with the customer name of all group customers;
  • the third determining subunit is configured to determine, according to the comparison result, whether there is a group client whose text content of the customer name is identical, and if yes, continue to determine whether the group client with the identical text content has the same identification as the selected group client.
  • the information is that the group customer with the same text content and having the same identification information is identified as the same customer as the selected group customer; if not, the selected group customer and all other group customers are identified as different customers.
  • the present application further provides an information uniqueness identification system.
  • an information uniqueness identification system Referring to FIG. 11, it includes a plurality of source databases 110 and an application server 120 for uniquely identifying information as described above.
  • Each source database 110 is configured to store basic information of a group client
  • the application server 120 is configured to acquire basic information of a group client stored in each source database, where the basic information includes a customer name and identification information;
  • the identification information marks each group customer as a precision identification class or a fuzzy recognition class; according to the preset identification rule, the group customers in the precise recognition class and the fuzzy recognition class are respectively accurately identified and fuzzyly identified, and the groups that are the same customer are identified.
  • the customer obtains the recognition result of accurate identification and fuzzy recognition, and integrates the basic information of all group customers who are the same customer according to the recognition result, and obtains the uniquely identified group customer information.
  • the workflow of the information uniqueness identification system in this embodiment is the same as the steps in the foregoing embodiments of the information uniqueness identification method, and is not described here.
  • the information uniqueness identification method obtains basic information of a group client stored in each source database, and the basic information includes Customer name and identification information; then, according to the identification information, each group customer is marked as a precise identification class or a fuzzy recognition class; then, according to the preset identification rule, the group customers in the precise recognition class and the fuzzy recognition class are respectively accurately identified and blurred. Identify and identify group customers who are the same customer; then obtain the recognition results of accurate identification and fuzzy recognition, and integrate the basic information of all group customers who are the same customer according to the recognition result, and obtain the uniquely identified group customer information. . Accurate identification or fuzzy identification can be carried out according to the types of different groups of customers, which solves the problem that some group customers cannot perform unique identification and data integration due to incomplete customer information.
  • a computer program to instruct related hardware (such as a processor, a controller, etc.), and the program can be stored in one.
  • the program when executed, may include the processes of the various method embodiments as described above.
  • the storage medium described therein may be a memory, a magnetic disk, an optical disk, or the like.

Abstract

Disclosed are an information uniqueness identification method, an application server, a system, and a storage medium. The information uniqueness identification method comprises: acquiring basic information about a group client stored in each source database, the basic information comprising a client name and identification information; marking each group client as a precise identification category or a fuzzy identification category according to the identification information; according to a pre-set identification rule, performing accurate identification and fuzzy identification of group clients in the precise identification category and the fuzzy identification category respectively, and identifying the group clients who are the same client; and acquiring identification results of accurate identification and fuzzy identification, and integrating the basic information about all group clients who are the same client according to the identification results so as to obtain the uniquely identified group client information. Accurate identification or fuzzy identification can be performed according to the types of different group clients, the problem that unique identification and data integration cannot be performed due to incomplete client information is solved.

Description

信息唯一性识别方法、应用服务器、系统及存储介质  Information uniqueness identification method, application server, system and storage medium
本申请要求于2017年09月20日提交中国专利局、申请号为201710850369.5、发明名称为“信息唯一性识别方法、应用服务器、系统及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on September 20, 2017, the Chinese Patent Office, the application number is 201710850369.5, and the invention name is "information unique identification method, application server, system and storage medium". The citations are incorporated herein by reference.
技术领域Technical field
本申请涉及信息识别技术领域,具体涉及信息唯一性识别方法、应用服务器、系统及存储介质。The present application relates to the field of information recognition technologies, and in particular, to an information uniqueness identification method, an application server, a system, and a storage medium.
背景技术Background technique
目前,由于许多公司的团体客户数量庞大,需对团体客户进行识别以及数据整合,以便于团体客户管理,传统的信息唯一性识别主要是通过客户的组织机构编码进行识别的,这种识别方法虽然严谨、准确,但是由于客户信息很可能发生部分缺失等情况,组织机构编码信息饱和度不高,通过组织机构代码进行准确识别会导致部分信息不够完整的客户无法实现识别及整合,限制了唯一性识别团体客户的范围。At present, due to the large number of group customers in many companies, it is necessary to identify and integrate group customers for group customer management. The traditional information unique identification is mainly identified by the customer's organization code. Rigorous and accurate, but because the customer information is likely to be partially missing, the organization coding information is not highly saturated. Accurate identification by the organization code will result in incomplete identification and integration of some of the information, which limits the uniqueness. Identify the scope of group customers.
因此,现有技术还有待于改进和发展。Therefore, the prior art has yet to be improved and developed.
发明内容Summary of the invention
鉴于上述现有技术的不足之处,本申请的目的在于提供一种信息唯一性识别方法、应用服务器、系统及存储介质,能根据不同团体客户的类型分别进行精准识别或模糊识别,解决了由于客户信息不完整导致的部分团体客户无法进行唯一性识别及数据整合的问题。In view of the above-mentioned deficiencies of the prior art, the purpose of the present application is to provide an information unique identification method, an application server, a system, and a storage medium, which can perform accurate recognition or fuzzy recognition according to different types of group customers, and solve the problem. Some group customers who are incomplete customer information are unable to perform unique identification and data integration issues.
为了达到上述目的,本申请采取了以下技术方案:In order to achieve the above objectives, the present application adopts the following technical solutions:
一种信息唯一性识别方法,其包括如下步骤:An information uniqueness identification method includes the following steps:
获取各个源数据库中存储的团体客户的基本信息,其中,所述基本信息包括客户名称和识别信息;Obtaining basic information of a group client stored in each source database, where the basic information includes a customer name and identification information;
根据所述识别信息将各个团体客户标记为精准识别类或者模糊识别类;Marking each group customer as a precise recognition class or a fuzzy recognition class according to the identification information;
根据预设识别规则对精准识别类和模糊识别类中的团体客户分别进行精准识别和模糊识别,识别互为同一客户的团体客户;According to the preset identification rules, the group customers in the precise identification category and the fuzzy recognition category are respectively accurately identified and fuzzyly identified, and the group customers who are the same customer are identified;
获取精准识别和模糊识别的识别结果,并根据识别结果对团体客户的基本信息进行整合,得出唯一性识别的团体客户信息。Obtain the recognition results of accurate recognition and fuzzy recognition, and integrate the basic information of the group customers according to the recognition results to obtain the uniquely identified group customer information.
一种信息唯一性识别的应用服务器,其包括:处理器、存储器和通信总线;An application server for uniquely identifying information, comprising: a processor, a memory, and a communication bus;
所述存储器上存储有可被所述处理器执行的计算机可读程序;所述通信总线实现处理器和存储器之间的连接通信;所述处理器执行所述计算机可读程序时实现如上所述的信息唯一性识别方法中的步骤。Storing, on the memory, a computer readable program executable by the processor; the communication bus implementing connection communication between a processor and a memory; and the processor executing the computer readable program as described above The steps in the information uniqueness identification method.
一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现如上所述的信息唯一性识别方法中的步骤。A computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to achieve unique identification of information as described above The steps in the method.
一种信息唯一性识别系统,包括若干个源数据库,其还包括如上所述的信息唯一性识别的应用服务器;各源数据库,用于存储团体客户的基本信息;所述应用服务器,用于获取各个源数据库中存储的团体客户的基本信息,其中,所述基本信息包括客户名称和识别信息;以及根据所述识别信息将各个团体客户标记为精准识别类或者模糊识别类;根据预设识别规则对精准识别类和模糊识别类中的团体客户分别进行精准识别和模糊识别,识别互为同一客户的团体客户;并获取精准识别和模糊识别的识别结果,根据识别结果将所有互为同一客户的团体客户的基本信息进行整合,得出唯一性识别的团体客户信息。An information uniqueness identification system includes a plurality of source databases, which further includes an application server uniquely identified by the information as described above; each source database is configured to store basic information of a group client; and the application server is configured to acquire Basic information of a group customer stored in each source database, wherein the basic information includes a customer name and identification information; and marking each group customer as a precise identification class or a fuzzy recognition class according to the identification information; according to a preset identification rule Accurately identify and fuzzyly identify group customers in the precision recognition class and the fuzzy recognition class, identify group customers who are the same customer; and obtain the recognition results of accurate recognition and fuzzy recognition, and all the same customers according to the recognition result The basic information of the group customers is integrated to obtain uniquely identified group customer information.
相较于现有技术,本申请提供的信息唯一性识别方法、应用服务器、系统及存储介质中,所述信息唯一性识别方法通过获取各个源数据库中存储的团体客户的基本信息,其中,所述基本信息包括客户名称和识别信息;之后根据所述识别信息将各个团体客户标记为精准识别类或者模糊识别类;之后根据预设识别规则对精准识别类和模糊识别类中的团体客户分别进行精准识别和模糊识别,识别互为同一客户的团体客户;之后获取精准识别和模糊识别的识别结果,并根据识别结果将所有互为同一客户的团体客户的基本信息进行整合,得出唯一性识别的团体客户信息。能根据不同团体客户的类型分别进行精准识别或模糊识别,解决了由于客户信息不完整导致的部分团体客户无法进行唯一性识别及数据整合的问题。Compared with the prior art, in the information uniqueness identification method, the application server, the system, and the storage medium provided by the present application, the information uniqueness identification method obtains basic information of a group client stored in each source database, where The basic information includes the customer name and the identification information; then, each group customer is marked as a precise identification class or a fuzzy recognition class according to the identification information; and then the group customers in the precise recognition class and the fuzzy recognition class are separately performed according to the preset identification rule. Accurate identification and fuzzy recognition, identify group customers who are the same customer; then obtain the recognition results of precise identification and fuzzy recognition, and integrate the basic information of all group customers who are the same customer according to the recognition result to obtain unique identification Group customer information. Accurate identification or fuzzy identification can be carried out according to the types of different groups of customers, which solves the problem that some group customers cannot perform unique identification and data integration due to incomplete customer information.
附图说明DRAWINGS
图1为本申请提供的信息唯一性识别方法的应用环境示意图;1 is a schematic diagram of an application environment of an information uniqueness identification method provided by the present application;
图2为本申请提供的信息唯一性识别方法的流程图;2 is a flowchart of a method for uniquely identifying information provided by the present application;
图3为本申请提供的信息唯一性识别方法中步骤S20的流程图;FIG. 3 is a flowchart of step S20 in the method for uniquely identifying information provided by the present application;
图4为本申请提供的信息唯一性识别方法中步骤S30的流程图;4 is a flowchart of step S30 in the method for uniquely identifying information provided by the present application;
图5为本申请提供的信息唯一性识别方法中步骤S32的流程图;FIG. 5 is a flowchart of step S32 in the method for uniquely identifying information provided by the present application;
图6为本申请提供的信息唯一性识别方法中步骤S33的流程图;FIG. 6 is a flowchart of step S33 in the method for uniquely identifying information provided by the present application;
图7为本申请提供的信息唯一性识别方法中步骤S332的流程图;FIG. 7 is a flowchart of step S332 in the method for uniquely identifying information provided by the present application;
图8为本申请提供的信息唯一性识别方法中步骤S333的流程图;FIG. 8 is a flowchart of step S333 in the method for uniquely identifying information provided by the present application;
图9为本申请信息唯一性识别程序的较佳实施例的运行环境示意图;9 is a schematic diagram of an operating environment of a preferred embodiment of the information uniqueness identification program of the present application;
图10为本申请安装信息唯一性识别程序的应用服务器较佳实施例的功能模块图;10 is a functional block diagram of a preferred embodiment of an application server for installing an information uniqueness identification program of the present application;
图11为本申请提供的信息唯一性识别系统的结构框图。FIG. 11 is a structural block diagram of an information uniqueness identification system provided by the present application.
具体实施方式Detailed ways
鉴于现有技术中客户信息不完全时无法进行唯一性识别及整合等缺点,本申请的目的在于提供一种信息唯一性识别方法、应用服务器、系统及存储介质,能根据不同团体客户的类型分别进行精准识别或模糊识别,解决了由于客户信息不完整导致的部分团体客户无法进行唯一性识别及数据整合的问题。In view of the shortcomings such as unique identification and integration when the customer information in the prior art is incomplete, the purpose of the present application is to provide an information unique identification method, an application server, a system and a storage medium, which can be respectively according to different types of group customers. Accurate identification or fuzzy recognition solves the problem that some group customers cannot perform unique identification and data integration due to incomplete customer information.
为使本申请的目的、技术方案及效果更加清楚、明确,以下参照附图并举实施例对本申请进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the objects, technical solutions and effects of the present application more clear and clear, the present application will be further described in detail below with reference to the accompanying drawings. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.
请参阅图1,其为本申请提供的信息唯一性识别方法的应用环境示意图。图中,应用服务器中可安装一个或多个应用程序来处理相关数据。在本实施例中,可通过应用服务器接收来自各个源数据库中存储的团体客户的基本信息,并由应用服务器根据所述基本信息将各个团体客户标记为精准识别类或者模糊识别类,对精准识别类和模糊识别类中的团体客户分别进行精准识别和模糊识别,识别互为同一客户的团体客户,根据识别结果对团体客户的基本信息进行整合,从而实现了根据不同团体客户的类型分别进行精准识别或模糊识别,避免客户信息不完整导致无法识别的情况。Please refer to FIG. 1 , which is a schematic diagram of an application environment of the information uniqueness identification method provided by the present application. In the figure, one or more applications can be installed in the application server to process related data. In this embodiment, the basic information of the group clients stored in the respective source databases may be received by the application server, and the application server marks each group client as a precise identification class or a fuzzy recognition class according to the basic information, and accurately identifies the group. The group customers in the class and fuzzy recognition class respectively perform precise identification and fuzzy recognition, identify group customers who are the same customer, and integrate the basic information of the group customers according to the recognition result, thereby achieving accurate according to the types of different groups of customers. Identify or blur identification to avoid unrecognized customer information.
请参阅图2,本申请提供的信息唯一性识别方法包括以下步骤:Referring to FIG. 2, the method for uniquely identifying information provided by the present application includes the following steps:
S10、获取各个源数据库中存储的团体客户的基本信息,其中,所述基本信息包括客户名称和识别信息。S10. Acquire basic information of a group client stored in each source database, where the basic information includes a customer name and identification information.
本实施例中,可设置若干个源数据库对应存储不同产业公司的团体客户数据,由于一个团体客户可能与同一公司旗下的不同产业公司均有业务来往,从而在不同的源数据库中均存储有该团体客户的数据,因此需要将所有源数据库中的团体客户进行唯一性识别和整合,以便于团体客户的管理和数据分析,具体为从不同产业公司的各个源数据库中获取已存储的团体客户的基本信息,其中,所述基本信息包括客户名称和识别信息,具体所述识别信息为可识别团体客户身份的信息,例如组织机构代码、工商登记号、税务登记号、营业执照号等等。In this embodiment, a plurality of source databases may be set to store group customer data of different industrial companies. Since a group customer may have business contacts with different industrial companies under the same company, the same source database stores the same. Group customer data, so it is necessary to uniquely identify and integrate group customers in all source databases to facilitate group customer management and data analysis, specifically to obtain stored group customers from various source databases of different industry companies. Basic information, wherein the basic information includes a customer name and identification information, and the specific identification information is information that can identify the identity of the group customer, such as an organization code, a business registration number, a tax registration number, a business license number, and the like.
优选的,所述源数据库为oracle数据库(oracle数据库又名Oracle RDBMS,或简称Oracle,是甲骨文公司的一款关系数据库管理系统),MySQL(MySQL是一个开放源码的小型关联式数据库管理系统)数据库或PostgreSQL(PostgreSQL 是一个自由的对象-关系数据库服务器)数据库,所述目标数据库为hive(hive是基于Hadoop的一个数据仓库工具)数据库。这些数据库均为常见且易于操作的数据库管理系统及工具,便于在本实施例中对数据进行分析及处理。Preferably, the source database is an oracle database (the oracle database is also named Oracle) RDBMS, or Oracle for short, is a relational database management system from Oracle), MySQL (MySQL is an open source small relational database management system) database or PostgreSQL (PostgreSQL) Is a free object-relational database server database, the target database is hive (hive is a data warehouse tool based on Hadoop) database. These databases are common and easy to operate database management systems and tools to facilitate analysis and processing of data in this embodiment.
S20、根据所述识别信息将各个团体客户标记为精准识别类或者模糊识别类。S20. Mark each group customer as a precise recognition class or a fuzzy recognition class according to the identification information.
在获取了所有团体客户的基本信息后,由于不同团体客户的识别信息完整度不一样,因此根据不同团体客户的识别信息将其标记为精准识别类或者模糊识别类,对不同类别的客户进行相应的唯一性识别,无论客户识别信息完整或缺失均能实现团体客户的识别及整合,拓宽信息唯一性识别的应用范围,请参阅图3,其为本申请提供的信息唯一性识别方法中步骤S20的流程图。After obtaining the basic information of all group customers, because the identification information of different group customers is different, the identification information of different groups of customers is marked as accurate identification or fuzzy recognition, and corresponding to different categories of customers. The uniqueness identification, whether the customer identification information is complete or missing, can realize the identification and integration of the group customer, and broaden the application scope of the unique identification of the information. Please refer to FIG. 3 , which is the step S20 of the information uniqueness identification method provided by the present application. Flow chart.
如图3所示,所述步骤S20包括:As shown in FIG. 3, the step S20 includes:
S21、解析各个团体客户的识别信息;S21. Analyze identification information of each group customer;
S22、判断各个团体客户的识别信息中是否包含预设精准识别信息,若是,则标记为精准识别类;否则标记为模糊识别类。S22. Determine whether the identification information of each group customer includes preset accurate identification information, and if yes, mark as a precise recognition class; otherwise, mark as a fuzzy recognition class.
即在获取了所有团体客户的基本信息后,解析各个团体客户的识别信息,得出每个团体客户的识别信息所包含的内容,依次判断各个团体客户的识别信息中是否包含预设精准识别信息,若有则标记为精准识别类,否则标记为模糊识别类,从而实现团体客户的识别分类,为后续针对性的识别过程提供数据基。本实施例中,所述预设精准识别信息优选为组织机构代码这一唯一的、始终不变的代码标识,将识别信息中包含有组织机构代码的团体客户标记为精准识别类,从而可进行快速准确的精准识别,而将不包含组织机构代码的团体客户标记为模糊识别类,通过其他客户信息综合进行模糊识别,实现了分级识别,丰富了信息唯一性识别的应用场景。That is, after obtaining the basic information of all the group customers, the identification information of each group customer is analyzed, and the content included in the identification information of each group customer is obtained, and the identification information of each group customer is sequentially determined whether the preset accurate identification information is included. If there is, it is marked as a precise recognition class, otherwise it is marked as a fuzzy recognition class, so as to realize the identification classification of the group customers, and provide a data base for the subsequent targeted identification process. In this embodiment, the preset accurate identification information is preferably a unique and always-changing code identifier of the organization code, and the group customer including the organization code in the identification information is marked as a precise identification class, thereby enabling Fast and accurate accurate identification, and group customers who do not include the organization code are marked as fuzzy recognition, and fuzzy recognition is realized through other customer information comprehensively, which realizes hierarchical recognition and enriches the application scenario of information unique identification.
S30、根据预设识别规则对精准识别类和模糊识别类中的团体客户分别进行精准识别和模糊识别,识别互为同一客户的团体客户。S30: Perform accurate identification and fuzzy recognition on the group customers in the precise identification category and the fuzzy recognition category according to the preset identification rule, and identify group customers that are the same customer.
本实施例中,对所有团体客户进行了分类之后,针对不同类别的团体客户分别进行精准识别和模糊识别,有效覆盖了所有团体客户范围,能识别出不同源数据库中互为同一客户的团体客户,以便团体客户数据的管理和分析。请参阅图4,其为本申请提供的信息唯一性识别方法中步骤S30的流程图。In this embodiment, after classifying all group customers, accurate identification and fuzzy recognition are respectively performed for different groups of group customers, effectively covering all group customer ranges, and identifying group customers who are the same customer in different source databases. In order to manage and analyze group customer data. Please refer to FIG. 4 , which is a flowchart of step S30 in the method for uniquely identifying information provided by the present application.
如图4所示,所述步骤S30包括:As shown in FIG. 4, the step S30 includes:
S31、对所有团体客户的客户名称进行文字检测,获取客户名称的字数和文字内容;S31. Perform text detection on the customer name of all group customers, and obtain the word number and text content of the customer name;
S32、根据精准识别类中每个团体客户的预设精准识别信息和客户名称进行唯一性识别,识别所述精准识别类中互为同一客户的团体客户;S32: Perform unique identification according to the preset accurate identification information and the customer name of each group customer in the accurate identification class, and identify the group customers in the accurate identification class that are the same customer;
S33、根据模糊识别类中每个团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别,识别所述模糊识别类中互为同一客户的团体客户。S33. Perform unique identification according to the identification information of each group client in the fuzzy recognition class, the number of words of the customer name, and the text content, and identify the group customers in the fuzzy recognition class that are the same customer.
本实施例中,在进行团体客户分类后,先对所有团体客户的客户名称进行文字检测,得到客户名称的字数和文字内容并保存,其中文字检测识别可采用现有的OCR文字识别技术,之后根据客户类别不同,针对精准识别类中的客户,由于其识别信息中包含有预设精准识别信息(本实施例中为组织机构代码),因此根据精准识别类中每个团体客户的预设精准识别信息和客户名称进行唯一性识别,而针对模糊识别类中的客户,由于其识别信息中不包含预设精准识别信息,因此需要根据识别信息中的其他内容,并结合客户名称的字数以及文字内容中和进行唯一性识别,既保证了精准识别类中的客户能进行准确快速的识别,也保证了模糊识别类中的客户能综合其他各项基本信息进行识别,满足不同信息完整度的客户识别需求。In this embodiment, after performing group customer classification, text detection is performed on the customer names of all group customers, and the word number and text content of the customer name are obtained and saved, wherein the text detection and recognition can adopt the existing OCR character recognition technology, and then According to the customer category, for the customers in the precision identification class, because the identification information includes the preset accurate identification information (in this embodiment, the organization code), according to the preset accuracy of each group customer in the accurate identification class. The identification information and the customer name are uniquely identified, and for the customer in the fuzzy recognition class, since the identification information does not include the preset accurate identification information, it is necessary to combine the other contents in the identification information with the word number and the text of the customer name. The content and the unique identification ensure that the customers in the accurate identification class can accurately and quickly identify, and also ensure that the customers in the fuzzy recognition class can integrate other basic information to identify and satisfy the different information integrity. Identify needs.
其中精准识别的过程请参阅图5,其为本申请提供的信息唯一性识别方法中步骤S32的流程图,如图5所示,所述步骤S32包括:For the process of accurate identification, please refer to FIG. 5 , which is a flowchart of step S32 in the method for uniquely identifying information provided by the present application. As shown in FIG. 5 , the step S32 includes:
S321、任意选取精准识别类中的一个团体客户,将其预设精准识别信息和客户名称与精准识别类中其他团体客户的预设精准识别信息和客户名称进行对比;S321, arbitrarily selecting a group customer in the precision identification class, comparing the preset accurate identification information and the customer name with the preset accurate identification information and the customer name of other group customers in the accurate recognition category;
S322、根据对比结果判断是否存在预设精准识别信息相同、且客户名称相同的团体客户,若存在,则将预设精准识别信息相同、且客户名称相同的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与精准识别类中其他团体客户识别为不同客户;S322. Determine, according to the comparison result, whether there is a group customer whose preset precision identification information is the same and the customer name is the same, and if yes, the group customer with the same accurate identification information and the same customer name is identified as the group customer with the selected group customer as The same customer; if not, the selected group customer and other group customers in the precision identification category are identified as different customers;
S323、继续选取精准识别类中的另一个团体客户与其他团体客户进行唯一性识别,直到精准识别类中所有的团体客户均被识别。S323. Continue to select another group customer in the precision identification class to uniquely identify with other group customers until all group customers in the precision identification class are identified.
在进行精准识别时,任意选取精准识别类中的一个团体客户,将其组织机构代码和客户名称与精准识别类中的其他团体客户进行对比,根据对比结果进行唯一性识别,判断是否存在组织机构代码相同、且客户名称相同(字数和文字内容均相同)的团体客户,如果存在,则将所有组织机构代码相同、且客户名称相同的团体客户与被选取的团体客户识别为同一客户,即不管识别出多少个组织机构代码相同、且客户名称相同的团体客户,均将其与被选取的团体客户识别为同一客户,并且所有识别为同一客户的在后续进行精准识别时无需重复识别,节约识别耗时;如果不存在,则将被选取的团体客户与精准识别类中的其他团体客户识别为不同客户,之后继续选取精准识别类中的另一个团体客户进行上述识别过程,直到精准识别类中的所有团体客户均被识别。When performing accurate identification, arbitrarily select a group customer in the precision identification class, compare its organization code and customer name with other group customers in the precision identification category, and uniquely identify according to the comparison result to determine whether there is an organization. Group customers with the same code and the same customer name (the same number of words and text), if any, group customers with the same organization code and the same customer name are identified as the same customer, that is, regardless of the selected group customer Identifying how many group customers with the same organization code and the same customer name are identified as the same customer as the selected group customer, and all identified as the same customer do not need to be repeatedly identified in the subsequent accurate identification, saving recognition Time-consuming; if it does not exist, the selected group customer and other group customers in the precision identification class are identified as different customers, and then continue to select another group customer in the precision identification class to perform the above identification process until the accurate recognition class All group customers are identified
例如,从不同源数据库获取到的团体客户中,标记为精准识别类的团体客户有十个,记为客户1、客户2、…、客户10, 在进行精准识别时,任意选取其中一个团体客户开始进行唯一性识别,例如选取客户1,将其组织机构代码和客户名称与其他九个团体客户进行对比,得出客户3和客户4的组织机构代码和客户名称均与客户1完全相同,此时将客户1、客户3和客户4识别为同一客户,三者的客户数据可进行整合,并且后续进行精准识别时无需再对客户3和客户4进行识别;之后继续将客户2的组织机构代码和客户名称与其他六个团体客户进行对比,得出没有团体客户的组织机构代码和客户名称均与客户2完全相同,此时将客户2与其他九个团体客户识别为不同客户;按上述识别过程依次再对客户5、客户6、…、客户10进行唯一性识别,从而将精准识别类中的十个客户进行唯一性识别,以利于团体客户数据分析和管理。For example, among the group customers obtained from different source databases, there are ten group customers marked as accurate identification classes, which are recorded as customer 1, customer 2, ..., customer 10, In the accurate identification, arbitrarily select one of the group customers to start unique identification, for example, select customer 1, compare its organization code and customer name with other nine group customers, and get the organization of customer 3 and customer 4. The code and customer name are exactly the same as customer 1. At this time, customer 1, customer 3 and customer 4 are identified as the same customer, and the customer data of the three can be integrated, and there is no need for customer 3 and customer 4 for subsequent accurate identification. Identify; then continue to compare customer 2's organization code and customer name with the other six group customers, and conclude that the organization code and customer name without group customer are exactly the same as customer 2, then customer 2 and other Nine group customers are identified as different customers; according to the above identification process, customers 5, customers 6, ..., and customers 10 are uniquely identified, so that ten customers in the precision identification class are uniquely identified to facilitate group customers. Data analysis and management.
其中模糊识别的过程请参阅图6,其为本申请提供的信息唯一性识别方法中步骤S33的流程图,如图6所示,所述步骤S33包括:For the process of the fuzzy identification, please refer to FIG. 6 , which is a flowchart of step S33 in the method for uniquely identifying information provided by the present application. As shown in FIG. 6 , the step S33 includes:
S331、判断模糊识别类中各个团体客户的客户名称字数是否大于等于预设阈值;S331. Determine whether the number of customer name words of each group client in the fuzzy recognition class is greater than or equal to a preset threshold;
S332、若大于等于预设阈值,则根据团体客户的客户名称的字数和文字内容进行唯一性识别;S332. If the threshold is greater than or equal to the preset threshold, the uniqueness is determined according to the number of words and the text content of the customer name of the group client;
S333、若小于预设阈值,则根据团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别。S333. If the threshold is less than the preset threshold, the unique identification is performed according to the identification information of the group customer, the number of words of the customer name, and the text content.
即在进行模糊识别时,根据团体客户的客户名称字数进行不同的识别过程,先判断模糊识别类中各个团体客户的客户名称字数是否大于预设阈值,本实施例中,所述预设阈值优选为8,即判断模糊识别类中各个团体客户的客户名称是否大于等于8个字,如果大于等于8个字,此时由于客户名称较长,出现不同客户重名的几率非常小,因此根据客户名称字数和文字内容进行唯一性识别,如果小于8个字,则为了进一步确认团体客户的信息,不仅需要客户名称的字数和文字内容,还需结合其识别信息一并进行唯一性识别,根据不同客户名称字数进行进一步分类识别,同时保证了识别准确度和识别效率。That is, when the fuzzy recognition is performed, the different identification process is performed according to the number of the customer name of the group client, and the number of the customer name of each group client in the fuzzy identification class is determined to be greater than a preset threshold. In this embodiment, the preset threshold is preferred. 8 is to judge whether the customer name of each group customer in the fuzzy recognition class is greater than or equal to 8 words. If it is greater than or equal to 8 words, at this time, because the customer name is long, the probability of occurrence of different customer duplicate names is very small, so according to the customer The name word number and the text content are uniquely identified. If it is less than 8 words, in order to further confirm the information of the group customer, not only the word number and the text content of the customer name but also the identification information must be combined and uniquely identified, according to different The number of customer name words is further classified and identified, while ensuring recognition accuracy and recognition efficiency.
具体来说,请一并参阅图7,其为本申请提供的信息唯一性识别方法中步骤S332的流程图,如图7所示,所述步骤S332包括:Specifically, please refer to FIG. 7 , which is a flowchart of step S332 in the method for uniquely identifying information provided by the present application. As shown in FIG. 7 , the step S332 includes:
S3321、任意选取模糊识别类中客户名称字数大于等于预设阈值的一个团体客户,将其客户名称与所有团体客户的客户名称进行文字内容对比;S3321, arbitrarily selecting a group customer whose number of words in the fuzzy recognition class is greater than or equal to a preset threshold, and comparing the customer name with the text content of the customer name of all group customers;
S3322、根据对比结果判断是否存在客户名称的文字内容完全相同的团体客户,若存在,则将文字内容完全相同的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与其他所有团体客户识别为不同客户;S3322: Determine, according to the comparison result, whether there is a group client whose text content of the customer name is completely the same, if yes, identify the group client with the same text content as the same client as the selected group client; if not, the group client will be selected Group customers and all other group customers are identified as different customers;
S3323、继续选取模糊识别类中客户名称字数大于等于预设阈值的另一个团体客户进行唯一性识别,直到模糊识别类中所有客户名称字数大于等于预设阈值的团体客户均被识别。S3323. Continue to select another group client whose number of customers in the fuzzy recognition class is greater than or equal to a preset threshold, and perform unique identification until all group customers whose number of customers in the fuzzy recognition class is greater than or equal to a preset threshold are identified.
在进行模糊识别时,针对客户名称字数大于等于预设阈值(本实施例中为8个字)的情形,首先任意选取模糊识别类中名称字数大于等于8个字的一个团体客户,将其客户名称与所有团体客户(包括精准识别类和模糊识别类中的团体客户)的客户名称进行文字内容对比,根据对比结果进行唯一性识别,判断是否存在客户名称的文字内容完全相同的团体客户,若存在则将文字内容完全相同的团体客户与被选取的团体客户识别为同一客户,同样不管识别出多少个客户名称的文字内容相同的团体客户,均将其与被选取的团体客户识别为同一客户,并且所有识别为同一客户的在后续进行模糊识别时无需重复识别,节约识别耗时;若不存在则将被选取的团体客户与其他所有团体客户均识别为不同客户,之后继续选取模糊识别类中的另一个客户名称大于等于8个字的团体客户进行上述识别过程,直到模糊识别类中所有客户名称大于等于8个字的团体客户均被识别。In the case of performing fuzzy recognition, for a case where the number of words of the customer name is greater than or equal to a preset threshold (8 words in this embodiment), first, a group client whose number of words in the fuzzy recognition class is greater than or equal to 8 words is arbitrarily selected, and the client is selected. The name is compared with the customer name of all group customers (including the group customers in the precision identification category and the fuzzy recognition category), and the uniqueness is determined based on the comparison result to determine whether there is a group customer whose textual content of the customer name is identical. If there is a group customer who has the same text content and the selected group customer as the same customer, and no matter how many customer accounts with the same text content of the customer name are identified, they are identified as the same customer as the selected group customer. And all the identifications of the same customer do not need to be repeatedly identified in the subsequent fuzzy recognition, saving the recognition time; if not, the selected group customer and all other group customers are identified as different customers, and then continue to select the fuzzy recognition class Another customer name in the middle is greater than or equal to 8 words Groups customer the identification process until all the fuzzy recognition class Customer Name Customer greater than or equal groups of eight words are recognized.
例如,从不同源数据库获取到的团体客户中,标记为模糊识别类、且客户名称字数大于等于8个字的团体客户有十个,记为客户11、客户12、…、客户20,在进行模糊识别时,任意选取其中一个团体客户开始进行唯一性识别,例如选取客户11,将其客户名称与其他所有团体客户进行对比,由于此时可能发生客户11与精准识别类中的团体客户为同一客户,但是由于其识别信息中缺失了组织机构代码导致标记为模糊识别类,因此,在进行模糊识别时,需要将被选取的团体客户与其他所有团体客户的名称进行对比,确保识别的全面性和准确性,之后假设对比得出精准识别类中的客户2、模糊识别类中的客户13和客户14的客户名称文字内容均与客户11完全相同,则将客户2、客户11、客户13和客户14识别为同一客户,四者的客户数据可进行整合;之后继续将客户12的客户名称与其他六个团体客户进行对比,得出没有团体客户的客户名称与客户12完全相同,此时将客户12与其他所有团体客户识别为不同客户;按上述识别过程依次再对客户15、客户16、…、客户20进行唯一性识别,完成名称字数大于8个字的团体客户的唯一性识别过程。For example, among the group customers obtained from different source databases, there are ten group customers marked as fuzzy recognition classes and the number of customer name words is greater than or equal to 8 words, which are recorded as customer 11, customer 12, ..., customer 20, in progress In the case of fuzzy recognition, arbitrarily select one of the group customers to start unique identification. For example, select customer 11 and compare its customer name with all other group customers. At this time, it may happen that the customer 11 is the same as the group customer in the precision identification class. The customer, but because the organization code is missing from the identification information, it is marked as fuzzy identification. Therefore, when performing fuzzy recognition, it is necessary to compare the selected group customers with the names of all other group customers to ensure the comprehensiveness of the identification. And accuracy, then assume that the customer in the precision identification class 2, the customer 13 in the fuzzy recognition class, and the client 14 have the same customer name text content as the customer 11, then the customer 2, the customer 11, the customer 13 and The customer 14 is identified as the same customer, and the customer data of the four can be integrated; after that, the customer of the customer 12 continues to be The name is compared with the other six group customers, and the customer name without the group customer is exactly the same as the customer 12. At this time, the customer 12 and all other group customers are identified as different customers; according to the above identification process, the customer 15 and the customer are sequentially 16. The customer 20 performs unique identification and completes the uniqueness identification process of the group customer whose name word number is greater than 8 words.
进一步地,请参阅图8,其为本申请提供的信息唯一性识别方法中步骤S333的流程图,如图8所示,所述步骤S333包括:Further, please refer to FIG. 8 , which is a flowchart of step S333 in the method for uniquely identifying information provided by the present application. As shown in FIG. 8 , the step S333 includes:
S3331、任意选取模糊识别类中客户名称字数小于预设阈值的一个团体客户,将其客户名称与所有团体客户的客户名称进行文字内容对比;S3331, arbitrarily selecting a group customer whose number of customers in the fuzzy recognition class is less than a preset threshold, and comparing the customer name with the customer name of all group customers;
S3331、根据对比结果判断是否存在客户名称的文字内容完全相同的团体客户,若存在,则继续判断文字内容完全相同的团体客户与被选取的团体客户是否具有任意相同的识别信息,将文字内容完全相同、且具有任意相同的识别信息的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与其他所有团体客户识别为不同客户;S3331, judging whether there is a group client whose text content of the customer name is completely the same according to the comparison result, and if yes, continuing to determine whether the group client with the identical text content has the same identification information as the selected group client, and completes the text content completely. Group customers that are identical and have any identical identification information are identified as the same customer as the selected group customer; if not, the selected group customer and all other group customers are identified as different customers;
S3331、继续选取模糊识别类中客户名称字数小于预设阈值的另一个团体客户进行唯一性识别,直到模糊识别类中所有客户名称字数小于预设阈值的团体客户均被识别。S3331. Continue to select another group client in the fuzzy recognition class whose client name word number is less than a preset threshold, and perform unique identification until all group clients whose word name in the fuzzy recognition class is less than a preset threshold are recognized.
在进行模糊识别时,针对客户名称字数小于预设阈值(本实施例中为8个字)的情形,先任意选取模糊识别类中名称字数小于8个字的一个团体客户,将其客户名称与所有团体客户(包括精准识别类和模糊识别类中的团体客户)的客户名称进行文字内容对比,根据对比结果进行唯一性识别,判断是否存在客户名称的文字内容完全相同的团体客户,若存在,由于被选取的团体客户名称较短,容易发生重名的情况,因此继续判断文字内容完全相同的团体客户与被选取的团体客户是否具有任意相同的识别信息,例如工商登记号、税务登记号、营业执照号中至少一项相同,将客户名称文字内容完全相同,且具有任意相同识别信息的团体客户与被选取的团体客户识别为同一客户;若不存在则将被选取的团体客户与其他所有团体客户识别为不同客户,之后继续选取模糊识别类中的另一个客户名称小于8个字的团体客户进行上述识别过程,直到模糊识别类中所有客户名称小于8个字的团体客户均被识别,结合识别信息以及客户名称完成模糊识别类中客户名称小于8个字的信息唯一性识别。 When performing fuzzy recognition, for a case where the number of words of the customer name is less than a preset threshold (8 words in this embodiment), a group customer whose name is less than 8 words in the fuzzy recognition class is arbitrarily selected, and the customer name is The customer names of all group customers (including group customers in the precision identification category and fuzzy recognition category) are compared with the text content, and uniquely identified based on the comparison results to determine whether there is a group customer whose textual content of the customer name is identical, if any, Since the selected group customer name is short and the name of the group is likely to occur, it is determined whether the group client with the identical text content has the same identification information as the selected group client, such as the business registration number, the tax registration number, At least one of the business license numbers is the same, the customer name text content is identical, and the group customer with any identical identification information is identified as the same customer as the selected group customer; if not, the selected group customer and all others are selected Group customers are identified as different customers and continue to be selected Another group customer with less than 8 words in the fuzzy recognition class performs the above identification process until all group customers whose customer names are less than 8 words in the fuzzy recognition class are identified, and the fuzzy identification class is completed in combination with the identification information and the customer name. Information uniqueness identification in which the customer name is less than 8 words.
S40、获取精准识别和模糊识别的识别结果,并根据识别结果将所有互为同一客户的团体客户的基本信息进行整合,得出唯一性识别的团体客户信息。S40: Obtain the recognition result of the accurate identification and the fuzzy recognition, and integrate the basic information of all the group customers who are the same customer according to the recognition result, and obtain the uniquely identified group customer information.
本实施例中,在针对不同类别的团体客户分别进行了精准识别和模糊识别后,识别出所有互为同一客户的团体客户,此时获取精准识别和模糊识别的识别结果,根据识别结果将互为同一客户的团体客户的基本信息进行整合,具体为将所有互为同一客户的客户名称进行统一,并将各自的识别信息进行互补整合,得出唯一性识别的团体客户信息,根据唯一性识别的团体客户信息可一次性获取某一团体客户存储在不同源数据库中的客户数据,实现了将不同源数据库中的同一客户数据进行统一整合分析,可根据识别整合后的客户数据进行数据分析,结合其在不同产业公司存储的用户数据能综合分析该团体客户的数据生命周期、倾向性消费意愿以及风险控制信息等等,有利于团体客户的跟进和管理。In this embodiment, after accurate identification and fuzzy recognition are performed for different groups of group customers, all group customers who are the same customer are identified, and the recognition results of accurate recognition and fuzzy recognition are obtained at this time, and the recognition results are mutually Integrate the basic information of the group customers of the same customer, specifically to unify the names of all customers who are the same customer, and to complement and integrate the respective identification information to obtain uniquely identified group customer information, according to unique identification. The group customer information can obtain the customer data stored in different source databases by a group of customers at one time, realizes the unified integration analysis of the same customer data in different source databases, and can perform data analysis according to the identification and integration of customer data. Combined with the user data stored in different industrial companies, it can comprehensively analyze the data life cycle, propensity to consume and risk control information of the group customers, which is beneficial to the follow-up and management of group customers.
如图9所示,基于上述信息唯一性识别方法,本申请还相应提供了一种信息唯一性识别的应用服务器,其包括处理器10、存储器20及显示器30。图9仅示出了信息唯一性识别的应用服务器的部分组件,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。As shown in FIG. 9 , based on the above information uniqueness identification method, the present application further provides an application server for information unique identification, which includes a processor 10, a memory 20, and a display 30. Figure 9 shows only some of the components of the application server for information uniqueness identification, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
所述存储器20在一些实施例中可以是所述信息唯一性识别的应用服务器的内部存储单元,例如应用服务器的硬盘或内存。所述存储器20在另一些实施例中也可以是所述信息唯一性识别的应用服务器的外部存储设备,例如所述信息唯一性识别的应用服务器上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器20还可以既包括所信息唯一性识别的应用服务器的内部存储单元也包括外部存储设备。所述存储器20用于存储安装于所述信息唯一性识别的应用服务器的应用软件及各类数据,例如所述安装信息唯一性识别的应用服务器的程序代码等。所述存储器20还可以用于暂时地存储已经输出或者将要输出的数据。在一些实施例中,存储器20上存储有信息唯一性识别程序40,该信息唯一性识别程序40可被处理器10所执行,从而实现本申请各实施例的信息唯一性识别方法。The memory 20 may, in some embodiments, be an internal storage unit of an application server uniquely identified by the information, such as a hard disk or memory of an application server. In other embodiments, the memory 20 may also be an external storage device of the application server uniquely identified by the information, for example, a plug-in hard disk equipped on the application server uniquely identified by the information, and a smart memory card (Smart Media Card, SMC), Secure Digital (SD) card, flash card (Flash) Card) and so on. Further, the memory 20 may also include an internal storage unit of the application server uniquely identified by the information and an external storage device. The memory 20 is configured to store application software and various types of data installed on an application server uniquely identified by the information, such as a program code of an application server uniquely identified by the installation information. The memory 20 can also be used to temporarily store data that has been output or is about to be output. In some embodiments, an information uniqueness identification program 40 is stored on the memory 20, and the information uniqueness identification program 40 can be executed by the processor 10 to implement the information uniqueness identification method of various embodiments of the present application.
所述处理器10在一些实施例中可以是一中央处理器(Central Processing Unit, CPU),微处理器或其他数据处理芯片,用于运行所述存储器20中存储的程序代码或处理数据,例如执行所述权限认证方法等。The processor 10 may be a central processing unit (Central Processing Unit) in some embodiments. A CPU, microprocessor or other data processing chip for running program code or processing data stored in the memory 20, such as executing the rights authentication method or the like.
所述显示器30在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。所述显示器30用于显示在所述信息唯一性识别的应用服务器的信息以及用于显示可视化的用户界面,例如识别结果界面等。所述信息唯一性识别的应用服务器的部件10-30通过系统总线相互通信。The display 30 may be an LED display, a liquid crystal display, a touch liquid crystal display, and an OLED (Organic) in some embodiments. Light-Emitting Diode, organic light emitting diodes), etc. The display 30 is for displaying information of an application server uniquely identified by the information and a user interface for displaying visualization, such as a recognition result interface or the like. The components 10-30 of the application server uniquely identified by the information communicate with one another via a system bus.
在一些实施例中,当处理器10执行所述存储器20中信息唯一性识别程序40时实现以下如上述信息唯一性识别方法中各个实施例相同的步骤,此处不再赘述。In some embodiments, when the processor 10 executes the information uniqueness identification program 40 in the memory 20, the following steps are implemented in the following embodiments of the information uniqueness identification method, and details are not described herein again.
请参阅图10,其为本申请安装信息唯一性识别程序的应用服务器较佳实施例的功能模块图。在本实施例中,安装信息唯一性识别程序的应用服务器可以被分割成一个或多个模块,所述一个或者多个模块被存储于所述存储器20中,并由一个或多个处理器(本实施例为所述处理器10)所执行,以完成本申请。例如,在图10中,安装信息唯一性识别程序的应用服务器可以被分割成获取模块21、分类模块22、识别模块23和整合模块24。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段,比程序更适合于描述所述团体客户唯一性识别程序在所述团体客户唯一性识别的应用服务器中的执行过程。以下描述将具体介绍所述模块21-24的功能。Please refer to FIG. 10 , which is a functional block diagram of a preferred embodiment of an application server for installing an information uniqueness identification program of the present application. In this embodiment, the application server that installs the information uniqueness identification program may be divided into one or more modules, the one or more modules being stored in the memory 20 and being processed by one or more processors ( This embodiment is performed by the processor 10) to complete the application. For example, in FIG. 10, an application server that installs the information uniqueness identification program may be divided into an acquisition module 21, a classification module 22, an identification module 23, and an integration module 24. A module referred to in this application refers to a series of computer program instruction segments capable of performing a specific function, and is more suitable than the program to describe the execution process of the group customer uniqueness identification program in the application server uniquely identified by the group client. The following description will specifically describe the functions of the modules 21-24.
获取模块21,用于获取各个源数据库中存储的团体客户的基本信息,其中,所述基本信息包括客户名称和识别信息;The obtaining module 21 is configured to obtain basic information of a group client stored in each source database, where the basic information includes a customer name and identification information;
分类模块22,用于根据所述识别信息将各个团体客户标记为精准识别类或者模糊识别类;The classification module 22 is configured to mark each group customer as a precise recognition class or a fuzzy recognition class according to the identification information;
识别模块23,用于根据预设识别规则对精准识别类和模糊识别类中的团体客户分别进行精准识别和模糊识别,识别互为同一客户的团体客户;The identification module 23 is configured to accurately identify and fuzzyly identify the group customers in the precise identification class and the fuzzy recognition class according to the preset identification rule, and identify group customers that are mutually the same customer;
整合模块24,用于获取精准识别和模糊识别的识别结果,并根据识别结果将所有互为同一客户的团体客户的基本信息进行整合,得出唯一性识别的团体客户信息。The integration module 24 is configured to obtain the recognition result of the accurate identification and the fuzzy recognition, and integrate the basic information of all the group customers who are the same customer according to the recognition result to obtain the uniquely identified group customer information.
所述分类模块22包括:The classification module 22 includes:
解析单元,用于解析各个团体客户的识别信息;a parsing unit for parsing identification information of each group client;
分类单元,用于判断各个团体客户的识别信息中是否包含预设精准识别信息,若是,则标记为精准识别类;否则标记为模糊识别类。The classification unit is configured to determine whether the identification information of each group customer includes preset accurate identification information, and if so, the label is a precise recognition class; otherwise, the label is a fuzzy recognition class.
所述识别模块23包括:The identification module 23 includes:
检测单元,用于对所有团体客户的客户名称进行文字检测,获取客户名称的字数和文字内容;The detecting unit is configured to perform text detection on the customer names of all group customers, and obtain the word number and text content of the customer name;
精准识别单元,用于根据精准识别类中每个团体客户的预设精准识别信息和客户名称进行唯一性识别,识别所述精准识别类中互为同一客户的团体客户;The accurate identification unit is configured to uniquely identify according to the preset accurate identification information and the customer name of each group customer in the accurate identification class, and identify the group customers in the precision identification class that are the same customer;
模糊识别单元,用于根据模糊识别类中每个团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别,识别所述模糊识别类中互为同一客户的团体客户。The fuzzy identification unit is configured to perform unique identification according to the identification information of each group client in the fuzzy recognition class, the number of words of the customer name, and the text content, and identify the group customers in the fuzzy recognition class that are mutually the same customer.
所述精准识别单元包括:The precision identification unit includes:
第一比较子单元,用于任意选取精准识别类中的一个团体客户,将其预设精准识别信息和客户名称与精准识别类中其他团体客户的预设精准识别信息和客户名称进行对比;The first comparison subunit is configured to arbitrarily select a group customer in the precision identification class, and compare the preset accurate identification information and the customer name with the preset accurate identification information and the customer name of other group customers in the precision identification category;
第一识别子单元,用于根据对比结果判断是否存在预设精准识别信息相同、且客户名称相同的团体客户,若存在,则将预设精准识别信息相同、且客户名称相同的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与精准识别类中其他团体客户识别为不同客户。The first identification subunit is configured to determine, according to the comparison result, whether there is a group customer whose preset precision identification information is the same and the customer name is the same, and if yes, the group customer with the same accurate identification information and the same customer name is preset The selected group customers are identified as the same customer; if not, the selected group customers and other group customers in the precision identification category are identified as different customers.
所述模糊识别单元包括:The fuzzy recognition unit includes:
第一判断子单元,用于判断模糊识别类中各个团体客户的客户名称字数是否大于等于预设阈值;a first determining sub-unit, configured to determine whether a number of customer name words of each group customer in the fuzzy recognition class is greater than or equal to a preset threshold;
第二识别子单元,用于若大于等于预设阈值,则根据团体客户的客户名称的字数和文字内容进行唯一性识别;a second identification subunit, configured to perform unique identification according to the number of words and text content of the customer name of the group client if the preset threshold is greater than or equal to;
第三识别子单元,用于若小于预设阈值,则根据团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别。The third identification subunit is configured to perform unique identification according to the identification information of the group customer, the number of words of the customer name, and the text content if the threshold is less than a preset threshold.
所述第二识别子单元包括;The second identification subunit includes:
第二比较子单元,用于任意选取模糊识别类中客户名称字数大于等于预设阈值的一个团体客户,将其客户名称与所有团体客户的客户名称进行文字内容对比;a second comparison sub-unit, configured to arbitrarily select a group customer whose number of customers in the fuzzy recognition class is greater than or equal to a preset threshold, and compare the customer name with the customer name of all group customers;
第二判断子单元,用于根据对比结果判断是否存在客户名称的文字内容完全相同的团体客户,若存在,则将文字内容完全相同的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与其他所有团体客户识别为不同客户。a second determining sub-unit, configured to determine, according to the comparison result, whether there is a group client whose text content of the customer name is identical, and if present, identify the group client with the identical text content as the same client as the selected group client; If it exists, the selected group customer and all other group customers are identified as different customers.
所述第三识别子单元包括:The third identification subunit includes:
第三比较子单元,用于任意选取模糊识别类中客户名称字数小于预设阈值的一个团体客户,将其客户名称与所有团体客户的客户名称进行文字内容对比;The third comparison sub-unit is configured to arbitrarily select a group customer whose number of customers in the fuzzy recognition class is less than a preset threshold, and compare the customer name with the customer name of all group customers;
第三判断子单元,用于根据对比结果判断是否存在客户名称的文字内容完全相同的团体客户,若存在,则继续判断文字内容完全相同的团体客户与被选取的团体客户是否具有任意相同的识别信息,将文字内容完全相同、且具有任意相同的识别信息的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与其他所有团体客户识别为不同客户。The third determining subunit is configured to determine, according to the comparison result, whether there is a group client whose text content of the customer name is identical, and if yes, continue to determine whether the group client with the identical text content has the same identification as the selected group client. The information is that the group customer with the same text content and having the same identification information is identified as the same customer as the selected group customer; if not, the selected group customer and all other group customers are identified as different customers.
基于上述信息唯一性识别方法和应用服务器本申请还相应提供一种信息唯一性识别系统,请参阅图11,其包括若干个源数据库110和如上所述的信息唯一性识别的应用服务器120。Based on the above information uniqueness identification method and application server, the present application further provides an information uniqueness identification system. Referring to FIG. 11, it includes a plurality of source databases 110 and an application server 120 for uniquely identifying information as described above.
其中,各个源数据库110用于存储团体客户的基本信息,所述应用服务器120用于获取各个源数据库中存储的团体客户的基本信息,其中,所述基本信息包括客户名称和识别信息;以及根据所述识别信息将各个团体客户标记为精准识别类或者模糊识别类;根据预设识别规则对精准识别类和模糊识别类中的团体客户分别进行精准识别和模糊识别,识别互为同一客户的团体客户;并获取精准识别和模糊识别的识别结果,根据识别结果将所有互为同一客户的团体客户的基本信息进行整合,得出唯一性识别的团体客户信息。Each source database 110 is configured to store basic information of a group client, and the application server 120 is configured to acquire basic information of a group client stored in each source database, where the basic information includes a customer name and identification information; The identification information marks each group customer as a precision identification class or a fuzzy recognition class; according to the preset identification rule, the group customers in the precise recognition class and the fuzzy recognition class are respectively accurately identified and fuzzyly identified, and the groups that are the same customer are identified. The customer obtains the recognition result of accurate identification and fuzzy recognition, and integrates the basic information of all group customers who are the same customer according to the recognition result, and obtains the uniquely identified group customer information.
本实施例中信息唯一性识别系统的工作流程如上述信息唯一性识别方法中各实施例相同的步骤,此处不在赘述。The workflow of the information uniqueness identification system in this embodiment is the same as the steps in the foregoing embodiments of the information uniqueness identification method, and is not described here.
综上所述,本申请提供的信息唯一性识别方法、应用服务器、系统及存储介质中,所述信息唯一性识别方法通过获取各个源数据库中存储的团体客户的基本信息,所述基本信息包括客户名称和识别信息;之后根据所述识别信息将各个团体客户标记为精准识别类或者模糊识别类;之后根据预设识别规则对精准识别类和模糊识别类中的团体客户分别进行精准识别和模糊识别,识别互为同一客户的团体客户;之后获取精准识别和模糊识别的识别结果,并根据识别结果将所有互为同一客户的团体客户的基本信息进行整合,得出唯一性识别的团体客户信息。能根据不同团体客户的类型分别进行精准识别或模糊识别,解决了由于客户信息不完整导致的部分团体客户无法进行唯一性识别及数据整合的问题。In summary, in the information uniqueness identification method, the application server, the system, and the storage medium provided by the present application, the information uniqueness identification method obtains basic information of a group client stored in each source database, and the basic information includes Customer name and identification information; then, according to the identification information, each group customer is marked as a precise identification class or a fuzzy recognition class; then, according to the preset identification rule, the group customers in the precise recognition class and the fuzzy recognition class are respectively accurately identified and blurred. Identify and identify group customers who are the same customer; then obtain the recognition results of accurate identification and fuzzy recognition, and integrate the basic information of all group customers who are the same customer according to the recognition result, and obtain the uniquely identified group customer information. . Accurate identification or fuzzy identification can be carried out according to the types of different groups of customers, which solves the problem that some group customers cannot perform unique identification and data integration due to incomplete customer information.
当然,本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关硬件(如处理器,控制器等)来完成,所述的程序可存储于一计算机可读取的存储介质中,该程序在执行时可包括如上述各方法实施例的流程。其中所述的存储介质可为存储器、磁碟、光盘等。Certainly, those skilled in the art can understand that all or part of the processes in the foregoing embodiments can be implemented by a computer program to instruct related hardware (such as a processor, a controller, etc.), and the program can be stored in one. In a computer readable storage medium, the program, when executed, may include the processes of the various method embodiments as described above. The storage medium described therein may be a memory, a magnetic disk, an optical disk, or the like.
应当理解的是,本申请的应用不限于上述的举例,对本领域普通技术人员来说,可以根据上述说明加以改进或变换,所有这些改进和变换都应属于本申请所附权利要求的保护范围。It should be understood that the application of the present application is not limited to the above-described examples, and those skilled in the art can make modifications and changes in accordance with the above description, all of which are within the scope of the appended claims.

Claims (28)

  1. 一种信息唯一性识别方法,其特征在于,包括如下步骤: An information uniqueness identification method, comprising the steps of:
    获取各个源数据库中存储的团体客户的基本信息,其中,所述基本信息包括客户名称和识别信息;Obtaining basic information of a group client stored in each source database, where the basic information includes a customer name and identification information;
    根据所述识别信息将各个团体客户标记为精准识别类或者模糊识别类;Marking each group customer as a precise recognition class or a fuzzy recognition class according to the identification information;
    根据预设识别规则对精准识别类和模糊识别类中的团体客户分别进行精准识别和模糊识别,识别互为同一客户的团体客户;According to the preset identification rules, the group customers in the precise identification category and the fuzzy recognition category are respectively accurately identified and fuzzyly identified, and the group customers who are the same customer are identified;
    获取精准识别和模糊识别的识别结果,并根据识别结果将所有互为同一客户的团体客户的基本信息进行整合,得出唯一性识别的团体客户信息。Obtain the recognition result of accurate identification and fuzzy recognition, and integrate the basic information of all group customers who are the same customer according to the recognition result, and obtain the uniquely identified group customer information.
  2. 根据权利要求1所述的信息唯一性识别方法,其特征在于,所述根据所述识别信息将各个团体客户标记为精准识别类或者模糊识别类的步骤包括:The information uniqueness identification method according to claim 1, wherein the step of marking each group customer as a precise recognition class or a fuzzy recognition class according to the identification information comprises:
    解析各个团体客户的识别信息;Analyze the identification information of each group of customers;
    判断各个团体客户的识别信息中是否包含预设精准识别信息,若是,则标记为精准识别类;否则标记为模糊识别类。It is judged whether the identification information of each group customer includes preset accurate identification information, and if so, it is marked as a precise recognition class; otherwise, it is marked as a fuzzy recognition class.
  3. 根据权利要求2所述的信息唯一性识别方法,其特征在于,所述根据预设识别规则对精准识别类和模糊识别类中的团体客户分别进行精准识别和模糊识别,识别互为同一客户的团体客户的步骤包括:The information uniqueness identification method according to claim 2, wherein the accurate identification and fuzzy recognition are performed on the group customers in the precise recognition class and the fuzzy recognition class according to the preset identification rule, and the identification is mutually the same customer. The steps for group customers include:
    对所有团体客户的客户名称进行文字检测,获取客户名称的字数和文字内容;Perform text detection on the customer names of all group customers to obtain the word count and text content of the customer name;
    根据精准识别类中每个团体客户的预设精准识别信息和客户名称进行唯一性识别,识别所述精准识别类中互为同一客户的团体客户;Identifying, according to the preset accurate identification information and the customer name of each group customer in the accurate identification class, the group identification of the same customer in the accurate identification category;
    根据模糊识别类中每个团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别,识别所述模糊识别类中互为同一客户的团体客户。According to the identification information of each group client in the fuzzy recognition class, the number of words of the customer name, and the text content, unique identification is performed, and the group customers in the fuzzy recognition class that are the same customer are identified.
  4. 根据权利要求3所述的信息唯一性识别方法,其特征在于,所述根据精准识别类中每个团体客户的预设精准识别信息和客户名称进行唯一性识别,识别所述精准识别类中互为同一客户的团体客户的步骤包括:The information uniqueness identification method according to claim 3, wherein the unique identification is performed according to the preset accurate identification information and the customer name of each group customer in the accurate identification class, and the mutual identification class is identified The steps for a group customer for the same customer include:
    任意选取精准识别类中的一个团体客户,将其预设精准识别信息和客户名称与精准识别类中其他团体客户的预设精准识别信息和客户名称进行对比;Arbitrarily select a group customer in the precision identification class, compare its preset accurate identification information and customer name with the preset accurate identification information and customer name of other group customers in the precision identification category;
    根据对比结果判断是否存在预设精准识别信息相同、且客户名称相同的团体客户,若存在,则将预设精准识别信息相同、且客户名称相同的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与精准识别类中其他团体客户识别为不同客户;According to the comparison result, it is judged whether there is a group customer whose preset accurate identification information is the same and the customer name is the same, and if yes, the group customer whose preset accurate identification information is the same and whose customer name is the same is recognized as the same customer by the selected group customer. If it does not exist, the selected group customers and other group customers in the precision identification category are identified as different customers;
    继续选取精准识别类中的另一个团体客户与其他团体客户进行唯一性识别,直到精准识别类中所有的团体客户均被识别。Continue to select another group customer in the Accurate Recognition category for unique identification with other group customers until all group customers in the Accurate Recognition category are identified.
  5. 根据权利要求3所述的信息唯一性识别方法,其特征在于,所述根据模糊识别类中每个团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别,识别所述模糊识别类中互为同一客户的团体客户的步骤包括:The information uniqueness identification method according to claim 3, wherein the unique identification is performed according to the identification information of each group client in the fuzzy recognition class, the number of words of the customer name, and the text content, and the fuzzy recognition class is identified. The steps for group customers who are each other's customers include:
    判断模糊识别类中各个团体客户的客户名称字数是否大于等于预设阈值;Determining whether the number of customer name words of each group customer in the fuzzy recognition class is greater than or equal to a preset threshold;
    若大于等于预设阈值,则根据团体客户的客户名称的字数和文字内容进行唯一性识别;If it is greater than or equal to the preset threshold, the uniqueness is determined according to the number of words and the text content of the customer name of the group client;
    若小于预设阈值,则根据团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别。If it is less than the preset threshold, it is uniquely identified according to the identification information of the group customer, the number of words of the customer name, and the text content.
  6. 根据权利要求5所述的信息唯一性识别方法,其特征在于,所述若大于等于预设阈值,则根据团体客户的客户名称的字数和文字内容进行唯一性识别的步骤包括:The method for uniquely identifying information according to claim 5, wherein the step of uniquely identifying the word number and the text content of the customer name of the group client if the threshold is greater than or equal to the preset threshold comprises:
    任意选取模糊识别类中客户名称字数大于等于预设阈值的一个团体客户,将其客户名称与所有团体客户的客户名称进行文字内容对比;Any group customer with the number of customer name words in the fuzzy recognition class greater than or equal to the preset threshold is randomly selected, and the customer name is compared with the customer name of all group customers;
    根据对比结果判断是否存在客户名称的文字内容完全相同的团体客户,若存在,则将文字内容完全相同的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与其他所有团体客户识别为不同客户;According to the comparison result, it is judged whether there is a group client whose text content of the customer name is completely the same, if yes, the group client with the identical text content is identified as the same client as the selected group client; if not, the selected group will be selected. The customer is identified as a different customer with all other group customers;
    继续选取模糊识别类中客户名称字数大于等于预设阈值的另一个团体客户进行唯一性识别,直到模糊识别类中所有客户名称字数大于等于预设阈值的团体客户均被识别。Continue to select another group customer whose number of words in the fuzzy recognition class is greater than or equal to the preset threshold for unique identification, until all the group customers whose number of words in the fuzzy recognition class are greater than or equal to the preset threshold are recognized.
  7. 根据权利要求5所述的信息唯一性识别方法,其特征在于,所述若小于预设阈值,则根据团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别的步骤包括:The information uniqueness identification method according to claim 5, wherein if the threshold is less than a preset threshold, the step of uniquely identifying according to the identification information of the group customer, the number of words of the customer name, and the text content includes:
    任意选取模糊识别类中客户名称字数小于预设阈值的一个团体客户,将其客户名称与所有团体客户的客户名称进行文字内容对比;Arbitrarily select a group customer whose number of customers in the fuzzy recognition class is less than a preset threshold, and compare the customer name with the customer name of all group customers;
    根据对比结果判断是否存在客户名称的文字内容完全相同的团体客户,若存在,则继续判断文字内容完全相同的团体客户与被选取的团体客户是否具有任意相同的识别信息,将文字内容完全相同、且具有任意相同的识别信息的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与其他所有团体客户识别为不同客户;According to the comparison result, it is judged whether there is a group client whose text content of the customer name is completely the same, and if yes, it is determined whether the group client with the identical text content has the same identification information as the selected group client, and the text content is completely the same. And the group customer having any identical identification information is identified as the same customer as the selected group customer; if not, the selected group customer and all other group customers are identified as different customers;
    继续选取模糊识别类中客户名称字数小于预设阈值的另一个团体客户进行唯一性识别,直到模糊识别类中所有客户名称字数小于预设阈值的团体客户均被识别。Continue to select another group customer whose number of customers in the fuzzy recognition class is less than the preset threshold for unique identification until all the group customers whose number of words in the fuzzy recognition class is less than the preset threshold are recognized.
  8. 一种信息唯一性识别的应用服务器,其特征在于,包括:处理器、存储器和通信总线;An application server for uniquely identifying information, comprising: a processor, a memory, and a communication bus;
    所述存储器上存储有可被所述处理器执行的计算机可读程序;Storing on the memory is a computer readable program executable by the processor;
    所述通信总线实现处理器和存储器之间的连接通信;The communication bus implements connection communication between the processor and the memory;
    所述处理器执行所述计算机可读程序时实现如下步骤:The processor implements the following steps when executing the computer readable program:
    获取各个源数据库中存储的团体客户的基本信息,其中,所述基本信息包括客户名称和识别信息;Obtaining basic information of a group client stored in each source database, where the basic information includes a customer name and identification information;
    根据所述识别信息将各个团体客户标记为精准识别类或者模糊识别类;Marking each group customer as a precise recognition class or a fuzzy recognition class according to the identification information;
    根据预设识别规则对精准识别类和模糊识别类中的团体客户分别进行精准识别和模糊识别,识别互为同一客户的团体客户;According to the preset identification rules, the group customers in the precise identification category and the fuzzy recognition category are respectively accurately identified and fuzzyly identified, and the group customers who are the same customer are identified;
    获取精准识别和模糊识别的识别结果,并根据识别结果将所有互为同一客户的团体客户的基本信息进行整合,得出唯一性识别的团体客户信息。Obtain the recognition result of accurate identification and fuzzy recognition, and integrate the basic information of all group customers who are the same customer according to the recognition result, and obtain the uniquely identified group customer information.
  9. 根据权利要求8所述的信息唯一性识别的应用服务器,其特征在于,所述根据所述识别信息将各个团体客户标记为精准识别类或者模糊识别类的步骤包括:The application server for uniquely identifying information according to claim 8, wherein the step of marking each group customer as a precise recognition class or a fuzzy recognition class according to the identification information comprises:
    解析各个团体客户的识别信息;Analyze the identification information of each group of customers;
    判断各个团体客户的识别信息中是否包含预设精准识别信息,若是,则标记为精准识别类;否则标记为模糊识别类。It is judged whether the identification information of each group customer includes preset accurate identification information, and if so, it is marked as a precise recognition class; otherwise, it is marked as a fuzzy recognition class.
  10. 根据权利要求9所述的信息唯一性识别的应用服务器,其特征在于,所述根据预设识别规则对精准识别类和模糊识别类中的团体客户分别进行精准识别和模糊识别,识别互为同一客户的团体客户的步骤包括:The application server for uniquely identifying information according to claim 9, wherein the accurate identification and fuzzy recognition are performed on the group customers in the precise recognition class and the fuzzy recognition class according to the preset identification rule, and the recognition is the same The steps of the customer's group customer include:
    对所有团体客户的客户名称进行文字检测,获取客户名称的字数和文字内容;Perform text detection on the customer names of all group customers to obtain the word count and text content of the customer name;
    根据精准识别类中每个团体客户的预设精准识别信息和客户名称进行唯一性识别,识别所述精准识别类中互为同一客户的团体客户;Identifying, according to the preset accurate identification information and the customer name of each group customer in the accurate identification class, the group identification of the same customer in the accurate identification category;
    根据模糊识别类中每个团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别,识别所述模糊识别类中互为同一客户的团体客户。According to the identification information of each group client in the fuzzy recognition class, the number of words of the customer name, and the text content, unique identification is performed, and the group customers in the fuzzy recognition class that are the same customer are identified.
  11. 根据权利要求10所述的信息唯一性识别的应用服务器,其特征在于,所述根据精准识别类中每个团体客户的预设精准识别信息和客户名称进行唯一性识别,识别所述精准识别类中互为同一客户的团体客户的步骤包括:The application server for uniquely identifying information according to claim 10, wherein the unique identification is performed according to the preset accurate identification information and the customer name of each group client in the precision identification class, and the accurate recognition class is identified. The steps for group customers who are each other's customers include:
    任意选取精准识别类中的一个团体客户,将其预设精准识别信息和客户名称与精准识别类中其他团体客户的预设精准识别信息和客户名称进行对比;Arbitrarily select a group customer in the precision identification class, compare its preset accurate identification information and customer name with the preset accurate identification information and customer name of other group customers in the precision identification category;
    根据对比结果判断是否存在预设精准识别信息相同、且客户名称相同的团体客户,若存在,则将预设精准识别信息相同、且客户名称相同的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与精准识别类中其他团体客户识别为不同客户;According to the comparison result, it is judged whether there is a group customer whose preset accurate identification information is the same and the customer name is the same, and if yes, the group customer whose preset accurate identification information is the same and whose customer name is the same is recognized as the same customer by the selected group customer. If it does not exist, the selected group customers and other group customers in the precision identification category are identified as different customers;
    继续选取精准识别类中的另一个团体客户与其他团体客户进行唯一性识别,直到精准识别类中所有的团体客户均被识别。Continue to select another group customer in the Accurate Recognition category for unique identification with other group customers until all group customers in the Accurate Recognition category are identified.
  12. 根据权利要求10所述的信息唯一性识别的应用服务器,其特征在于,所述根据模糊识别类中每个团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别,识别所述模糊识别类中互为同一客户的团体客户的步骤包括:The application server for uniquely identifying information according to claim 10, wherein the identification is uniquely identified based on the identification information of each group client in the fuzzy recognition class, the number of words of the customer name, and the text content, and the blur is recognized. The steps to identify group customers that are the same customer in a class include:
    判断模糊识别类中各个团体客户的客户名称字数是否大于等于预设阈值;Determining whether the number of customer name words of each group customer in the fuzzy recognition class is greater than or equal to a preset threshold;
    若大于等于预设阈值,则根据团体客户的客户名称的字数和文字内容进行唯一性识别;If it is greater than or equal to the preset threshold, the uniqueness is determined according to the number of words and the text content of the customer name of the group client;
    若小于预设阈值,则根据团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别。If it is less than the preset threshold, it is uniquely identified according to the identification information of the group customer, the number of words of the customer name, and the text content.
  13. 根据权利要求12所述的信息唯一性识别的应用服务器,其特征在于,所述若大于等于预设阈值,则根据团体客户的客户名称的字数和文字内容进行唯一性识别的步骤包括:The application server for uniquely identifying information according to claim 12, wherein if the threshold is greater than or equal to a preset threshold, the step of uniquely identifying the word number and the text content according to the customer name of the group client includes:
    任意选取模糊识别类中客户名称字数大于等于预设阈值的一个团体客户,将其客户名称与所有团体客户的客户名称进行文字内容对比;Any group customer with the number of customer name words in the fuzzy recognition class greater than or equal to the preset threshold is randomly selected, and the customer name is compared with the customer name of all group customers;
    根据对比结果判断是否存在客户名称的文字内容完全相同的团体客户,若存在,则将文字内容完全相同的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与其他所有团体客户识别为不同客户;According to the comparison result, it is judged whether there is a group client whose text content of the customer name is completely the same, if yes, the group client with the identical text content is identified as the same client as the selected group client; if not, the selected group will be selected. The customer is identified as a different customer with all other group customers;
    继续选取模糊识别类中客户名称字数大于等于预设阈值的另一个团体客户进行唯一性识别,直到模糊识别类中所有客户名称字数大于等于预设阈值的团体客户均被识别。Continue to select another group customer whose number of words in the fuzzy recognition class is greater than or equal to the preset threshold for unique identification, until all the group customers whose number of words in the fuzzy recognition class are greater than or equal to the preset threshold are recognized.
  14. 根据权利要求13所述的信息唯一性识别的应用服务器,其特征在于,所述若小于预设阈值,则根据团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别的步骤包括:The application server for uniquely identifying information according to claim 13, wherein if the threshold is less than a preset threshold, the step of uniquely identifying according to the identification information of the group customer, the number of words of the customer name, and the text content includes:
    任意选取模糊识别类中客户名称字数小于预设阈值的一个团体客户,将其客户名称与所有团体客户的客户名称进行文字内容对比;Arbitrarily select a group customer whose number of customers in the fuzzy recognition class is less than a preset threshold, and compare the customer name with the customer name of all group customers;
    根据对比结果判断是否存在客户名称的文字内容完全相同的团体客户,若存在,则继续判断文字内容完全相同的团体客户与被选取的团体客户是否具有任意相同的识别信息,将文字内容完全相同、且具有任意相同的识别信息的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与其他所有团体客户识别为不同客户;According to the comparison result, it is judged whether there is a group client whose text content of the customer name is completely the same, and if yes, it is determined whether the group client with the identical text content has the same identification information as the selected group client, and the text content is completely the same. And the group customer having any identical identification information is identified as the same customer as the selected group customer; if not, the selected group customer and all other group customers are identified as different customers;
    继续选取模糊识别类中客户名称字数小于预设阈值的另一个团体客户进行唯一性识别,直到模糊识别类中所有客户名称字数小于预设阈值的团体客户均被识别。Continue to select another group customer whose number of customers in the fuzzy recognition class is less than the preset threshold for unique identification until all the group customers whose number of words in the fuzzy recognition class is less than the preset threshold are recognized.
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现如下步骤:A computer readable storage medium, characterized in that the computer readable storage medium stores one or more programs, the one or more programs being executable by one or more processors to implement the following steps:
    获取各个源数据库中存储的团体客户的基本信息,其中,所述基本信息包括客户名称和识别信息;Obtaining basic information of a group client stored in each source database, where the basic information includes a customer name and identification information;
    根据所述识别信息将各个团体客户标记为精准识别类或者模糊识别类;Marking each group customer as a precise recognition class or a fuzzy recognition class according to the identification information;
    根据预设识别规则对精准识别类和模糊识别类中的团体客户分别进行精准识别和模糊识别,识别互为同一客户的团体客户;According to the preset identification rules, the group customers in the precise identification category and the fuzzy recognition category are respectively accurately identified and fuzzyly identified, and the group customers who are the same customer are identified;
    获取精准识别和模糊识别的识别结果,并根据识别结果将所有互为同一客户的团体客户的基本信息进行整合,得出唯一性识别的团体客户信息。Obtain the recognition result of accurate identification and fuzzy recognition, and integrate the basic information of all group customers who are the same customer according to the recognition result, and obtain the uniquely identified group customer information.
  16. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述一个或者多个程序可被一个或者多个处理器执行,还实现如下步骤:The computer readable storage medium of claim 15 wherein the one or more programs are executable by one or more processors and further implementing the following steps:
    解析各个团体客户的识别信息;Analyze the identification information of each group of customers;
    判断各个团体客户的识别信息中是否包含预设精准识别信息,若是,则标记为精准识别类;否则标记为模糊识别类。It is judged whether the identification information of each group customer includes preset accurate identification information, and if so, it is marked as a precise recognition class; otherwise, it is marked as a fuzzy recognition class.
  17. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述一个或者多个程序可被一个或者多个处理器执行,还实现如下步骤:The computer readable storage medium of claim 16, wherein the one or more programs are executable by one or more processors and further implement the following steps:
    对所有团体客户的客户名称进行文字检测,获取客户名称的字数和文字内容;Perform text detection on the customer names of all group customers to obtain the word count and text content of the customer name;
    根据精准识别类中每个团体客户的预设精准识别信息和客户名称进行唯一性识别,识别所述精准识别类中互为同一客户的团体客户;Identifying, according to the preset accurate identification information and the customer name of each group customer in the accurate identification class, the group identification of the same customer in the accurate identification category;
    根据模糊识别类中每个团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别,识别所述模糊识别类中互为同一客户的团体客户。According to the identification information of each group client in the fuzzy recognition class, the number of words of the customer name, and the text content, unique identification is performed, and the group customers in the fuzzy recognition class that are the same customer are identified.
  18. 根据权利要求17所述的计算机可读存储介质,其特征在于,所述一个或者多个程序可被一个或者多个处理器执行,还实现如下步骤:The computer readable storage medium of claim 17, wherein the one or more programs are executable by one or more processors and further implement the following steps:
    任意选取精准识别类中的一个团体客户,将其预设精准识别信息和客户名称与精准识别类中其他团体客户的预设精准识别信息和客户名称进行对比;Arbitrarily select a group customer in the precision identification class, compare its preset accurate identification information and customer name with the preset accurate identification information and customer name of other group customers in the precision identification category;
    根据对比结果判断是否存在预设精准识别信息相同、且客户名称相同的团体客户,若存在,则将预设精准识别信息相同、且客户名称相同的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与精准识别类中其他团体客户识别为不同客户;According to the comparison result, it is judged whether there is a group customer whose preset accurate identification information is the same and the customer name is the same, and if yes, the group customer whose preset accurate identification information is the same and whose customer name is the same is recognized as the same customer by the selected group customer. If it does not exist, the selected group customers and other group customers in the precision identification category are identified as different customers;
    继续选取精准识别类中的另一个团体客户与其他团体客户进行唯一性识别,直到精准识别类中所有的团体客户均被识别。Continue to select another group customer in the Accurate Recognition category for unique identification with other group customers until all group customers in the Accurate Recognition category are identified.
  19. 根据权利要求17所述的计算机可读存储介质,其特征在于,所述一个或者多个程序可被一个或者多个处理器执行,还实现如下步骤:The computer readable storage medium of claim 17, wherein the one or more programs are executable by one or more processors and further implement the following steps:
    判断模糊识别类中各个团体客户的客户名称字数是否大于等于预设阈值;Determining whether the number of customer name words of each group customer in the fuzzy recognition class is greater than or equal to a preset threshold;
    若大于等于预设阈值,则根据团体客户的客户名称的字数和文字内容进行唯一性识别;If it is greater than or equal to the preset threshold, the uniqueness is determined according to the number of words and the text content of the customer name of the group client;
    若小于预设阈值,则根据团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别。If it is less than the preset threshold, it is uniquely identified according to the identification information of the group customer, the number of words of the customer name, and the text content.
  20. 根据权利要求19所述的计算机可读存储介质,其特征在于,所述一个或者多个程序可被一个或者多个处理器执行,还实现如下步骤:The computer readable storage medium of claim 19, wherein the one or more programs are executable by one or more processors, further implementing the following steps:
    任意选取模糊识别类中客户名称字数大于等于预设阈值的一个团体客户,将其客户名称与所有团体客户的客户名称进行文字内容对比;Any group customer with the number of customer name words in the fuzzy recognition class greater than or equal to the preset threshold is randomly selected, and the customer name is compared with the customer name of all group customers;
    根据对比结果判断是否存在客户名称的文字内容完全相同的团体客户,若存在,则将文字内容完全相同的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与其他所有团体客户识别为不同客户;According to the comparison result, it is judged whether there is a group client whose text content of the customer name is completely the same, if yes, the group client with the identical text content is identified as the same client as the selected group client; if not, the selected group will be selected. The customer is identified as a different customer with all other group customers;
    继续选取模糊识别类中客户名称字数大于等于预设阈值的另一个团体客户进行唯一性识别,直到模糊识别类中所有客户名称字数大于等于预设阈值的团体客户均被识别。Continue to select another group customer whose number of words in the fuzzy recognition class is greater than or equal to the preset threshold for unique identification, until all the group customers whose number of words in the fuzzy recognition class are greater than or equal to the preset threshold are recognized.
  21. 根据权利要求19所述的计算机可读存储介质,其特征在于,所述一个或者多个程序可被一个或者多个处理器执行,还实现如下步骤:The computer readable storage medium of claim 19, wherein the one or more programs are executable by one or more processors, further implementing the following steps:
    任意选取模糊识别类中客户名称字数小于预设阈值的一个团体客户,将其客户名称与所有团体客户的客户名称进行文字内容对比;Arbitrarily select a group customer whose number of customers in the fuzzy recognition class is less than a preset threshold, and compare the customer name with the customer name of all group customers;
    根据对比结果判断是否存在客户名称的文字内容完全相同的团体客户,若存在,则继续判断文字内容完全相同的团体客户与被选取的团体客户是否具有任意相同的识别信息,将文字内容完全相同、且具有任意相同的识别信息的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与其他所有团体客户识别为不同客户;According to the comparison result, it is judged whether there is a group client whose text content of the customer name is completely the same, and if yes, it is determined whether the group client with the identical text content has the same identification information as the selected group client, and the text content is completely the same. And the group customer having any identical identification information is identified as the same customer as the selected group customer; if not, the selected group customer and all other group customers are identified as different customers;
    继续选取模糊识别类中客户名称字数小于预设阈值的另一个团体客户进行唯一性识别,直到模糊识别类中所有客户名称字数小于预设阈值的团体客户均被识别。Continue to select another group customer whose number of customers in the fuzzy recognition class is less than the preset threshold for unique identification until all the group customers whose number of words in the fuzzy recognition class is less than the preset threshold are recognized.
  22. 一种信息唯一性识别系统,包括若干个源数据库,其特征在于,还包括信息唯一性识别的应用服务器;An information uniqueness identification system, comprising a plurality of source databases, characterized in that it further comprises an application server for uniquely identifying information;
    各源数据库,用于存储团体客户的基本信息;Each source database is used to store basic information of a group customer;
    所述应用服务器,用于获取各个源数据库中存储的团体客户的基本信息,其中,所述基本信息包括客户名称和识别信息;以及根据所述识别信息将各个团体客户标记为精准识别类或者模糊识别类;根据预设识别规则对精准识别类和模糊识别类中的团体客户分别进行精准识别和模糊识别,识别互为同一客户的团体客户;并获取精准识别和模糊识别的识别结果,根据识别结果将所有互为同一客户的团体客户的基本信息进行整合,得出唯一性识别的团体客户信息。The application server is configured to acquire basic information of a group client stored in each source database, where the basic information includes a customer name and identification information; and mark each group customer as a precise recognition class or blur according to the identification information. Identifying the class; accurately identifying and fuzzyly identifying the group customers in the precise identification class and the fuzzy recognition class according to the preset identification rule, and identifying the group customers who are the same customer; and obtaining the recognition result of the accurate recognition and the fuzzy recognition, according to the identification As a result, the basic information of all group customers who are the same customer is integrated to obtain uniquely identified group customer information.
  23. 根据权利要求22所述的信息唯一性识别系统,其特征在于,所述应用服务器还用于:The information uniqueness identification system according to claim 22, wherein the application server is further configured to:
    解析各个团体客户的识别信息;Analyze the identification information of each group of customers;
    判断各个团体客户的识别信息中是否包含预设精准识别信息,若是,则标记为精准识别类;否则标记为模糊识别类。It is judged whether the identification information of each group customer includes preset accurate identification information, and if so, it is marked as a precise recognition class; otherwise, it is marked as a fuzzy recognition class.
  24. 根据权利要求23所述的信息唯一性识别系统,其特征在于,所述应用服务器还用于:The information uniqueness identification system according to claim 23, wherein the application server is further configured to:
    对所有团体客户的客户名称进行文字检测,获取客户名称的字数和文字内容;Perform text detection on the customer names of all group customers to obtain the word count and text content of the customer name;
    根据精准识别类中每个团体客户的预设精准识别信息和客户名称进行唯一性识别,识别所述精准识别类中互为同一客户的团体客户;Identifying, according to the preset accurate identification information and the customer name of each group customer in the accurate identification class, the group identification of the same customer in the accurate identification category;
    根据模糊识别类中每个团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别,识别所述模糊识别类中互为同一客户的团体客户。According to the identification information of each group client in the fuzzy recognition class, the number of words of the customer name, and the text content, unique identification is performed, and the group customers in the fuzzy recognition class that are the same customer are identified.
  25. 根据权利要求24所述的信息唯一性识别系统,其特征在于,所述应用服务器还用于:The information uniqueness identification system according to claim 24, wherein the application server is further configured to:
    任意选取精准识别类中的一个团体客户,将其预设精准识别信息和客户名称与精准识别类中其他团体客户的预设精准识别信息和客户名称进行对比;Arbitrarily select a group customer in the precision identification class, compare its preset accurate identification information and customer name with the preset accurate identification information and customer name of other group customers in the precision identification category;
    根据对比结果判断是否存在预设精准识别信息相同、且客户名称相同的团体客户,若存在,则将预设精准识别信息相同、且客户名称相同的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与精准识别类中其他团体客户识别为不同客户;According to the comparison result, it is judged whether there is a group customer whose preset accurate identification information is the same and the customer name is the same, and if yes, the group customer whose preset accurate identification information is the same and whose customer name is the same is recognized as the same customer by the selected group customer. If it does not exist, the selected group customers and other group customers in the precision identification category are identified as different customers;
    继续选取精准识别类中的另一个团体客户与其他团体客户进行唯一性识别,直到精准识别类中所有的团体客户均被识别。Continue to select another group customer in the Accurate Recognition category for unique identification with other group customers until all group customers in the Accurate Recognition category are identified.
  26. 根据权利要求24所述的信息唯一性识别系统,其特征在于,所述应用服务器还用于:The information uniqueness identification system according to claim 24, wherein the application server is further configured to:
    判断模糊识别类中各个团体客户的客户名称字数是否大于等于预设阈值;Determining whether the number of customer name words of each group customer in the fuzzy recognition class is greater than or equal to a preset threshold;
    若大于等于预设阈值,则根据团体客户的客户名称的字数和文字内容进行唯一性识别;If it is greater than or equal to the preset threshold, the uniqueness is determined according to the number of words and the text content of the customer name of the group client;
    若小于预设阈值,则根据团体客户的识别信息、客户名称的字数和文字内容进行唯一性识别。If it is less than the preset threshold, it is uniquely identified according to the identification information of the group customer, the number of words of the customer name, and the text content.
  27. 根据权利要求26所述的信息唯一性识别系统,其特征在于,所述应用服务器还用于:The information uniqueness identification system according to claim 26, wherein the application server is further configured to:
    任意选取模糊识别类中客户名称字数大于等于预设阈值的一个团体客户,将其客户名称与所有团体客户的客户名称进行文字内容对比;Any group customer with the number of customer name words in the fuzzy recognition class greater than or equal to the preset threshold is randomly selected, and the customer name is compared with the customer name of all group customers;
    根据对比结果判断是否存在客户名称的文字内容完全相同的团体客户,若存在,则将文字内容完全相同的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与其他所有团体客户识别为不同客户;According to the comparison result, it is judged whether there is a group client whose text content of the customer name is completely the same, if yes, the group client with the identical text content is identified as the same client as the selected group client; if not, the selected group will be selected. The customer is identified as a different customer with all other group customers;
    继续选取模糊识别类中客户名称字数大于等于预设阈值的另一个团体客户进行唯一性识别,直到模糊识别类中所有客户名称字数大于等于预设阈值的团体客户均被识别。Continue to select another group customer whose number of words in the fuzzy recognition class is greater than or equal to the preset threshold for unique identification, until all the group customers whose number of words in the fuzzy recognition class are greater than or equal to the preset threshold are recognized.
  28. 根据权利要求26所述的信息唯一性识别系统,其特征在于,所述应用服务器还用于:The information uniqueness identification system according to claim 26, wherein the application server is further configured to:
    任意选取模糊识别类中客户名称字数小于预设阈值的一个团体客户,将其客户名称与所有团体客户的客户名称进行文字内容对比;Arbitrarily select a group customer whose number of customers in the fuzzy recognition class is less than a preset threshold, and compare the customer name with the customer name of all group customers;
    根据对比结果判断是否存在客户名称的文字内容完全相同的团体客户,若存在,则继续判断文字内容完全相同的团体客户与被选取的团体客户是否具有任意相同的识别信息,将文字内容完全相同、且具有任意相同的识别信息的团体客户与被选取的团体客户识别为同一客户;若不存在,则将被选取的团体客户与其他所有团体客户识别为不同客户;According to the comparison result, it is judged whether there is a group client whose text content of the customer name is completely the same, and if yes, it is determined whether the group client with the identical text content has the same identification information as the selected group client, and the text content is completely the same. And the group customer having any identical identification information is identified as the same customer as the selected group customer; if not, the selected group customer and all other group customers are identified as different customers;
    继续选取模糊识别类中客户名称字数小于预设阈值的另一个团体客户进行唯一性识别,直到模糊识别类中所有客户名称字数小于预设阈值的团体客户均被识别。 Continue to select another group customer whose number of customers in the fuzzy recognition class is less than the preset threshold for unique identification until all the group customers whose number of words in the fuzzy recognition class is less than the preset threshold are recognized.
PCT/CN2018/084325 2017-09-20 2018-04-25 Information uniqueness identification method, application server, system, and storage medium WO2019056750A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710850369.5 2017-09-20
CN201710850369.5A CN107704529B (en) 2017-09-20 2017-09-20 Information uniqueness identification method, application server, system and storage medium

Publications (1)

Publication Number Publication Date
WO2019056750A1 true WO2019056750A1 (en) 2019-03-28

Family

ID=61172973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/084325 WO2019056750A1 (en) 2017-09-20 2018-04-25 Information uniqueness identification method, application server, system, and storage medium

Country Status (2)

Country Link
CN (1) CN107704529B (en)
WO (1) WO2019056750A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704529B (en) * 2017-09-20 2020-04-10 平安科技(深圳)有限公司 Information uniqueness identification method, application server, system and storage medium
CN109064342A (en) * 2018-07-20 2018-12-21 阳光保险集团股份有限公司 Client identity recognition methods and device
CN109815268A (en) * 2018-12-21 2019-05-28 上海诺悦智能科技有限公司 A kind of transaction sanction list matching system
CN111126935B (en) * 2019-11-19 2023-07-21 泰康保险集团股份有限公司 Method and device for processing security data, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101452556A (en) * 2008-12-31 2009-06-10 中国建设银行股份有限公司 Customer information processing system and method
CN106407245A (en) * 2016-06-23 2017-02-15 平安科技(深圳)有限公司 Information processing method and apparatus
CN106934509A (en) * 2015-12-30 2017-07-07 平安科技(深圳)有限公司 Customer information merging method and system
CN106970994A (en) * 2017-04-01 2017-07-21 长沙智擎信息技术有限公司 A kind of online practical demonstration extracting method of automation
CN107704529A (en) * 2017-09-20 2018-02-16 平安科技(深圳)有限公司 The recognition methods of information uniqueness, application server, system and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020194018A1 (en) * 2000-06-05 2002-12-19 Gene Scott Method for matching complimentary business interests
CN102663008B (en) * 2012-03-20 2017-04-26 浪潮软件股份有限公司 Government integrated business platform business library and construction method of base library
CN103646110B (en) * 2013-12-26 2017-01-11 中国人民银行征信中心 Natural person basic identity information matching method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101452556A (en) * 2008-12-31 2009-06-10 中国建设银行股份有限公司 Customer information processing system and method
CN106934509A (en) * 2015-12-30 2017-07-07 平安科技(深圳)有限公司 Customer information merging method and system
CN106407245A (en) * 2016-06-23 2017-02-15 平安科技(深圳)有限公司 Information processing method and apparatus
CN106970994A (en) * 2017-04-01 2017-07-21 长沙智擎信息技术有限公司 A kind of online practical demonstration extracting method of automation
CN107704529A (en) * 2017-09-20 2018-02-16 平安科技(深圳)有限公司 The recognition methods of information uniqueness, application server, system and storage medium

Also Published As

Publication number Publication date
CN107704529B (en) 2020-04-10
CN107704529A (en) 2018-02-16

Similar Documents

Publication Publication Date Title
WO2019056750A1 (en) Information uniqueness identification method, application server, system, and storage medium
WO2019041831A1 (en) Performance indicator assessment and evaluation method, device and apparatus, and storage medium
WO2019041832A1 (en) Method, server and system for modifying source database table structure, and storage medium
WO2019165691A1 (en) Method, apparatus and device for automatically generating test case, and readable storage medium
WO2019100654A1 (en) Method and device for processing multiple tasks, application server and storage medium
WO2018107610A1 (en) Service data processing method, system and device, and computer-readable storage medium
WO2019128007A1 (en) Container logon method, application server, system, and storage medium
WO2018066942A1 (en) Electronic device and method for controlling the same
WO2013155905A1 (en) Method, device and computer storage media for user preferences information collection
WO2018227880A1 (en) Data comparison method, apparatus and device, and readable storage medium
WO2015144089A1 (en) Application recommending method and apparatus
WO2019119624A1 (en) Excel table-based calculation method and apparatus, device, and storage medium
WO2020224247A1 (en) Blockchain–based data provenance method, apparatus and device, and readable storage medium
WO2019161615A1 (en) Bill entry method, system, optical character recognition server and storage medium
WO2015135443A1 (en) Method and apparatus for simulating sound in virtual scenario, and terminal
WO2014079327A1 (en) Information pushing method and system, digital tv receiving terminal and computer storage medium
WO2019174375A1 (en) Interface test method, apparatus and device, and computer-readable storage medium
WO2019000462A1 (en) Face image processing method and apparatus, storage medium, and electronic device
WO2017206601A1 (en) Client data processing method and apparatus
WO2017197802A1 (en) Character string fuzzy matching method and apparatus
WO2019041822A1 (en) Intelligent salesman code generation method, apparatus and device, and storage medium
WO2018233356A1 (en) Method, system, device, and computer readable storage medium for scanning a document
WO2019090981A1 (en) Method and apparatus for monitoring insurance application system
WO2018120430A1 (en) Page construction method, terminal, computer-readable storage medium and page construction device
US10217031B2 (en) Identifying complimentary physical components to known physical components

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01/10/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18857947

Country of ref document: EP

Kind code of ref document: A1