CN110750509A - Enterprise name duplicate checking method and device, equipment and medium - Google Patents

Enterprise name duplicate checking method and device, equipment and medium Download PDF

Info

Publication number
CN110750509A
CN110750509A CN201911018999.1A CN201911018999A CN110750509A CN 110750509 A CN110750509 A CN 110750509A CN 201911018999 A CN201911018999 A CN 201911018999A CN 110750509 A CN110750509 A CN 110750509A
Authority
CN
China
Prior art keywords
enterprise information
enterprise
repeated
information
organization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911018999.1A
Other languages
Chinese (zh)
Inventor
王章龙
张韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Senauer Beth (beijing) Marketing Technology Ltd By Share Ltd
Original Assignee
Senauer Beth (beijing) Marketing Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Senauer Beth (beijing) Marketing Technology Ltd By Share Ltd filed Critical Senauer Beth (beijing) Marketing Technology Ltd By Share Ltd
Priority to CN201911018999.1A priority Critical patent/CN110750509A/en
Publication of CN110750509A publication Critical patent/CN110750509A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides an enterprise name duplicate checking method, an enterprise name duplicate checking device, enterprise name duplicate checking equipment and an enterprise name duplicate checking medium, wherein the method comprises the following steps: acquiring enterprise information, wherein the enterprise information comprises: a name of the business; splitting the enterprise name into word-splitting phrases with the region, the word size, the operation range, the organization form and the organization type as categories; for the word segmentation phrase, one or more combinations of regions, word sizes, operation ranges, organization forms and mechanism types are used as duplication checking bases to determine repeated enterprise information.

Description

Enterprise name duplicate checking method and device, equipment and medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to an enterprise name duplicate checking method, device, equipment and medium.
Background
In actual production life, enterprise data of clients are arranged, and when oriented marketing is conducted on enterprises, a large amount of repeated enterprise data exist in sample data. The intelligent data duplicate checking system aims at the Chinese name of an enterprise to carry out duplicate checking operation. In order to reduce the enterprise dislike of marketing users in marketing, the marketing data repetition rate is reduced, so that the delivery of a marketing range is more accurately positioned, and the accuracy of Chinese data of enterprises is improved.
In thousands of data, it is very difficult to search for duplicate data by human power. In the prior art, the following schemes are generally adopted for duplicate checking: 1. and removing duplicate data of the identical enterprise name by using the duplicate checking function of the EXCEL. 2. And removing the duplicate data of the completely same enterprise name by using database tools such as SQL and the like. 3. And (4) removing duplicate data of the identical enterprise name by using a duplicate checking tool.
The prior art has the following disadvantages: 1. the operators are required to master the tools of EXCEL, SQL, duplicate checking, word segmentation and the like, and the requirement on the skills of the basic operators is high. 2. Requiring operators to understand certain duplicate checking methods and combining all the methods for duplicate checking. 3. The intermediate flow is more and the operation is complex. 4. The repetition rate of the duplicate checking result is high, completely repeated data can be removed, and approximately repeated data cannot be checked. 5. Consuming much resources. If the data volume is large, multiple persons are needed to cooperate and confirm repeatedly, and errors are easy to occur.
Therefore, it is a technical problem to be solved by those skilled in the art how to provide a business name duplication checking scheme, which can reduce the skill requirement on operators and conveniently and quickly check the business names.
Disclosure of Invention
Therefore, the embodiment of the invention provides an enterprise name duplication checking method, device, equipment and medium, which can reduce the skill requirement on operators and conveniently and quickly check the duplication of enterprise names.
In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides an enterprise information duplication checking method, including:
acquiring enterprise information, wherein the enterprise information comprises: a name of the business;
splitting the enterprise name into word-splitting phrases with the region, the word size, the operation range, the organization form and the organization type as categories;
and determining repeated enterprise information by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplication checking bases for the word segmentation word groups.
Preferably, after said obtaining business information;
before the splitting the enterprise name into word-splitting phrases with the categories of region, word size, operation range, organization form and organization type, the method further comprises the following steps:
converting full angle data or half angle data in the enterprise information into a first preset format;
converting Arabic numerals or Chinese character numerals in the enterprise information into a second preset format to obtain enterprise information with a uniform format;
converting the Chinese characters in the enterprise information into pinyin so as to realize homophone duplication checking;
converting the traditional Chinese characters in the enterprise information into simplified Chinese characters so as to realize repeated checking of simplified Chinese characters and traditional Chinese characters;
the first preset format is full angle data or half angle data; the second preset format is Arabic numerals or Chinese character numerals.
Preferably, the region comprises: country, province, city, county;
the mechanism type comprises: company, office, department, hall, office, organization, office.
Preferably, the enterprise information further includes: one or more combinations of contact names, mailbox addresses, telephone numbers and mobile phone numbers corresponding to the enterprise names;
correspondingly, the duplicate checking method further comprises the following steps:
and determining repeated enterprise information by using one or more combinations of contact names, mailbox addresses, telephone numbers and mobile phone numbers as a duplication checking basis.
Preferably, the determining repeated enterprise information for the word-segmentation phrase by using one or more combinations of regions, word sizes, business scopes, organizational forms and organization types as duplication checking bases includes:
searching for duplication of the word segmentation phrases by using a first rule to obtain first repeated enterprise information;
searching for duplication of the word segmentation phrases by using a second rule to obtain second repeated enterprise information;
if enterprise information with information intersection exists in the first repeated enterprise information and the second repeated enterprise information, merging the first repeated enterprise information and the second repeated enterprise information into third repeated enterprise information;
wherein the first rule comprises: a combination of one or more of region, font size, business scope, organizational form, organization type; the second rule includes: a combination of one or more of region, font size, business scope, organizational form, organization type; the first rule is different from the second rule.
Preferably, after merging the first repeated enterprise information and the second repeated enterprise information into third repeated enterprise information if there is enterprise information with information intersection in the first repeated enterprise information and the second repeated enterprise information, the method further includes:
storing the first rule and the second rule as a rule group template for subsequent use and invocation;
and only one copy of completely repeated data in the enterprise information is reserved to obtain enterprise duplicate checking result data so as to facilitate the export and use of users.
In a second aspect, an embodiment of the present invention provides an enterprise information duplication checking apparatus, including:
an enterprise information obtaining module, configured to obtain enterprise information, where the enterprise information includes: a name of the business;
the enterprise name word segmentation module is used for splitting the enterprise name into word segmentation phrases with the types of regions, word sizes, operation ranges, organization forms and organization types;
and the word segmentation combination duplication checking module is used for determining repeated enterprise information by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplication checking bases for the word segmentation phrases.
Preferably, the word segmentation combination duplication checking module includes:
the first duplicate checking unit is used for checking duplicate of the word segmentation phrases by using a first rule to obtain first repeated enterprise information;
the second duplication checking unit is used for checking duplication of the word segmentation phrases by using a second rule to obtain second repeated enterprise information;
a data merging unit, configured to merge the first duplicate enterprise information and the second duplicate enterprise information into third duplicate enterprise information if there is enterprise information with information intersection in the first duplicate enterprise information and the second duplicate enterprise information;
wherein the first rule comprises: a combination of one or more of region, font size, business scope, organizational form, organization type; the second rule includes: a combination of one or more of region, font size, business scope, organizational form, organization type; the first rule is different from the second rule.
In a third aspect, an embodiment of the present invention provides an enterprise information duplication checking device, including:
a memory for storing a computer program;
a processor, configured to implement the steps of the enterprise information duplication checking method according to any one of the above first aspects when the computer program is executed.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the steps of the enterprise information duplication checking method according to any one of the above first aspects.
The embodiment of the invention provides an enterprise information duplicate checking method, which comprises the following steps: acquiring enterprise information, wherein the enterprise information comprises: a name of the business; splitting the enterprise name into word-splitting phrases with the region, the word size, the operation range, the organization form and the organization type as categories; for the word segmentation phrase, one or more combinations of regions, word sizes, operation ranges, organization forms and mechanism types are used as duplication checking bases to determine repeated enterprise information.
The enterprise name duplicate checking method, the enterprise name duplicate checking device, the enterprise name duplicate checking equipment and the enterprise name duplicate checking medium provided by the embodiment of the invention have the beneficial effects which are not repeated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.
Fig. 1 is a flowchart of an enterprise information duplication checking method according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a unified format of an enterprise information duplication checking method according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a double duplication checking method for an enterprise information duplication checking method according to an embodiment of the present invention;
fig. 4 is a data storage call flow chart of an enterprise information duplication checking method according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating an enterprise information duplication checking method according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram illustrating a structure of an enterprise information duplication checking apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a word segmentation combination duplication checking module of an enterprise information duplication checking device according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an enterprise information duplication checking apparatus according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 2, fig. 3, fig. 4, and fig. 5, fig. 1 is a flowchart of an enterprise information duplication checking method according to an embodiment of the present invention; fig. 2 is a flowchart illustrating a unified format of an enterprise information duplication checking method according to an embodiment of the present invention; fig. 3 is a flowchart illustrating a double duplication checking method for an enterprise information duplication checking method according to an embodiment of the present invention; fig. 4 is a data storage call flow chart of an enterprise information duplication checking method according to an embodiment of the present invention; fig. 5 is a practical flowchart of an enterprise information duplication checking method according to an embodiment of the present invention.
The embodiment of the invention provides an enterprise information duplicate checking method, which comprises the following steps:
step S11: acquiring enterprise information, wherein the enterprise information comprises: a name of the business;
step S12: splitting the enterprise name into word-splitting phrases with the region, the word size, the operation range, the organization form and the organization type as categories;
step S13: and determining repeated enterprise information by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplication checking bases for the word segmentation word groups.
In the embodiment of the present invention, firstly, the enterprise information needs to be acquired, for example, the enterprise information may be acquired through an excel form file in which the enterprise information is stored, specifically, the enterprise information should at least include an enterprise name, and certainly, some other information of the enterprise may also be included, for example, business information, a contact name, a mailbox address, a telephone number, a mobile phone number, and the like corresponding to the enterprise name.
After the enterprise information is obtained, the enterprise name can be divided into word groups with the categories of region, word number, operation range, organization form and organization type according to the enterprise name in the enterprise information, generally, for the enterprise name, the enterprise name needs to meet certain specifications due to the regulations of national departments, for example, for an enterprise name called "Beijing knows guaguaguaguaguaguaguagua intellectual property agency, the Beijing is region, knows guaguaguaguaguaguaguagua is word number, the intellectual property agency is operation range, and the company has organization form and structure type. Therefore, the enterprise name can be segmented according to the rule of the enterprise name, and segmented word groups with the types of regions, word sizes, operation ranges, organization forms and organization types are obtained.
Specifically, for example, in china, the region in the name of a business, includes: country, province, city, county; therefore, Chinese place names can be ranked and ranked to facilitate word segmentation; and the types of institutions in china include: companies, offices, departments, halls, offices, institutions, offices, and the like, and these types of structures may be listed, and some of them, halls, offices, institutions, and offices are administrative structures.
After word-segmentation word groups are obtained, duplicate checking can be performed on each word group as a basis, for example, a region can be set as a duplicate checking basis, so that all enterprise names belonging to the same region can be obtained, which is not enough, because there are many enterprises in one region, further duplicate checking can be performed by using word numbers. For example, if the names of the businesses are obtained: three company names of Beijing Piagu intellectual property agency, Liaoning Piagu intellectual property agency, and Xian Piagu intellectual property agency are repeated when only the word size is used for duplication checking, but the three company names are not repeated when the region and the word size are used for duplication checking.
When the enterprise information, the method further comprises the following steps: one or more combinations of contact names, mailbox addresses, telephone numbers and mobile phone numbers corresponding to the enterprise names; correspondingly, the duplicate checking method further comprises the following steps: and determining repeated enterprise information by using one or more combinations of contact names, mailbox addresses, telephone numbers and mobile phone numbers as a duplication checking basis. That is to say, duplicate checking can be performed not only by using the word segmentation word group obtained by segmenting words in the enterprise name, but also by using the contact name, the mailbox address, the telephone number, the mobile phone number and the like.
Further, because the enterprise information acquired from various channels has different input methods and input habits, different expressions may exist in the same meaning, for example, beijing 108 chinese schools and beijing one-zero-eight chinese schools may be input, if the information has english characters, there may also be different half-corner and full-corner inputs, and if the differences are not subjected to standard processing, the name of the enterprise cannot be effectively checked, so after acquiring the enterprise information, before splitting the name of the enterprise into word-splitting phrases of which the categories are region, word size, business scope, organizational form and organization type, the following steps may be further implemented to achieve unification of the input formats:
step S21: converting full angle data or half angle data in the enterprise information into a first preset format;
step S22: converting Arabic numerals or Chinese character numerals in the enterprise information into a second preset format to obtain enterprise information with a uniform format;
step S23: converting the Chinese characters in the enterprise information into pinyin so as to realize homophone duplication checking;
step S24: converting the traditional Chinese characters in the enterprise information into simplified Chinese characters so as to realize repeated checking of simplified Chinese characters and traditional Chinese characters;
the first preset format is full angle data or half angle data; the second preset format is Arabic numerals or Chinese character numerals.
Furthermore, in practice, when two different sets of word segmentation phrase combinations are adopted as rules to screen the enterprise names, the obtained duplicate checking results are different, at this time, the results obtained by duplicate checking of the two sets of rules can be further operated to check the duplicate more thoroughly, and in order to realize that the repeated enterprise information is determined by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplicate checking bases, the following steps can be implemented:
step S31: searching for duplication of the word segmentation phrases by using a first rule to obtain first repeated enterprise information;
step S32: searching for duplication of the word segmentation phrases by using a second rule to obtain second repeated enterprise information;
step S33: if enterprise information with information intersection exists in the first repeated enterprise information and the second repeated enterprise information, merging the first repeated enterprise information and the second repeated enterprise information into third repeated enterprise information;
wherein the first rule comprises: a combination of one or more of region, font size, business scope, organizational form, organization type; the second rule includes: a combination of one or more of region, font size, business scope, organizational form, organization type; the first rule is different from the second rule. For example, if the first duplicate business information is: beijing zhigua intellectual property agency, Xian zhigua intellectual property agency; the second duplicate enterprise information is: beijing Zhigua intellectual Property agency, Shanghai Zhigua intellectual Property agency; at this time, the first repeated business information and the second repeated business information have the same business name: the Beijing Piagua intellectual Property agency company, so that information intersection exists in the first repeated enterprise information and the second repeated enterprise information; at this time, the first duplicate enterprise information and the second duplicate enterprise information may be merged to obtain third duplicate enterprise information: beijing Zhigua intellectual Property agency, Xian Zhigua intellectual Property agency, and Shanghai Zhigua intellectual Property agency.
It should be noted that after the enterprise information is checked for duplicate, that is, after the first duplicate enterprise information and the second duplicate enterprise information are combined into the third duplicate enterprise information, the rules may also be stored as a rule set template, so as to facilitate future use, and specifically, the following steps may be performed:
step S41: storing the first rule and the second rule as a rule group template for subsequent use and invocation;
step S42: and only one copy of completely repeated data in the enterprise information is reserved to obtain enterprise duplicate checking result data so as to facilitate the export and use of users.
The embodiment of the invention provides an enterprise information duplicate checking method, which comprises the following steps: acquiring enterprise information, wherein the enterprise information comprises: a name of the business; splitting the enterprise name into word-splitting phrases with the region, the word size, the operation range, the organization form and the organization type as categories; for the word segmentation phrase, one or more combinations of regions, word sizes, operation ranges, organization forms and mechanism types are used as duplication checking bases to determine repeated enterprise information.
Referring to fig. 6 and 7, fig. 6 is a schematic diagram illustrating a structure of an enterprise information duplication checking device according to an embodiment of the present invention; fig. 7 is a schematic structural diagram of a word segmentation combination duplication checking module of an enterprise information duplication checking device according to an embodiment of the present invention.
In a second aspect, an embodiment of the present invention provides an enterprise information duplication checking apparatus 600, including:
an enterprise information obtaining module 610, configured to obtain enterprise information, where the enterprise information includes: a name of the business;
an enterprise name word segmentation module 620, configured to split the enterprise name into word segmentation phrases with categories of a region, a word size, an operation range, an organization form, and an organization type;
and the word segmentation combination duplication checking module 630 is used for determining repeated enterprise information by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplication checking bases for the word segmentation word groups.
Preferably, the word segmentation and combination duplication checking module 630 includes:
the first duplicate checking unit 631 is configured to check duplicates of the word segmentation phrases by using a first rule to obtain first repeated enterprise information;
a second duplicate checking unit 632, configured to check duplicate of the word segmentation phrase by using a second rule, to obtain second duplicate enterprise information;
a data merging unit 633, configured to merge the first duplicate enterprise information and the second duplicate enterprise information into third duplicate enterprise information if there is enterprise information with information intersection in the first duplicate enterprise information and the second duplicate enterprise information;
wherein the first rule comprises: a combination of one or more of region, font size, business scope, organizational form, organization type; the second rule includes: a combination of one or more of region, font size, business scope, organizational form, organization type; the first rule is different from the second rule.
Referring to fig. 8 and 9, fig. 8 is a schematic structural diagram of an enterprise information duplication checking device according to an embodiment of the present invention; fig. 9 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
The embodiment of the present invention provides an enterprise information duplication checking device 800, which includes:
a memory 810 for storing a computer program;
a processor 820, configured to implement the steps of any one of the enterprise information duplication checking methods according to the first aspect as described above when executing the computer program. Stored in a space in the memory 810 for storage of program code which, when executed by the processor 820, implements any of the methods of the embodiments of the invention.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements any of the steps of the enterprise information duplication checking method according to any of the above embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another apparatus, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a function calling device, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (10)

1. An enterprise information duplication checking method is characterized by comprising the following steps:
acquiring enterprise information, wherein the enterprise information comprises: a name of the business;
splitting the enterprise name into word-splitting phrases with the region, the word size, the operation range, the organization form and the organization type as categories;
and determining repeated enterprise information by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplication checking bases for the word segmentation word groups.
2. The enterprise information duplication checking method of claim 1,
after the obtaining of the business information;
before the splitting the enterprise name into word-splitting phrases with the categories of region, word size, operation range, organization form and organization type, the method further comprises the following steps:
converting full angle data or half angle data in the enterprise information into a first preset format;
converting Arabic numerals or Chinese character numerals in the enterprise information into a second preset format to obtain enterprise information with a uniform format;
converting the Chinese characters in the enterprise information into pinyin so as to realize homophone duplication checking;
converting the traditional Chinese characters in the enterprise information into simplified Chinese characters so as to realize repeated checking of simplified Chinese characters and traditional Chinese characters;
the first preset format is full angle data or half angle data; the second preset format is Arabic numerals or Chinese character numerals.
3. The enterprise information duplication checking method of claim 1,
the region, comprising: country, province, city, county;
the mechanism type comprises: company, office, department, hall, office, organization, office.
4. The enterprise information duplication checking method of claim 1,
the enterprise information further comprises: one or more combinations of contact names, mailbox addresses, telephone numbers and mobile phone numbers corresponding to the enterprise names;
correspondingly, the duplicate checking method further comprises the following steps:
and determining repeated enterprise information by using one or more combinations of contact names, mailbox addresses, telephone numbers and mobile phone numbers as a duplication checking basis.
5. The enterprise information duplication checking method according to any one of claims 1 to 4,
the word segmentation phrase is determined to repeated enterprise information by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplication checking bases, and the method comprises the following steps:
searching for duplication of the word segmentation phrases by using a first rule to obtain first repeated enterprise information;
searching for duplication of the word segmentation phrases by using a second rule to obtain second repeated enterprise information;
if enterprise information with information intersection exists in the first repeated enterprise information and the second repeated enterprise information, merging the first repeated enterprise information and the second repeated enterprise information into third repeated enterprise information;
wherein the first rule comprises: a combination of one or more of region, font size, business scope, organizational form, organization type; the second rule includes: a combination of one or more of region, font size, business scope, organizational form, organization type; the first rule is different from the second rule.
6. The enterprise information duplication checking method of claim 5 wherein,
after merging the first repeated enterprise information and the second repeated enterprise information into third repeated enterprise information if there is enterprise information with information intersection in the first repeated enterprise information and the second repeated enterprise information, the method further includes:
storing the first rule and the second rule as a rule group template for subsequent use and invocation;
and only one copy of completely repeated data in the enterprise information is reserved to obtain enterprise duplicate checking result data so as to facilitate the export and use of users.
7. An enterprise information duplication checking device is characterized by comprising:
an enterprise information obtaining module, configured to obtain enterprise information, where the enterprise information includes: a name of the business;
the enterprise name word segmentation module is used for splitting the enterprise name into word segmentation phrases with the types of regions, word sizes, operation ranges, organization forms and organization types;
and the word segmentation combination duplication checking module is used for determining repeated enterprise information by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplication checking bases for the word segmentation phrases.
8. The enterprise information duplication checking apparatus of claim 7,
the word segmentation combination duplication checking module comprises:
the first duplicate checking unit is used for checking duplicate of the word segmentation phrases by using a first rule to obtain first repeated enterprise information;
the second duplication checking unit is used for checking duplication of the word segmentation phrases by using a second rule to obtain second repeated enterprise information;
a data merging unit, configured to merge the first duplicate enterprise information and the second duplicate enterprise information into third duplicate enterprise information if there is enterprise information with information intersection in the first duplicate enterprise information and the second duplicate enterprise information;
wherein the first rule comprises: a combination of one or more of region, font size, business scope, organizational form, organization type; the second rule includes: a combination of one or more of region, font size, business scope, organizational form, organization type; the first rule is different from the second rule.
9. An enterprise information duplication checking device is characterized by comprising:
a memory for storing a computer program;
a processor for implementing the steps of the enterprise information duplication checking method as claimed in any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the enterprise information duplication checking method according to any one of claims 1 to 6.
CN201911018999.1A 2019-10-24 2019-10-24 Enterprise name duplicate checking method and device, equipment and medium Pending CN110750509A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911018999.1A CN110750509A (en) 2019-10-24 2019-10-24 Enterprise name duplicate checking method and device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911018999.1A CN110750509A (en) 2019-10-24 2019-10-24 Enterprise name duplicate checking method and device, equipment and medium

Publications (1)

Publication Number Publication Date
CN110750509A true CN110750509A (en) 2020-02-04

Family

ID=69279765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911018999.1A Pending CN110750509A (en) 2019-10-24 2019-10-24 Enterprise name duplicate checking method and device, equipment and medium

Country Status (1)

Country Link
CN (1) CN110750509A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832304A (en) * 2020-06-29 2020-10-27 上海巧房信息科技有限公司 Method and device for checking duplicate of building name, electronic equipment and storage medium
CN112364635A (en) * 2020-11-30 2021-02-12 中国银行股份有限公司 Enterprise name duplication checking method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104252507A (en) * 2013-06-28 2014-12-31 北京华傲达数据技术有限公司 Enterprise data matching method and device
CN104424202A (en) * 2013-08-21 2015-03-18 北大方正集团有限公司 Method and system for performing duplication checking on customer information in customer relationship management (CRM) system
CN108090185A (en) * 2017-12-16 2018-05-29 河北慧日信息技术有限公司 A kind of customer information duplicate checking method
CN109165326A (en) * 2018-08-16 2019-01-08 蜜小蜂智慧(北京)科技有限公司 A kind of character string matching method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104252507A (en) * 2013-06-28 2014-12-31 北京华傲达数据技术有限公司 Enterprise data matching method and device
CN104424202A (en) * 2013-08-21 2015-03-18 北大方正集团有限公司 Method and system for performing duplication checking on customer information in customer relationship management (CRM) system
CN108090185A (en) * 2017-12-16 2018-05-29 河北慧日信息技术有限公司 A kind of customer information duplicate checking method
CN109165326A (en) * 2018-08-16 2019-01-08 蜜小蜂智慧(北京)科技有限公司 A kind of character string matching method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832304A (en) * 2020-06-29 2020-10-27 上海巧房信息科技有限公司 Method and device for checking duplicate of building name, electronic equipment and storage medium
CN111832304B (en) * 2020-06-29 2024-02-27 上海巧房信息科技有限公司 Weight checking method and device for building names, electronic equipment and storage medium
CN112364635A (en) * 2020-11-30 2021-02-12 中国银行股份有限公司 Enterprise name duplication checking method and device
CN112364635B (en) * 2020-11-30 2023-11-21 中国银行股份有限公司 Enterprise name duplicate checking method and device

Similar Documents

Publication Publication Date Title
CN110088749B (en) Method, system and medium for automatic ontology generation
CN110704398B (en) Database migration method and device from MySQL to Oracle and computer equipment
US8108367B2 (en) Constraints with hidden rows in a database
CN110647614A (en) Intelligent question and answer method, device, medium and electronic equipment
US9229971B2 (en) Matching data based on numeric difference
DE112016003626T5 (en) Natural language interface to databases
US20120158714A1 (en) Storage and searching of temporal entity information
CN111324609A (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN113672781A (en) Data query method and device, electronic equipment and storage medium
CN110750509A (en) Enterprise name duplicate checking method and device, equipment and medium
CN111813914B (en) Question-answering method and device based on dictionary tree, recognition equipment and readable storage medium
CN110032594A (en) The data pick-up method, apparatus and storage medium of the Various database of customizable
CN111143074B (en) Method and device for distributing translation files
CN114327493A (en) Data processing method and device, electronic equipment and computer readable medium
CN106980617B (en) Method and system for operating database based on JSON statement
CN115809228A (en) Data comparison method and device, storage medium and electronic equipment
CN104657130A (en) Method for hierarchically layering business support system
CN116244387A (en) Entity relationship construction method, device, electronic equipment and storage medium
US11875132B2 (en) Validation of revised IVR prompt translation
CN115391432A (en) Judicial big data processing method, system, server and storage medium
CN113111066A (en) Automatic online method, device and system for database operation work order and computer equipment
US9275358B1 (en) System, method, and computer program for automatically creating and submitting defect information associated with defects identified during a software development lifecycle to a defect tracking system
CN115617773A (en) Data migration method, device and system
CN111427946A (en) Data processing method and device
US11861322B2 (en) Automated management of revised IVR prompt translations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200204

RJ01 Rejection of invention patent application after publication