CN110750509A - Enterprise name duplicate checking method and device, equipment and medium - Google Patents
Enterprise name duplicate checking method and device, equipment and medium Download PDFInfo
- Publication number
- CN110750509A CN110750509A CN201911018999.1A CN201911018999A CN110750509A CN 110750509 A CN110750509 A CN 110750509A CN 201911018999 A CN201911018999 A CN 201911018999A CN 110750509 A CN110750509 A CN 110750509A
- Authority
- CN
- China
- Prior art keywords
- enterprise information
- enterprise
- repeated
- information
- organization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000008520 organization Effects 0.000 claims abstract description 62
- 230000011218 segmentation Effects 0.000 claims abstract description 40
- 230000007246 mechanism Effects 0.000 claims abstract description 5
- 238000004590 computer program Methods 0.000 claims description 12
- 230000015572 biosynthetic process Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 8
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1748—De-duplication implemented within the file system, e.g. based on file segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- General Physics & Mathematics (AREA)
- Game Theory and Decision Science (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the invention provides an enterprise name duplicate checking method, an enterprise name duplicate checking device, enterprise name duplicate checking equipment and an enterprise name duplicate checking medium, wherein the method comprises the following steps: acquiring enterprise information, wherein the enterprise information comprises: a name of the business; splitting the enterprise name into word-splitting phrases with the region, the word size, the operation range, the organization form and the organization type as categories; for the word segmentation phrase, one or more combinations of regions, word sizes, operation ranges, organization forms and mechanism types are used as duplication checking bases to determine repeated enterprise information.
Description
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to an enterprise name duplicate checking method, device, equipment and medium.
Background
In actual production life, enterprise data of clients are arranged, and when oriented marketing is conducted on enterprises, a large amount of repeated enterprise data exist in sample data. The intelligent data duplicate checking system aims at the Chinese name of an enterprise to carry out duplicate checking operation. In order to reduce the enterprise dislike of marketing users in marketing, the marketing data repetition rate is reduced, so that the delivery of a marketing range is more accurately positioned, and the accuracy of Chinese data of enterprises is improved.
In thousands of data, it is very difficult to search for duplicate data by human power. In the prior art, the following schemes are generally adopted for duplicate checking: 1. and removing duplicate data of the identical enterprise name by using the duplicate checking function of the EXCEL. 2. And removing the duplicate data of the completely same enterprise name by using database tools such as SQL and the like. 3. And (4) removing duplicate data of the identical enterprise name by using a duplicate checking tool.
The prior art has the following disadvantages: 1. the operators are required to master the tools of EXCEL, SQL, duplicate checking, word segmentation and the like, and the requirement on the skills of the basic operators is high. 2. Requiring operators to understand certain duplicate checking methods and combining all the methods for duplicate checking. 3. The intermediate flow is more and the operation is complex. 4. The repetition rate of the duplicate checking result is high, completely repeated data can be removed, and approximately repeated data cannot be checked. 5. Consuming much resources. If the data volume is large, multiple persons are needed to cooperate and confirm repeatedly, and errors are easy to occur.
Therefore, it is a technical problem to be solved by those skilled in the art how to provide a business name duplication checking scheme, which can reduce the skill requirement on operators and conveniently and quickly check the business names.
Disclosure of Invention
Therefore, the embodiment of the invention provides an enterprise name duplication checking method, device, equipment and medium, which can reduce the skill requirement on operators and conveniently and quickly check the duplication of enterprise names.
In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides an enterprise information duplication checking method, including:
acquiring enterprise information, wherein the enterprise information comprises: a name of the business;
splitting the enterprise name into word-splitting phrases with the region, the word size, the operation range, the organization form and the organization type as categories;
and determining repeated enterprise information by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplication checking bases for the word segmentation word groups.
Preferably, after said obtaining business information;
before the splitting the enterprise name into word-splitting phrases with the categories of region, word size, operation range, organization form and organization type, the method further comprises the following steps:
converting full angle data or half angle data in the enterprise information into a first preset format;
converting Arabic numerals or Chinese character numerals in the enterprise information into a second preset format to obtain enterprise information with a uniform format;
converting the Chinese characters in the enterprise information into pinyin so as to realize homophone duplication checking;
converting the traditional Chinese characters in the enterprise information into simplified Chinese characters so as to realize repeated checking of simplified Chinese characters and traditional Chinese characters;
the first preset format is full angle data or half angle data; the second preset format is Arabic numerals or Chinese character numerals.
Preferably, the region comprises: country, province, city, county;
the mechanism type comprises: company, office, department, hall, office, organization, office.
Preferably, the enterprise information further includes: one or more combinations of contact names, mailbox addresses, telephone numbers and mobile phone numbers corresponding to the enterprise names;
correspondingly, the duplicate checking method further comprises the following steps:
and determining repeated enterprise information by using one or more combinations of contact names, mailbox addresses, telephone numbers and mobile phone numbers as a duplication checking basis.
Preferably, the determining repeated enterprise information for the word-segmentation phrase by using one or more combinations of regions, word sizes, business scopes, organizational forms and organization types as duplication checking bases includes:
searching for duplication of the word segmentation phrases by using a first rule to obtain first repeated enterprise information;
searching for duplication of the word segmentation phrases by using a second rule to obtain second repeated enterprise information;
if enterprise information with information intersection exists in the first repeated enterprise information and the second repeated enterprise information, merging the first repeated enterprise information and the second repeated enterprise information into third repeated enterprise information;
wherein the first rule comprises: a combination of one or more of region, font size, business scope, organizational form, organization type; the second rule includes: a combination of one or more of region, font size, business scope, organizational form, organization type; the first rule is different from the second rule.
Preferably, after merging the first repeated enterprise information and the second repeated enterprise information into third repeated enterprise information if there is enterprise information with information intersection in the first repeated enterprise information and the second repeated enterprise information, the method further includes:
storing the first rule and the second rule as a rule group template for subsequent use and invocation;
and only one copy of completely repeated data in the enterprise information is reserved to obtain enterprise duplicate checking result data so as to facilitate the export and use of users.
In a second aspect, an embodiment of the present invention provides an enterprise information duplication checking apparatus, including:
an enterprise information obtaining module, configured to obtain enterprise information, where the enterprise information includes: a name of the business;
the enterprise name word segmentation module is used for splitting the enterprise name into word segmentation phrases with the types of regions, word sizes, operation ranges, organization forms and organization types;
and the word segmentation combination duplication checking module is used for determining repeated enterprise information by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplication checking bases for the word segmentation phrases.
Preferably, the word segmentation combination duplication checking module includes:
the first duplicate checking unit is used for checking duplicate of the word segmentation phrases by using a first rule to obtain first repeated enterprise information;
the second duplication checking unit is used for checking duplication of the word segmentation phrases by using a second rule to obtain second repeated enterprise information;
a data merging unit, configured to merge the first duplicate enterprise information and the second duplicate enterprise information into third duplicate enterprise information if there is enterprise information with information intersection in the first duplicate enterprise information and the second duplicate enterprise information;
wherein the first rule comprises: a combination of one or more of region, font size, business scope, organizational form, organization type; the second rule includes: a combination of one or more of region, font size, business scope, organizational form, organization type; the first rule is different from the second rule.
In a third aspect, an embodiment of the present invention provides an enterprise information duplication checking device, including:
a memory for storing a computer program;
a processor, configured to implement the steps of the enterprise information duplication checking method according to any one of the above first aspects when the computer program is executed.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the steps of the enterprise information duplication checking method according to any one of the above first aspects.
The embodiment of the invention provides an enterprise information duplicate checking method, which comprises the following steps: acquiring enterprise information, wherein the enterprise information comprises: a name of the business; splitting the enterprise name into word-splitting phrases with the region, the word size, the operation range, the organization form and the organization type as categories; for the word segmentation phrase, one or more combinations of regions, word sizes, operation ranges, organization forms and mechanism types are used as duplication checking bases to determine repeated enterprise information.
The enterprise name duplicate checking method, the enterprise name duplicate checking device, the enterprise name duplicate checking equipment and the enterprise name duplicate checking medium provided by the embodiment of the invention have the beneficial effects which are not repeated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.
Fig. 1 is a flowchart of an enterprise information duplication checking method according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a unified format of an enterprise information duplication checking method according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a double duplication checking method for an enterprise information duplication checking method according to an embodiment of the present invention;
fig. 4 is a data storage call flow chart of an enterprise information duplication checking method according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating an enterprise information duplication checking method according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram illustrating a structure of an enterprise information duplication checking apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a word segmentation combination duplication checking module of an enterprise information duplication checking device according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an enterprise information duplication checking apparatus according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 2, fig. 3, fig. 4, and fig. 5, fig. 1 is a flowchart of an enterprise information duplication checking method according to an embodiment of the present invention; fig. 2 is a flowchart illustrating a unified format of an enterprise information duplication checking method according to an embodiment of the present invention; fig. 3 is a flowchart illustrating a double duplication checking method for an enterprise information duplication checking method according to an embodiment of the present invention; fig. 4 is a data storage call flow chart of an enterprise information duplication checking method according to an embodiment of the present invention; fig. 5 is a practical flowchart of an enterprise information duplication checking method according to an embodiment of the present invention.
The embodiment of the invention provides an enterprise information duplicate checking method, which comprises the following steps:
step S11: acquiring enterprise information, wherein the enterprise information comprises: a name of the business;
step S12: splitting the enterprise name into word-splitting phrases with the region, the word size, the operation range, the organization form and the organization type as categories;
step S13: and determining repeated enterprise information by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplication checking bases for the word segmentation word groups.
In the embodiment of the present invention, firstly, the enterprise information needs to be acquired, for example, the enterprise information may be acquired through an excel form file in which the enterprise information is stored, specifically, the enterprise information should at least include an enterprise name, and certainly, some other information of the enterprise may also be included, for example, business information, a contact name, a mailbox address, a telephone number, a mobile phone number, and the like corresponding to the enterprise name.
After the enterprise information is obtained, the enterprise name can be divided into word groups with the categories of region, word number, operation range, organization form and organization type according to the enterprise name in the enterprise information, generally, for the enterprise name, the enterprise name needs to meet certain specifications due to the regulations of national departments, for example, for an enterprise name called "Beijing knows guaguaguaguaguaguaguagua intellectual property agency, the Beijing is region, knows guaguaguaguaguaguaguagua is word number, the intellectual property agency is operation range, and the company has organization form and structure type. Therefore, the enterprise name can be segmented according to the rule of the enterprise name, and segmented word groups with the types of regions, word sizes, operation ranges, organization forms and organization types are obtained.
Specifically, for example, in china, the region in the name of a business, includes: country, province, city, county; therefore, Chinese place names can be ranked and ranked to facilitate word segmentation; and the types of institutions in china include: companies, offices, departments, halls, offices, institutions, offices, and the like, and these types of structures may be listed, and some of them, halls, offices, institutions, and offices are administrative structures.
After word-segmentation word groups are obtained, duplicate checking can be performed on each word group as a basis, for example, a region can be set as a duplicate checking basis, so that all enterprise names belonging to the same region can be obtained, which is not enough, because there are many enterprises in one region, further duplicate checking can be performed by using word numbers. For example, if the names of the businesses are obtained: three company names of Beijing Piagu intellectual property agency, Liaoning Piagu intellectual property agency, and Xian Piagu intellectual property agency are repeated when only the word size is used for duplication checking, but the three company names are not repeated when the region and the word size are used for duplication checking.
When the enterprise information, the method further comprises the following steps: one or more combinations of contact names, mailbox addresses, telephone numbers and mobile phone numbers corresponding to the enterprise names; correspondingly, the duplicate checking method further comprises the following steps: and determining repeated enterprise information by using one or more combinations of contact names, mailbox addresses, telephone numbers and mobile phone numbers as a duplication checking basis. That is to say, duplicate checking can be performed not only by using the word segmentation word group obtained by segmenting words in the enterprise name, but also by using the contact name, the mailbox address, the telephone number, the mobile phone number and the like.
Further, because the enterprise information acquired from various channels has different input methods and input habits, different expressions may exist in the same meaning, for example, beijing 108 chinese schools and beijing one-zero-eight chinese schools may be input, if the information has english characters, there may also be different half-corner and full-corner inputs, and if the differences are not subjected to standard processing, the name of the enterprise cannot be effectively checked, so after acquiring the enterprise information, before splitting the name of the enterprise into word-splitting phrases of which the categories are region, word size, business scope, organizational form and organization type, the following steps may be further implemented to achieve unification of the input formats:
step S21: converting full angle data or half angle data in the enterprise information into a first preset format;
step S22: converting Arabic numerals or Chinese character numerals in the enterprise information into a second preset format to obtain enterprise information with a uniform format;
step S23: converting the Chinese characters in the enterprise information into pinyin so as to realize homophone duplication checking;
step S24: converting the traditional Chinese characters in the enterprise information into simplified Chinese characters so as to realize repeated checking of simplified Chinese characters and traditional Chinese characters;
the first preset format is full angle data or half angle data; the second preset format is Arabic numerals or Chinese character numerals.
Furthermore, in practice, when two different sets of word segmentation phrase combinations are adopted as rules to screen the enterprise names, the obtained duplicate checking results are different, at this time, the results obtained by duplicate checking of the two sets of rules can be further operated to check the duplicate more thoroughly, and in order to realize that the repeated enterprise information is determined by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplicate checking bases, the following steps can be implemented:
step S31: searching for duplication of the word segmentation phrases by using a first rule to obtain first repeated enterprise information;
step S32: searching for duplication of the word segmentation phrases by using a second rule to obtain second repeated enterprise information;
step S33: if enterprise information with information intersection exists in the first repeated enterprise information and the second repeated enterprise information, merging the first repeated enterprise information and the second repeated enterprise information into third repeated enterprise information;
wherein the first rule comprises: a combination of one or more of region, font size, business scope, organizational form, organization type; the second rule includes: a combination of one or more of region, font size, business scope, organizational form, organization type; the first rule is different from the second rule. For example, if the first duplicate business information is: beijing zhigua intellectual property agency, Xian zhigua intellectual property agency; the second duplicate enterprise information is: beijing Zhigua intellectual Property agency, Shanghai Zhigua intellectual Property agency; at this time, the first repeated business information and the second repeated business information have the same business name: the Beijing Piagua intellectual Property agency company, so that information intersection exists in the first repeated enterprise information and the second repeated enterprise information; at this time, the first duplicate enterprise information and the second duplicate enterprise information may be merged to obtain third duplicate enterprise information: beijing Zhigua intellectual Property agency, Xian Zhigua intellectual Property agency, and Shanghai Zhigua intellectual Property agency.
It should be noted that after the enterprise information is checked for duplicate, that is, after the first duplicate enterprise information and the second duplicate enterprise information are combined into the third duplicate enterprise information, the rules may also be stored as a rule set template, so as to facilitate future use, and specifically, the following steps may be performed:
step S41: storing the first rule and the second rule as a rule group template for subsequent use and invocation;
step S42: and only one copy of completely repeated data in the enterprise information is reserved to obtain enterprise duplicate checking result data so as to facilitate the export and use of users.
The embodiment of the invention provides an enterprise information duplicate checking method, which comprises the following steps: acquiring enterprise information, wherein the enterprise information comprises: a name of the business; splitting the enterprise name into word-splitting phrases with the region, the word size, the operation range, the organization form and the organization type as categories; for the word segmentation phrase, one or more combinations of regions, word sizes, operation ranges, organization forms and mechanism types are used as duplication checking bases to determine repeated enterprise information.
Referring to fig. 6 and 7, fig. 6 is a schematic diagram illustrating a structure of an enterprise information duplication checking device according to an embodiment of the present invention; fig. 7 is a schematic structural diagram of a word segmentation combination duplication checking module of an enterprise information duplication checking device according to an embodiment of the present invention.
In a second aspect, an embodiment of the present invention provides an enterprise information duplication checking apparatus 600, including:
an enterprise information obtaining module 610, configured to obtain enterprise information, where the enterprise information includes: a name of the business;
an enterprise name word segmentation module 620, configured to split the enterprise name into word segmentation phrases with categories of a region, a word size, an operation range, an organization form, and an organization type;
and the word segmentation combination duplication checking module 630 is used for determining repeated enterprise information by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplication checking bases for the word segmentation word groups.
Preferably, the word segmentation and combination duplication checking module 630 includes:
the first duplicate checking unit 631 is configured to check duplicates of the word segmentation phrases by using a first rule to obtain first repeated enterprise information;
a second duplicate checking unit 632, configured to check duplicate of the word segmentation phrase by using a second rule, to obtain second duplicate enterprise information;
a data merging unit 633, configured to merge the first duplicate enterprise information and the second duplicate enterprise information into third duplicate enterprise information if there is enterprise information with information intersection in the first duplicate enterprise information and the second duplicate enterprise information;
wherein the first rule comprises: a combination of one or more of region, font size, business scope, organizational form, organization type; the second rule includes: a combination of one or more of region, font size, business scope, organizational form, organization type; the first rule is different from the second rule.
Referring to fig. 8 and 9, fig. 8 is a schematic structural diagram of an enterprise information duplication checking device according to an embodiment of the present invention; fig. 9 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
The embodiment of the present invention provides an enterprise information duplication checking device 800, which includes:
a memory 810 for storing a computer program;
a processor 820, configured to implement the steps of any one of the enterprise information duplication checking methods according to the first aspect as described above when executing the computer program. Stored in a space in the memory 810 for storage of program code which, when executed by the processor 820, implements any of the methods of the embodiments of the invention.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements any of the steps of the enterprise information duplication checking method according to any of the above embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another apparatus, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a function calling device, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.
Claims (10)
1. An enterprise information duplication checking method is characterized by comprising the following steps:
acquiring enterprise information, wherein the enterprise information comprises: a name of the business;
splitting the enterprise name into word-splitting phrases with the region, the word size, the operation range, the organization form and the organization type as categories;
and determining repeated enterprise information by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplication checking bases for the word segmentation word groups.
2. The enterprise information duplication checking method of claim 1,
after the obtaining of the business information;
before the splitting the enterprise name into word-splitting phrases with the categories of region, word size, operation range, organization form and organization type, the method further comprises the following steps:
converting full angle data or half angle data in the enterprise information into a first preset format;
converting Arabic numerals or Chinese character numerals in the enterprise information into a second preset format to obtain enterprise information with a uniform format;
converting the Chinese characters in the enterprise information into pinyin so as to realize homophone duplication checking;
converting the traditional Chinese characters in the enterprise information into simplified Chinese characters so as to realize repeated checking of simplified Chinese characters and traditional Chinese characters;
the first preset format is full angle data or half angle data; the second preset format is Arabic numerals or Chinese character numerals.
3. The enterprise information duplication checking method of claim 1,
the region, comprising: country, province, city, county;
the mechanism type comprises: company, office, department, hall, office, organization, office.
4. The enterprise information duplication checking method of claim 1,
the enterprise information further comprises: one or more combinations of contact names, mailbox addresses, telephone numbers and mobile phone numbers corresponding to the enterprise names;
correspondingly, the duplicate checking method further comprises the following steps:
and determining repeated enterprise information by using one or more combinations of contact names, mailbox addresses, telephone numbers and mobile phone numbers as a duplication checking basis.
5. The enterprise information duplication checking method according to any one of claims 1 to 4,
the word segmentation phrase is determined to repeated enterprise information by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplication checking bases, and the method comprises the following steps:
searching for duplication of the word segmentation phrases by using a first rule to obtain first repeated enterprise information;
searching for duplication of the word segmentation phrases by using a second rule to obtain second repeated enterprise information;
if enterprise information with information intersection exists in the first repeated enterprise information and the second repeated enterprise information, merging the first repeated enterprise information and the second repeated enterprise information into third repeated enterprise information;
wherein the first rule comprises: a combination of one or more of region, font size, business scope, organizational form, organization type; the second rule includes: a combination of one or more of region, font size, business scope, organizational form, organization type; the first rule is different from the second rule.
6. The enterprise information duplication checking method of claim 5 wherein,
after merging the first repeated enterprise information and the second repeated enterprise information into third repeated enterprise information if there is enterprise information with information intersection in the first repeated enterprise information and the second repeated enterprise information, the method further includes:
storing the first rule and the second rule as a rule group template for subsequent use and invocation;
and only one copy of completely repeated data in the enterprise information is reserved to obtain enterprise duplicate checking result data so as to facilitate the export and use of users.
7. An enterprise information duplication checking device is characterized by comprising:
an enterprise information obtaining module, configured to obtain enterprise information, where the enterprise information includes: a name of the business;
the enterprise name word segmentation module is used for splitting the enterprise name into word segmentation phrases with the types of regions, word sizes, operation ranges, organization forms and organization types;
and the word segmentation combination duplication checking module is used for determining repeated enterprise information by taking one or more combinations of regions, word sizes, operation ranges, organization forms and organization types as duplication checking bases for the word segmentation phrases.
8. The enterprise information duplication checking apparatus of claim 7,
the word segmentation combination duplication checking module comprises:
the first duplicate checking unit is used for checking duplicate of the word segmentation phrases by using a first rule to obtain first repeated enterprise information;
the second duplication checking unit is used for checking duplication of the word segmentation phrases by using a second rule to obtain second repeated enterprise information;
a data merging unit, configured to merge the first duplicate enterprise information and the second duplicate enterprise information into third duplicate enterprise information if there is enterprise information with information intersection in the first duplicate enterprise information and the second duplicate enterprise information;
wherein the first rule comprises: a combination of one or more of region, font size, business scope, organizational form, organization type; the second rule includes: a combination of one or more of region, font size, business scope, organizational form, organization type; the first rule is different from the second rule.
9. An enterprise information duplication checking device is characterized by comprising:
a memory for storing a computer program;
a processor for implementing the steps of the enterprise information duplication checking method as claimed in any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the enterprise information duplication checking method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911018999.1A CN110750509A (en) | 2019-10-24 | 2019-10-24 | Enterprise name duplicate checking method and device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911018999.1A CN110750509A (en) | 2019-10-24 | 2019-10-24 | Enterprise name duplicate checking method and device, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110750509A true CN110750509A (en) | 2020-02-04 |
Family
ID=69279765
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911018999.1A Pending CN110750509A (en) | 2019-10-24 | 2019-10-24 | Enterprise name duplicate checking method and device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110750509A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111832304A (en) * | 2020-06-29 | 2020-10-27 | 上海巧房信息科技有限公司 | Method and device for checking duplicate of building name, electronic equipment and storage medium |
CN112364635A (en) * | 2020-11-30 | 2021-02-12 | 中国银行股份有限公司 | Enterprise name duplication checking method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104252507A (en) * | 2013-06-28 | 2014-12-31 | 北京华傲达数据技术有限公司 | Enterprise data matching method and device |
CN104424202A (en) * | 2013-08-21 | 2015-03-18 | 北大方正集团有限公司 | Method and system for performing duplication checking on customer information in customer relationship management (CRM) system |
CN108090185A (en) * | 2017-12-16 | 2018-05-29 | 河北慧日信息技术有限公司 | A kind of customer information duplicate checking method |
CN109165326A (en) * | 2018-08-16 | 2019-01-08 | 蜜小蜂智慧(北京)科技有限公司 | A kind of character string matching method and device |
-
2019
- 2019-10-24 CN CN201911018999.1A patent/CN110750509A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104252507A (en) * | 2013-06-28 | 2014-12-31 | 北京华傲达数据技术有限公司 | Enterprise data matching method and device |
CN104424202A (en) * | 2013-08-21 | 2015-03-18 | 北大方正集团有限公司 | Method and system for performing duplication checking on customer information in customer relationship management (CRM) system |
CN108090185A (en) * | 2017-12-16 | 2018-05-29 | 河北慧日信息技术有限公司 | A kind of customer information duplicate checking method |
CN109165326A (en) * | 2018-08-16 | 2019-01-08 | 蜜小蜂智慧(北京)科技有限公司 | A kind of character string matching method and device |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111832304A (en) * | 2020-06-29 | 2020-10-27 | 上海巧房信息科技有限公司 | Method and device for checking duplicate of building name, electronic equipment and storage medium |
CN111832304B (en) * | 2020-06-29 | 2024-02-27 | 上海巧房信息科技有限公司 | Weight checking method and device for building names, electronic equipment and storage medium |
CN112364635A (en) * | 2020-11-30 | 2021-02-12 | 中国银行股份有限公司 | Enterprise name duplication checking method and device |
CN112364635B (en) * | 2020-11-30 | 2023-11-21 | 中国银行股份有限公司 | Enterprise name duplicate checking method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110088749B (en) | Method, system and medium for automatic ontology generation | |
CN110704398B (en) | Database migration method and device from MySQL to Oracle and computer equipment | |
US8108367B2 (en) | Constraints with hidden rows in a database | |
CN110647614A (en) | Intelligent question and answer method, device, medium and electronic equipment | |
US9229971B2 (en) | Matching data based on numeric difference | |
DE112016003626T5 (en) | Natural language interface to databases | |
US20120158714A1 (en) | Storage and searching of temporal entity information | |
CN111324609A (en) | Knowledge graph construction method and device, electronic equipment and storage medium | |
CN113672781A (en) | Data query method and device, electronic equipment and storage medium | |
CN110750509A (en) | Enterprise name duplicate checking method and device, equipment and medium | |
CN111813914B (en) | Question-answering method and device based on dictionary tree, recognition equipment and readable storage medium | |
CN110032594A (en) | The data pick-up method, apparatus and storage medium of the Various database of customizable | |
CN111143074B (en) | Method and device for distributing translation files | |
CN114327493A (en) | Data processing method and device, electronic equipment and computer readable medium | |
CN106980617B (en) | Method and system for operating database based on JSON statement | |
CN115809228A (en) | Data comparison method and device, storage medium and electronic equipment | |
CN104657130A (en) | Method for hierarchically layering business support system | |
CN116244387A (en) | Entity relationship construction method, device, electronic equipment and storage medium | |
US11875132B2 (en) | Validation of revised IVR prompt translation | |
CN115391432A (en) | Judicial big data processing method, system, server and storage medium | |
CN113111066A (en) | Automatic online method, device and system for database operation work order and computer equipment | |
US9275358B1 (en) | System, method, and computer program for automatically creating and submitting defect information associated with defects identified during a software development lifecycle to a defect tracking system | |
CN115617773A (en) | Data migration method, device and system | |
CN111427946A (en) | Data processing method and device | |
US11861322B2 (en) | Automated management of revised IVR prompt translations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200204 |
|
RJ01 | Rejection of invention patent application after publication |