CN114861787A - Method and device for acquiring company under name of person to be inquired under condition of duplicate name - Google Patents

Method and device for acquiring company under name of person to be inquired under condition of duplicate name Download PDF

Info

Publication number
CN114861787A
CN114861787A CN202210459194.6A CN202210459194A CN114861787A CN 114861787 A CN114861787 A CN 114861787A CN 202210459194 A CN202210459194 A CN 202210459194A CN 114861787 A CN114861787 A CN 114861787A
Authority
CN
China
Prior art keywords
company
name
group
under
inquired
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210459194.6A
Other languages
Chinese (zh)
Inventor
马大蔚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jindi Technology Co Ltd
Original Assignee
Beijing Jindi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jindi Technology Co Ltd filed Critical Beijing Jindi Technology Co Ltd
Priority to CN202210459194.6A priority Critical patent/CN114861787A/en
Publication of CN114861787A publication Critical patent/CN114861787A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method and a device for acquiring companies under the name of a person to be inquired with a duplicate name, a storage medium and an electronic device are disclosed, wherein the method comprises the following steps: acquiring each under-name company of the renamed personnel, and acquiring the characteristic data of each under-name company, wherein the renamed personnel comprise the personnel to be inquired and the personnel renamed by the personnel to be inquired; forming a company group based on the association characteristics among the companies under the names of the renamed persons, gathering a plurality of company groups into a company group set, and if different companies have the same characteristic data, referring the characteristic data as the association characteristics; and comparing the similarity of the company group where the company under the name of the person to be inquired is located with other company groups in the company group set, and searching each company group meeting the similarity requirement according to the comparison result so as to obtain each company under the name of the person to be inquired. The method enables the acquired data of the company under the name of the boss with the same name to be more comprehensive and accurate, meets the requirement of business query, and greatly improves the query experience and satisfaction of query users.

Description

Method and device for acquiring company under name of person to be inquired under condition of duplicate name
Technical Field
The invention relates to the technical field of company under name inquiry, in particular to a method and a device for acquiring a company under name of a person to be inquired under the condition of a duplicate name, a storage medium and electronic equipment.
Background
In daily life and various business activities, people often need to query information such as corporate representatives, high governments, etc. of a certain company and query other companies in the name of the person. In the inquiry process, when there is a case of renaming with a person to be inquired and it is required to determine whether the same person of two or more companies is the same natural person, it is generally determined whether the same person of two companies is the same person by an association characteristic between the companies.
The friend calculation method is that the correlation characteristics of the companies under the same name boss name are obtained, the information comprises the information of the mail box, the telephone, the address, other personnel and the like of the company, then whether other companies under the same name boss name have the correlation characteristics is searched, if the two companies have the correlation characteristics, the two companies are considered to be correlated, and the same name bosses of the two companies are the same person until all the companies with the correlation characteristics are found.
However, the query result of the existing query method is not accurate and comprehensive enough, the necessary association relationship cannot be found completely in the query process, and potential companies without association characteristics are easily missed, so that the situation that part of companies are missed occurs.
In view of the above, it is desirable to provide a query method to solve the problem in the prior art that a company under the name of a person to be queried with a duplicate name cannot be queried accurately and comprehensively.
Disclosure of Invention
In order to solve the problems that in the prior art, a query result is not accurate and comprehensive enough, and potential companies without associated features are easily missed in a query process, the embodiment of the invention provides a method and a device for acquiring companies under the name of a person to be queried under the condition of a duplicate name, a storage medium and electronic equipment.
According to a first aspect of the embodiments of the present invention, there is provided a method for acquiring a company under a name of a person to be queried, where there is a duplicate name, the method including:
acquiring each under-name company of the renamed personnel and acquiring the characteristic data of each under-name company, wherein the renamed personnel comprise personnel to be inquired and personnel renamed by the personnel to be inquired;
forming a company group based on the association characteristics among the companies under the names of the renamed persons, and gathering a plurality of company groups into a company group set, wherein if different companies have the same characteristic data, the characteristic data is called as the association characteristics;
and comparing the similarity of the company group where the company under the name of the person to be inquired is located with other company groups in the company group set, and searching each company group meeting the similarity requirement according to the comparison result so as to obtain each company under the name of the person to be inquired.
Preferably, a company group is formed based on the association characteristics between the respective sub-names of the renames, and before a plurality of company groups are assembled into a company group aggregate,
and firstly, cleaning and filtering each item of feature data in the company group to filter out the feature data which are not suitable for being used as the associated features.
Preferably, the cleaning filter object comprises pseudo-registered feature data, and feature data registered by a group.
Preferably, the characteristic data includes phone call, mailbox, associated person, investment relation, address, industry and operation range.
Preferably, a company group is formed based on the association characteristics between the respective famous companies of the rename, and the plurality of company groups are assembled into a company group set including:
associating two companies with association characteristics to form a company pair;
combining the company pairs with the associated characteristics into a group through connectivity to form the company group, and gathering a plurality of company groups into a company group set.
Preferably, merging the pairs of companies with the associated characteristics into a group through connectivity comprises:
if one company pair and the other company pair comprise the same company, the two company pairs are judged to have connectivity, and a plurality of company pairs with connectivity are combined into a group through the connectivity.
Preferably, the similarity comparison between the company group where the company under the name of the person to be queried is located and other company groups in the company group set is performed, and each company group meeting the similarity requirement is searched according to the comparison result, so that obtaining each company under the name of the person to be queried includes:
calculating a feature vector of each company group in the company group set;
for the named companies of the persons to be inquired, judging the company groups where the named companies of the persons to be inquired are located, and respectively calculating the feature vector similarity between the company groups where the named companies of the persons to be inquired are located and other company groups in the company group set;
if the similarity of the calculated feature vector is greater than the similarity threshold value, the company group where the named company of the person to be inquired is located is judged, and the company group and the corresponding company group participating in similarity comparison belong to the company group named by the person to be inquired in a unified manner;
and collecting the determined multiple company groups which uniformly belong to the names of the persons to be inquired to form a company group set of the names of the persons to be inquired, so as to obtain each company under the name of the person to be inquired.
Preferably, calculating the feature vector of each company group in the company group set comprises:
carrying out intra-group fusion on the characteristic data of the intra-group companies of each company group in the company group set to form the characteristics of the company group;
and performing feature vectorization on the formed features of the company group to obtain a feature vector of the company group.
Preferably, the feature vectorizing the features of the formed company group comprises:
the characteristic data is coded by one-hot,
and/or the presence of a gas in the gas,
and vectorizing the characteristic data by adopting a doc2vec method.
Preferably, the similarity comparison between the company group where the company under the name of the person to be queried is located and other company groups in the company group set is performed, and each company group meeting the similarity requirement is searched according to the comparison result, so that obtaining each company under the name of the person to be queried includes:
calculating the characteristic vector of the company in the group of each company group in the company group set;
calculating the clustering center of the characteristic vector of the company in each company group;
for the company under the name of the person to be inquired, judging the company group where the company under the name of the person to be inquired is located, and respectively calculating the distance between the cluster center of other company groups in the company group set and the cluster center of the company group where the company under the name of the person to be inquired is located;
if the distance is smaller than the distance threshold value, the similarity between the company group participating in the distance comparison and the company group where the company under the name of the inquirer is located is judged, and the company group uniformly belong to the company group under the name of the inquirer;
and collecting the determined companies which uniformly belong to the names of the persons to be inquired to form a company group set under the names of the persons to be inquired, so as to obtain each company under the name of the person to be inquired.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for obtaining a company under the name of a person to be queried, where there is a duplicate name, the apparatus including:
the system comprises an acquisition unit, a query unit and a display unit, wherein the acquisition unit is used for acquiring each under-name company of the renamed personnel and acquiring the characteristic data of each under-name company, and the renamed personnel comprise personnel to be queried and personnel renamed by the personnel to be queried;
the collecting unit is used for forming a company group based on the associated characteristics of the companies under the names of the renamed persons, and a plurality of company groups are collected into a company group set, wherein if the different companies have the same characteristic data, the characteristic data is called as the associated characteristics;
and the searching unit is used for comparing the similarity of the company group where the company under the name of the person to be inquired is located with other company groups in the company group set, and searching each company group meeting the similarity requirement according to the comparison result so as to obtain each company under the name of the person to be inquired.
According to a third aspect of embodiments of the present invention, there is provided a computer-readable storage medium storing a computer program for executing the above-described method of acquiring a company under a person to be queried in which a duplicate name exists.
According to a fourth aspect of embodiments of the present invention, there is provided an electronic apparatus, including:
a processor;
a memory for storing executable instructions of the processor;
the processor is used for reading the executable instruction from the memory and executing the instruction to realize the method for acquiring the company under the name of the person to be inquired with the duplicate name.
Based on the method and the apparatus for acquiring the company under the name of the person to be queried under the condition of the duplicate name, the storage medium and the electronic device provided by the above embodiment of the invention, the method for acquiring the company under the name of the person to be queried under the condition of the duplicate name comprises the following steps: acquiring each under-name company of the renamed personnel and acquiring the characteristic data of each under-name company, wherein the renamed personnel comprise personnel to be inquired and personnel renamed by the personnel to be inquired; forming a company group based on the association characteristics among the companies under the names of the renamed persons, and gathering a plurality of company groups into a company group set, wherein if different companies have the same characteristic data, the characteristic data is called as the association characteristics; and comparing the similarity of the company group where the company under the name of the person to be inquired is located with other company groups in the company group set, and searching each company group meeting the similarity requirement according to the comparison result so as to obtain each company under the name of the person to be inquired. According to the technical scheme, a company group set is obtained by calculating relevant characteristic data of companies under the names of the famous persons, then similarity comparison is carried out on the company group where the company under the name of the person to be inquired is located and other company groups in the company group set, each company group meeting the similarity requirement is searched according to the comparison result, so that each company under the name of the person to be inquired is obtained, all companies under the name of the person to be inquired are finally obtained, the potential company condition without relevant characteristics is effectively avoided being omitted, the obtained data of the company under the same name and boss name are more comprehensive and accurate, the business inquiry requirement is met, and the inquiry experience and satisfaction of inquiry users are greatly improved.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail embodiments of the present invention with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.
FIG. 1 is a flowchart illustrating a method for obtaining a company under a name of a person to be queried in the presence of a duplicate name according to an exemplary embodiment of the invention;
FIG. 2 is a schematic structural diagram of an apparatus for obtaining a company under the name of a person to be queried in the presence of a duplicate name according to an exemplary embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present invention.
Detailed Description
Hereinafter, example embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present invention are used merely to distinguish one element, step, device, module, or the like from another element, and do not denote any particular technical or logical order therebetween.
It should also be understood that in embodiments of the present invention, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.
It is also to be understood that any reference to any component, data, or structure in the embodiments of the invention may be generally understood as one or more, unless explicitly defined otherwise or stated to the contrary hereinafter.
In addition, the term "and/or" in the present invention is only one kind of association relationship describing the associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In the present invention, the character "/" generally indicates that the preceding and following related objects are in an "or" relationship.
It should also be understood that the description of the embodiments of the present invention emphasizes the differences between the embodiments, and the same or similar parts may be referred to each other, and are not repeated herein for brevity.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations, and with numerous other electronic devices, such as terminal devices, computer systems, servers, etc. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Exemplary method
Fig. 1 is a flowchart illustrating a method for obtaining a company under a name of a person to be queried in the presence of a duplicate name according to an exemplary embodiment of the present invention. The embodiment can be applied to electronic equipment, and as shown in fig. 1, the method for acquiring a company under the name of a person to be queried, where a duplicate name exists, includes the following steps:
step S01, acquiring each under-name company of the duplicate name personnel and acquiring the characteristic data of each under-name company, wherein the duplicate name personnel comprise the personnel to be inquired and the personnel with the duplicate name;
in the step, the person to be queried and each under-name company of the person with the same name are obtained, and the characteristic data of each under-name company is further obtained, wherein the characteristic data comprises but is not limited to a telephone, a mailbox, associated persons, an investment relation, an address, an industry and an operation range.
Step S02, cleaning and filtering each item of feature data in the company group, and filtering out the feature data which are not suitable for being used as the associated features;
in this step, each item of feature data in the company group is cleaned and filtered, and the feature data which are not suitable for being used as the associated features are filtered out. The cleaning filter objects include, but are not limited to, pseudo-registered feature data, and feature data registered by a community.
For the characteristic data of the pseudo registration, for example, common numbers are used to replace telephone information, specifically, for example, "123456" or "111111" is used as the telephone information of the company when the company is registered, and the like, cleaning is needed, and the pseudo registration information is removed by establishing a black list. The method for cleaning and filtering the pseudo-registration feature data is also suitable for other feature data such as mailboxes, names and the like, and can remove some very common names and very obvious forged names such as Zhang three, Li Si and the like for the names.
The feature data of group registration often involves a very large registration amount, and can be cleaned by counting the number of times of registration, and the like, specifically, a threshold value of the registration amount of the same feature data is set to determine whether the feature data is group registration, and if the registration amount of a certain feature data exceeds the threshold value of the registration amount, the feature data is determined to be the feature data of group registration. Specifically, if more than 100 companies register the same phone, the phone is considered as group registered feature data, which cannot be used as the associated feature.
Through the cleaning and filtering step, the feature data which are not suitable for being used as the associated features are filtered, so that the conditions of huge information processing amount and inaccurate obtained results caused by subsequent invalid feature data are prevented, the error association is reduced, and the subsequent data processing efficiency and the obtained result accuracy are greatly improved.
Step S03, forming a company group based on the associated characteristics of the companies under the names of the renamed persons, and gathering a plurality of company groups into a company group set, wherein if the different companies have the same characteristic data, the characteristic data is called as the associated characteristics;
in this step, a company group is formed based on the association characteristics between the companies under the names of the renamed persons, and the gathering of a plurality of company groups into a company group set further includes step S031 and step S032, which are specifically as follows:
step S031: associating two companies with association characteristics to form a company pair;
the two companies with the association characteristics are associated, for example, the two companies with the association characteristics such as the same telephone, the same mailbox, the same other people, the same investment relationship and the like are associated, so that the two associated companies form a company pair, and the two companies can be associated to perform the company pair as long as one association characteristic exists.
Step S032: combining the company pairs with the associated characteristics into a group through connectivity to form the company group, and gathering a plurality of company groups into a company group set.
In this step, if one of the company pairs and the other of the company pairs comprise a same company, it is determined that the two company pairs have connectivity, a plurality of company pairs having connectivity are combined into a group through connectivity, that is, a company group is formed, and a plurality of company groups are collected into a company group set { S } 1 ,S 2 ,S 3 ···S n Where n is an integer greater than 1.
And step S04, comparing the similarity of the company group of the company under the name of the person to be inquired with other company groups in the company group set, and searching each company group meeting the similarity requirement according to the comparison result, thereby obtaining each company under the name of the person to be inquired.
In one embodiment, the step S04 further includes a first step to a fourth step, which are as follows:
the first step is as follows: calculating a feature vector of each company group in the company group set;
specifically, firstly, performing intra-group fusion on the feature data of the intra-group companies of each company group in the company group set, namely splicing the feature data of a plurality of companies in the company group, such as addresses, industries, operation ranges and the like, to form the features of the company group;
and then, performing feature vectorization on the formed features of the company group to obtain a feature vector of the company group. When the characteristics of the company group are vectorized, one-hot coding is adopted for the characteristic data, and/or a doc2vec method is adopted for the characteristic data to be vectorized. For example, the address is subjected to one-hot coding according to province, city and county, and the industry and the operation range are vectorized by adopting a doc2vec method, so that the characteristic vector of the company group is obtained.
The second step is as follows: for the named companies of the persons to be inquired, judging the company groups where the named companies of the persons to be inquired are located, and respectively calculating the feature vector similarity between the company groups where the named companies of the persons to be inquired are located and other company groups in the company group set;
specifically, based on the determined information, for example, it is already clear that a certain company in the company group set belongs to a company under the name of the person to be queried, the company group in which the company under the name of the person to be queried belongs is determined, and then feature vector similarities between the company group in which the company under the name of the person to be queried belongs and other company groups in the company group set are respectively calculated.
The third step: if the similarity of the calculated feature vector is greater than the similarity threshold value, the company group where the named company of the person to be inquired is located is judged, and the company group and the corresponding company group participating in similarity comparison belong to the company group named by the person to be inquired in a unified manner;
specifically, if the calculated similarity of the feature vector is greater than the similarity threshold, the company group in which the named company of the person to be queried belongs is determined, and the company group and the corresponding company group participating in the similarity comparison belong to the company group in which the named company of the person to be queried belongs together, that is, the company group in which the named company of the person to be queried belongs and the corresponding company group participating in the similarity comparison belong to the same person name; on the contrary, the corresponding company group participating in the similarity comparison is judged not to belong to the company group under the name of the person to be inquired, that is, the company group in which the company under the name of the person to be inquired is judged not to belong to the same person name as the corresponding company group participating in the similarity comparison.
The fourth step: and collecting the determined multiple company groups which uniformly belong to the names of the persons to be inquired to form a company group set of the names of the persons to be inquired, so as to obtain each company under the name of the person to be inquired.
Specifically, the plurality of company groups determined in the third step and uniformly belonging to the name of the person to be queried are collected, the collected objects include the company group where the company under the name of the person to be queried belongs and the company groups selected by similarity comparison and belonging to the name of the person to be queried, and a company group set under the name of the person to be queried is formed by collection, each company in the company group set can form a company set, and each company in the company set is the company under the name of the person to be queried, so that each company under the name of the person to be queried is obtained.
In another alternative embodiment, the step S04 further includes steps one to five, which are as follows:
the method comprises the following steps: calculating the characteristic vector of the company in the group of each company group in the company group set;
step two: calculating the clustering center of the characteristic vector of the company in each company group;
step three: for the company under the name of the person to be inquired, judging the company group where the company under the name of the person to be inquired is located, and respectively calculating the distance between the cluster center of other company groups in the company group set and the cluster center of the company group where the company under the name of the person to be inquired is located;
step four: if the distance is smaller than the distance threshold value, the similarity between the company group participating in the distance comparison and the company group where the company under the name of the inquirer is located is judged, and the company group uniformly belong to the company group under the name of the inquirer;
specifically, if the distance value calculated in the third step is smaller than the distance threshold, it is determined that the company group participating in the distance comparison has similarity with the company group in which the company under the name of the inquiring person is located, and the company group belong to the company group under the name of the inquiring person in a unified manner, that is, it is determined that the company group participating in the distance comparison and the company group in which the company under the name of the inquiring person is located belong to the same person name; on the contrary, the company group participating in the distance comparison is judged not to belong to the same person name as the company group of the company under the name of the inquirer.
Step five: and collecting the determined companies which uniformly belong to the names of the persons to be inquired to form a company group set under the names of the persons to be inquired, so as to obtain each company under the name of the person to be inquired.
Specifically, the determined companies belonging to the name of the person to be queried in a unified manner are collected, the collection object includes a company group where the company belonging to the name of the person to be queried belongs, and a company group which belongs to the name of the person to be queried and is screened out through similarity comparison, a company group set under the name of the person to be queried is formed through collection, each company in the company group set can form a company set, each company in the company set is the company under the name of the person to be queried, and therefore each company under the name of the person to be queried is obtained.
The similarity comparison is carried out on the company group where the company under the name of the person to be inquired is located and other company groups in the company group set, each company group meeting the similarity requirement is searched according to the comparison result, so that each company under the name of the person to be inquired is obtained, all companies under the name of the person to be inquired are finally obtained, the condition that companies without correlation characteristics are omitted is effectively avoided, the data of the company under the name of the same boss is more comprehensive and accurate to obtain, the business inquiry requirement is met, and the inquiry experience and the satisfaction degree of inquiry users are greatly improved.
Exemplary devices
Fig. 2 is a schematic structural diagram of an apparatus for acquiring a company under the name of a person to be queried in which a duplicate name exists according to an exemplary embodiment of the present invention. As shown in fig. 2, the apparatus for acquiring a company with a name of a person to be queried and a duplicate name according to this embodiment includes:
the acquiring unit 201 is configured to acquire each under-name company of the renamed person, and acquire feature data of each under-name company, where the renamed person includes a person to be queried and a person renamed therewith;
a collecting unit 202, configured to form a company group based on the associated features of the renamed companies, where multiple company groups are collected into a company group set, where if different companies have the same feature data, the feature data is referred to as an associated feature;
the searching unit 203 is configured to compare similarity between a company group where a company under the name of the person to be queried is located and other company groups in the company group set, and search each company group meeting the requirement of similarity according to a comparison result, thereby obtaining each company under the name of the person to be queried.
Exemplary computer program product and computer-readable storage Medium
In addition to the above methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the filtering method for sensitive web site images according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.
The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions, which, when executed by a processor, cause the processor to perform the steps in the filtering method for sensitive website images according to various embodiments of the present disclosure described in the above section "exemplary method" of this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Exemplary electronic device
Fig. 3 is a structure of an electronic device according to an exemplary embodiment of the present invention. The electronic device may be either or both of the first device and the second device, or a stand-alone device separate from them, which stand-alone device may communicate with the first device and the second device to receive the acquired input signals therefrom. FIG. 3 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure. As shown in fig. 3, the electronic device includes one or more processors 301 and memory 302.
The processor 301 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
Memory 302 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 301 to implement the method of information mining of historical change records of the software program of the disclosed embodiments described above and/or other desired functions. In one example, the electronic device may further include: an input device 303 and an output device 304, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
The input device 303 may also include, for example, a keyboard, a mouse, and the like.
The output device 304 can output various information to the outside. The output devices 304 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, for simplicity, only some of the components of the electronic device relevant to the present disclosure are shown in fig. 3, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device may include any other suitable components, depending on the particular application.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure. The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (13)

1. A method for acquiring a company under the name of a person to be inquired under a condition of duplicate name, which is characterized by comprising the following steps:
acquiring each under-name company of the renamed personnel and acquiring the characteristic data of each under-name company, wherein the renamed personnel comprise personnel to be inquired and personnel renamed by the personnel to be inquired;
forming a company group based on the association characteristics among the companies under the names of the renamed persons, and gathering a plurality of company groups into a company group set, wherein if different companies have the same characteristic data, the characteristic data is called as the association characteristics;
and comparing the similarity of the company group where the company under the name of the person to be inquired is located with other company groups in the company group set, and searching each company group meeting the similarity requirement according to the comparison result so as to obtain each company under the name of the person to be inquired.
2. The method of claim 1, wherein a company group is formed based on the association characteristics between the respective sub-named companies of the renamed person, wherein before a plurality of company groups are assembled into a company group aggregate,
and firstly, cleaning and filtering each item of feature data in the company group to filter out the feature data which are not suitable for being used as the associated features.
3. The method of claim 2, wherein the cleaning filter objects comprise pseudo-registered feature data and feature data registered by a community.
4. The method of claim 1, wherein the characteristic data includes phone calls, mailboxes, associated personnel, investment relationships, addresses, industry, and business scope.
5. The method of claim 1, wherein a corporate group is formed based on the association between the respective nominated companies of the renamed personnel, and wherein the assembling of the plurality of corporate groups into a corporate group collection comprises:
associating two companies with association characteristics to form a company pair;
combining the company pairs with the associated characteristics into a group through connectivity to form the company group, and gathering a plurality of company groups into a company group set.
6. The method of claim 5, wherein merging the pairs of companies with associated characteristics into a group via connectivity comprises:
if one company pair and the other company pair comprise the same company, the two company pairs are judged to have connectivity, and a plurality of company pairs with connectivity are combined into a group through the connectivity.
7. The method according to claim 1, wherein the step of comparing the similarity between the company group in which the company under the name of the person to be queried is located and other company groups in the company group set, and searching each company group meeting the similarity requirement according to the comparison result, thereby obtaining each company under the name of the person to be queried comprises the steps of:
calculating a feature vector of each company group in the company group set;
for the named companies of the persons to be inquired, judging the company groups where the named companies of the persons to be inquired are located, and respectively calculating the feature vector similarity between the company groups where the named companies of the persons to be inquired are located and other company groups in the company group set;
if the similarity of the calculated feature vector is greater than the similarity threshold value, the company group where the named company of the person to be inquired is located is judged, and the company group and the corresponding company group participating in similarity comparison belong to the company group named by the person to be inquired in a unified manner;
and collecting the determined multiple company groups which uniformly belong to the names of the persons to be inquired to form a company group set of the names of the persons to be inquired, so as to obtain each company under the name of the person to be inquired.
8. The method of claim 7, wherein computing a feature vector for each corporate group in the set of corporate groups comprises:
carrying out intra-group fusion on the characteristic data of the intra-group companies of each company group in the company group set to form the characteristics of the company group;
and performing feature vectorization on the formed features of the company group to obtain a feature vector of the company group.
9. The method of claim 8, wherein feature vectorizing the features of the formed company group comprises:
one-hot coding is adopted for the characteristic data,
and/or the presence of a gas in the gas,
and vectorizing the characteristic data by adopting a doc2vec method.
10. The method according to claim 1, wherein the step of comparing the similarity between the company group in which the company under the name of the person to be queried is located and other company groups in the company group set, and searching each company group meeting the similarity requirement according to the comparison result, thereby obtaining each company under the name of the person to be queried comprises the steps of:
calculating the characteristic vector of the company in the group of each company group in the company group set;
calculating the clustering center of the characteristic vector of the company in each company group;
for the company under the name of the person to be inquired, judging the company group where the company under the name of the person to be inquired is located, and respectively calculating the distance between the cluster center of other company groups in the company group set and the cluster center of the company group where the company under the name of the person to be inquired is located;
if the distance is smaller than the distance threshold value, the similarity between the company group participating in the distance comparison and the company group where the company under the name of the inquirer is located is judged, and the company group uniformly belong to the company group under the name of the inquirer;
and collecting the determined companies which uniformly belong to the names of the persons to be inquired to form a company group set under the names of the persons to be inquired, so as to obtain each company under the name of the person to be inquired.
11. An apparatus for obtaining a company under a name of a person to be queried in which a duplicate name exists, the apparatus comprising:
the system comprises an acquisition unit, a query unit and a display unit, wherein the acquisition unit is used for acquiring each under-name company of the renamed personnel and acquiring the characteristic data of each under-name company, and the renamed personnel comprise personnel to be queried and personnel renamed by the personnel to be queried;
the collecting unit is used for forming a company group based on the associated characteristics of the companies under the names of the renamed persons, and a plurality of company groups are collected into a company group set, wherein if the different companies have the same characteristic data, the characteristic data is called as the associated characteristics;
and the searching unit is used for comparing the similarity of the company group where the company under the name of the person to be inquired is located with other company groups in the company group set, and searching each company group meeting the similarity requirement according to the comparison result so as to obtain each company under the name of the person to be inquired.
12. A computer-readable storage medium, characterized in that the storage medium stores a computer program for performing the method according to any of claims 1-10.
13. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory for storing executable instructions of the processor;
wherein the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any one of claims 1-10.
CN202210459194.6A 2022-04-27 2022-04-27 Method and device for acquiring company under name of person to be inquired under condition of duplicate name Pending CN114861787A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210459194.6A CN114861787A (en) 2022-04-27 2022-04-27 Method and device for acquiring company under name of person to be inquired under condition of duplicate name

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210459194.6A CN114861787A (en) 2022-04-27 2022-04-27 Method and device for acquiring company under name of person to be inquired under condition of duplicate name

Publications (1)

Publication Number Publication Date
CN114861787A true CN114861787A (en) 2022-08-05

Family

ID=82633598

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210459194.6A Pending CN114861787A (en) 2022-04-27 2022-04-27 Method and device for acquiring company under name of person to be inquired under condition of duplicate name

Country Status (1)

Country Link
CN (1) CN114861787A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880623A (en) * 2011-07-13 2013-01-16 富士通株式会社 Method and device for searching people with same name
CN107402984A (en) * 2017-07-11 2017-11-28 北京金堤科技有限公司 A kind of sorting technique and device based on theme
US20170344954A1 (en) * 2016-05-31 2017-11-30 Linkedln Corporation Query building for search by ideal candidates
CN107577791A (en) * 2017-09-18 2018-01-12 河北省科学院应用数学研究所 A kind of method of enterprise's reference name duplication of name disambiguation and the credit investigation system with this method
CN109376182A (en) * 2018-09-26 2019-02-22 上海睿翎法律咨询服务有限公司 The method for realizing affiliated company's identifying processing based on computer software
CN109992603A (en) * 2019-04-04 2019-07-09 北京金堤科技有限公司 A kind of data search method, device, electronic equipment and computer-readable medium
CN110175555A (en) * 2019-05-23 2019-08-27 厦门市美亚柏科信息股份有限公司 Facial image clustering method and device
CN111428503A (en) * 2020-03-11 2020-07-17 合肥工业大学 Method and device for identifying and processing same-name person
CN112417879A (en) * 2020-11-25 2021-02-26 上海水滴征信服务有限公司 Determining business attribute similarity, rename object determination
CN113269244A (en) * 2021-05-18 2021-08-17 上海睿翎法律咨询服务有限公司 Disambiguation processing method, system, device, processor and storage medium thereof aiming at cross-enterprise personnel rename in business and commerce registration information
CN113609346A (en) * 2021-10-08 2021-11-05 企查查科技有限公司 Natural person name disambiguation method, device and medium based on enterprise incidence relation
CN114240344A (en) * 2021-12-06 2022-03-25 企查查科技有限公司 Enterprise personnel data processing method and device, computer equipment and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880623A (en) * 2011-07-13 2013-01-16 富士通株式会社 Method and device for searching people with same name
US20170344954A1 (en) * 2016-05-31 2017-11-30 Linkedln Corporation Query building for search by ideal candidates
CN107402984A (en) * 2017-07-11 2017-11-28 北京金堤科技有限公司 A kind of sorting technique and device based on theme
CN107577791A (en) * 2017-09-18 2018-01-12 河北省科学院应用数学研究所 A kind of method of enterprise's reference name duplication of name disambiguation and the credit investigation system with this method
CN109376182A (en) * 2018-09-26 2019-02-22 上海睿翎法律咨询服务有限公司 The method for realizing affiliated company's identifying processing based on computer software
CN109992603A (en) * 2019-04-04 2019-07-09 北京金堤科技有限公司 A kind of data search method, device, electronic equipment and computer-readable medium
CN110175555A (en) * 2019-05-23 2019-08-27 厦门市美亚柏科信息股份有限公司 Facial image clustering method and device
CN111428503A (en) * 2020-03-11 2020-07-17 合肥工业大学 Method and device for identifying and processing same-name person
CN112417879A (en) * 2020-11-25 2021-02-26 上海水滴征信服务有限公司 Determining business attribute similarity, rename object determination
CN113269244A (en) * 2021-05-18 2021-08-17 上海睿翎法律咨询服务有限公司 Disambiguation processing method, system, device, processor and storage medium thereof aiming at cross-enterprise personnel rename in business and commerce registration information
CN113609346A (en) * 2021-10-08 2021-11-05 企查查科技有限公司 Natural person name disambiguation method, device and medium based on enterprise incidence relation
CN114240344A (en) * 2021-12-06 2022-03-25 企查查科技有限公司 Enterprise personnel data processing method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US8768914B2 (en) System and method for searching and matching databases
AU2008339587B2 (en) Data normalisation for investigative data mining
CN111612038B (en) Abnormal user detection method and device, storage medium and electronic equipment
CN111614690A (en) Abnormal behavior detection method and device
US11500876B2 (en) Method for duplicate determination in a graph
CN110825817B (en) Enterprise suspected association judgment method and system
CN115170027A (en) Data analysis method, device, equipment and storage medium
CN110704432A (en) Data index establishing method and device, readable storage medium and electronic equipment
CN107038593B (en) Abnormal data processing method and system based on anti-counterfeiting traceability system
CN113495886A (en) Method and device for detecting pollution sample data for model training
CN114861787A (en) Method and device for acquiring company under name of person to be inquired under condition of duplicate name
CN110781211B (en) Data analysis method and device
CN111429110A (en) Store standardization auditing method, device, equipment and storage medium
CN111581235A (en) Method and system for identifying common incidence relation
Jabeen et al. Divided we stand out! Forging Cohorts fOr Numeric Outlier Detection in large scale knowledge graphs (CONOD)
CN113095604B (en) Fusion method, device and equipment of product data and storage medium
CN111611483B (en) Object portrait construction method, device and equipment and storage medium
CN113706279B (en) Fraud analysis method, fraud analysis device, electronic equipment and storage medium
CN114706899A (en) Express delivery data sensitivity calculation method and device, storage medium and equipment
CN113656652A (en) Method, device and equipment for detecting medical insurance violation and storage medium
CN112416922A (en) Group partner association data mining method, device, equipment and storage medium
CN110781309A (en) Entity parallel relation similarity calculation method based on pattern matching
CN112907306B (en) Customer satisfaction judging method and device
WO2014091481A1 (en) System and method for determining by an external entity the human hierarchial structure of an organization, using public social networks
CN115150052B (en) Method, device, equipment and storage medium for tracking and identifying attack group

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination