CN110609905A - Method and device for recognizing type of over-point and processing graph data - Google Patents

Method and device for recognizing type of over-point and processing graph data Download PDF

Info

Publication number
CN110609905A
CN110609905A CN201910866575.4A CN201910866575A CN110609905A CN 110609905 A CN110609905 A CN 110609905A CN 201910866575 A CN201910866575 A CN 201910866575A CN 110609905 A CN110609905 A CN 110609905A
Authority
CN
China
Prior art keywords
attributes
text
point
type
attribute information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910866575.4A
Other languages
Chinese (zh)
Inventor
汪振兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jianlian Technology (Guangdong) Co.,Ltd.
Original Assignee
Shenzhen Zhongyi Weirong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhongyi Weirong Technology Co Ltd filed Critical Shenzhen Zhongyi Weirong Technology Co Ltd
Priority to CN201910866575.4A priority Critical patent/CN110609905A/en
Publication of CN110609905A publication Critical patent/CN110609905A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method, a device, a server and a computer readable storage medium for identifying a type of a super point and processing graph data, wherein the method for identifying the type of the super point comprises the following steps: acquiring a node set of a plurality of nodes directly associated with a current super point; acquiring at least part of category attributes of the node set; acquiring text coincidence degree and/or text similarity of the attribute information of the at least partial category attributes; and determining the type of the current super point according to the text contact degree and/or the text similarity.

Description

Method and device for recognizing type of over-point and processing graph data
Technical Field
The invention relates to the technical field of science and technology finance, in particular to a method and a device for identifying a type of a super point and processing graph data, a server and a computer readable storage medium.
Background
In the field of financial wind control and anti-fraud, more and more graph databases are adopted to store user information so as to improve the speed of relationship query. In practical commercial systems, there is often a problem of a overtoint, i.e. a node in which the number of associated edges or other nodes in the graph database exceeds a certain threshold.
The types of the super-points are various, one of the super-points is a dirty data super-point, the dirty data super-point does not represent a normal business relationship, the real business is meaningless, if the data super-point is kept in a graph database, the performance of the graph database is influenced, the training speed of models such as anti-fraud models is reduced, and the edges related to the dirty data super-point and the dirty data super-point are generally required to be deleted. In the prior art, manual work is usually adopted to judge whether the overtop is a dirty data overtop, however, when the number of the overtops in one graph is large, the manual work cannot be adopted to judge all the overtops; in addition, the manual over-point judgment can only be in an off-line mode, and the over-point generated in the actual graph calculation process cannot be judged in real time.
Disclosure of Invention
In view of the above, embodiments of the present invention provide a method, an apparatus, a server, and a computer-readable storage medium for recognizing a type of a waypoint and processing graph data, so as to improve the speed of recognizing the type of the waypoint.
According to a first aspect, an embodiment of the present invention provides a method for identifying a type of a waypoint, including: acquiring a node set of a plurality of nodes directly associated with a current super point; acquiring at least part of category attributes of the node set; acquiring text coincidence degree and/or text similarity of the attribute information of the at least partial category attributes; and determining the type of the current super point according to the text contact degree and/or the text similarity.
Optionally, the obtaining of the text coincidence degree of the attribute information of the at least part of category attributes includes: acquiring the number of attributes with the same attribute information in various attributes; taking the ratio of the sum of the attributes with the same attribute information in each attribute to the number of all the attributes of the at least part of types of attributes as the text coincidence degree; or taking the ratio of the weighted sum of the number of the attributes with the same attribute information in various attributes to the number of all the attributes of the at least part of category attributes as the text coincidence degree.
Optionally, the determining the type of the current super point according to the text contact ratio includes: and when the text contact ratio is larger than a first preset threshold value, determining that the current super point is a dirty data super point.
Optionally, the obtaining the text similarity of the attribute information of the at least part of category attributes includes: acquiring the number of attributes with similar attribute information in various attributes; taking the ratio of the sum of the attributes with similar attribute information in various attributes to the number of all the attributes of at least part of types of attributes as the text similarity; or taking the ratio of the weighted sum of the number of the attributes with similar attribute information in various attributes to the number of all the attributes of at least part of the categories as the text similarity.
Optionally, the determining the type of the current super point according to the text similarity includes: and when the text similarity is larger than a second preset threshold value, determining that the current super point is a dirty data super point.
Optionally, determining the type of the current super point according to the text contact degree and the text similarity includes: when the text contact ratio is greater than a third preset threshold and the text similarity is greater than a fourth preset threshold, determining that the current super point is a dirty data super point; or when the weighted sum of the text contact degree and the text similarity is larger than a fifth preset threshold value, determining the current super point as a dirty data super point.
According to a second aspect, an embodiment of the present invention provides a graph data processing method, including: acquiring a super point existing in a current map; determining the type of the obtained overtop according to the method for identifying the type of the overtop in any one of the first aspect; and deleting the dirty data super point in the current map and the edges related to the dirty data super point.
According to a third aspect, an embodiment of the present invention provides a device for identifying a type of a waypoint, including: the node unit is used for acquiring a node set of a plurality of nodes directly related to the current over point; the attribute unit is used for acquiring at least part of category attributes of the node set; the calculation unit is used for acquiring the text coincidence degree and/or the text similarity of the attribute information of the at least part of category attributes; and the determining unit is used for determining the type of the current super point according to the text contact degree and/or the text similarity.
According to a fourth aspect, an embodiment of the present invention provides a server, including: a memory and a processor, the memory and the processor being communicatively coupled to each other, the memory having stored therein computer instructions, the processor performing the method of any of the first or second aspects by executing the computer instructions.
According to a fifth aspect, the present invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions for causing the computer to execute the method of any one of the first or second aspects.
According to the method, the device, the server and the computer readable storage medium for identifying the type of the super point and processing the graph data, based on the characteristic that a large amount of identical or similar fictitious data exists in the test data, the type of the super point can be determined by analyzing the text overlap ratio or the text similarity of the attribute information of at least part of the category attributes of the node set, the automatic identification of the type of the super point is realized, the speed of identifying the type of the super point is improved, and the generated super point can be identified in real time in the actual graph calculation process because the type of the super point is automatically identified by the server.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:
FIG. 1 illustrates a flow diagram of a method of over-point type identification according to an embodiment of the invention;
FIG. 2 illustrates a node structure diagram of a graph according to an embodiment of the invention;
FIG. 3 illustrates a flow diagram of a method of over-point type identification according to another embodiment of the invention;
FIG. 4 illustrates a flow diagram of a method of over-point type identification according to another embodiment of the invention;
FIG. 5 illustrates a flow diagram of a method of over-point type identification according to another embodiment of the invention;
FIG. 6 shows a flow diagram of a graph data processing method according to an embodiment of the invention;
FIG. 7 shows a schematic diagram of a device for identifying a type of a waypoint in accordance with an embodiment of the invention;
FIG. 8 shows a schematic diagram of a graph data processing apparatus according to an embodiment of the invention;
fig. 9 shows a schematic diagram of a server according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 illustrates a method for recognizing a type of a hotspot according to an embodiment of the invention, the method being suitable for being executed in a server and may comprise the following steps:
s11, acquiring a node set of a plurality of nodes directly related to the current over point.
The number of the associated edges or other nodes in the graph database exceeds a certain threshold, and in order to obtain the excess points existing in one graph, the number of the associated edges or other nodes of each node can be obtained one by one, and whether the number exceeds the predetermined threshold is judged, so as to judge whether the node is the excess point, and a person skilled in the art can set the size of the predetermined threshold according to actual conditions.
After the waypoints in the map are acquired, the type of the waypoint can be identified. In this step, a node set of a plurality of nodes directly associated with the current super point is obtained, as can be seen from fig. 2, the nodes directly associated with the current super point include node 1, node 2, node 3 … …, and node n, and the server can obtain that the node set directly associated with the current super point is { node 1, node 2, node 3 … …, node n }. The map may have a plurality of nodes, and the server may acquire the node sets directly associated with each node one by one, or may acquire the node sets directly associated with each node by using a plurality of processes.
And S12, acquiring at least part of category attributes of the node set.
In the graph database, each node includes one or more attributes, as shown in fig. 2, taking the field of financial wind control as an example, each node may be used to represent a user or a work piece, the node may include various attributes, such as a name, an identification number, a mobile phone number, a work unit, a home address, an email, a facial image, and the like of the user, and the server may obtain all or part of the category attributes of the node set directly associated with the current node.
And S13, acquiring the text coincidence degree or text similarity of the attribute information of the at least partial category attribute.
After the server acquires all or part of the category attributes of the node set, the attributes can be analyzed to obtain the text coincidence degree or text similarity degree of the attribute information of the category attributes. As an alternative embodiment, the server may obtain only some category attributes, for example, for the attributes of nationality, gender, age, and other categories, there must be a large amount of identical or similar attribute information, and it is difficult to determine whether the data is real data or fictitious test data, so that the data can be disregarded, which can improve the speed of the type identification of the overtaking point and reduce the interference.
As an alternative, in calculating text overlap or text similarity, different kinds of attributes may be given different weights, for example, a case where a large number of same home addresses are likely to be dirty data, because it is not uncommon for hundreds or thousands of people to live at the same home address, possibly due to test data, and thus a greater weight may be given, whereas for a case where a large number of same work units, there are actually cases where hundreds or even thousands of people are working at the same work unit, which is likely to be real data, and thus a lower weight may be given.
And S14, determining the type of the current over point according to the text contact ratio or the text similarity.
The invention discloses a method for detecting a data overtime point in a financial wind control system, which is characterized in that a dirty data overtime point is a overtime point which does not show a normal business relationship in a map, the inventor finds that in the field of financial wind control, the dirty data overtime point is usually generated due to test data, when a tester tests, the test data must be associated to the map of real data, the tester usually only uses one or two real data, such as an identity card or a telephone number, and uses a large amount of identical or similar fictional data for other information, such as home address information, so that a large amount of direct association is generated in the map to form the overtime point, and the overtime point data is not a normal business relationship and belongs to the dirty data overtime point. Because the tester will use a large amount of the same or similar fictitious data for other information, when the text contact ratio or text similarity is high, the over point can be determined to be a dirty data over point, and when the text contact ratio or text similarity is low, the over point which may be a normal business relationship should be retained.
Through the steps, because the dirty data overtop in the financial wind control field is usually generated by test data, based on the characteristic that the test data has a large amount of same or similar fictional data, the server can determine the type of the overtop by analyzing the text contact ratio or the text similarity of the attribute information of at least part of the category attributes of the node set, and when the text contact ratio or the text similarity is higher, the overtop can be determined to be the dirty data overtop, so that the automatic identification of the type of the overtop is realized, the identification speed of the type of the overtop is improved, and the generated overtop can be identified in real time in the process of calculating the actual graph because the type of the overtop is automatically identified by the server.
FIG. 3 illustrates a method of type identification of a hotspot according to another embodiment of the present invention, the method being adapted to be performed in a server, and the method may comprise the steps of:
and S21, acquiring a node set of a plurality of nodes directly associated with the current over point, wherein the specific content can refer to the description of the step S11.
And S22, acquiring at least part of category attributes of the node set, wherein the specific content can refer to the description of the step S12.
And S23, acquiring the number of attributes with the same attribute information in various attributes.
In an actual map, a tester may use a large amount of the same fictitious data, so that various attributes contain a large amount of attributes with the same attribute information. Taking the home address attribute in the node set as an example, the server may extract attribute information of the home address attribute in the node set, for example, if attribute information of the home address attribute of i nodes is all D-street E cells of C-district in B city, a province, and attribute information of the home address attribute of j nodes is all D-street F cells of C-district in B city, a province, B city, and a city, B city, and B city, a city, B city, and B city. Similarly, the server may obtain the number of attributes having the same attribute information among other types of attributes.
And S24, taking the ratio of the sum of the attributes with the same attribute information in various attributes to the number of all the attributes of at least part of the types of attributes as the text coincidence degree.
For example, the node set includes attribute type 1, attribute type 2, and attribute type i of … …, where the number of attributes having the same attribute information in the i types of attributes is s1、s2、……siIf the number of all attributes of the at least partial category attribute is n, the text overlap ratio can be s1+s2+…+si/n。
In another embodiment, taking the example that the node set includes the attribute type 1, the attribute type 2, and the attribute type i … …, different weights may be given to different types of attributes, for example, a predetermined weight k may be given to each of the different types of attributes1、k2、……kiThe number of attributes having the same attribute information is s1、s2、……siIf the number of all attributes of the at least partial category attribute is n, the text overlap ratio may be k1s1+k2s2+…+kisiAnd/n, as mentioned above, the weights of various attributes can be set reasonably by those skilled in the art according to actual conditions.
And S25, when the text contact ratio is greater than a first preset threshold value, determining that the current overtop is a dirty data overtop.
The skilled person can set the first predetermined threshold value appropriately according to actual conditions, for example, different first predetermined threshold values can be set, and the dirty data over-point automatically identified by the server is manually checked, so as to select the most suitable first predetermined threshold value.
Through the steps, based on the characteristic that the test data has a large amount of same fictitious data, the server realizes automatic identification of the type of the over point according to the text contact ratio of the attribute information of at least part of the category attributes, improves the speed of identifying the type of the over point, and can identify the generated over point in real time in the process of calculating the actual graph because the over point type is automatically identified by the server.
FIG. 4 illustrates a method of type identification of a hotspot according to another embodiment of the present invention, the method being adapted to be performed in a server, and the method may comprise the steps of:
and S31, acquiring a node set of a plurality of nodes directly associated with the current super point, wherein the specific content can refer to the description of the step S11.
And S32, acquiring at least part of category attributes of the node set, wherein the specific content can refer to the description of the step S12.
S33, acquiring the number of attributes with similar attribute information in various attributes;
still taking the home address attribute in the node set as an example, the server may extract attribute information of the home address attribute in the node set, for example, the attribute information of the home address attribute of i nodes is street E cell F of C cell D of B city, a province, and B city1D street E district F of building, A province, B city, C district2D street E district F of building, … … A province, B city, C district, D districtiThe attribute information of the home address attribute of the i nodes is very similar, and the server may also find that the attribute information of the home address attribute of the j nodes is D street G cell H in district C of district B city, province A1D street G district H of building, A province, B city, C district, D district2D street G cell H of C district of B city of building, … … A provincejThe attribute information of the home address attributes of the j nodes is very similar, and the number of attributes with similar attribute information in the home address attributes can be determined to be i + j. In this step, a clustering algorithm may be used to obtain the number of attributes with similar attribute information among the various attributes. Similarly, the server may obtain it accordinglyThe number of attributes whose attribute information is similar among other kinds of attributes. In an actual map, a tester may not only reuse the same data, but may also make minor modifications to the data, and a large amount of similar data is likely to be test data. One skilled in the art can set a metric to measure whether two attribute information are similar, and when the similarity between the attribute information is greater than the metric, the two attribute information can be considered similar.
And S34, taking the ratio of the sum of the attributes with similar attribute information in various attributes to the number of all the attributes of at least part of the types of attributes as the text similarity.
Similarly, for example, the node set includes attribute type 1, attribute type 2, and attribute type … …, where the number of attributes with similar attribute information in the i types of attributes is t1、t2、……tiIf the number of all attributes of at least some of the category attributes is n, the text similarity is t1+t2+…+ti/n。
Similarly, as another embodiment, still taking the case that the node set includes the attribute type 1, the attribute type 2, and the attribute type … … as an example, as described above, it is necessary to assign different weights to the different types of attributes, for example, to assign the predetermined weights k respectively1、k2、……kiThe number of attributes with similar attribute information is t1、t2、……tiIf the number of all attributes of the at least partial category attribute is n, the text overlap ratio may be k1t+k2t2+…+kitiAnd/n, as mentioned above, the weights of various attributes can be set reasonably by those skilled in the art according to actual conditions.
And S35, when the text similarity is larger than a second preset threshold value, determining that the current super point is a dirty data super point.
The skilled person can set the second predetermined threshold value appropriately according to actual conditions, for example, a different second predetermined threshold value may be set, and the dirty data over-point automatically identified by the server is manually checked, so as to select the most suitable second predetermined threshold value.
Through the steps, based on the characteristic that the test data has a large amount of similar fictitious data, the server realizes automatic identification of the type of the over point according to the text similarity of the attribute information of at least part of the category attributes, the speed of identifying the type of the over point is improved, and the over point type is automatically identified by the server, so that the generated over point can be identified in real time in the process of calculating the actual graph.
FIG. 5 illustrates a method of type identification of a hotspot according to another embodiment of the present invention, the method being adapted to be performed in a server, and the method may comprise the steps of:
and S41, acquiring a node set of a plurality of nodes directly associated with the current super point, wherein the specific content can refer to the description of the step S11.
And S42, acquiring at least part of category attributes of the node set, wherein the specific content can refer to the description of the step S12.
S43, acquiring text coincidence degree and text similarity of the attribute information of at least part of the category attribute, wherein the specific content of acquiring the text coincidence degree can refer to the description of the steps S23 and S24, and the specific content of acquiring the text similarity degree can refer to the description of the steps S33 and S34. In the embodiment, the same and similar attribute information is considered at the same time, so that the identification of the type of the over point is more accurate
And S44, determining the type of the current over point according to the text contact ratio and the text similarity.
As an alternative embodiment, a third predetermined threshold and a fourth predetermined threshold may be set, and when the text coincidence degree is greater than the third predetermined threshold and the text similarity degree is greater than the fourth predetermined threshold, the current super point is determined to be the dirty data super point. Similarly, the skilled person may set different third and fourth predetermined thresholds and manually check the dirty data over-point automatically identified by the server to select the most suitable third and fourth predetermined thresholds.
As another alternative, the text contact degree and the text similarity may be respectively assigned with predetermined weights, and when the weighted sum of the text contact degree and the text similarity is greater than a fifth predetermined threshold, the current super-point is determined to be a dirty data super-point. Similarly, a person skilled in the art may set a different fifth predetermined threshold, and manually check the dirty data over-point automatically identified by the server to select the most suitable fifth predetermined threshold.
Through the steps, based on the characteristic that the test data has a large amount of identical and similar fictitious data, the type of the over point is automatically identified by simultaneously adopting the text coincidence degree and the text similarity degree of the attribute information of at least part of the category attributes in the embodiment, and the accuracy of identifying the type of the over point is improved.
Fig. 6 shows a graph data processing method according to an embodiment of the present invention, which is adapted to be executed in a server, and which may include the steps of:
s51, obtaining the over points existing in the current map.
As described above, the server may obtain the number of edges or other nodes associated with each node one by one, and determine whether the number exceeds a predetermined threshold, thereby determining whether the node is a passing point, and a person skilled in the art may set the size of the predetermined threshold according to actual situations.
S52, determining the type of the acquired over point.
Specifically, the type of the obtained super point may be determined by using the super point type identification method described in the embodiment shown in fig. 1 to 5, and specific contents may refer to the related description in the embodiment shown in fig. 1 to 5.
And S53, deleting the dirty data over-point in the current map and all associated edges of the dirty data over-point.
After the dirty data over-point is identified, the dirty data over-point in the current map and all related edges of the dirty data over-point can be deleted.
Correspondingly, as shown in fig. 7, an embodiment of the present invention further provides a device for identifying a type of a hotspot, where the device is applied to a server, and the device includes:
a node unit 61, configured to obtain a node set of a plurality of nodes directly associated with the current super point, where specific contents refer to the description of step S11;
an attribute unit 62, configured to obtain at least part of category attributes of the node set, where the specific content refers to the description of step S12;
a calculating unit 63, configured to obtain a text overlap ratio and/or a text similarity of the attribute information of the at least part of category attributes, where the specific content refers to the description of step S13;
the determining unit 64 is configured to determine the type of the current super point according to the text contact degree and/or the text similarity, and the specific content refers to the description of step S14.
Other specific details of the apparatus for identifying a type of a breakpoint according to the embodiment of the present invention may be understood with reference to the corresponding related descriptions and effects in the embodiments shown in fig. 1 to fig. 5, and are not described herein again.
Correspondingly, as shown in fig. 8, an embodiment of the present invention further provides a graph data processing apparatus, which is adapted to a server, and includes:
a super point unit 71, configured to obtain a super point existing in the current map, where specific contents refer to the description in step S51;
an identifying unit 72, configured to determine the type of the obtained over point, and the specific content refers to the description of step S52;
the deleting unit 73 is configured to delete the dirty data over-point in the current map and all the associated edges of the dirty data over-point, and the specific contents refer to the description of step S53.
Other details of the graph data processing apparatus according to the embodiment of the present invention may be understood with reference to the corresponding descriptions and effects in the embodiments shown in fig. 1 to fig. 6, and are not described herein again.
As shown in fig. 9, an embodiment of the present invention further provides a server, where the server may include a processor 81 and a memory 82, where the processor 81 and the memory 82 may be connected by a bus or by another method, and fig. 9 illustrates an example of a connection by a bus.
Processor 81 may be a Central Processing Unit (CPU). The Processor 81 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.
The memory 82, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions corresponding to the method for identifying a type of a breakpoint and the method for processing graph data in embodiments of the present invention. The processor 81 executes various functional applications of the processor and data processing by executing non-transitory software instructions stored in the memory 82, that is, implements the above-described method of recognizing the type of the over point and the method of processing the map data in the method embodiment.
The memory 82 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 82 may optionally include memory located remotely from the processor 81, which may be connected to the processor 81 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The details of the server may be understood by referring to the corresponding descriptions and effects in the embodiments shown in fig. 1 to fig. 6, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (10)

1. A method for identifying a type of a hotspot, comprising:
acquiring a node set of a plurality of nodes directly associated with a current super point;
acquiring at least part of category attributes of the node set;
acquiring text coincidence degree and/or text similarity of the attribute information of the at least partial category attributes;
and determining the type of the current super point according to the text contact degree and/or the text similarity.
2. The method according to claim 1, wherein the obtaining of the text coincidence of the attribute information of the at least partial category attribute comprises:
acquiring the number of attributes with the same attribute information in various attributes;
taking the ratio of the sum of the attributes with the same attribute information in each attribute to the number of all the attributes of the at least part of types of attributes as the text coincidence degree; or
And taking the ratio of the weighted sum of the number of the attributes with the same attribute information in various attributes to the number of all the attributes of the at least part of category attributes as the text coincidence degree.
3. The method of claim 2, wherein determining the type of the current hyper-point based on the text engagement degree comprises:
and when the text contact ratio is larger than a first preset threshold value, determining that the current super point is a dirty data super point.
4. The method according to claim 1, wherein the obtaining the text similarity of the attribute information of the at least partial category attribute comprises:
acquiring the number of attributes with similar attribute information in various attributes;
taking the ratio of the sum of the attributes with similar attribute information in various attributes to the number of all the attributes of at least part of types of attributes as the text similarity; or
And taking the ratio of the weighted sum of the number of the attributes with similar attribute information in various attributes to the number of all the attributes of the at least part of category attributes as the text similarity.
5. The method of claim 4, wherein said determining the type of the current super point according to the text similarity comprises:
and when the text similarity is larger than a second preset threshold value, determining that the current super point is a dirty data super point.
6. The method of any of claims 1-5, wherein determining the type of the current hyper-point based on the text goodness of fit and the text similarity comprises:
when the text contact ratio is greater than a third preset threshold and the text similarity is greater than a fourth preset threshold, determining that the current super point is a dirty data super point; or
And when the weighted sum of the text contact degree and the text similarity is larger than a fifth preset threshold value, determining the current super point as a dirty data super point.
7. A graph data processing method, comprising:
acquiring a super point existing in a current map;
determining the type of the obtained waypoint according to the method for identifying the type of the waypoint according to any one of claims 1 to 6;
and deleting the dirty data super point in the current map and the edges related to the dirty data super point.
8. A device for identifying a type of a waypoint, comprising:
the node unit is used for acquiring a node set of a plurality of nodes directly related to the current over point;
the attribute unit is used for acquiring at least part of category attributes of the node set;
the calculation unit is used for acquiring the text coincidence degree and/or the text similarity of the attribute information of the at least part of category attributes;
and the determining unit is used for determining the type of the current super point according to the text contact degree and/or the text similarity.
9. A server, comprising: a memory and a processor communicatively coupled to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the method of any of claims 1-7.
10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-7.
CN201910866575.4A 2019-09-12 2019-09-12 Method and device for recognizing type of over-point and processing graph data Pending CN110609905A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910866575.4A CN110609905A (en) 2019-09-12 2019-09-12 Method and device for recognizing type of over-point and processing graph data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910866575.4A CN110609905A (en) 2019-09-12 2019-09-12 Method and device for recognizing type of over-point and processing graph data

Publications (1)

Publication Number Publication Date
CN110609905A true CN110609905A (en) 2019-12-24

Family

ID=68891285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910866575.4A Pending CN110609905A (en) 2019-09-12 2019-09-12 Method and device for recognizing type of over-point and processing graph data

Country Status (1)

Country Link
CN (1) CN110609905A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915396A (en) * 2015-05-28 2015-09-16 杭州电子科技大学 Knowledge retrieving method
CN105488176A (en) * 2015-11-30 2016-04-13 华为软件技术有限公司 Data processing method and device
CN105612516A (en) * 2013-10-07 2016-05-25 甲骨文国际公司 Attribute redundancy removal
CN106372659A (en) * 2016-08-30 2017-02-01 五八同城信息技术有限公司 Similar object determination method and device
CN109491995A (en) * 2018-12-25 2019-03-19 苏宁易购集团股份有限公司 Knowledge based map inquires the method and system of financial abnormal data
CN109657947A (en) * 2018-12-06 2019-04-19 西安交通大学 A kind of method for detecting abnormality towards enterprises ' industry classification
US10303797B1 (en) * 2015-12-18 2019-05-28 EMC IP Holding Company LLC Clustering files in deduplication systems
CN109840286A (en) * 2019-01-31 2019-06-04 中国农业银行股份有限公司 It is a kind of identification mass data building relational graph in super node method and device
CN109933671A (en) * 2019-01-31 2019-06-25 平安科技(深圳)有限公司 Construct method, apparatus, computer equipment and the storage medium of personal knowledge map
CN109992960A (en) * 2018-12-06 2019-07-09 北京奇艺世纪科技有限公司 A kind of forgery parameter detection method, device, electronic equipment and storage medium
CN110083756A (en) * 2018-01-26 2019-08-02 国际商业机器公司 Identify the redundant node in knowledge graph data structure

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105612516A (en) * 2013-10-07 2016-05-25 甲骨文国际公司 Attribute redundancy removal
CN104915396A (en) * 2015-05-28 2015-09-16 杭州电子科技大学 Knowledge retrieving method
CN105488176A (en) * 2015-11-30 2016-04-13 华为软件技术有限公司 Data processing method and device
US10303797B1 (en) * 2015-12-18 2019-05-28 EMC IP Holding Company LLC Clustering files in deduplication systems
CN106372659A (en) * 2016-08-30 2017-02-01 五八同城信息技术有限公司 Similar object determination method and device
CN110083756A (en) * 2018-01-26 2019-08-02 国际商业机器公司 Identify the redundant node in knowledge graph data structure
CN109657947A (en) * 2018-12-06 2019-04-19 西安交通大学 A kind of method for detecting abnormality towards enterprises ' industry classification
CN109992960A (en) * 2018-12-06 2019-07-09 北京奇艺世纪科技有限公司 A kind of forgery parameter detection method, device, electronic equipment and storage medium
CN109491995A (en) * 2018-12-25 2019-03-19 苏宁易购集团股份有限公司 Knowledge based map inquires the method and system of financial abnormal data
CN109840286A (en) * 2019-01-31 2019-06-04 中国农业银行股份有限公司 It is a kind of identification mass data building relational graph in super node method and device
CN109933671A (en) * 2019-01-31 2019-06-25 平安科技(深圳)有限公司 Construct method, apparatus, computer equipment and the storage medium of personal knowledge map

Similar Documents

Publication Publication Date Title
CN107341220B (en) Multi-source data fusion method and device
WO2017133615A1 (en) Service parameter acquisition method and apparatus
CN109949154B (en) Customer information classification method, apparatus, computer device and storage medium
CN112613569B (en) Image recognition method, training method and device for image classification model
CN112839014B (en) Method, system, equipment and medium for establishing abnormal visitor identification model
CN111931845A (en) System and method for determining similarity of user groups
CN113468034A (en) Data quality evaluation method and device, storage medium and electronic equipment
CN111338692A (en) Vulnerability classification method and device based on vulnerability codes and electronic equipment
CN112819611A (en) Fraud identification method, device, electronic equipment and computer-readable storage medium
CN112818162A (en) Image retrieval method, image retrieval device, storage medium and electronic equipment
WO2020019489A1 (en) Method for predicting reason for employee resignation and related device
CN110827036A (en) Method, device, equipment and storage medium for detecting fraudulent transactions
CN109711656B (en) Multisystem association early warning method, device, equipment and computer readable storage medium
CN117593115A (en) Feature value determining method, device, equipment and medium of credit risk assessment model
CN108734393A (en) Matching process, user equipment, storage medium and the device of information of real estate
CN112836124A (en) Image data acquisition method and device, electronic equipment and storage medium
CN110674832A (en) Method, device and terminal for identifying enterprise to which Internet user belongs
CN105162931A (en) Method and device for classifying communication numbers
WO2020119533A1 (en) Public sentiment warning method and apparatus based on recurrent neural network algorithm, terminal and medium
CN109992614B (en) Data acquisition method, device and server
CN110609905A (en) Method and device for recognizing type of over-point and processing graph data
CN115879819A (en) Enterprise credit evaluation method and device
CN115017256A (en) Power data processing method and device, electronic equipment and storage medium
CN114781517A (en) Risk identification method and device and terminal equipment
CN109919811B (en) Insurance agent culture scheme generation method based on big data and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220608

Address after: 518000 floor 7, building S6, poly Yuzhu port, No. 848, Huangpu Avenue East, Huangpu District, Guangzhou City, Guangdong Province

Applicant after: Jianlian Technology (Guangdong) Co.,Ltd.

Address before: 510623 Room 201, building a, No. 1, Qianwan 1st Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong

Applicant before: SHENZHEN ZHONGYING WEIRONG TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20191224

RJ01 Rejection of invention patent application after publication