CN111400448A - Method and device for analyzing incidence relation of objects - Google Patents

Method and device for analyzing incidence relation of objects Download PDF

Info

Publication number
CN111400448A
CN111400448A CN202010169167.6A CN202010169167A CN111400448A CN 111400448 A CN111400448 A CN 111400448A CN 202010169167 A CN202010169167 A CN 202010169167A CN 111400448 A CN111400448 A CN 111400448A
Authority
CN
China
Prior art keywords
analyzed
text data
keywords
target object
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010169167.6A
Other languages
Chinese (zh)
Inventor
李阳
魏聪惠
陈建文
杨志滔
王怡冰
黄星
刘洋
陈阳
曾佳妍
王俐
邱晓辉
苏鹏皓
朱佳
邱炜亨
薛飞
王酝秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202010169167.6A priority Critical patent/CN111400448A/en
Publication of CN111400448A publication Critical patent/CN111400448A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an object incidence relation analysis method and device, wherein the method comprises the following steps: acquiring text data of a target object and an object to be analyzed, wherein the text data is text data exchanged between the target object and the object to be analyzed; extracting a plurality of keywords in the text data; carrying out community classification analysis on the target object and the object to be analyzed; and if the community classification analysis result is that the target object and the object to be analyzed are classified in the same community, matching the extracted keywords with a predefined object relation type, and determining the incidence relation between the target object and the object to be analyzed. The method and the device can judge the incidence relation of the target object and have high accuracy.

Description

Method and device for analyzing incidence relation of objects
Technical Field
The invention relates to the technical field of object data analysis, in particular to an incidence relation analysis method and device of an object.
Background
The analysis of the association relationship of the object is more and more important in the current enterprises, for example, in the management of the human resources of the enterprises, the staff relationship in the association relationship of the object is an important component, and the good staff relationship can make the staff psychologically satisfied, is beneficial to improving the working efficiency and the positive initiative of the staff, and can also ensure the effective execution of the strategy and the target of the enterprises to a certain extent.
Employee relationships are key factors that affect employee behavioral attitudes, work efficiency, and execution capacity. However, the current rules for determining the relationship types, relationship names and various relationships between employees are relatively not standardized, the rules are respectively defined by related organizations, the whole integration and comparison are not facilitated, the validity of the relationship information of the employees is not high enough, and the accuracy of the finally determined relationship between the employees is not high.
Disclosure of Invention
The embodiment of the invention provides an incidence relation analysis method of an object, which is used for judging the incidence relation of a target object and has high accuracy, and the method comprises the following steps:
acquiring text data of a target object and an object to be analyzed, wherein the text data is text data exchanged between the target object and the object to be analyzed;
extracting a plurality of keywords in the text data;
carrying out community classification analysis on the target object and the object to be analyzed;
and if the community classification analysis result is that the target object and the object to be analyzed are classified in the same community, matching the extracted keywords with a predefined object relation type, and determining the incidence relation between the target object and the object to be analyzed.
The embodiment of the invention provides an incidence relation analysis device of an object, which is used for judging the incidence relation of a target object and has high accuracy, and the device comprises:
the data acquisition module is used for acquiring text data of a target object and an object to be analyzed, wherein the text data is text data exchanged between the target object and the object to be analyzed;
the keyword extraction module is used for extracting a plurality of keywords in the text data;
the community classification analysis module is used for carrying out community classification analysis on the target object and the object to be analyzed;
and the relation analysis module is used for matching the extracted keywords with a predefined object relation type and determining the incidence relation between the target object and the object to be analyzed if the community classification analysis result indicates that the target object and the object to be analyzed are classified in the same community.
The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the incidence relation analysis method of the objects when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing the method for analyzing association between objects is stored in the computer-readable storage medium.
In the embodiment of the invention, text data of a target object and an object to be analyzed are obtained, wherein the text data is text data exchanged between the target object and the object to be analyzed; extracting a plurality of keywords in the text data; carrying out community classification analysis on the target object and the object to be analyzed; and if the community classification analysis result indicates that the target object and the object to be analyzed are in the same community, matching the extracted keywords with a predefined object relation type, and determining the incidence relation between the target object and the object to be analyzed. In the process, a plurality of keywords in the text data are extracted, the target object and the object to be analyzed are determined to be in the same community classification, the first judgment is carried out, the extracted keywords are matched with the predefined object relation type only when the target object and the object to be analyzed are classified in the same community, the relation between the target object and the communication object is determined, the second judgment is carried out, and the association relation between the target object and the object to be analyzed can be determined more accurately through the two judgments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
FIG. 1 is a flowchart of a method for analyzing an association relationship between objects according to an embodiment of the present invention;
FIG. 2 is a detailed flowchart of an association analysis method of objects according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an apparatus for analyzing association between objects according to an embodiment of the present invention;
FIG. 4 is a diagram of a computer device in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
In the description of the present specification, the terms "comprising," "including," "having," "containing," and the like are used in an open-ended fashion, i.e., to mean including, but not limited to. Reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The sequence of steps involved in the embodiments is for illustrative purposes to illustrate the implementation of the present application, and the sequence of steps is not limited and can be adjusted as needed.
Fig. 1 is a flowchart of an association analysis method of an object in an embodiment of the present invention, as shown in fig. 1, the method includes:
step 101, acquiring text data of a target object and an object to be analyzed, wherein the text data is text data exchanged between the target object and the object to be analyzed;
step 102, extracting a plurality of keywords in text data;
103, carrying out community classification analysis on the target object and the object to be analyzed;
and 104, if the community classification analysis result shows that the target object and the object to be analyzed are in the same community, matching the extracted keywords with a predefined object relation type, and determining the incidence relation between the target object and the object to be analyzed.
In the embodiment of the invention, a plurality of keywords in text data are extracted, the target object and the object to be analyzed are firstly determined to be in the same community classification, which is the first judgment, the extracted keywords are matched with the predefined object relationship type only when the target object and the object to be analyzed are classified in the same community, the relationship between the target object and the communication object is determined, which is the second judgment, and the association relationship between the target object and the object to be analyzed can be more accurately determined through two judgments.
In step 101, text data of a target object and an object to be analyzed may be obtained, for example, the target object may be an employee, the object to be analyzed may be an object to be communicated with the employee, may be an employee inside a company, or may be an external client outside the company, and the text data may be an incoming mail record, a chat record, or the like of the employee and the object to be analyzed.
There are various methods for extracting a plurality of keywords from the text data in step 102, and one example is given below.
In an embodiment, before extracting the plurality of keywords in the text data, the method further includes:
preprocessing the text data to obtain the text data which accords with a preset standard;
segmenting words of the text data which accord with the preset specification to obtain a plurality of decomposed words;
extracting a plurality of keywords in the text data, including:
a plurality of keywords are extracted from the plurality of decomposed words.
In the above embodiment, the pre-setting specification includes removing unnecessary words, and the pre-processing the text data includes:
s1: removing the name and time of the person in the text data;
for example, python programming may be used to remove names and time "Lix _ Total" from text data
2019-01-02 16:01:52”;
S2: filtering English words in the text data and leaving the Chinese content of the text data;
s3: removing some nonsense words and punctuation marks in the text data by using the word stock bag;
for example, remove "go", "got", "? "and the like.
Through the steps, the text data meeting the preset specification is obtained.
And then, segmenting the text data meeting the preset specification to obtain a plurality of decomposed words, and particularly segmenting the text data meeting the preset specification by adopting a Chinese segmentation model cws. When dividing words, the problem of inaccurate individual words is encountered, wherein names of people are the most common, such as: the term "Liu Yuan aromatic mountain" can be classified into "Liu Yuan", "aromatic" and "mountain", but actually "Liu Yuan" is a person name, and "aromatic mountain" is a word, and at this time, the word can be solved through a custom dictionary word library Customword.
Through the word segmentation, a plurality of decomposed words can be obtained, keyword extraction can be performed at this time, and various extraction methods are provided, and one embodiment is given below.
In one embodiment, extracting a plurality of keywords from a plurality of decomposed words comprises:
calculating the TF-IDF value of each decomposed word by adopting a TF-IDF algorithm;
sequencing the decomposed words from large to small according to the TF-IDF value;
and determining a preset number of decomposition words in the sorted decomposition words as the keywords.
In the above embodiment, the calculation formula of the TF-IDF value is as follows:
Figure BDA0002408563640000051
wherein n isijTo decompose the word tiIn document djThe denominator is in the file djThe sum of the occurrence times of all the decomposed words in the Chinese character;
| D | is the total number of text data; for example, a plurality of text data may be obtained, each text data being participled into a plurality of decomposed words;
|{j:ti∈djis taken to contain a word tiIf the decomposed word is not in any text data, the total number of text data of (a) will result in a dividend of zero, so 1+ | { j: t, is typically usedi∈dj}|。
If one text data set has 100 total decomposed words and the word "eat" has 5 occurrences, then the word frequency of the word "eat" in the file is 5/100 to 0.05. one method of calculating the text data frequency (IDF) is to determine how many text data have the word "eat" and then divide by the total number of text data contained in the text data set, the word "eat" has 65 text data occurrences and the total number of text data is 10,00, and the inverse file frequency is log (10,00/65) to 1.1847. the final TF-IDF value is 0.05 × 1.1847 to 0.0592345.
After obtaining the TF-IDF value of each decomposed word, sorting the TF-IDF values from large to small; and determining a preset number of decomposition words in the sorted decomposition words as the keywords. For example, if the preset number is 30, the top 30 sorted decomposed words may be determined as keywords, where the preset number is related to a predefined object relationship type, and the predefined object relationship type includes multiple types, for example, one of the predefined object relationship types is a work relationship type, and the work relationship type includes 30 keywords, and then the preset number is 30.
In one embodiment, the community classification analysis of the target object and the object to be analyzed includes:
and carrying out community classification analysis on the target object and the object to be analyzed by adopting a community classification algorithm.
In an embodiment, matching the extracted keywords with a predefined object relationship type, and determining an association relationship between the target object and the object to be analyzed includes:
for each predefined object relationship type, analyzing the similarity of a plurality of keywords and the object relationship type;
and determining the object relationship type with the maximum similarity as the incidence relationship between the target object and the object to be analyzed.
In the above embodiments, a plurality of object relationship types are predefined in the embodiments of the present invention, and table 1 is an example of the plurality of object relationship types.
TABLE 1 examples of multiple object relationship types
Figure BDA0002408563640000061
Figure BDA0002408563640000071
Figure BDA0002408563640000081
Figure BDA0002408563640000091
Figure BDA0002408563640000101
Figure BDA0002408563640000111
In each object relationship type in table 1, each object relationship type includes a plurality of keywords, and the number of keywords in the extracted text data corresponds to the number of keywords in each object relationship type, so that the similarity can be calculated. And if the community classification analysis result is that the target object and the object to be analyzed are not in any community classification, the target object and the object to be analyzed have no association relation.
In one embodiment, the similarity is a cosine similarity;
analyzing the similarity of a plurality of keywords and the object relation type, comprising:
calculating the similarity between a plurality of keywords and the object relation type by adopting the following formula:
Figure BDA0002408563640000112
wherein cos (θ) is the similarity;
n is the total number of the keywords in the text data and the keywords in the object relation type;
xithe number of times of occurrence of the ith keyword;
yithe number of times of occurrence of the ith keyword in the object relationship type.
In the above-described embodiment, for example, the number of occurrences of each keyword among the keywords in the extracted text data is [1, 1, 1, 1, 1, 1, 1, 1, 1,0, 0,0, 0,0, 0,0, 0, 1, 1,0, 0]The number of occurrences of a keyword in an object relationship type is listbcodeeonehot ═ 0,0, 1,0, 0,0, 0,0, 0,0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]Then, then
Figure BDA0002408563640000121
And finally, determining the object relation type with the maximum similarity as the incidence relation between the target object and the object to be analyzed, and determining the incidence relation between the target object and the object to be analyzed only by obtaining the incidence relation without carrying out community classification analysis on the target object and the object to be analyzed.
In the above embodiment, the association relationship between the target object and the plurality of objects to be analyzed may be determined, where the determined relationship (i.e. the relationship is direct and accurate) is also referred to as a first-degree relationship, but there may be a suspicious relationship, for example, at this time, a community classification algorithm and a second-degree relationship model may be applied to perform second-degree mining on the existing relationship.
A second degree relation model: if A has some association with B, A has some association with C, B has no one-degree relationship with C, and the two relationships are the same, then B and C can be considered to have a suspected relationship.
For example, the target object "note", "li star right", "zhao star" is in the same community classification category, there is a school friend relationship between "note" and "li star right", "li star right" and "zhao star" are in a school friend relationship, and there is no one-degree relationship between "note" and "zhao star". In this regard, it can be said that "Zhang Xin" and "Zhao Xin" exist "are dubious" school friend relationships. The probability score of the "suspected" school friend relationship can be calculated by combining the number of the relationship edges of "zhang xi xiang" and "li xi quan", the number of the relationship edges of "li xi quan" and "zhao xi bi", the number of the relationship edges of "li xi quan" and "zhao quan", and the like.
After the incidence relations between the target object and the objects to be analyzed are obtained, the incidence relations can be displayed, the displaying mode comprises single-node displaying, multi-node displaying and knowledge graph displaying, wherein the single-node displaying shows all the incidence relations of one object, the multi-node displaying shows the incidence relations among a selected number of objects, and the knowledge graph displaying shows the incidence relations among all the objects which are logged in the human resource system at present.
Based on the above embodiment, the present invention provides the following embodiment to explain a detailed flow of an association relationship analysis method of an object, and fig. 2 is a detailed flow chart of an association relationship analysis method of an object according to an embodiment of the present invention, as shown in fig. 2, including:
step 201, acquiring text data of a target object and an object to be analyzed, wherein the text data is text data exchanged between the target object and the object to be analyzed;
step 202, preprocessing the text data to obtain text data meeting preset specifications;
step 203, performing word segmentation on the text data meeting the preset specification to obtain a plurality of decomposed words;
step 204, calculating the TF-IDF value of each decomposed word by adopting a TF-IDF algorithm;
step 205, sequencing a plurality of decomposed words according to TF-IDF values from large to small;
step 206, determining a preset number of decomposition words in the sorted decomposition words as keywords;
step 207, carrying out community classification analysis on the target object and the object to be analyzed by adopting a community classification algorithm;
step 208, determining whether the community classification analysis result is that the target object and the object to be analyzed are classified in the same community, if yes, going to step 209, otherwise going to step 211;
step 209, analyzing the similarity between the keywords and each predefined object relationship type;
step 210, determining the object relationship type with the maximum similarity as the incidence relationship between the target object and the object to be analyzed;
and step 211, determining that the target object has no association relation with the object to be analyzed.
Of course, it is understood that other variations of the above detailed flow can be made, and all such variations are intended to fall within the scope of the present invention.
In summary, in the method provided in the embodiment of the present invention, text data of a target object and an object to be analyzed is obtained, where the text data is text data exchanged between the target object and the object to be analyzed; extracting a plurality of keywords in the text data; carrying out community classification analysis on the target object and the object to be analyzed; and if the community classification analysis result indicates that the target object and the object to be analyzed are in the same community, matching the extracted keywords with a predefined object relation type, and determining the incidence relation between the target object and the object to be analyzed. In the process, a plurality of keywords in the text data are extracted, the target object and the object to be analyzed are determined to be in the same community classification, the first judgment is carried out, the extracted keywords are matched with the predefined object relation type only when the target object and the object to be analyzed are classified in the same community, the relation between the target object and the communication object is determined, the second judgment is carried out, and the association relation between the target object and the object to be analyzed can be determined more accurately through the two judgments. In addition, the method of the embodiment of the invention can adapt to continuously changing business requirements, analyze the application and display of the staff relation terminal, and improve the intellectualization, the scientization and the accuracy of the human resource management. The method of the invention is expandable and is suitable for analyzing the incidence relation between the objects which are continuously added; the method can meet the related requirements of employee relationship management in various services to a certain extent; the method can provide a relatively friendly application and display interface, and enables a user to operate simply, conveniently and easily.
The embodiment of the present invention further provides an apparatus for analyzing an association relationship between objects, which has a similar principle to the method for analyzing an association relationship between objects and is not described herein again.
Fig. 3 is a schematic diagram of an association analysis apparatus for objects according to an embodiment of the present invention, as shown in fig. 3, including:
the data obtaining module 301 is configured to obtain text data of a target object and an object to be analyzed, where the text data is text data exchanged between the target object and the object to be analyzed;
a keyword extraction module 302, configured to extract a plurality of keywords in the text data;
the community classification analysis module 303 is configured to perform community classification analysis on the target object and the object to be analyzed;
and the relationship analysis module 304 is configured to, if the community classification analysis result indicates that the target object and the object to be analyzed are classified in the same community, match the extracted multiple keywords with a predefined object relationship type, and determine an association relationship between the target object and the object to be analyzed.
In one embodiment, the apparatus further comprises a preprocessing module 305 for:
preprocessing the text data to obtain the text data which accords with a preset standard;
segmenting words of the text data which accord with the preset specification to obtain a plurality of decomposed words;
the keyword extraction module 302 is specifically configured to:
a plurality of keywords are extracted from the plurality of decomposed words.
In an embodiment, the keyword extraction module 302 is specifically configured to:
calculating the TF-IDF value of each decomposed word by adopting a TF-IDF algorithm;
sequencing the decomposed words from large to small according to the TF-IDF value;
and determining a preset number of decomposition words in the sorted decomposition words as the keywords.
In an embodiment, the community classification analysis module 303 is specifically configured to:
and carrying out community classification analysis on the target object and the object to be analyzed by adopting a community classification algorithm.
In an embodiment, the relationship analysis module 304 is specifically configured to:
for each predefined object relationship type, analyzing the similarity of a plurality of keywords and the object relationship type;
and determining the object relationship type with the maximum similarity as the incidence relationship between the target object and the object to be analyzed.
In one embodiment, the similarity is a cosine similarity;
the relationship analysis module 304 is specifically configured to:
calculating the similarity between a plurality of keywords and the object relation type by adopting the following formula:
Figure BDA0002408563640000151
wherein cos (θ) is the similarity;
xithe number of times of occurrence of the ith keyword;
yithe number of times of occurrence of the ith keyword in the object relationship type.
In summary, in the apparatus provided in the embodiment of the present invention, text data of a target object and an object to be analyzed is obtained, where the text data is text data exchanged between the target object and the object to be analyzed; extracting a plurality of keywords in the text data; carrying out community classification analysis on the target object and the object to be analyzed; and if the community classification analysis result indicates that the target object and the object to be analyzed are in the same community, matching the extracted keywords with a predefined object relation type, and determining the incidence relation between the target object and the object to be analyzed. In the process, a plurality of keywords in the text data are extracted, the target object and the object to be analyzed are determined to be in the same community classification, the first judgment is carried out, the extracted keywords are matched with the predefined object relation type only when the target object and the object to be analyzed are classified in the same community, the relation between the target object and the communication object is determined, the second judgment is carried out, and the association relation between the target object and the object to be analyzed can be determined more accurately through the two judgments. In addition, the method of the embodiment of the invention can adapt to continuously changing business requirements, analyze the application and display of the staff relation terminal, and improve the intellectualization, the scientization and the accuracy of the human resource management. The device is expandable and is suitable for analyzing the association relationship between objects which are increased continuously; the device can meet the related requirements of employee relationship management in various services to a certain extent; the device can provide a friendly application and display interface, and enables a user to operate easily, conveniently and easily.
An embodiment of the present application further provides a computer device, and fig. 4 is a schematic diagram of the computer device in the embodiment of the present invention, where the computer device is capable of implementing all steps in the association relationship analysis method of the object in the embodiment, and the electronic device specifically includes the following contents:
a processor (processor)401, a memory (memory)402, a communication interface (communications interface)403, and a bus 404;
the processor 401, the memory 402 and the communication interface 403 complete mutual communication through the bus 404; the communication interface 403 is used for implementing information transmission between related devices such as server-side devices, detection devices, and user-side devices;
the processor 401 is configured to call the computer program in the memory 402, and when the processor executes the computer program, the processor implements all the steps in the association analysis method of the object in the above embodiments.
An embodiment of the present application further provides a computer-readable storage medium, which can implement all the steps in the association relationship analysis of the object in the above embodiment, and the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program implements all the steps of the association relationship analysis method of the object in the above embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (14)

1. An object association analysis method is characterized by comprising the following steps:
acquiring text data of a target object and an object to be analyzed, wherein the text data is text data exchanged between the target object and the object to be analyzed;
extracting a plurality of keywords in the text data;
carrying out community classification analysis on the target object and the object to be analyzed;
and if the community classification analysis result is that the target object and the object to be analyzed are classified in the same community, matching the extracted keywords with a predefined object relation type, and determining the incidence relation between the target object and the object to be analyzed.
2. The method for analyzing the association relationship of the objects according to claim 1, further comprising, before extracting the plurality of keywords in the text data:
preprocessing the text data to obtain the text data which accords with a preset standard;
segmenting words of the text data which accord with the preset specification to obtain a plurality of decomposed words;
extracting a plurality of keywords in the text data, including:
a plurality of keywords are extracted from the plurality of decomposed words.
3. The method of analyzing the association relationship of the objects according to claim 2, wherein extracting a plurality of keywords from a plurality of decomposed words comprises:
calculating the TF-IDF value of each decomposed word by adopting a TF-IDF algorithm;
sequencing the decomposed words from large to small according to the TF-IDF value;
and determining a preset number of decomposition words in the sorted decomposition words as the keywords.
4. The method for analyzing association relationship between objects according to claim 1, wherein performing community classification analysis on the target object and the object to be analyzed comprises:
and carrying out community classification analysis on the target object and the object to be analyzed by adopting a community classification algorithm.
5. The method of analyzing association between objects according to claim 1, wherein the step of matching the extracted keywords with predefined object relationship types to determine the association between the target object and the object to be analyzed comprises:
for each predefined object relationship type, analyzing the similarity of a plurality of keywords and the object relationship type;
and determining the object relationship type with the maximum similarity as the incidence relationship between the target object and the object to be analyzed.
6. The method of analyzing the association of an object according to claim 5, wherein the similarity is a cosine similarity;
analyzing the similarity of a plurality of keywords and the object relation type, comprising:
calculating the similarity between a plurality of keywords and the object relation type by adopting the following formula:
Figure FDA0002408563630000021
wherein cos (θ) is the similarity;
n is the total number of the keywords in the text data and the keywords in the object relation type;
xithe number of times of occurrence of the ith keyword;
yithe number of times of occurrence of the ith keyword in the object relationship type.
7. An apparatus for analyzing an association relation of objects, comprising:
the data acquisition module is used for acquiring text data of a target object and an object to be analyzed, wherein the text data is text data exchanged between the target object and the object to be analyzed;
the keyword extraction module is used for extracting a plurality of keywords in the text data;
the community classification analysis module is used for carrying out community classification analysis on the target object and the object to be analyzed;
and the relation analysis module is used for matching the extracted keywords with a predefined object relation type and determining the incidence relation between the target object and the object to be analyzed if the community classification analysis result indicates that the target object and the object to be analyzed are classified in the same community.
8. The apparatus for analyzing the association relationship of objects according to claim 7, further comprising a preprocessing module for:
preprocessing the text data to obtain the text data which accords with a preset standard;
segmenting words of the text data which accord with the preset specification to obtain a plurality of decomposed words;
the keyword extraction module is specifically configured to:
a plurality of keywords are extracted from the plurality of decomposed words.
9. The apparatus for analyzing association between objects according to claim 7, wherein the keyword extraction module is specifically configured to:
calculating the TF-IDF value of each decomposed word by adopting a TF-IDF algorithm;
sequencing the decomposed words from large to small according to the TF-IDF value;
and determining a preset number of decomposition words in the sorted decomposition words as the keywords.
10. The apparatus for analyzing association relationship between objects according to claim 7, wherein the community classification analysis module is specifically configured to:
and carrying out community classification analysis on the target object and the object to be analyzed by adopting a community classification algorithm.
11. The apparatus for analyzing association relationship between objects according to claim 7, wherein the relationship analysis module is specifically configured to:
for each predefined object relationship type, analyzing the similarity of a plurality of keywords and the object relationship type;
and determining the object relationship type with the maximum similarity as the incidence relationship between the target object and the object to be analyzed.
12. The apparatus for analyzing the association relationship of objects according to claim 11, wherein the similarity is a cosine similarity;
the relationship analysis module is specifically configured to:
calculating the similarity between a plurality of keywords and the object relation type by adopting the following formula:
Figure FDA0002408563630000031
wherein cos (θ) is the similarity;
xithe number of times of occurrence of the ith keyword;
yithe number of times of occurrence of the ith keyword in the object relationship type.
13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 6 when executing the computer program.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 6.
CN202010169167.6A 2020-03-12 2020-03-12 Method and device for analyzing incidence relation of objects Pending CN111400448A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010169167.6A CN111400448A (en) 2020-03-12 2020-03-12 Method and device for analyzing incidence relation of objects

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010169167.6A CN111400448A (en) 2020-03-12 2020-03-12 Method and device for analyzing incidence relation of objects

Publications (1)

Publication Number Publication Date
CN111400448A true CN111400448A (en) 2020-07-10

Family

ID=71434201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010169167.6A Pending CN111400448A (en) 2020-03-12 2020-03-12 Method and device for analyzing incidence relation of objects

Country Status (1)

Country Link
CN (1) CN111400448A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560480A (en) * 2020-12-24 2021-03-26 北京百度网讯科技有限公司 Task community discovery method, device, equipment and storage medium
CN112883733A (en) * 2020-12-09 2021-06-01 成都中科大旗软件股份有限公司 Analysis method for quickly constructing event relation based on text entity extraction
CN114416990A (en) * 2022-01-17 2022-04-29 北京百度网讯科技有限公司 Object relationship network construction method and device and electronic equipment

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140188830A1 (en) * 2012-12-27 2014-07-03 Sas Institute Inc. Social Community Identification for Automatic Document Classification
CN106649422A (en) * 2016-06-12 2017-05-10 中国移动通信集团湖北有限公司 Keyword extraction method and apparatus
CN107436875A (en) * 2016-05-25 2017-12-05 华为技术有限公司 File classification method and device
CN107729466A (en) * 2017-10-12 2018-02-23 杭州中奥科技有限公司 Construction method, device and the electronic equipment of relational network
CN108280458A (en) * 2017-01-05 2018-07-13 腾讯科技(深圳)有限公司 Group relation kind identification method and device
CN109543078A (en) * 2018-10-18 2019-03-29 深圳云天励飞技术有限公司 Social relationships determine method, apparatus, equipment and computer readable storage medium
CN110188191A (en) * 2019-04-08 2019-08-30 北京邮电大学 A kind of entity relationship map construction method and system for Web Community's text
CN110390039A (en) * 2019-07-25 2019-10-29 广州汇智通信技术有限公司 Social networks analysis method, device and the equipment of knowledge based map
CN110457603A (en) * 2019-08-16 2019-11-15 中国电子信息产业集团有限公司第六研究所 Customer relationship abstracting method, device, electronic equipment and readable storage medium storing program for executing
CN110532451A (en) * 2019-06-26 2019-12-03 平安科技(深圳)有限公司 Search method and device for policy text, storage medium, electronic device
CN110555172A (en) * 2019-08-30 2019-12-10 京东数字科技控股有限公司 user relationship mining method and device, electronic equipment and storage medium
CN110610434A (en) * 2019-09-04 2019-12-24 成都威嘉软件有限公司 Community discovery method based on artificial intelligence
CN110647590A (en) * 2019-09-23 2020-01-03 税友软件集团股份有限公司 Target community data identification method and related device
CN110705301A (en) * 2019-09-30 2020-01-17 京东城市(北京)数字科技有限公司 Entity relationship extraction method and device, storage medium and electronic equipment

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140188830A1 (en) * 2012-12-27 2014-07-03 Sas Institute Inc. Social Community Identification for Automatic Document Classification
CN107436875A (en) * 2016-05-25 2017-12-05 华为技术有限公司 File classification method and device
CN106649422A (en) * 2016-06-12 2017-05-10 中国移动通信集团湖北有限公司 Keyword extraction method and apparatus
CN108280458A (en) * 2017-01-05 2018-07-13 腾讯科技(深圳)有限公司 Group relation kind identification method and device
CN107729466A (en) * 2017-10-12 2018-02-23 杭州中奥科技有限公司 Construction method, device and the electronic equipment of relational network
CN109543078A (en) * 2018-10-18 2019-03-29 深圳云天励飞技术有限公司 Social relationships determine method, apparatus, equipment and computer readable storage medium
CN110188191A (en) * 2019-04-08 2019-08-30 北京邮电大学 A kind of entity relationship map construction method and system for Web Community's text
CN110532451A (en) * 2019-06-26 2019-12-03 平安科技(深圳)有限公司 Search method and device for policy text, storage medium, electronic device
CN110390039A (en) * 2019-07-25 2019-10-29 广州汇智通信技术有限公司 Social networks analysis method, device and the equipment of knowledge based map
CN110457603A (en) * 2019-08-16 2019-11-15 中国电子信息产业集团有限公司第六研究所 Customer relationship abstracting method, device, electronic equipment and readable storage medium storing program for executing
CN110555172A (en) * 2019-08-30 2019-12-10 京东数字科技控股有限公司 user relationship mining method and device, electronic equipment and storage medium
CN110610434A (en) * 2019-09-04 2019-12-24 成都威嘉软件有限公司 Community discovery method based on artificial intelligence
CN110647590A (en) * 2019-09-23 2020-01-03 税友软件集团股份有限公司 Target community data identification method and related device
CN110705301A (en) * 2019-09-30 2020-01-17 京东城市(北京)数字科技有限公司 Entity relationship extraction method and device, storage medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘锦文等: "基于信息关联拓扑的互联网社交关系挖掘", 《计算机应用》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883733A (en) * 2020-12-09 2021-06-01 成都中科大旗软件股份有限公司 Analysis method for quickly constructing event relation based on text entity extraction
CN112560480A (en) * 2020-12-24 2021-03-26 北京百度网讯科技有限公司 Task community discovery method, device, equipment and storage medium
CN112560480B (en) * 2020-12-24 2023-08-15 北京百度网讯科技有限公司 Task community discovery method, device, equipment and storage medium
CN114416990A (en) * 2022-01-17 2022-04-29 北京百度网讯科技有限公司 Object relationship network construction method and device and electronic equipment
CN114416990B (en) * 2022-01-17 2024-05-21 北京百度网讯科技有限公司 Method and device for constructing object relation network and electronic equipment

Similar Documents

Publication Publication Date Title
US9477747B2 (en) Method and apparatus for acquiring hot topics
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
US20150302197A1 (en) Apparatus and Method for Identifying Similarity Via Dynamic Decimation of Token Sequence N-Grams
CN108491388B (en) Data set acquisition method, classification method, device, equipment and storage medium
CN107872454B (en) Threat information monitoring and analyzing system and method for ultra-large Internet platform
CN111400448A (en) Method and device for analyzing incidence relation of objects
CN106874253A (en) Recognize the method and device of sensitive information
CN107145516B (en) Text clustering method and system
CN110362601B (en) Metadata standard mapping method, device, equipment and storage medium
CN108549723B (en) Text concept classification method and device and server
CN113076735A (en) Target information acquisition method and device and server
CN112199588A (en) Public opinion text screening method and device
CN111680498B (en) Entity disambiguation method, device, storage medium and computer equipment
CN113486664A (en) Text data visualization analysis method, device, equipment and storage medium
CN112328805A (en) Entity mapping method of vulnerability description information and database table based on NLP
CN114817243A (en) Method, device and equipment for establishing database joint index and storage medium
CN112579781B (en) Text classification method, device, electronic equipment and medium
CN114116997A (en) Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium
CN108073567B (en) Feature word extraction processing method, system and server
CN113705164A (en) Text processing method and device, computer equipment and readable storage medium
CN105512270B (en) Method and device for determining related objects
CN104462439A (en) Event recognizing method and device
CN116680401A (en) Document processing method, document processing device, apparatus and storage medium
CN107992501B (en) Social network information identification method, processing method and device
CN112435151B (en) Government information data processing method and system based on association analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220920

Address after: 25 Financial Street, Xicheng District, Beijing 100033

Applicant after: CHINA CONSTRUCTION BANK Corp.

Address before: 25 Financial Street, Xicheng District, Beijing 100033

Applicant before: CHINA CONSTRUCTION BANK Corp.

Applicant before: Jianxin Financial Science and Technology Co.,Ltd.

TA01 Transfer of patent application right