CN111190965A - Text data-based ad hoc relationship analysis system and method - Google Patents

Text data-based ad hoc relationship analysis system and method Download PDF

Info

Publication number
CN111190965A
CN111190965A CN201811360803.2A CN201811360803A CN111190965A CN 111190965 A CN111190965 A CN 111190965A CN 201811360803 A CN201811360803 A CN 201811360803A CN 111190965 A CN111190965 A CN 111190965A
Authority
CN
China
Prior art keywords
data
text
information
entity
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811360803.2A
Other languages
Chinese (zh)
Other versions
CN111190965B (en
Inventor
尚林林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chen Rui Corp
Original Assignee
Chen Rui Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chen Rui Corp filed Critical Chen Rui Corp
Priority to CN201811360803.2A priority Critical patent/CN111190965B/en
Publication of CN111190965A publication Critical patent/CN111190965A/en
Application granted granted Critical
Publication of CN111190965B publication Critical patent/CN111190965B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a system and a method for analyzing an ad hoc relationship based on text data, which are used for defining analysis rules of various different types of service text data through a text data access module so as to be beneficial to flexibly using various text data; the relation model configuration module is used for setting a data relation model for the accessed text data information, so that a user can flexibly set the relation model according to the service requirement of the user; establishing a mapping relation between the accessed text data and a data relation model through an ad hoc relation mapping module; the ad hoc task management module controls the data relation model to perform relation analysis on the accessed text data; the analysis result data information is visually presented in a visual relation graph mode, secondary data relation analysis can be carried out on the presented data relation, further, deep invisible relations among the data can be found, and more valuable data information can be provided for business analysis.

Description

Text data-based ad hoc relationship analysis system and method
Technical Field
The invention relates to a relation analysis system for text data rule processing, data relation model definition and data processing ad hoc visual display, in particular to an ad hoc relation analysis system and method for performing ad hoc visual display and secondary relation analysis on text data.
Background
With the continuous progress of human social science and technology and the rapid development of internet technology and computer technology, a large amount of text data of various types are accumulated in various industries and different business departments, and the text data are stored in a scattered manner and have poor relevance. How to effectively integrate the text data, process the text data according to rules, and display the processed data relationship in a visual mode, so as to discover the implicit relationship among the data and the value of the deep level inside the data, which is a problem to be solved urgently at present.
In order to solve the problems, various text data analysis systems, such as a text data management system (mainly realizing classified uploading management of text data), a text data storage system (mainly realizing classified storage and retrieval of text data), a text data full-text search system (mainly realizing full-text indexing of text data and searching according to keywords) and the like, appear in the market, but the main principles of the text data analysis systems are classified storage, query according to conditions and full-text search of text data based on the text data; for the last acquired data or one or more text files, the association relationship among the file contents is difficult to find, and particularly the data association relationship among a plurality of files is difficult to find; therefore, the text data application analysis systems cannot well meet the actual requirements of services, cannot acquire the relevance among the text data, and are particularly difficult to find the hidden deep-level relevance among a plurality of text data.
Due to the above problems, the present inventors have studied and analyzed the related technologies such as simply setting processing rules and flexibly defining a data relationship model for the existing text data, so as to develop a text data-based ad hoc relationship analysis system and method that can easily access various text data sources, flexibly set a data relationship model, and perform a visual relationship display and a secondary relationship analysis on the processed data, thereby finding a deep-level association relationship of the data.
Disclosure of Invention
In order to overcome the problems, the inventor of the present invention has made intensive research and designs a text data-based ad hoc relationship analysis system and method, which define parsing rules for various different types of service text data through a text data access module, so as to flexibly use/call various text data sources; the data relation model is established through the relation model configuration module, so that a user can flexibly set the relation model according to the service requirement of the user; the accessed text data is subjected to ad hoc relation mapping through the set data relation model, the data relation is visually presented in a visual mode, further, secondary data relation analysis can be performed on the presented data relation, further, deep invisible relations among the data can be found, more valuable data information is provided for business analysis, and therefore the method and the device are completed.
The invention aims to provide the following technical scheme:
(1) an ad hoc relation analysis system based on text data comprises an application system 1, a presentation system 2 and a data system 3;
wherein, application system 1 includes:
the text data access module 11 is used as a data source module to link with the text data of the service text library 31, and transmits the text data to the ad hoc relationship mapping module 12 and the relationship model configuration module 13;
the relation model configuration module 13 is used for receiving the text data transmitted by the text data access module 11, setting a data relation model based on the text data and the service requirement, and transmitting the set data relation model to the ad hoc relation mapping module 12;
the chairman relationship mapping module 12 is used for receiving the text data transmitted by the text data access module 11 and the data relationship model information transmitted by the relationship model configuration module 13, mapping the text data and the data relationship model, and transmitting the mapping relationship to the chairman task management module 14;
the ad hoc task management module 14 is used for controlling the data relation model to perform relation analysis on the accessed text data and transmitting analysis result data information generated in the process of operating the data relation model to the data relation visualization display module 21;
presentation system 2 comprises:
and a data relationship visualization display module 21 for receiving the analysis result data information transmitted by the ad hoc task management module 14 and visually displaying the analysis result data information in a relationship graph manner.
Preferably, the display system (2) further comprises a secondary data relationship analysis module (22) which performs secondary data analysis on the data information displayed by the data relationship visualization display module (21) to find out deep invisible relationships among the data; the preferable secondary analysis mode comprises object clustering analysis, relation path analysis and fuzzy retrieval of diagram content.
In a preferred embodiment, the text data access module 11 includes:
the data source configuration submodule is used for establishing a database link for accessing the service text database (31) and configuring information of the accessed text data;
the data access sub-module accesses the text data, and the access mode comprises an importing mode; the data access sub-module can access full text data information or partial text data information;
the data analysis submodule carries out formatting processing on the accessed text data according to a set analysis rule;
the data preview sub-module is used for presenting the analyzed text data and judging whether the analyzed data format meets the requirements by business personnel;
and the data output submodule transmits the analyzed data to the corresponding data receiving module.
In a preferred embodiment, the relational model configuration module 13 includes:
a data relationship model name submodule to store names of data relationship models;
the entity configuration submodule is used for configuring two entities to be associated and storing entity information;
the link configuration submodule is used for storing data relation information of the data relation model;
and the data relation model classification submodule is used for setting the classification of the data relation model.
In a preferred embodiment, the data system 3 comprises:
a service text library 31, which comprises a call ticket record text library, a fund transaction text library and other conventional text data;
and the system configuration library 32 is used for storing data information generated inside the system, and comprises text access rule information, ad hoc relationship mapping information and ad hoc task management information.
And a relational model library 33 for storing the relational model basic information, the entity attribute information, and the link attribute information, which are stored by the relational model configuration module 13 when the relational model is defined.
(2) A method for analyzing an ad hoc relationship based on text data, the method comprising the steps of:
step 1), accessing text data: configuring data source information of a service text library 31 to be accessed, and accessing text data of the service text library 31;
step 2), defining a data relation model: defining a data relation model according to the text data accessed in the step 1), wherein the data relation model comprises two entities to be associated;
step 3), the chairman relationship mapping step: mapping entity attributes in the data relation model and corresponding field names of the accessed text data to obtain a mapping relation;
step 4), the data relation model operation step: executing and monitoring the associated tasks of the data relation model set in the step 3), and generating analysis result data information in the operation process of the data relation model;
step 5), displaying result information: performing visual display operation on the analysis result data information generated in the step 4); preferred presentation means include a network layout, a circle analysis, a fan layout, an arch layout.
Preferably, the method further comprises performing secondary operation on the visualized chart data after the data relationship visualization presentation is completed, wherein the secondary operation comprises object clustering analysis, relationship path analysis and fuzzy retrieval of chart content.
The system and the method for analyzing the ad hoc relationship based on the text data have the following beneficial effects:
(1) according to the system and the method for analyzing the ad hoc relationship based on the text data, the text access rule is set, so that the type of the accessed text and the text content processing rule can be flexibly defined, the types of the accessed and processed text data are wider, the operation is more convenient and faster, and the practicability of the system is improved;
(2) according to the system and the method for analyzing the ad hoc relationship based on the text data, disclosed by the invention, the entity is configured based on the accessed text data, and the entity is used as a medium, so that a data relationship analysis model can be flexibly configured, the data mapping modes are various in the text data processing process, and the flexibility of the system and the subjective activity of people are improved;
(3) the system and the method for analyzing the ad hoc relationship based on the text data can define the execution mode of the text processing task in multiple modes and monitor the execution state and the execution progress of the text processing task in real time by arranging the ad hoc task management module;
(4) according to the system and the method for analyzing the ad hoc relationship based on the text data, the relationship among the data can be visually displayed by arranging the data relationship visual display module;
(5) the system and the method for analyzing the ad hoc relationship based on the text data have a secondary data relationship analysis function, can further discover deep invisible relationships among data, and are convenient for a user to carry out comprehensive data analysis and data mining work.
Drawings
FIG. 1 illustrates a schematic structural diagram of a textual data-based ad hoc relationship analysis system according to a preferred embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating data flow between modules in an application architecture and a presentation architecture in accordance with a preferred embodiment of the present invention;
FIG. 3 illustrates a data representation of a system configuration library according to a preferred embodiment of the present invention;
FIG. 4 illustrates a data representation intent of a relational model library in accordance with a preferred embodiment of the present invention;
FIG. 5 illustrates a flow diagram of the functional structure of an ad hoc task management module in accordance with a preferred embodiment of the present invention;
FIG. 6 illustrates a flow diagram of a visualization relationship analysis in accordance with a preferred embodiment of the present invention;
FIG. 7 illustrates a flow diagram of the business operations of a text data based ad hoc relationship analysis system in accordance with a preferred embodiment of the present invention;
fig. 8 shows a visualization display result in a network form according to embodiment 1 of the present invention.
The reference numbers illustrate:
1-application system;
11-a text data access module;
12-an immittance mapping module;
13-a relational model configuration module;
14-an ad hoc task management module;
2-a presentation system;
21-a data relationship visualization display module;
22-a secondary data relationship analysis module;
3-a data system;
31-service text library;
32-a system configuration library;
33-library of relational models.
Detailed Description
The invention is explained in more detail below with reference to the figures and examples. The features and advantages of the present invention will become more apparent from the description.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
In order to effectively develop data with large-scale effect from the perspective of data storage, management and data analysis to obtain effective integration among the data, discover the implicit relationship among the data and the value of the deep level inside the data, and show the data in a clear and quantitative manner, the invention provides an ad hoc relationship analysis system based on text data, which is shown in fig. 1 and 2 and comprises an application system 1, a showing system 2 and a data system 3;
wherein, application system 1 includes:
the text data access module 11 is used as a data source module to link with the text data of the service text library 31, and transmits the text data to the ad hoc relationship mapping module 12 and the relationship model configuration module 13;
the relation model configuration module 13 is used for receiving the text data transmitted by the text data access module 11, setting a data relation model based on the text data and the service requirement, and transmitting the set data relation model to the ad hoc relation mapping module 12;
the chairman relationship mapping module 12 is used for receiving the text data transmitted by the text data access module 11 and the data relationship model information transmitted by the relationship model configuration module 13, mapping the text data and the data relationship model, and transmitting the mapping relationship to the chairman task management module 14;
the ad hoc task management module 14 is used for controlling the data relation model to perform relation analysis on the accessed text data and transmitting analysis result data information generated in the process of operating the data relation model to the data relation visualization display module 21;
presentation system 2 comprises:
and a data relationship visualization display module 21 for receiving the analysis result data information transmitted by the ad hoc task management module 14 and visually displaying the analysis result data information in a relationship graph manner.
In the present invention, the text data access module 11 receives text data from the service text library 31, including structured data and semi-structured data (semi-structured), and the storage file format includes txt document, Excel document, XML document, CSV document, etc. The structured data refers to data which can be represented by data or a uniform structure and can be stored in a two-dimensional table structure; semi-structured data is data that is intermediate between structured data (e.g., relational databases) and unstructured data (e.g., sound, image files, etc.), and its structure and content are mixed together without obvious distinction.
The text data access module 11 comprises two text data access processes, wherein the text data accessed in the first text data access process is used for mapping the text data with the data relation model through the chairman relation mapping module 12; the mapping refers to a process of linking field names of the text data with entity attributes in the data relationship model. For the purpose of this process, it is preferable that the first text data access process accesses a part of text information including a field name, for example, the first ten rows of data information including a field name of an Excel document.
The second text data access process is used for data relation analysis through a data relation model. At this time, the accessed text data is all data information in the text.
In a preferred embodiment, the text data access module 11 includes:
the data source configuration submodule is used for establishing a database link for accessing the service text library 31 and configuring information of the accessed text data;
the service text base can include a plurality of databases, each database contains a plurality of text documents, the data source configuration sub-module firstly establishes and maintains database access information (such as database description, database storage path and the like) from which the text data information is extracted, and then establishes and maintains access information (text document name, text document annotation, text document storage path and the like) of the text documents under the databases, and at the moment, the data source configuration sub-module is directly connected to the text documents of which the data information is to be extracted.
As shown in fig. 3, the data source configuration sub-module completes access management of the text data source by adding, modifying and deleting the text data access rule, the main key of the "text access rule table" is the "rule identifier", a unique data source identifier is generated every time a text data source is established, the representation is automatically generated according to the set rule, and the set generation rule is that a system timestamp is added to a new data source batch; "rule name" is a description of the data source accessing user-defined input; the text storage path is a path for accessing text data; the 'text type identifier' corresponds to the main key 'type identifier' of the 'text type table', and is represented by a character string; the "division rule identifier" corresponds to the primary key "rule identifier" of the "division rule table", and is represented by a character string.
The text type table is formed by summarizing and summarizing in advance according to the format of a common text file, and the type name shows a txt document, an Excel document, an XML document, a CSV document and the like; the "type descriptions" describe the above-described storage file formats, respectively.
The 'division rule table' is obtained by generalizing and summarizing according to a common data storage format in advance, and the 'rule name' comprises ','; "or" | ", etc.; "rule definition" only formats txt files (where the fields in txt documents are often split in ","; "or" | "), and no selection is required for Excel documents, XML documents, CSV documents, etc. having standard rule formats.
The data access sub-module accesses the text data, and the access mode comprises an importing mode; the data access sub-module can access full text data information or partial text data information;
as shown in fig. 3, the data access sub-module implements management of the "text access rule table" information by adding, deleting and modifying the access text information.
The data analysis submodule carries out formatting processing on the accessed text data according to a set analysis rule; the parsing rule is used for guiding the calling of the original text data aiming at data in different storage formats:
(a) for the data stored in the txt document, the parsing rule is as follows: processing according to a rule of text content segmentation, and using an analysis technology to read a text line by line and then segment, judge and obtain the text according to the rule;
(b) for the data stored in the Excel document, the analysis rule is as follows: acquiring data according to a cell rule of the POI component, and using the POI component by an analysis technology;
(c) for the data stored in the XML document, the parsing rule is as follows: acquiring data according to the document structure mark, wherein Jdom, dom4j and the like are used as analysis technologies;
(d) for the data stored in the CSV document, the parsing rule is as follows: the text is processed according to default segmentation rules, and the analysis technology reads the text line by line and then segments, judges and acquires the text according to the rules.
The data preview sub-module is used for presenting the analyzed text data and judging whether the analyzed data format meets the requirements or not by a service worker; if the requirement is not met, the analysis rule is replaced and reformatted, and if the requirement is met, the data output of the data output submodule is allowed;
and the data output sub-module is used for transmitting the analyzed text data to the corresponding data receiving module (in a model establishing stage, the analyzed text data are transmitted to the ad hoc relationship mapping module 12 and the relationship model configuration module 13, and in a text data association stage, the analyzed text data are transmitted to the ad hoc relationship mapping module 12).
In the invention, the relation model configuration module 13 comprises four sub-modules, namely a data relation model name sub-module, an entity configuration sub-module, a link configuration sub-module and a data relation model category sub-module;
a data relationship model name submodule to store names of data relationship models;
the entity configuration submodule is used for configuring two entities to be associated and storing entity information; at least two entities can be generated from one text, attribute information of two entities to be associated forming the data relation model is originated from data fields in the same type of text, and for example, data information of the two entities to be associated is originated from a call record file.
The original data is separately packaged according to two entities (such as entity A and entity B), and the two entities are connected A, B through a link, so that the purpose of the visual presentation of the analysis result is to more intuitively present the relationship between the data.
The link configuration submodule is used for storing data relation information of the data relation model;
and the data relation model classification submodule is used for setting the classification of the data relation model.
In the invention, the information of the entity comprises the name of the entity A, the type of the entity A, the attribute of the entity A, the name of the entity B, the type of the entity B and the attribute of the entity B; a plurality of attributes of the entity (the attribute of the entity A or the attribute of the entity B) can be added according to business needs, and the entity attributes are mapped with field names in corresponding text data subsequently for data relation analysis among different entities.
In the present invention, the data relationship information includes a link name between two entities and a link attribute between the two entities.
As shown in fig. 4, the relational model configuration module 13 adds, modifies and queries data information of the "relational model table" to achieve the purpose of configuring the relational model; the relational model table is stored in the relational model library 33.
The "model name" in the "relational model table" refers to the name of the established model, such as the call record relational model; the 'model type identification' field corresponds to the 'type identification' field in the 'model type table', the field is represented by character strings and corresponds to the type information of the model, and the model type refers to the character type, the organization type, the communication type, the address type and the like; the fields of the 'entity A name' and the 'entity B name' correspond to the field of the 'entity name' in the 'entity attribute table', are expressed by character strings and correspond to the names of the configured entities; the fields of the entity A category identification and the entity B category identification correspond to a main key of an entity category table, namely the category identification, which is expressed by character strings and corresponds to the category information of the configured entity; the "relationship name" corresponds to the "relationship name" field of the "relationship attribute table" and indicates a call relationship, a transaction relationship, and the like.
As shown in fig. 4, in the "entity attribute table," attribute identification "refers to defining a unique english code for an attribute, and an attribute name refers to a chinese description for the attribute definition. In the relational attribute table, the attribute identifier is used for defining unique English codes for the attributes, and the attribute name is used for defining Chinese description for the attributes. The entity attribute table and the relation attribute table are generated by configuration according to a data relation model which needs to be established.
The main key in the entity category table is "category identification", "category name" refers to Chinese description defined for a category, and "category identification" refers to English only code defined for a category. The primary key in the model category table is "category identification", and the "category description" refers to the definition of the data relationship model. The entity class table and the model class table are obtained by pre-summarizing and summarizing according to the conventional service requirements.
In the invention, the system configures the relation model through the relation model configuration module 13, so that the system can flexibly perform data relation analysis, can flexibly define the analysis model according to the requirements of users, and increases the practicability and flexibility of the system; meanwhile, the application range of the system is also improved, so that the system is suitable for more text data ad hoc analysis fields.
In the present invention, the chairman relationship mapping module 12 establishes mapping between text data and a data relationship model, and as shown in fig. 3, the mapping relationship is stored in a "model mapping rule" of a "text access rule table".
In the present invention, as shown in fig. 3 and 5, the chairman task management module 14 completes the monitoring of the text processing task by adding, modifying and deleting the chairman task management table. The main key of the chairman task management table is 'task name'; the 'execution mode' in the ad hoc task management table is one of timing execution and instant execution; "task state" means whether the task is in a start or stop state; "execution state" refers to whether a task is in the process; "execution progress" refers to the progress of the current data processing completion. The module can flexibly define the execution mode of the processing task and can monitor the execution state and the execution progress of the processing task in real time.
In the present invention, as shown in fig. 6, the data relationship visualization display module 21 mainly visually displays the information relationship of the ad hoc task analysis result data according to different display modes, such as a network layout, a circular analysis, a sector layout, and an arc layout.
For each piece of data analyzed by each data source, entity a, entity B, a link is formed, and then the link includes identification information of entity A, B (the identification information is derived from entity attribute information in the entity information); if the analyzed results of the new data source are fused together and displayed, wherein the entity A, B contained in the link and identified by A, B entity already exists in the loaded data source, the newly added data source link is automatically linked with the existing entity, so that the analyzed data of different data sources can combine the same entity A, B by the identity of the entity, and finally link the entity A, B by the link to form a relationship network.
In a preferred embodiment of the present invention, in order to further discover the deep invisible relationship between the data, the analysis result is subjected to a secondary analysis, and the secondary analysis process is performed by the secondary data relationship analysis module 22. The secondary analysis mode comprises object clustering analysis, relation path analysis, fuzzy retrieval of diagram content and the like, and the displayed data information can be analyzed in multiple angles through the operation, so that a user can make inductive reasoning, potential data relation is mined from the reasoning, the user is helped to find out valuable data support, and correct decision analysis is made.
The object clustering analysis means that an entity with only one link relation in a relation network graph is removed, only entities with more than two relation links are reserved for re-layout, and the incidence relation among a plurality of link entities can be analyzed;
the relationship path analysis means that all link paths between two selected entities in the relationship network graph are analyzed, and the entities on the links are highlighted, so that the entities on all link paths between two entities selected to be concerned can be analyzed;
the fuzzy retrieval of diagram content refers to fuzzy retrieval of attribute information of an analysis interface entity, and is implemented by inputting a retrieval keyword, so that entity information hit by the keyword can be analyzed and highlighted.
In the invention, the data system 3 comprises three libraries, namely a service text library 31, a system configuration library 32 and a relational model library 33;
a service text library 31, which comprises a call ticket record text library, a fund transaction text library and other conventional text data;
the service text base 31 is a text database of an enterprise, the analysis system can access text data to be analyzed through the text data access module 11, and the service text base 31 stores text information data of each unit, such as call ticket record information, fund transaction information and the like; meanwhile, the field to which the service text library belongs is not particularly limited, and the service text library can be a call ticket recording text library, a fund transaction text library and other conventional text data, namely the service text library can comprise all libraries for recording text data; due to the application of the service text library, the application range of the system is wide, and the practicability of the system is improved. As described above, the text data information includes structured data and semi-structured data (semi-structured), and the storage file format includes txt document, Excel document, XML document, CSV document, and the like.
The system configuration library 32 is configured to store data information generated inside the system, wherein the data information includes text access rule information, ad hoc relationship mapping information, and ad hoc task management information.
The system configuration library 32 is a core library of the text data-based ad hoc relationship analysis system, and is used for storing data information generated by the system configuration, as shown in fig. 3: the system comprises four data tables, namely a text type table, a segmentation rule table, a text access rule table and an ad hoc task management table; these four data sheets provide data support for the system while the system is running.
And a relational model library 33 for storing the relational model basic information, the entity attribute information, and the link attribute information, which are stored by the relational model configuration module 13 when the relational model is defined.
As shown in fig. 4: the relational model library 33 includes an entity attribute table, an entity category table, a relational model table, a relational attribute table, and a model category table.
As shown in fig. 7, another aspect of the present invention is to provide a textual data-based ad hoc relationship analysis method, which is preferably implemented by the textual data-based ad hoc relationship analysis system described above, and includes the following steps:
(1) s1, accessing text data: configuring data source information of a service text library 31 to be accessed, accessing text data of the service text library 31, and performing data formatting; the text data in the service text library 31 is called, that is, the formatting process is performed.
(2) S2, defining a data relation model step: defining a data relation model according to the text data information accessed in the step S1, wherein the data relation model comprises two entities to be associated; at least two entities can be generated in one text, and the attribute information of the two entities to be associated is originated from the text data fields of the same type.
(3) S3, the chairman mapping step: mapping entity attributes in the data relation model and corresponding field names of the accessed text data to obtain a mapping relation;
(4) s4, the data relation model operation step: executing and monitoring the associated tasks of the data relationship model set in the step S3, and generating processing result data information in the running process of the data relationship model;
(5) s5, a result information display step: performing visualization display operation on the processing result data information generated in step S4; such as a network layout, a circular analysis, a fan layout, an arch layout.
Preferably, after the visualization of the data relationship is completed, performing a secondary operation S6 on the visualized chart data to further analyze the deep-level association relationship between the data. The secondary operation comprises object clustering analysis, relation path analysis (chart link path analysis), fuzzy retrieval of chart content and the like.
In a preferred embodiment, step S2 includes the following sub-steps:
inputting relation model name information;
configuring entity information, configuring entity A information and configuring entity B information;
inputting the name of an entity A;
inputting an entity A type;
a plurality of attributes of the input entity A can be added according to business requirements;
inputting an entity B name;
inputting an entity B type;
a plurality of attributes of the input entity B can be added according to business requirements;
configuring link information;
inputting a link name;
a plurality of link attributes can be input according to business requirements;
and inputting a relation model type.
Examples
Example 1
1. A business text base, a call record text base, a fund transaction text base and a personnel information text base, wherein the call record text base comprises a plurality of call record texts (file 1); the fund transaction text base comprises a plurality of fund transaction texts (file 2); the person information text base includes a plurality of person information texts (file 3);
2. the text data access module 11 connects the service text library with the text data access module 11 in an importing mode;
3. the text data access module 11 respectively accesses a call record file, a fund transaction file and a personnel information file and transmits the information of the three data files to the relation model configuration module 13;
4. the relational model is set by the relational model configuration module 13:
specifically, the method comprises the following steps:
(1) call record relationship model:
(a) definition of call record relationship model: the call log information is a call log file,
(b) defining a relational model name: a call record relationship model that is used to model call records,
(c) defining a relational model class: selecting a communication relation type;
(d) define entity a name: own number;
defining entity A attributes: own number;
defining entity A type: a telephone;
(e) define entity B name: the other party's telephone;
defining entity B attributes: the number of the other party;
define entity B type: a telephone;
(f) defining the link name: a conversation relationship;
defining the link attribute: time of call, type of call
(2) Fund transaction relationship model:
(a) definition of the fund transaction relationship model: the fund transaction information is a transaction record file;
(b) defining a relational model name: a fund transaction relationship model;
(c) defining a relational model class: selecting a transaction relationship type;
(d) define entity a name: the method comprises the steps of a, setting a local account;
defining entity A attributes: the method comprises the steps of a, setting a local account;
defining entity A type: an account number;
(e) define entity B name: the account of the other party;
defining entity B attributes: the account of the other party;
define entity B type: account number
(f) Defining the link name: a transaction relationship;
defining the link attribute: transaction time, transaction type;
(3) personnel information conversation relationship model:
(a) definition of a personnel information conversation relation model: the personnel information is a personnel information file;
(b) defining a relational model name: a personnel information conversation relationship model;
(c) defining a relational model class: selecting a personnel conversation relationship type;
(d) define entity a name: personnel information;
defining entity A attributes: name, certificate number;
defining entity A type: personnel;
(e) define entity B name: a telephone number;
defining entity B attributes: a contact way;
define entity B type: a telephone;
(f) defining the link name: a communication mode;
defining the link attribute: is free of
(4) Personnel information transaction relationship model:
(a) definition of a personnel information transaction relationship model: the personnel information is a personnel information file;
(b) defining a relational model name: personnel information transaction relation model
(c) Defining a relational model class: selecting a personnel transaction relationship type;
(d) define entity a name: personnel information;
defining entity A attributes: name, certificate number;
defining entity A type: personnel;
(e) define entity B name: a bank account number;
defining entity B attributes: a bank card number;
define entity B type: an account number;
(f) defining the link name: a funding account number;
defining the link attribute: none.
5. The chairman task management module 14 executes a data relationship model: the generated processing result data information comprises a self-number entity (table 1), an opposite-party number entity (table 2), a call relation link (table 3), a self-account entity (table 4), an opposite-party account entity (table 5), a transaction relation link (table 6), a personnel information entity (table 7), a telephone number entity (table 8), a communication mode link (table 9), a bank account entity (table 10) and a fund account link (table 11);
6. the data relationship visualization display module 21 performs visualization analysis on the data relationship, and may also perform entity clustering analysis, link path analysis, and icon fuzzy retrieval based on the visualization analysis result, and the specific visualization display is shown in fig. 8. The visualization data is formed by the following method: for each piece of data analyzed by each data source, an entity A, an entity B and a link are formed, and then the link contains identification information of the entity A, B; if the analyzed results of the new data source are fused together and displayed, wherein the entity A, B contained in the link and identified by A, B entity already exists in the loaded data source, the newly added data source link is automatically linked with the existing entity, so that the analyzed data of different data sources can combine the same entity A, B by the identity of the entity, and finally link the entity A, B by the link to form a relationship network.
7. The file and data information is as follows:
(a) call log file (File 1)
Own number, call time, call type, and other number
1362634xxxx, 2018-06-0612: 01:02, Caller, 1386649xxxx
1362634xxxx, 2018-08-0816: 05:08, Called, 1392637xxxx
……
The original file of the call record file (file 1) is a txt file, and after each row of data is analyzed according to commas by an analysis rule, four data fields corresponding to the file 1 are obtained, namely own number, call time, call type and opposite number.
(b) Transaction log file (File 2)
The account number of the party, the transaction time, the transaction type and the account number of the opposite party
52189909abcd4406,2018-04-0810: 01:02, roll-out, 52189909abcd8606
52189909abcd4406,2018-07-0914: 05:08, transgenic 52189909abcd6789
……
The original file of the transaction record file (file 2) is a txt file, and after each row of data is analyzed according to commas by an analysis rule, four fields of a data field corresponding to the file 2, namely the account number of the party, the transaction time, the transaction type and the account number of the opposite party, are obtained.
(c) Personnel information file (File 3)
Certificate number, name, bank card number and contact way
11018219750120abcd Zhang III, 52189909abcd4406,1362634xxxx
11018219760118abcd, Liqu, 52189909abcd8606,1386649xxxx
11018219780125abcd, Wang Wu, 52189909abcd6789,1392637xxxx
……
The original file of the personnel information file (file 3) is a txt file, and after each row of data is analyzed according to commas by an analysis rule, the data fields corresponding to the file 3 are obtained as four fields of certificate numbers, names, bank card numbers and contact ways.
Caller number entity (Table 1)
Entity name Entity attributes Entity type
Own number 1362634xxxx Telephone set
……
Opposite party number entity (Table 2)
Entity name Entity attributes Entity type
Number of the other party 1386649xxxx Telephone set
Number of the other party 1392637xxxx Telephone set
……
Call relation Link (Table 3)
Link name Entity A identity Entity B identity Link attributes
Call relations 1362634xxxx 1386649xxxx 2018-06-0612: 01:02, caller
Call relations 1362634xxxx 1392637xxxx 2018-08-0816: 05:08, called party
……
Account entity of this account (Table 4)
Entity name Entity attributes Entity type
The account of this party 52189909abcd4406 Account number
……
Account entity of the other side (Table 5)
Entity name Entity attributes Entity type
Account number of the other party 52189909abcd8606 Account number
Account number of the other party 52189909abcd 6789 Account number
……
Trade relation link (Table 6)
Link name Entity A identity Entity B identity Link attributes
Transaction relationships 52189909abcd4406 52189909abcd8606 2018-04-0810: 01:02, roll out
Transaction relationships 52189909abcd4406 52189909abcd6789 2018-07-0914: 05:08, and transferring into
……
Personnel information entity (Table 7)
Entity name Entity attributes Entity type
Personnel information 11018219750120abcd Zhang san Personnel
Personnel information 11018219760118abcd and Liquan Personnel
Personnel information 11018219780125abcd, wangwu Personnel
……
Telephone number entity (Table 8)
Entity name Entity attributes Entity type
Contact means 1362634xxxx Telephone set
Contact means 1386649xxxx Telephone set
Contact means 1392637xxxx Telephone set
……
Communication mode link (watch 9)
Link name Entity A identity Entity B identity Link attributes
Communication method 11018219750120abcd 1362634xxxx
Communication method 11018219760118abcd 1386649xxxx
Communication method 11018219780125abcd 1392637xxxx
……
Bank number entity (Table 10)
Entity name Entity attributes Entity type
Bank account number 52189909abcd4406 Account number
Bank account number 52189909abcd8606 Account number
Bank account number 52189909abcd6789 Account number
……
Funding account number link (Table 11)
Link name Entity A identity Entity B identity Link attributes
Funding account 11018219750120abcd 52189909abcd4406
Funding account 11018219760118abcd 52189909abcd8606
Funding account 11018219780125abcd 52189909abcd6789
……
The present invention has been described above in connection with preferred embodiments, but these embodiments are merely exemplary and merely illustrative. On the basis of the above, the invention can be subjected to various substitutions and modifications, and the substitutions and the modifications are all within the protection scope of the invention.

Claims (10)

1. An ad hoc relation analysis system based on text data is characterized by comprising an application system (1), a presentation system (2) and a data system (3);
wherein, the application system (1) comprises:
the text data access module (11) is used as a data source module and is linked with the text data of the service text library (31), and transmits the text data to the ad hoc relationship mapping module (12) and the relationship model configuration module (13);
the relation model configuration module (13) is used for receiving the text data transmitted by the text data access module (11), setting a data relation model based on the text data and the service requirement, and transmitting the set data relation model to the ad hoc relation mapping module (12);
the ad hoc relationship mapping module (12) receives the text data transmitted by the text data access module (11) and the data relationship model information transmitted by the relationship model configuration module (13), maps the text data and the data relationship model, and transmits the mapping relationship to the ad hoc task management module (14);
the ad hoc task management module (14) is used for controlling the data relation model to perform relation analysis on the accessed text data and transmitting analysis result data information generated in the process of operating the data relation model to the data relation visualization display module (21);
the presentation system (2) comprises:
and the data relation visualization display module (21) is used for receiving the analysis result data information transmitted by the ad hoc task management module (14) and visually displaying the analysis result data information in a relation graph mode.
2. The analysis system according to claim 1, wherein the text data information received by the text data access module (11) from the service text repository (31) comprises structured data and semi-structured data, and the storage file format comprises txt document, Excel document, XML document, CSV document.
3. Analysis system according to claim 1, characterized in that the text data access module (11) comprises:
the data source configuration submodule is used for establishing a database link for accessing the service text database (31) and configuring information of the accessed text data;
the data access sub-module accesses the text data, and the access mode comprises an importing mode; the data access sub-module can access full text data information or partial text data information;
the data analysis submodule carries out formatting processing on the accessed text data according to a set analysis rule;
the data preview sub-module is used for presenting the analyzed text data and judging whether the analyzed data format meets the requirements by business personnel;
and the data output submodule transmits the analyzed data to the corresponding data receiving module.
4. The analysis system according to claim 1, wherein the relational model configuration module (13) comprises:
a data relationship model name submodule to store names of data relationship models;
the entity configuration submodule is used for configuring two entities to be associated and storing entity information;
the link configuration submodule is used for storing data relation information of the data relation model;
and the data relation model classification submodule is used for setting the classification of the data relation model.
5. The analysis system of claim 4, wherein the entity information includes entity name, entity type and entity attribute, wherein the entity attribute can be added in plurality according to service requirement, and the entity attribute information of two entities to be associated is originated from text data field of the same type.
6. The analysis system according to claim 1, wherein the presentation system (2) further comprises a secondary data relationship analysis module (22) for performing secondary data analysis on the data information of the data relationship visualization presentation module (21) to find out deep invisible relationships between the data;
preferably, the secondary analysis mode comprises object clustering analysis, relationship path analysis and fuzzy retrieval of diagram content.
7. The analytical system of claim 1, wherein data hierarchy 3 comprises:
a service text base (31) comprising a call ticket record text base, a fund transaction text base and other conventional text data;
and the system configuration library (32) is used for storing data information generated inside the system, and comprises text access rule information, ad hoc relation mapping information and ad hoc task management information.
And a relational model library (33) for storing the relational model basic information, the entity attribute information and the link attribute information which are stored by the relational model configuration module (13) when the relational model is defined.
8. A text data-based ad hoc relationship analysis method is characterized by comprising the following steps:
step 1), accessing text data: configuring data source information of a service text library (31) needing to be accessed, and accessing text data of the service text library (31);
step 2), defining a data relation model: defining a data relation model according to the text data information accessed in the step 1), wherein the data relation model comprises two entities to be associated;
step 3), the chairman relationship mapping step: mapping entity attributes in the data relation model and corresponding field names of the accessed text data to obtain a mapping relation;
step 4), the data relation model operation step: executing and monitoring the associated tasks of the data relation model set in the step 3), and generating analysis result data information in the operation process of the data relation model;
step 5), displaying result information: performing visual display operation on the analysis result data information generated in the step 4); preferred presentation means include a network layout, a circle analysis, a fan layout, an arch layout.
9. The analysis method according to claim 8, further comprising performing secondary operations on the visualized chart data after the visualization presentation of the data relationship is completed, wherein the secondary operations include object clustering analysis, relationship path analysis and fuzzy retrieval of chart content.
10. The analysis method according to claim 8, wherein step 2) comprises the sub-steps of:
inputting relation model name information;
configuring entity information, configuring entity A information and configuring entity B information;
inputting the name of an entity A;
inputting an entity A type;
a plurality of attributes of the input entity A can be added according to business requirements;
inputting an entity B name;
inputting an entity B type;
a plurality of attributes of the input entity B can be added according to business requirements;
configuring link information;
inputting a link name;
a plurality of link attributes can be input according to business requirements;
and inputting a relation model type.
CN201811360803.2A 2018-11-15 2018-11-15 Impromptu relation analysis system and method based on text data Active CN111190965B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811360803.2A CN111190965B (en) 2018-11-15 2018-11-15 Impromptu relation analysis system and method based on text data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811360803.2A CN111190965B (en) 2018-11-15 2018-11-15 Impromptu relation analysis system and method based on text data

Publications (2)

Publication Number Publication Date
CN111190965A true CN111190965A (en) 2020-05-22
CN111190965B CN111190965B (en) 2023-11-10

Family

ID=70707526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811360803.2A Active CN111190965B (en) 2018-11-15 2018-11-15 Impromptu relation analysis system and method based on text data

Country Status (1)

Country Link
CN (1) CN111190965B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814442A (en) * 2020-06-29 2020-10-23 四川长虹电器股份有限公司 Excel data processing method based on SpringBoot
CN112115367A (en) * 2020-09-28 2020-12-22 北京百度网讯科技有限公司 Information recommendation method, device, equipment and medium based on converged relationship network
CN113254436A (en) * 2021-07-15 2021-08-13 深圳市信润富联数字科技有限公司 Hadoop-based data management system and method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040167883A1 (en) * 2002-12-06 2004-08-26 Attensity Corporation Methods and systems for providing a service for producing structured data elements from free text sources
CN101923549A (en) * 2009-07-29 2010-12-22 北京航天理想科技有限公司 User-defined visual intelligent track clue analytical system and establishing method
CN103093322A (en) * 2013-02-21 2013-05-08 用友软件股份有限公司 System and method for impromptu analyzing business data
CN104731814A (en) * 2013-12-23 2015-06-24 北京宸瑞科技有限公司 System and method for flexibly comparing and analyzing data
CN106055545A (en) * 2015-04-10 2016-10-26 穆西格马交易方案私人有限公司 Text mining system and tool
US20170017708A1 (en) * 2015-07-17 2017-01-19 Sqrrl Data, Inc. Entity-relationship modeling with provenance linking for enhancing visual navigation of datasets
CN108197237A (en) * 2017-12-29 2018-06-22 北京恒泰实达科技股份有限公司 Visualization data, which collect, shows system
CN108694179A (en) * 2017-04-06 2018-10-23 北京宸瑞科技股份有限公司 Personage's view analysis system based on attribute extraction and method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040167883A1 (en) * 2002-12-06 2004-08-26 Attensity Corporation Methods and systems for providing a service for producing structured data elements from free text sources
CN101923549A (en) * 2009-07-29 2010-12-22 北京航天理想科技有限公司 User-defined visual intelligent track clue analytical system and establishing method
CN103093322A (en) * 2013-02-21 2013-05-08 用友软件股份有限公司 System and method for impromptu analyzing business data
CN104731814A (en) * 2013-12-23 2015-06-24 北京宸瑞科技有限公司 System and method for flexibly comparing and analyzing data
CN106055545A (en) * 2015-04-10 2016-10-26 穆西格马交易方案私人有限公司 Text mining system and tool
US20170017708A1 (en) * 2015-07-17 2017-01-19 Sqrrl Data, Inc. Entity-relationship modeling with provenance linking for enhancing visual navigation of datasets
CN108694179A (en) * 2017-04-06 2018-10-23 北京宸瑞科技股份有限公司 Personage's view analysis system based on attribute extraction and method
CN108197237A (en) * 2017-12-29 2018-06-22 北京恒泰实达科技股份有限公司 Visualization data, which collect, shows system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814442A (en) * 2020-06-29 2020-10-23 四川长虹电器股份有限公司 Excel data processing method based on SpringBoot
CN112115367A (en) * 2020-09-28 2020-12-22 北京百度网讯科技有限公司 Information recommendation method, device, equipment and medium based on converged relationship network
CN112115367B (en) * 2020-09-28 2024-04-02 北京百度网讯科技有限公司 Information recommendation method, device, equipment and medium based on fusion relation network
CN113254436A (en) * 2021-07-15 2021-08-13 深圳市信润富联数字科技有限公司 Hadoop-based data management system and method

Also Published As

Publication number Publication date
CN111190965B (en) 2023-11-10

Similar Documents

Publication Publication Date Title
US11645317B2 (en) Recommending topic clusters for unstructured text documents
CN113342821B (en) Report configuration method, device, equipment and computer storage medium
US8166013B2 (en) Method and system for crawling, mapping and extracting information associated with a business using heuristic and semantic analysis
US8135669B2 (en) Information access with usage-driven metadata feedback
US7930301B2 (en) System and method for searching computer files and returning identified files and associated files
JP5879260B2 (en) Method and apparatus for analyzing content of microblog message
US20120221553A1 (en) Methods for electronic document searching and graphically representing electronic document searches
US20160092551A1 (en) Method and system for creating filters for social data topic creation
US10713291B2 (en) Electronic document generation using data from disparate sources
US20120246154A1 (en) Aggregating search results based on associating data instances with knowledge base entities
US20150261773A1 (en) System and Method for Automatic Generation of Information-Rich Content from Multiple Microblogs, Each Microblog Containing Only Sparse Information
US20100005087A1 (en) Facilitating collaborative searching using semantic contexts associated with information
US11120057B1 (en) Metadata indexing
KR20170098854A (en) Building reports
US11308177B2 (en) System and method for accessing and managing cognitive knowledge
CN111190965B (en) Impromptu relation analysis system and method based on text data
US9996529B2 (en) Method and system for generating dynamic themes for social data
US10650191B1 (en) Document term extraction based on multiple metrics
US8260772B2 (en) Apparatus and method for displaying documents relevant to the content of a website
US20130173606A1 (en) Normalized search
CN115757689A (en) Information query system, method and equipment
Kalokyri et al. Integration and exploration of connected personal digital traces
CN116467291A (en) Knowledge graph storage and search method and system
JP2018013819A (en) Business matching support system, and business matching support method
JP2020064463A (en) Information operating device and information operating method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant