CN111190965B - Impromptu relation analysis system and method based on text data - Google Patents
Impromptu relation analysis system and method based on text data Download PDFInfo
- Publication number
- CN111190965B CN111190965B CN201811360803.2A CN201811360803A CN111190965B CN 111190965 B CN111190965 B CN 111190965B CN 201811360803 A CN201811360803 A CN 201811360803A CN 111190965 B CN111190965 B CN 111190965B
- Authority
- CN
- China
- Prior art keywords
- data
- text
- module
- relation
- text data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 96
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000013507 mapping Methods 0.000 claims abstract description 39
- 230000000007 visual effect Effects 0.000 claims abstract description 24
- 230000008569 process Effects 0.000 claims description 19
- 238000007405 data analysis Methods 0.000 claims description 8
- 238000007621 cluster analysis Methods 0.000 claims description 6
- 238000007726 management method Methods 0.000 description 20
- 238000012545 processing Methods 0.000 description 16
- 238000004891 communication Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 8
- 230000011218 segmentation Effects 0.000 description 5
- 238000013500 data storage Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 238000013523 data management Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 240000005049 Prunus salicina Species 0.000 description 1
- 235000012904 Prunus salicina Nutrition 0.000 description 1
- 235000003681 Prunus ussuriensis Nutrition 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013506 data mapping Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an impromptu relation analysis system and method based on text data, which are used for defining analysis rules of various different types of business text data through a text data access module so as to facilitate flexible use of various text data; setting a data relationship model for the accessed text data information through a relationship model configuration module, wherein a user can flexibly set the relationship model according to own business requirements; establishing a mapping relation between the accessed text data and a data relation model through an impromptu relation mapping module; the impromptu task management module controls the data relation model to carry out relation analysis on the accessed text data; the analysis result data information is visually presented in a visual relation graph mode, and secondary data relation analysis can be carried out on the presented data relation, so that deep invisible relations among data are found, and more valuable data information is provided for business analysis.
Description
Technical Field
The invention relates to a relation analysis system for text data rule processing, data relation model definition and data processing impromptu visual display, in particular to an impromptu relation analysis system and method for impromptu visual display and secondary relation analysis of text data.
Background
With the continuous progress of human social science and technology, the rapid development of internet technology and computer technology accumulates a large amount of text data of various types in various industries and different business departments, and the text data is stored in a scattered manner and has poor relevance. How to effectively integrate the text data, process the text data according to rules, and display the processed data relationship in a visual mode, so as to find the hidden relationship among the data and the deep value inside the data, which is a current urgent problem to be solved.
In order to solve these problems, when various text data analysis systems such as a text data management system (mainly realizing text data classification and uploading management), a text data storage system (mainly realizing text data classification and storage and retrieval), a text data full text search system (mainly realizing text data full text index and searching according to keywords) and the like are appeared on the market, but the main principles are that the text data is classified and stored, queried according to conditions, and text data is searched in full text; for the last acquired data or one or more text files, the association relation among file contents is difficult to find, and especially the data association relation among a plurality of files is difficult to find; therefore, the text data application analysis systems can not well meet the actual requirements of the service, can not learn the relevance among the text data, and particularly can not find hidden deep relevance among a plurality of text data.
Due to the problems, the inventor researches and analyzes related technologies such as the simple setting of processing rules and flexible definition of data relation models of the existing text data, so as to expect to develop an impromptu relation analysis system and method based on the text data, which can be simply connected with various text data sources, flexibly set the data relation models, simultaneously display the visual relation of the processed data and perform secondary relation analysis, thereby finding out the deep association relation of the data.
Disclosure of Invention
In order to overcome the problems, the inventor performs intensive research and designs an impromptu relation analysis system and method based on text data, and the text data access module is used for defining analysis rules of various different types of business text data so as to flexibly use/call various text data sources; a data relationship model is established through a relationship model configuration module, so that a user can flexibly set the relationship model according to own business requirements; the method comprises the steps of performing impromptu relation mapping on accessed text data through a set data relation model, visually presenting the data relation, further performing secondary data relation analysis on the presented data relation, further finding out deep invisible relation among data, and providing more valuable data information for service analysis, thereby completing the method.
The invention aims to provide the following technical scheme:
(1) An impromptu relation analysis system based on text data, which comprises an application system 1, a presentation system 2 and a data system 3;
wherein, application system 1 includes:
a text data access module 11 as a data source module linked with the text data of the business text library 31, transmitting the text data to the impromptu relation mapping module 12 and the relation model configuration module 13;
a relationship model configuration module 13 which receives the text data transmitted from the text data access module 11, sets a data relationship model based on the text data and the business requirements, and transmits the set data relationship model to the impromptu relationship mapping module 12;
an impromptu relation mapping module 12 for receiving the text data transmitted from the text data access module 11 and the data relation model information transmitted from the relation model configuration module 13, mapping the text data with the data relation model, and transmitting the mapping relation to the impromptu task management module 14;
the impromptu task management module 14 controls the data relation model to perform relation analysis on the accessed text data and transmits analysis result data information generated in the process of operating the data relation model to the data relation visual display module 21;
the presentation system 2 includes:
the data relationship visualization display module 21 receives the analysis result data information transmitted by the impromptu task management module 14 and visually displays the analysis result data information in a relationship graph manner.
Preferably, the display system (2) further comprises a secondary data relationship analysis module (22) which performs secondary data analysis on the data information displayed by the data relationship visual display module (21) to find out deep invisible relationships among the data; the secondary analysis mode preferably comprises object cluster analysis, relation path analysis and fuzzy search of chart content.
In a preferred embodiment, the text data access module 11 comprises:
a data source configuration sub-module for establishing a database link for accessing the service text library (31) and configuring information for accessing the text data;
the data access sub-module is used for accessing the text data, and the access mode comprises an importing mode; the data access sub-module can access full text data information or partial text data information;
the data analysis sub-module is used for formatting the accessed text data according to the set analysis rule;
the data preview sub-module is used for presenting the parsed text data and judging whether the parsed data format meets the requirements or not by service personnel;
and the data output sub-module is used for transmitting the parsed text data to the corresponding data receiving module.
In a preferred embodiment, the relational model configuration module 13 comprises:
a data relationship model name sub-module to store the name of the data relationship model;
the entity configuration submodule configures two entities to be associated and stores entity information;
a link configuration sub-module to store data relationship information of the data relationship model;
and the data relation model category sub-module is used for setting the classification of the data relation model.
In a preferred embodiment, data hierarchy 3 comprises:
business text library 31, including ticket record text library, funds transaction text library and other conventional text data;
a system configuration library 32 for storing data information generated inside the system, including text access rule information, impromptu relation mapping information, impromptu task management information.
A relationship model library 33 for storing relationship model basic information, entity attribute information, and link attribute information stored by the relationship model configuration module 13 when defining a relationship model.
(2) An impromptu relation analysis method based on text data, the method comprising the steps of:
step 1), a text data accessing step: configuring data source information of a service text library 31 to be accessed, and accessing text data of the service text library 31;
step 2), defining a data relation model: defining a data relationship model according to the text data accessed in the step 1), wherein the data relationship model comprises two entities to be associated;
step 3), an impromptu relation mapping step: mapping entity attributes in the data relation model with corresponding field names of the accessed text data to obtain a mapping relation;
step 4), the data relation model operation step: executing and monitoring the associated task of the data relationship model set in the step 3), and generating analysis result data information in the running process of the data relationship model;
step 5), a result information display step: performing visual display operation on the analysis result data information generated in the step 4); preferred presentations include network layouts, circular analyses, fan layouts, arcuate layouts.
Preferably, the method further comprises the step of performing secondary operations on the visualized chart data after the visualized presentation of the data relationship is completed, wherein the secondary operations comprise object cluster analysis, relationship path analysis and fuzzy retrieval of chart content.
According to the system and the method for analyzing the impromptu relation based on the text data, provided by the invention, the system and the method have the following beneficial effects:
(1) According to the system and the method for analyzing the impromptu relation based on the text data, the text access rule is set, the access text type and the text content processing rule can be flexibly defined, the types of the accessed and processed text data are wider, the operation is more convenient and rapid, and the practicability of the system is improved;
(2) According to the system and the method for analyzing the impromptu relation based on the text data, the entity is configured based on the accessed text data, and the entity is used as a medium, so that the data relation analysis model can be flexibly configured, the data mapping modes are various in the text data processing process, and the flexibility of the system and the subjective activity of people are improved;
(3) According to the system and the method for analyzing the impromptu relation based on the text data, provided by the invention, the execution mode of the text processing task can be defined in multiple modes by arranging the impromptu task management module, and the execution state and the execution progress of the text processing task can be monitored in real time;
(4) According to the system and the method for analyzing the impromptu relation based on the text data, the relation between the data can be intuitively displayed by arranging the data relation visual display module;
(5) The system and the method for analyzing the impromptu relation based on the text data have the function of analyzing the secondary data relation, can further find the deep invisible relation between the data, and are convenient for a user to perform comprehensive data analysis and data mining.
Drawings
FIG. 1 is a schematic diagram showing the structure of an impromptu relation analysis system based on text data according to a preferred embodiment of the present invention;
FIG. 2 is a diagram showing the data flow between modules in an application and presentation system according to a preferred embodiment of the present invention;
FIG. 3 illustrates a data representation of a system configuration library according to a preferred embodiment of the present invention;
FIG. 4 illustrates data representation intent of a relational model library in accordance with a preferred embodiment of the present invention;
FIG. 5 is a flow chart showing the functional structure of an impromptu task management module according to a preferred embodiment of the present invention;
FIG. 6 shows a flow chart of visual relationship analysis in accordance with a preferred embodiment of the present invention;
FIG. 7 is a flowchart illustrating the business operations of the text data based ad hoc relationship analysis system in accordance with a preferred embodiment of the present invention;
fig. 8 shows a visual presentation of the results in the form of a network according to embodiment 1 of the present invention.
Reference numerals illustrate:
1-an application system;
11-a text data access module;
12-an impromptu relation mapping module;
13-a relational model configuration module;
14-an ad hoc task management module;
2-a presentation system;
21-a data relationship visual display module;
22-a secondary data relationship analysis module;
3-data hierarchy;
31-a business text library;
32-a system configuration library;
33-a relational model library.
Detailed Description
The invention is further described in detail below by means of the figures and examples. The features and advantages of the present invention will become more apparent from the description.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
In order to effectively develop large-scale effect data from the aspects of data storage, management and data analysis so as to obtain effective integration among the data, and discover the implicit relation among the data and the deep value inside the data to display in a clear and quantitative mode, the invention provides an impromptu relation analysis system based on text data, which is shown in fig. 1 and 2, and comprises an application system 1, a display system 2 and a data system 3;
wherein, application system 1 includes:
a text data access module 11 as a data source module linked with the text data of the business text library 31, transmitting the text data to the impromptu relation mapping module 12 and the relation model configuration module 13;
a relationship model configuration module 13 which receives the text data transmitted from the text data access module 11, sets a data relationship model based on the text data and the business requirements, and transmits the set data relationship model to the impromptu relationship mapping module 12;
an impromptu relation mapping module 12 for receiving the text data transmitted from the text data access module 11 and the data relation model information transmitted from the relation model configuration module 13, mapping the text data with the data relation model, and transmitting the mapping relation to the impromptu task management module 14;
the impromptu task management module 14 controls the data relation model to perform relation analysis on the accessed text data and transmits analysis result data information generated in the process of operating the data relation model to the data relation visual display module 21;
the presentation system 2 includes:
the data relationship visualization display module 21 receives the analysis result data information transmitted by the impromptu task management module 14 and visually displays the analysis result data information in a relationship graph manner.
In the present invention, the text data access module 11 receives text data from the service text library 31, including structured data and semi-structured data (semi-structured), and the storage file format includes txt document, excel document, XML document, CSV document, etc. Wherein, the structured data refers to data which can be represented by data or a unified structure and can be stored in a two-dimensional table structure; semi-structured data is data that is interposed between structured data (e.g., relational databases) and unstructured data (e.g., sound, image files, etc.), where the structure and content are mixed together without significant distinction.
The text data access module 11 comprises two text data access processes, wherein the text data accessed by the first text data access process is used for mapping the text data with the data relationship model through the impromptu relationship mapping module 12; the mapping refers to a process of establishing a link between a field name of text data and an entity attribute in a data relationship model. For the purpose of this process, it is preferable that the first text data access process access a part of text information including a field name, such as the first ten data information including a field name of an Excel document.
The second text data access process is used for data relationship analysis through the data relationship model. At this time, the accessed text data is all data information in the text.
In a preferred embodiment, the text data access module 11 comprises:
a data source configuration sub-module for establishing a database link for accessing the service text library 31 and configuring information for accessing text data;
the business text library can comprise a plurality of databases, each database comprises a plurality of text documents, the data source configuration submodule firstly establishes and maintains database access information (such as database description, database storage path and the like) for extracting text data information from the database, and then establishes and maintains access information (such as text document names, text document notes, text document storage path and the like) of the text documents under the database, and at the moment, the data source configuration submodule is directly connected to the text documents of the data information to be extracted.
As shown in fig. 3, the data source configuration submodule completes the text data source access management by adding, modifying and deleting text data access rules, the main key of the text access rule table is a rule identifier, and each text data source is established to generate a unique data source identifier, the representation is automatically generated according to a set rule, and the set generation rule adds a system time stamp to a newly-built data source batch; the "rule name" is a description of the user-defined input of the data source access user; the text storage path is a path for accessing text data; the text type identifier corresponds to the main key type identifier of the text type table and is represented by a character string; the "division rule identifier" corresponds to the "rule identifier" of the main key of the "division rule table" and is represented by a character string.
The text type table is summarized in advance according to the format of a common text file, and the type name shows txt documents, excel documents, XML documents, CSV documents and the like; the "type description" describes the above-described storage file formats, respectively.
The dividing rule table is summarized in advance according to a common data storage format, and the rule names comprise ", and"; "or" || "and the like; "rule definition" only performs formatting method selection on txt files (fields in txt files are often divided by ","; "or" || ") and no selection is required for rule formats with standards such as Excel files, XML files, CSV files, etc.
The data access sub-module is used for accessing the text data, and the access mode comprises an importing mode; the data access sub-module can access full text data information or partial text data information;
as shown in fig. 3, the data access sub-module performs management of the "text access rule table" information by adding, deleting, and modifying work to access text information.
The data analysis sub-module is used for formatting the accessed text data according to the set analysis rule; the parsing rules are used for guiding the calling of the original text data aiming at the data of different storage formats:
(a) For data stored in txt documents, its parsing rules are: processing according to the rule of text content segmentation, and obtaining the text by using the analysis technology through segmentation judgment according to the rule after reading the text according to lines;
(b) For data stored in an Excel document, the parsing rules are: data acquisition is carried out according to the cell rule, and the POI component is used in the analysis technology;
(c) For data stored in an XML document, the parsing rules are: obtaining data according to the document structure mark, wherein Jdom, dom4j and the like are used for the analysis technology;
(d) For data stored in a CSV document, the parsing rules are: processing according to the default segmentation rules of the text, and obtaining the text by segmentation judgment according to rules after reading the text according to lines by an analysis technology.
The data preview sub-module is used for presenting the parsed text data and judging whether the parsed data format meets the requirements or not by service personnel; if the requirements are not met, the analysis rule is replaced to reformat, and if the requirements are met, the data output of the data output sub-module is allowed;
the data output sub-module transmits the parsed text data to the corresponding data receiving module (to the impromptu relation mapping module 12 and the relation model configuration module 13 in the establishing model stage; to the impromptu relation mapping module 12 in the text data association stage).
In the invention, the relation model configuration module 13 comprises a data relation model name sub-module, an entity configuration sub-module, a link configuration sub-module and a data relation model category sub-module;
a data relationship model name sub-module to store the name of the data relationship model;
the entity configuration submodule configures two entities to be associated and stores entity information; one text can generate at least two entities, and the attribute information of the two entities to be associated which form the data relationship model is derived from data fields in the text of the same type, for example, the data information of the two entities to be associated is derived from a call record file.
The original data is packaged separately according to two entities (entity A and entity B), and the two entities are connected through a link A, B, so that the visual representation of analysis results is used, and the relationship between the data is displayed more intuitively.
A link configuration sub-module to store data relationship information of the data relationship model;
and the data relation model category sub-module is used for setting the classification of the data relation model.
In the invention, the information of the entity comprises an entity A name, an entity A type, an attribute of the entity A, an entity B name, an entity B type and an attribute of the entity B; the attribute of the entity (the attribute of the entity A or the attribute of the entity B) can be added in a plurality according to the service requirement, and the attribute of the entity can be mapped with the field name in the corresponding text data for data relation analysis among different entities.
In the present invention, the data relationship information includes a link name between two entities and a link attribute between the two entities.
As shown in fig. 4, the relationship model configuration module 13 is used for adding, modifying and querying the data information of the relationship model table to achieve the purpose of configuring the relationship model; the relationship model table is stored in a relationship model library 33.
Wherein, the "model name" in the "relation model table" refers to the name of the established model, such as a call record relation model; the model category identification field corresponds to the category identification field in the model category table, which is represented by character strings and corresponds to category information of models, and the model categories refer to person categories, organization categories, communication categories, address categories and the like; the fields of the 'entity A name' and the 'entity B name' correspond to the 'entity name' field in the 'entity attribute table', are expressed by character strings, and correspond to the configured names of the entities; the fields of the entity A category identification and the entity B category identification correspond to the main key category identification of the entity category list, and the main key category identification is represented by a character string and corresponds to the category information of the configured entity; the "relationship name" corresponds to the "relationship name" field of the "relationship attribute table" and indicates a call relationship, a transaction relationship, and the like.
As shown in fig. 4, in the "entity attribute table," attribute identification "refers to defining a unique english encoding for an attribute, and attribute name refers to a chinese description defined for an attribute. In the relation attribute table, "attribute identification" refers to a unique english code defined for an attribute, and attribute name refers to a chinese description defined for an attribute. The entity attribute table and the relation attribute table are generated for configuring the data relation model which is built according to the requirement.
The main key in the entity class table is a class identifier, the class name refers to Chinese description defined for the class, and the class identifier refers to English-only code defined for the class. The main key in the model class table is a class identifier, and class description refers to the definition of the data relationship model. The entity class list and the model class list are summarized in advance according to the conventional service requirements.
In the invention, the system configures the relation model through the relation model configuration module 13, so that the system can flexibly analyze the data relation, flexibly define the analysis model according to the requirements of users, and increase the practicability and flexibility of the system; meanwhile, the application range of the system is also improved, so that the system is suitable for the field of impulse analysis of more text data.
In the present invention, the impromptu relation mapping module 12 establishes a mapping between text data and a data relation model, and as shown in fig. 3, the mapping relation is stored in a "model mapping rule" of a "text access rule table".
In the present invention, the impromptu task management module 14 completes the monitoring of the text processing task by adding, modifying and deleting the impromptu task management table, as shown in fig. 3 and 5. The main key of the impromptu task management table is a 'task name'; the 'execution mode' in the impromptu task management table is one of timing execution and instant execution; "task state" refers to whether a task is in a start or stop state; "execution state" refers to whether a task is in the process of being processed; "execution progress" refers to the progress of the current data processing completion. The module can flexibly define the execution mode of the processing task and can monitor the execution state and execution progress of the processing task in real time.
In the present invention, as shown in fig. 6, the data relationship visualization module 21 mainly presents the data information of the impromptu task analysis result according to different presentation modes, such as network layout, circular analysis, fan layout and arch layout.
Each piece of data analyzed by each data source forms an entity A and an entity B, and then the links contain identification information of an entity A, B (the identification information is derived from entity attribute information in the entity information); if the results of the new source analysis are fused together for presentation, wherein the entity A, B contained in the link identifies that an entity A, B already exists in the loaded source, then the newly added source link is automatically linked to the existing entity so that the different sources analyzed data can be identically combined by the entity identification for the same entity A, B and finally the entities A, B are linked together by the link to form a relational network.
In a preferred embodiment of the present invention, to further discover deep stealth relationships between data, the analysis results are subjected to a secondary analysis, which is performed by the secondary data relationship analysis module 22. The secondary analysis mode comprises object clustering analysis, relation path analysis, fuzzy search of chart content and the like, and the displayed data information can be subjected to multi-angle analysis through the operation, so that a user can make inductive reasoning, potential data relations can be mined from the information, the user can be helped to find valuable data supports, and correct decision analysis can be made.
The object cluster analysis is to remove the entity with only one link relation in the relation network diagram, only reserve the entity with more than two relation links for rearrangement, and analyze the association relation among a plurality of link entities;
relationship path analysis, which is to analyze and highlight all link paths between two entities selected in a relationship network graph, can analyze the entities on all link paths between two entities of choice;
the fuzzy search of the chart content refers to fuzzy search of attribute information of analysis interface entities, and is implemented through operation of inputting search keywords, and entity information hit by the keywords can be analyzed and highlighted.
In the present invention, the data system 3 includes three libraries, namely a business text library 31, a system configuration library 32, and a relationship model library 33;
business text library 31, including ticket record text library, funds transaction text library and other conventional text data;
the business text library 31 is a text database of an enterprise, the analysis system can access text data to be analyzed through the text data access module 11, and each unit of business text information data, such as ticket record information, fund transaction information and the like, is stored in the business text library 31; meanwhile, the field of the business text library is not particularly limited, and the business text library can be a ticket record text library, a fund transaction text library and other conventional text data, namely, the business text library can comprise all databases for recording text data; the application range of the system is wide due to the application of the service text library, and the practicability of the system is improved. As described above, the text data information includes structured data and semi-structured data (semi-structured), and the storage file format includes txt documents, excel documents, XML documents, CSV documents, and the like.
A system configuration library 32 for storing data information generated inside the system, including text access rule information, impromptu relation mapping information, impromptu task management information.
The system configuration library 32 is a core library of the context-based ad hoc relationship analysis system for storing data information generated by the system configuration, as shown in fig. 3: the system comprises four data tables, namely a text type table, a segmentation rule table, a text access rule table and an impromptu task management table; the four data tables provide data support for the system as it operates.
A relationship model library 33 for storing relationship model basic information, entity attribute information, and link attribute information stored by the relationship model configuration module 13 when defining a relationship model.
As shown in fig. 4: the relational model library 33 includes an entity attribute table, an entity category table, a relational model table, a relational attribute table, and a model category table.
As shown in fig. 7, another aspect of the present invention is to provide a text data based ad hoc relationship analysis method, preferably implemented by the text data based ad hoc relationship analysis system described above, comprising the steps of:
(1) S1, accessing text data: configuring data source information of a business text library 31 to be accessed, accessing text data of the business text library 31, and carrying out data formatting; the text data in the service text library 31 is retrieved, i.e. formatted.
(2) S2, defining a data relation model: defining a data relationship model according to the text data information accessed in the step S1, wherein the data relationship model comprises two entities to be associated; at least two entities can be generated in one text, and attribute information of two entities to be associated is derived from the text data fields of the same type.
(3) S3, an impromptu relation mapping step: mapping entity attributes in the data relation model with corresponding field names of the accessed text data to obtain a mapping relation;
(4) S4, a data relation model operation step: executing and monitoring the associated task of the data relationship model set in the step S3, and generating processing result data information in the running process of the data relationship model;
(5) S5, a result information showing step: performing visual display operation on the processing result data information generated in the step S4; such as network layout, circular analysis, fan-shaped layout, arcuate layout.
Preferably, after the visual display of the data relationship is completed, a secondary operation S6 is performed on the visual chart data, so as to further analyze the deep association relationship between the data. The secondary operations include object cluster analysis, relationship path analysis (chart link path analysis), chart content fuzzy retrieval, and the like.
In a preferred embodiment, step S2 comprises the sub-steps of:
inputting relationship model name information;
configuring entity information, configuring entity A information and entity B information;
inputting an entity A name;
inputting an entity A type;
inputting the attribute of the entity A, and adding a plurality of attributes according to the service requirement;
inputting an entity B name;
inputting an entity B type;
inputting the attribute of the entity B, and adding a plurality of attributes according to the service requirement;
carrying out link information configuration;
inputting a link name;
inputting a link attribute, and adding a plurality of links according to service requirements;
the relationship model type is entered.
Examples
Example 1
1. The business text library comprises a call record text library, a fund transaction text library and a personnel information text library, wherein the call record text library comprises a plurality of call record texts (file 1); the fund transaction text library comprises a plurality of fund transaction texts (file 2); the personnel information text library comprises a plurality of personnel information texts (file 3);
2. the text data access module 11 connects the business text library with the text data access module 11 in an import mode;
3. the text data access module 11 is respectively accessed into a call record file, a fund transaction file and a personnel information file and transmits the information of the three data files to the relation model configuration module 13;
4. the relationship model is set by the relationship model configuration module 13:
specifically:
(1) Call record relationship model:
(a) Definition of call record relationship model: the call log information is a call log file,
(b) Defining a relationship model name: a call log relationship model is provided for the call,
(c) Defining a relationship model category: selecting a communication relation type;
(d) Defining entity a name: own number;
defining entity a attributes: own number;
defining an entity a type: a telephone;
(e) Defining entity B name: a counterpart telephone;
defining entity B attributes: a counterpart number;
defining entity B type: a telephone;
(f) Defining a link name: a conversation relationship;
defining a link attribute: talk time, call type
(2) Funds transaction relationship model:
(a) Definition of a funds transaction relationship model: the fund transaction information is a transaction record file;
(b) Defining a relationship model name: a funds transaction relationship model;
(c) Defining a relationship model category: selecting a transaction relationship type;
(d) Defining entity a name: the account number;
defining entity a attributes: the account number;
defining an entity a type: an account number;
(e) Defining entity B name: an account number of the other party;
defining entity B attributes: an account number of the other party;
defining entity B type: account number
(f) Defining a link name: a transaction relationship;
defining a link attribute: transaction time, transaction type;
(3) Personnel information conversation relation model:
(a) Definition of personnel information conversation relation model: the personnel information is a personnel information file;
(b) Defining a relationship model name: a personnel information call relationship model;
(c) Defining a relationship model category: selecting a personnel conversation relationship type;
(d) Defining entity a name: personnel information;
defining entity a attributes: name, certificate number;
defining an entity a type: personnel;
(e) Defining entity B name: a telephone number;
defining entity B attributes: a contact means;
defining entity B type: a telephone;
(f) Defining a link name: a communication mode;
defining a link attribute: without any means for
(4) Personnel information trade relation model:
(a) Definition of personnel information trade relation model: the personnel information is a personnel information file;
(b) Defining a relationship model name: personnel information trade relation model
(c) Defining a relationship model category: selecting a personnel transaction relationship type;
(d) Defining entity a name: personnel information;
defining entity a attributes: name, certificate number;
defining an entity a type: personnel;
(e) Defining entity B name: a bank account number;
defining entity B attributes: a bank card number;
defining entity B type: an account number;
(f) Defining a link name: a funding account number;
defining a link attribute: and no.
5. The impromptu task management module 14 executes a data relationship model: the generated processing result data information is a own party number entity (table 1), a counterpart party number entity (table 2), a call relation link (table 3), a counterpart account entity (table 4), a counterpart account entity (table 5), a transaction relation link (table 6), a personnel information entity (table 7), a telephone number entity (table 8), a communication mode link (table 9), a bank account entity (table 10) and a fund account link (table 11);
6. the data relationship visual display module 21 performs visual analysis of the data relationship, and based on the visual analysis result, entity cluster analysis, link path analysis and icon fuzzy retrieval can also be performed, and the specific visual display is shown in fig. 8. The visualized data is formed by the following method: each piece of data analyzed by each data source forms an entity A and an entity B, and then the links contain identification information of an entity A, B; if the results of the new source analysis are fused together for presentation, wherein the entity A, B contained in the link identifies that an entity A, B already exists in the loaded source, then the newly added source link is automatically linked to the existing entity so that the different sources analyzed data can be identically combined by the entity identification for the same entity A, B and finally the entities A, B are linked together by the link to form a relational network.
7. The file and data information are as follows:
(a) Call record file (File 1)
Own number, talk time, call type, opposite number
1362634xxxx,2018-06-06 12:01:02, caller, 1386649xxxx
1362634xxxx, 2018-08-08:16:05:08, called, 1392637xxxx
……
The original file of the call record file (file 1) is a txt file, and after each line of data is analyzed according to commas by an analysis rule, the data field corresponding to the obtained file 1 is the own number, the call time, the call type and the opposite side number.
(b) Transaction record file (File 2)
The account number, the transaction time, the transaction type and the account number of the other party
52189909abcd4406,2018-04-08 10:01:02, roll out 52189909abcd8606
52189909abcd4406,2018-07-09 14:05:08, et al, into 52189909abcd6789
……
The original file of the transaction record file (file 2) is a txt file, and after comma analysis is performed on each row of data through analysis rules, the data fields corresponding to the file 2 are obtained to be the four fields of the account number, the transaction time, the transaction type and the account number of the other party.
(c) Personnel information file (File 3)
Certificate number, name, bank card number, contact
11018219750120abcd, zhang San, 52189909abcd4406,1362634xxxx
11018219760118abcd, li Si, 52189909abcd8606,1386649xx
11018219780125abcd, wangwu, 52189909abcd6789,1392637xxxx
……
The original file of the personnel information file (file 3) is a txt file, and after each line of data is analyzed according to commas by an analysis rule, the data fields corresponding to the file 3 are four fields of certificate number, name, bank card number and contact mode.
Already square number entity (Table 1)
Entity name | Entity attributes | Entity type |
Own number | 1362634xxxx | Telephone set |
…… |
Opposite party number entity (Table 2)
Entity name | Entity attributes | Entity type |
Number of the other party | 1386649xxxx | Telephone set |
Number of the other party | 1392637xxxx | Telephone set |
…… |
Communication relation link (Table 3)
Link names | Entity A identification | Entity B identification | Link attributes |
Conversation relationship | 1362634xxxx | 1386649xxxx | 2018-06-06 12:01:02, caller |
Conversation relationship | 1362634xxxx | 1392637xxxx | 2018-08-08:16:05:08, called party |
…… |
The account entity (Table 4)
Entity name | Entity attributes | Entity type |
Account number of the recipe | 52189909abcd4406 | Account number |
…… |
Counterpart account entity (Table 5)
Entity name | Entity attributes | Entity type |
Counter account | 52189909abcd8606 | Account number |
Counter account | 52189909abcd 6789 | Account number |
…… |
Trade relation link (Table 6)
Link names | Entity A identification | Entity B identification | Link attributes |
Trade relation | 52189909abcd4406 | 52189909abcd8606 | 2018-04-08 10:01:02, roll out |
Trade relation | 52189909abcd4406 | 52189909abcd6789 | 2018-07-09 14:05:08, transfer into |
…… |
Personnel information entity (Table 7)
Entity name | Entity attributes | Entity type |
Personnel information | 11018219750120abcd, zhang San | Personnel (personnel) |
Personnel information | 11018219760118abcd, liqi (Prunus salicina L.) | Personnel (personnel) |
Personnel information | 11018219780125abcd, wangwu | Personnel (personnel) |
…… |
Telephone number entity (Table 8)
Entity name | Entity attributes | Entity type |
Contact means | 1362634xxxx | Telephone set |
Contact means | 1386649xxxx | Telephone set |
Contact means | 1392637xxxx | Telephone set |
…… |
Communication mode linkage (Table 9)
Link names | Entity A identification | Entity B identification | Link attributes |
Communication mode | 11018219750120abcd | 1362634xxxx | |
Communication mode | 11018219760118abcd | 1386649xxxx | |
Communication mode | 11018219780125abcd | 1392637xxxx | |
…… |
Bank line number entity (Table 10)
Entity name | Entity attributes | Entity type |
Bank account | 52189909abcd4406 | Account number |
Bank account | 52189909abcd8606 | Account number |
Bank account | 52189909abcd6789 | Account number |
…… |
Fund account linking (Table 11)
Link names | Entity A identification | Entity B identification | Link attributes |
Fund account number | 11018219750120abcd | 52189909abcd4406 | |
Fund account number | 11018219760118abcd | 52189909abcd8606 | |
Fund account number | 11018219780125abcd | 52189909abcd6789 | |
…… |
The invention has been described above in connection with preferred embodiments, which are, however, exemplary only and for illustrative purposes. On this basis, the invention can be subjected to various substitutions and improvements, and all fall within the protection scope of the invention.
Claims (7)
1. An impromptu relation analysis system based on text data, characterized in that the system comprises an application system (1), a presentation system (2) and a data system (3);
wherein the application system (1) comprises:
a text data access module (11) which is used as a data source module and is linked with the text data of the business text library (31) and transmits the text data to the impromptu relation mapping module (12) and the relation model configuration module (13);
a relationship model configuration module (13) which receives the text data transmitted from the text data access module (11), sets a data relationship model based on the text data and the business requirements, and transmits the set data relationship model to the impromptu relationship mapping module (12);
an impromptu relation mapping module (12) for receiving the text data transmitted by the text data access module (11) and the data relation model information transmitted by the relation model configuration module (13), mapping the text data and the data relation model, and transmitting the mapping relation to the impromptu task management module (14);
the impromptu task management module (14) is used for controlling the data relation model to perform relation analysis on the accessed text data and transmitting analysis result data information generated in the process of operating the data relation model to the data relation visual display module (21);
the presentation system (2) comprises:
the data relationship visual display module (21) receives the analysis result data information transmitted by the impromptu task management module (14) and visually displays the analysis result data information in a relationship graph mode;
the text data access module (11) comprises two text data access processes, wherein text data accessed by a first text data access process is used for mapping the text data accessed by the first text data access process with a data relationship model through the impromptu relationship mapping module (12), and the first text data access process is accessed with partial text information comprising field names; the second text data access process is used for carrying out data relationship analysis through the data relationship model, and the accessed text data is all data information in the text
The display system (2) further comprises a secondary data relationship analysis module (22) which performs secondary data analysis on the data information of the data relationship visual display module (21) to find out deep invisible relationships among the data.
2. The analysis system according to claim 1, characterized in that the text data information received by the text data access module (11) from the service text library (31) comprises structured data and semi-structured data, and the storage file format comprises txt documents, excel documents, XML documents, CSV documents.
3. The analysis system according to claim 1, characterized in that the text data access module (11) comprises:
a data source configuration sub-module for establishing a database link for accessing the service text library (31) and configuring information for accessing the text data;
the data access sub-module is used for accessing the text data, and the access mode comprises an importing mode; the data access sub-module can access full text data information or partial text data information;
the data analysis sub-module is used for formatting the accessed text data according to the set analysis rule;
the data preview sub-module is used for presenting the parsed text data and judging whether the parsed data format meets the requirements or not by service personnel;
and the data output sub-module is used for transmitting the parsed text data to the corresponding data receiving module.
4. The analysis system according to claim 1, wherein the relational model configuration module (13) comprises:
a data relationship model name sub-module to store the name of the data relationship model;
the entity configuration submodule configures two entities to be associated and stores entity information;
a link configuration sub-module to store data relationship information of the data relationship model;
and the data relation model category sub-module is used for setting the classification of the data relation model.
5. The system of claim 4, wherein the entity information includes an entity name, an entity type, and an entity attribute, wherein the entity attribute can be added in a plurality according to service requirements, and the entity attribute information of two entities to be associated is derived from the same type of text data field.
6. The analysis system of claim 1, wherein the secondary analysis means comprises object cluster analysis, relationship path analysis, fuzzy search of chart content.
7. The analysis system according to claim 1, characterized in that the data system (3) comprises:
a business text library (31) including a ticket record text library, a funds transaction text library and other conventional text data;
a system configuration library (32) for storing data information generated inside the system, including text access rule information, impromptu relation mapping information, impromptu task management information;
and a relationship model library (33) for storing relationship model basic information, entity attribute information, and link attribute information stored by the relationship model configuration module (13) when defining a relationship model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811360803.2A CN111190965B (en) | 2018-11-15 | 2018-11-15 | Impromptu relation analysis system and method based on text data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811360803.2A CN111190965B (en) | 2018-11-15 | 2018-11-15 | Impromptu relation analysis system and method based on text data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111190965A CN111190965A (en) | 2020-05-22 |
CN111190965B true CN111190965B (en) | 2023-11-10 |
Family
ID=70707526
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811360803.2A Active CN111190965B (en) | 2018-11-15 | 2018-11-15 | Impromptu relation analysis system and method based on text data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111190965B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111814442A (en) * | 2020-06-29 | 2020-10-23 | 四川长虹电器股份有限公司 | Excel data processing method based on SpringBoot |
CN112115367B (en) * | 2020-09-28 | 2024-04-02 | 北京百度网讯科技有限公司 | Information recommendation method, device, equipment and medium based on fusion relation network |
CN113254436A (en) * | 2021-07-15 | 2021-08-13 | 深圳市信润富联数字科技有限公司 | Hadoop-based data management system and method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101923549A (en) * | 2009-07-29 | 2010-12-22 | 北京航天理想科技有限公司 | User-defined visual intelligent track clue analytical system and establishing method |
CN103093322A (en) * | 2013-02-21 | 2013-05-08 | 用友软件股份有限公司 | System and method for impromptu analyzing business data |
CN104731814A (en) * | 2013-12-23 | 2015-06-24 | 北京宸瑞科技有限公司 | System and method for flexibly comparing and analyzing data |
CN106055545A (en) * | 2015-04-10 | 2016-10-26 | 穆西格马交易方案私人有限公司 | Text mining system and tool |
CN108197237A (en) * | 2017-12-29 | 2018-06-22 | 北京恒泰实达科技股份有限公司 | Visualization data, which collect, shows system |
CN108694179A (en) * | 2017-04-06 | 2018-10-23 | 北京宸瑞科技股份有限公司 | Personage's view analysis system based on attribute extraction and method |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2508791A1 (en) * | 2002-12-06 | 2004-06-24 | Attensity Corporation | Systems and methods for providing a mixed data integration service |
US20170017708A1 (en) * | 2015-07-17 | 2017-01-19 | Sqrrl Data, Inc. | Entity-relationship modeling with provenance linking for enhancing visual navigation of datasets |
-
2018
- 2018-11-15 CN CN201811360803.2A patent/CN111190965B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101923549A (en) * | 2009-07-29 | 2010-12-22 | 北京航天理想科技有限公司 | User-defined visual intelligent track clue analytical system and establishing method |
CN103093322A (en) * | 2013-02-21 | 2013-05-08 | 用友软件股份有限公司 | System and method for impromptu analyzing business data |
CN104731814A (en) * | 2013-12-23 | 2015-06-24 | 北京宸瑞科技有限公司 | System and method for flexibly comparing and analyzing data |
CN106055545A (en) * | 2015-04-10 | 2016-10-26 | 穆西格马交易方案私人有限公司 | Text mining system and tool |
CN108694179A (en) * | 2017-04-06 | 2018-10-23 | 北京宸瑞科技股份有限公司 | Personage's view analysis system based on attribute extraction and method |
CN108197237A (en) * | 2017-12-29 | 2018-06-22 | 北京恒泰实达科技股份有限公司 | Visualization data, which collect, shows system |
Also Published As
Publication number | Publication date |
---|---|
CN111190965A (en) | 2020-05-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11645317B2 (en) | Recommending topic clusters for unstructured text documents | |
US8131779B2 (en) | System and method for interactive multi-dimensional visual representation of information content and properties | |
US8131684B2 (en) | Adaptive archive data management | |
US11853363B2 (en) | Data preparation using semantic roles | |
US20090300043A1 (en) | Text based schema discovery and information extraction | |
US20120072464A1 (en) | Systems and methods for master data management using record and field based rules | |
US20100312769A1 (en) | Methods, apparatus and software for analyzing the content of micro-blog messages | |
CN105408890A (en) | Performing an operation relative to tabular data based upon voice input | |
CN111190965B (en) | Impromptu relation analysis system and method based on text data | |
WO2012129149A2 (en) | Aggregating search results based on associating data instances with knowledge base entities | |
US20070088743A1 (en) | Information processing device and information processing method | |
CN111158795A (en) | Report generation method, device, medium and electronic equipment | |
US9619533B2 (en) | System and method for reporting multiple objects in enterprise content management | |
CN102810094A (en) | Report generation method and device | |
US20180067986A1 (en) | Database model with improved storage and search string generation techniques | |
US8260772B2 (en) | Apparatus and method for displaying documents relevant to the content of a website | |
US10650191B1 (en) | Document term extraction based on multiple metrics | |
US20160364426A1 (en) | Maintenance of tags assigned to artifacts | |
CN111382256A (en) | Information recommendation method and device | |
JP2020064463A (en) | Information operating device and information operating method | |
CN110874366A (en) | Data processing and query method and device | |
JP2013050896A (en) | Faq preparation support system and program | |
CN116467291A (en) | Knowledge graph storage and search method and system | |
JP2024518051A (en) | Efficient storage and querying of schemaless data | |
CN114818635A (en) | Data report generation method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |