CN116756088A - Method for analyzing character relationship in file and related equipment - Google Patents

Method for analyzing character relationship in file and related equipment Download PDF

Info

Publication number
CN116756088A
CN116756088A CN202311047547.2A CN202311047547A CN116756088A CN 116756088 A CN116756088 A CN 116756088A CN 202311047547 A CN202311047547 A CN 202311047547A CN 116756088 A CN116756088 A CN 116756088A
Authority
CN
China
Prior art keywords
relationship
person
data
name
archive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311047547.2A
Other languages
Chinese (zh)
Inventor
罗华山
肖斌
雷鸣
包攀
陈雪婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Cloud Archive Information Technology Co ltd
Original Assignee
Hunan Cloud Archive Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Cloud Archive Information Technology Co ltd filed Critical Hunan Cloud Archive Information Technology Co ltd
Priority to CN202311047547.2A priority Critical patent/CN116756088A/en
Publication of CN116756088A publication Critical patent/CN116756088A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Abstract

The application relates to the technical field of archive management, in particular to an analysis method and related equipment of character relations in archives, wherein the method is summarized as follows: acquiring a corresponding character name extraction rule according to the file type in the file data; acquiring name extraction requirements and corresponding name extraction modes; judging whether the file data meets the name extraction requirement; if the name extraction requirement is met, extracting the name of the person in the archive data according to a name extraction mode; updating the archive data according to the name of the person to form archive update data; extracting association relations among all the person names according to preset relation extraction rules and file updating data to form a person relation; analyzing each person relationship to form person relationship data according to a preset relationship analysis mode and file updating data; and combining the person names and the person relationship data to generate a relationship network corresponding to the file to be analyzed. The application has the effects of improving the analysis efficiency of the character relation and improving the analysis accuracy.

Description

Method for analyzing character relationship in file and related equipment
Technical Field
The application relates to the technical field of archive management, in particular to a method and related equipment for analyzing a person relationship in an archive.
Background
Because the range of the file forming person is very wide, the files formed by different industries contain various character relations, and the traditional way of analyzing the character relations by manual marking is adopted, so that not only is the labor consumed, but also the problem that the relation does not accord with the actual situation due to inaccurate marking caused by subjective consciousness can be caused.
Therefore, in order to solve the above-mentioned problems, it is a urgent need for a person skilled in the art to provide a method and related apparatus for analyzing in-file person relationships that can improve the efficiency of person relationship analysis and the accuracy of analysis.
Disclosure of Invention
In order to achieve the effects of improving the analysis efficiency of the character relationship and improving the analysis accuracy, the application provides an analysis method and related equipment of the character relationship in the file.
In a first aspect, the present application provides a method for analyzing relationships between characters in a file, including the steps of:
acquiring file data corresponding to a file to be analyzed;
acquiring a corresponding character name extraction rule according to the file type in the file data;
analyzing the character name extraction rule to obtain name extraction requirements and corresponding name extraction modes;
Judging whether the archive data accords with the name extraction requirement or not;
if the name extraction requirement is met, extracting the name of the person in the archive data according to the name extraction mode;
updating the archive data according to the name of the person to form archive update data;
extracting the association relation among the person names to form a person relation according to a preset relation extraction rule and the archive update data;
analyzing each person relationship to form person relationship data according to a preset relationship analysis mode and the archive update data;
and combining the person names and the person relation data to generate a relation network corresponding to the file to be analyzed.
Optionally, after the determining whether the archive data meets the name extraction requirement, the method further includes:
if the name extraction requirement is not met, a preset pretreatment mode is obtained;
analyzing the preset pretreatment mode to obtain a sub-treatment mode and a corresponding treatment sequence table;
acquiring a current processing order according to the processing order table;
processing the archive data according to the sub-processing mode corresponding to the current processing order until the archive data meets the corresponding sub-processing requirement;
And updating the current processing order according to the processing order table and returning to the previous step until the current processing order is the last processing order in the processing order table.
Optionally, the name extraction method is named entity recognition, and if the name extraction requirement is met, extracting the name of the person in the archive data according to the name extraction method includes:
if the name extraction requirement is met, carrying out named entity recognition on the archive data, and extracting a name recognition result belonging to the name type;
acquiring a determined name in the name recognition result according to the person naming rule;
acquiring names of the determined names in the name recognition result to form a name substitution relation;
and acquiring the person name corresponding to the archive data according to the determined name and the name substitution relation.
Optionally, the updating the profile data according to the person name to form profile update data includes:
according to the name replacement relation, the names in the archive data are respectively replaced with the corresponding determined names to form first archive data;
Acquiring each personal name pronoun in the first file data and a corresponding data position in the first file data;
and replacing the human-called pronouns with the corresponding determined names according to the context of the data position to form the archive update data.
Optionally, the extracting the association relationship between the person names to form a person relationship according to a preset relationship extraction rule and the archive update data includes:
extracting the association relation among the person names according to a preset relation extraction rule and the archive update data;
judging whether a contradiction relation exists in the association relation;
if the contradiction relation exists, the person name directly corresponding to the contradiction relation is obtained and used as a comparison person name;
acquiring corresponding behavior characteristics according to the relationship type of the contradictory relationship;
and if the behavior characteristics exist among the comparison person names, determining the person relation among the comparison person names according to the behavior characteristics.
Optionally, after the obtaining the corresponding behavior feature according to the relationship type of the contradictory relationship, the method further includes:
If the behavior characteristics do not exist between the names of the comparison characters, acquiring corresponding attribute characteristics according to the relationship types;
and if the comparison person names respectively accord with the attribute characteristics, determining the person relation between the comparison person names according to the attribute characteristics.
Optionally, the analyzing each person relationship to form person relationship data according to a preset relationship analysis mode and the archive update data includes:
analyzing the preset relation analysis mode to obtain the relation type;
analyzing the relationship types corresponding to the person relationships according to the relationship types and the archive update data to form relationship corresponding types;
and forming the figure relationship data according to the relationship corresponding type and the figure relationship.
Optionally, the forming the person relationship data according to the relationship correspondence type and the person relationship includes:
acquiring a corresponding relationship strength index according to the relationship type;
analyzing the relationship strength of each person relationship in each corresponding relationship type according to the relationship strength index;
and taking each person relationship as a reference, and combining the corresponding relationship type and the relationship strength to form the person relationship data.
In a second aspect, the present application also provides a system for analyzing relationships between people in an archive, including:
the first acquisition module is used for acquiring file data corresponding to the file to be analyzed;
the second acquisition module is used for acquiring name extraction requirements and corresponding name extraction modes according to the file types in the file data;
the first judging module is used for judging whether the archive data accords with the name extraction requirement or not;
the first extraction module is used for extracting the name of the person in the archive data according to the name extraction mode if the name extraction requirement is met;
the first updating module is used for updating the archive data according to the name of the person to form archive updating data;
the second extraction module is used for extracting the association relationship among the person names to form a person relationship according to a preset relationship extraction rule and the archive update data;
the first analysis module is used for analyzing each person relationship to form person relationship data according to a preset relationship analysis mode and the archive update data;
and the first generation module is used for combining the person name and the person relation data to generate a relation network corresponding to the file to be analyzed.
In a third aspect, the present application also provides a computer readable storage medium storing a computer program which, when executed by a processor, implements a method for analyzing character relationships in an archive as described in any one of the above.
In summary, according to the method and the related device for analyzing the relationship of the characters in the file, the name extraction requirement and the corresponding name extraction mode corresponding to the file type are obtained, the character names in the file data conforming to the name extraction requirement are extracted according to the name extraction mode, the file data are updated according to the extracted character names, the relationship of the characters in the updated file data is extracted according to the preset relationship extraction rule, the relationship of the characters is analyzed according to the preset relationship analysis mode to form the relationship data, and the character names and the relationship data are combined to generate the relationship network. The character name, the character relation and the character relation data are obtained according to the corresponding requirements, rules, modes and the like selected by the file types, so that the data required by the relation network are obtained according to different file types, and the effects of improving the character relation analysis efficiency and the analysis accuracy are achieved.
Drawings
FIG. 1 is a flowchart illustrating steps S101 to S108 of a method for analyzing relationships among persons in a file according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating steps S201 to S205 of a method for analyzing relationships among persons in a file according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating steps S301 to S304 of a method for analyzing relationships among persons in a file according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating steps S401 to S403 of a method for analyzing relationships among persons in a file according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating steps S501 to S505 of a method for analyzing relationships among persons in a file according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating steps S601 to S602 of a method for analyzing relationships among persons in a file according to an embodiment of the present application;
FIG. 7 is a flowchart illustrating steps S701 to S703 of a method for analyzing relationships among persons in a file according to an embodiment of the present application;
FIG. 8 is a flowchart illustrating steps S801 to S803 of a method for analyzing relationships among persons in a file according to an embodiment of the present application;
FIG. 9 is a block diagram of one embodiment of a system for analyzing relationships among people in a file according to an embodiment of the present application.
Detailed Description
In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
In a first aspect, as shown in fig. 1, the method for analyzing character relationships in an archive provided by the present application includes the following steps:
s101, acquiring archive data corresponding to an archive to be analyzed;
s102, acquiring a corresponding name extraction requirement and a corresponding name extraction mode according to the file type in the file data;
s103, judging whether the archive data meets the name extraction requirement or not;
s104, if the name extraction requirement is met, extracting the name of the person in the archive data according to a name extraction mode;
s105, updating the archive data according to the name of the person to form archive update data;
s106, extracting association relations among all the person names according to preset relation extraction rules and file updating data to form a person relation;
S107, analyzing each person relationship to form person relationship data according to a preset relationship analysis mode and file update data;
s108, combining the person names and the person relation data to generate a relation network corresponding to the files to be analyzed.
In step S101, the file to be analyzed is a file of the relationship of the people to be analyzed in the document setting unit, and the corresponding file information includes the user type of the document setting unit, the unit information, the file type, the number of files, the specific content of the files, and the like.
The standard required for analyzing the relationship of the characters is the name of the character, and the specific requirements for extracting the name of the character and the extraction modes may be different for the files of different vertical file units due to different file types, so that step S102 needs to be executed, that is, the corresponding name extraction requirements and the corresponding name extraction modes are obtained according to the file types in the file data. In the above, the name extraction requirement is preset, and is a requirement for extracting names of various archive types corresponding to various archives; the name extraction mode is a specific mode of extracting names of various file types corresponding to various files, which are preset.
It should be noted that the file type is directly related to the document unit, and generally, different types of document units will also have different file types. In addition, if the file type is lacking in the file data, the file type is required to be subjected to the complement, or the file type is required to be redetermined.
As mentioned above, different file types have corresponding name extraction requirements, but the current file data does not necessarily meet the name extraction requirements corresponding to the file types, so step S103 is needed to determine whether the file data meets the name extraction requirements. For example, the name extraction requirement is that the archive data is text data, whether the current archive data is text data is judged, and if the current archive data is not text data, the name extraction requirement is not satisfied, so that the name cannot be extracted.
If the name extraction requirement is not met, the current archive data cannot meet the name extraction requirement, and different follow-up operations can be selected according to actual conditions. For example, another file type with the highest similarity to the current file type and reaching the similarity threshold may be obtained as the comparison type, the name extraction requirement and the corresponding name extraction mode corresponding to the comparison type are obtained, and the steps from step S105 to step S107 are directly performed, and a verification step is added after step S107, i.e. whether the person relationship data accords with the conventional relationship data of the file to be analyzed is determined. If yes, entering a step S108, and marking file types and corresponding comparison types in a relation network after the step S108; if not, stopping execution and generating prompt information.
Of course, it is also possible to use a history relationship network corresponding to the current file type, and reversely deduce a name extraction mode, a preset relationship extraction rule and a preset relationship analysis mode corresponding to the history relationship network, then execute step S104 to step S107, and add a verification step after step S107, and then the procedure is similar to the scheme adopting similarity.
If the name extraction requirement is met, the current archive data is indicated to meet the name extraction requirement, and the person name in the archive data is extracted according to the name extraction method in step S104. The specific mode of the name extraction mode can be selected according to actual needs, for example, the file data is processed by using a natural language processing technology, and the name of the person in the file data is identified; or a preset name recognition model is adopted to recognize and extract the name of the person in the archive data, and training and optimizing of the name recognition model can be performed for specific archive types and/or archive units, and specific modes are not described in detail herein.
After extracting the person names in the archive data, the archive data needs to be updated by the person names so that the same person has a unique person name, and then step S105 is executed to update the archive data according to the person names to form archive update data, and step S106 is executed to extract the association relations between the person names to form the person relations according to the preset relation extraction rules and the archive update data.
The preset relationship extraction rule is a preset rule for extracting association relationships between person names, for example, what type of association relationship is extracted, a specific association hierarchy of the association relationship, and the like, and can be determined according to the file type. The specific method for extracting the association relation can use natural language processing and text analysis technology, adopts methods of rule matching, semantic role labeling, coreference resolution and the like, and can also use machine learning algorithms such as Conditional Random Field (CRF) or deep learning models (such as a cyclic neural network), which are not further described herein.
After the person relationships are obtained, further corresponding analysis and mining are required according to the person relationships, so as to obtain deeper insight and understanding from the archive data, and reveal complex interactions between the persons, social networks and the like, therefore, step S107 is performed, namely, according to a preset relationship analysis mode and archive update data, each person relationship is analyzed to form the person relationship data. The preset relationship analysis mode is a preset specific mode for analyzing the relationship of the people, and can be determined according to the file type, so as to analyze similar contents among the relationships of the people, such as the degree of intersection, dynamic change or appearance characteristics among the relationships of the people, and the like.
After the person relationship data is obtained, the person relationship data needs to be corresponding to the corresponding person name so as to generate a person relationship analysis result, so that step S108 is performed, namely, the person name and the person relationship data are combined to generate a relationship network corresponding to the file to be analyzed, which can be understood as that the person name is a node, the person relationship is an edge, and the extracted person relationship data are linked and combined to form the relationship network in the form of a directed graph or an undirected graph.
In practical application, two algorithms can be used to generate the relational network: analyzing the constructed character relation data, such as centrality, betweenness centrality and the like, by utilizing graph theory and a social network analysis method so as to reveal key characters and groups; or pattern recognition, namely, recognizing patterns and trends in the character relation data, such as the formation of character groups, the evolution of the relation and the like through a machine learning algorithm.
In practical use, the character relationship network can be visually displayed by using a computer and a display screen. An interactive graphical interface may be created using a graphical library (e.g., d3.Js, matplotlib), presenting a network of relationships in the form of nodes and edges, and distributing the nodes through a layout algorithm (e.g., force-directed layout). Through interaction operations such as mouse hovering and clicking, a user can browse and analyze the character relation network to view specific character relation data.
According to the analysis method for the character relationship in the archive, the character names in the archive data meeting the name extraction requirements are extracted according to the name extraction mode by acquiring the name extraction requirements corresponding to the archive types and the corresponding name extraction mode, the archive data is updated according to the extracted character names, the character relationship in the updated archive data is extracted according to the preset relationship extraction rule, the character relationship is analyzed according to the preset relationship analysis mode to form character relationship data, and the character names and the character relationship data are combined to generate a relationship network. The character name, the character relation and the character relation data are obtained according to the corresponding requirements, rules, modes and the like selected by the file types, so that the data required by the relation network are obtained according to different file types, and the effects of improving the character relation analysis efficiency and the analysis accuracy are achieved.
In one implementation manner of the present embodiment, as shown in fig. 2, after determining whether the archive data meets the name extraction requirement in step S103, the method further includes:
s201, if the name extraction requirements are not met, acquiring a preset pretreatment mode;
s202, analyzing a preset pretreatment mode to obtain a sub-treatment mode and a corresponding treatment sequence table;
S203, acquiring a current processing sequence according to the processing sequence table;
s204, processing the file data according to the sub-processing mode corresponding to the current processing order until the file data meets the corresponding sub-processing requirement;
s205, updating the current processing order according to the processing order table and returning to the previous step until the current processing order is the last processing order in the processing order table.
If the name extraction requirement is not met, the current file data cannot meet the name extraction requirement, the step S201 of acquiring the preset preprocessing mode is executed, the step S202 of analyzing the preset preprocessing mode is executed, and the sub-processing mode and the corresponding processing sequence table are acquired. The preset pretreatment mode is a preset mode for pretreating the file data, the sub-treatment modes are specific treatment modes contained in the preset pretreatment mode, and the treatment sequence list is a treatment sequence list of the sub-treatment modes.
It should be noted that, the specific sub-processing manner and the corresponding processing order table may also be different according to the different file types in the preset preprocessing manner, and in this embodiment, the sub-processing manner and the corresponding processing order table are as follows: text analysis, cleaning and standardization, data deduplication, data verification and data normalization, and the description of each sub-processing mode is specifically as follows:
Text parsing: the archive text is parsed into computer-understandable structured data. This step may use natural language processing techniques such as word segmentation, part-of-speech tagging, etc., to split the text into individual words or phrases, while identifying the boundaries and structure of the sentence.
Cleaning and standardization: after text parsing, the resulting data needs to be cleaned and normalized. The cleaning process includes removing non-critical information in the text, such as punctuation, stop words, etc., to reduce noise and redundancy. The normalization process is mainly to unify the format and naming convention of data, such as unifying person names to some standard name or identifier.
Data deduplication: for large-scale archival data, there are often duplicate records. In the data preprocessing stage, data deduplication is required to avoid introducing redundant information in subsequent processing.
And (3) data verification: during the preprocessing, the integrity and consistency of the data also need to be checked. For example, checking whether the name of the person has misspelling or missing, verifying whether the relevant field is complete, and ensuring the accuracy and reliability of the data.
Data normalization: if the archive text is from a different source or format, there may be a large variance. In order to achieve better character relation extraction and analysis effects, normalization processing is needed to be carried out on the data, so that the data has uniform structure and semantics.
The archive data that does not meet the name extraction requirement needs to execute the corresponding word processing manner sequentially according to the processing order table, so the step S203 is executed to obtain the current processing order according to the processing order table, and the current processing order may be understood as the initial processing order, which is determined according to the archive type, and the current processing orders of different archive types are different.
Step S204 may be considered as a step circularly executed according to the processing order table, that is, processing the archive data according to the sub-processing mode corresponding to the current processing order until the archive data meets the corresponding sub-processing requirement, and after each time the current processing order is executed and the sub-processing requirement is met, step S205 is executed, updating the current processing order according to the processing order table and executing again, and circularly executing until the current processing order is the last processing order in the processing order table.
It should be noted that, when executing step S204, a specific rule for determining whether the archive data meets the sub-processing requirement may be selected according to the actual requirement, for example, if the corresponding processing threshold is met, the requirement is determined to be met, or if the certain processing time is reached, the requirement is determined to be met. If the sub-processing requirements are not met, the stop loop can be selected, or the number of sub-processing modes which are not met with the sub-processing requirements is calculated, and whether the stop loop is selected or not is determined according to the number.
In addition, in the above processing of the archival data, natural language processing tools and techniques, such as NLTK (Natural Language Toolkit), spaCy, etc., may be used to speed up processing and improve accuracy, and the detailed description thereof will not be provided herein.
According to the analysis method for the human relation in the file, orderly and circularly preprocessing is carried out according to the sub-processing modes in the preset preprocessing modes and the corresponding processing sequence table until the sub-processing requirements corresponding to the sub-processing modes are met, so that the file data which do not meet the name extraction requirements can meet the effect of meeting the name extraction requirements after the file data are processed.
In one implementation manner of the present embodiment, as shown in fig. 3, the name extraction method is named entity recognition, and step S104, if the name extraction requirement is met, extracts the name of the person in the archive data according to the name extraction method, includes:
s301, if the name extraction requirement is met, carrying out named entity recognition on the archive data, and extracting a name recognition result belonging to the name type;
s302, acquiring a determined name in a name recognition result according to a person naming rule;
s303, acquiring names of all the determined names in the name recognition result to form a name substitution relation;
S304, according to the determined name and the name replacement relation, the name of the person corresponding to the archive data is obtained.
If the name extraction requirement is met, only the name recognition result belonging to the name type needs to be extracted, and step S301 is executed to perform named entity recognition on the file data, thereby extracting the name recognition result belonging to the name type. The named entity recognition (Named Entity Recognition, NER for short) belongs to natural language processing technology and is used for processing text data and recognizing person names in documents.
In the step S301, the name recognition result belonging to the name type may be referred to by multiple names, such as Zhang san, zhang t, and Wang two, for the same person in the archive data, so that the step S302 needs to be executed to obtain the determined name in the name recognition result according to the person naming rule. In the present embodiment, the name of the person is determined, and the person naming rule is a preset rule for determining the name of the person among the plurality of names.
After the determined names are obtained, step S303 is executed to obtain names of the determined names in the name recognition result, so as to form a name substitution relationship, and step S304 is executed to obtain the person names corresponding to the archive data according to the determined names and the name substitution relationship. For example, if the name is Zhang three, the names are named as the siblings of Zhang tertiary and Wang two, the name substitution relations of Zhang tertiary, wang two, etc. correspond to Zhang three, and thus the character name corresponding to the archive data is Zhang three.
According to the analysis method for the intra-archive person relationship provided by the embodiment, the name recognition results belonging to the person name type in archive data are extracted, the determined names in the archive data are obtained by combining the person naming rules, the name substitution relationship is formed according to the names of the determined names, and then the person names corresponding to the archive data are obtained according to the determined names and the name substitution relationship. The file data has the determined person names for the same person, so that errors and even failures of extraction or analysis of the person relationship caused by the existence of other names are avoided.
In one implementation of the present embodiment, as shown in fig. 4, step S105 updates profile data according to the name of the person to form profile update data, including:
s401, according to the name replacement relation, names in the archive data are replaced with corresponding determined names to form first archive data;
s402, acquiring each personal name pronoun in the first file data and corresponding data positions of the personal name pronoun in the first file data;
s403, replacing the human-called pronouns with corresponding determined names according to the context of the data position, and forming file updating data.
After the person names of the archive data are determined, other names need to be respectively referred to for corresponding processing, and a direct replacement mode is generally adopted, and step S401 is executed, namely, names in the archive data are respectively referred to as corresponding determined names according to a name replacement relationship, so as to form first archive data.
However, in practical application, besides the fact that the name of each person affects the extraction of the relationship between the persons, some person ' S pronouns, for example, you, me, he, she, we need to define the name of the corresponding person, so that after the step S402 is executed first, that is, each person ' S pronoun in the first file data and the corresponding data position in the first file data are obtained, that is, after the data position is obtained, that is, the context before and after the data position is executed in the step S403, the person ' S pronoun is replaced with the corresponding defined name, so as to form file update data.
According to the analysis method for the intra-archive human relation provided by the embodiment, names in archive data are replaced with corresponding determined names in a replacement mode, and then the human pronouns are replaced with the corresponding determined names according to the human pronouns and the corresponding data positions thereof and by combining the front context and the rear context, so that archive update data are formed. Only the unique determined person name is provided for the same person in the archive data, so that mistakes or even failures of extraction or analysis of the person relationship caused by the existence of the person-called pronoun are avoided.
In one implementation manner of the present embodiment, as shown in fig. 5, step S106 extracts association relationships between the person names to form a person relationship according to a preset relationship extraction rule and file update data, including:
S501, extracting association relations among all the person names according to preset relation extraction rules and archive update data;
s502, judging whether a contradiction relation exists in the association relation;
s503, if a contradiction relation exists, acquiring a person name directly corresponding to the contradiction relation as a comparison person name;
s504, acquiring corresponding behavior characteristics according to the relationship types of the contradictory relationships;
s505, if the behavior characteristics exist among the comparison person names, determining the person relation among the comparison person names according to the behavior characteristics.
In practical application, although the archive update data is preprocessed by a series of sub-processing methods, the problem of the data layer is eliminated, but some errors which affect the relationship of people may still exist, for example, in the same archive update data, zhang three is recorded as a father of Zhang Wu and a son of Zhang five in different positions or contents respectively, which obviously is not in accordance with the contradictory relationship of the practical situation, and belongs to the error relationship to be eliminated.
To eliminate the error, the association relationship between all the person names needs to be acquired first, then step S501 is executed to extract the association relationship between the person names according to the preset relationship extraction rule and the file update data, and then step S502 is executed to determine whether there is a contradictory relationship in the association relationship.
And if the contradiction relation does not exist, forming the character relation according to the association relation among the character names.
If there is a contradictory relationship, the person name directly corresponding to the acquired contradictory relationship in step S503 is executed, and as the comparison person name, for example, the aforementioned Zhang three is recorded as the parent of Zhang Wu and the son of Zhang five in different positions or contents, respectively, and the comparison person names are Zhang three and Zhang five.
If only the comparison person name is obtained and it is not possible to determine which of the contradictory relationships is the correct person relationship, step S504 is performed to obtain the corresponding behavior feature according to the relationship type of the contradictory relationship. The relationship types are relationship types corresponding to the human relationships, and the behavior features are behavior features of each relationship type. For example, comparing person names Zhang three and Zhang Wu, where the contradictory relationship is Zhang three to Zhang five father and Zhang five son, the behavioral characteristic may be the behavior of the child, support, etc. that expresses the ancestor score or relationship.
If the behavior features exist between the comparison person names, it is indicated that the person relationship corresponding to the behavior feature is the correct person relationship of the comparison person name, and step S505 is executed to determine the person relationship between the comparison person names according to the behavior features. For example, between Zhang three and Zhang five, there is a behavioral characteristic of Zhang three-support Zhang five, but there is no behavioral characteristic of Zhang three-support Zhang five, then the correct person relationship between the two is son with Zhang three being Zhang five.
According to the analysis method for the person relationship in the file, aiming at the association relationship with the contradictory relationship, whether the behavior characteristics corresponding to the relationship types of the contradictory relationship exist between the directly corresponding comparison person names is judged, and then the person relationship between the comparison person names is determined according to the specific behavior characteristics. The error character relation with the contradictory relation can be corrected to a certain extent, and the accuracy of the analysis method is improved.
In one implementation manner of the present embodiment, as shown in fig. 6, after step S504, that is, after obtaining the corresponding behavior feature according to the relationship type of the contradictory relationship, the method further includes:
s601, if no behavior features exist among the names of the comparison characters, acquiring corresponding attribute features according to the relationship types;
s602, if the comparison person names respectively accord with the attribute characteristics, determining the person relation between the comparison person names according to the attribute characteristics.
If no behavior feature exists between the comparison person names, and it is indicated that the contradictory relationship cannot be corrected according to the behavior feature, the corresponding attribute feature is obtained according to the relationship type in step S601, and after the attribute feature is obtained, it is determined whether the corresponding attribute of the comparison person name is an attribute feature. The attributes are attributes of all people, including gender, age, position, social relationship and the like, and the attribute features are corresponding attributes of all people in all relationship types.
If the comparison person names respectively conform to the attribute features, the step S602 is executed to determine the person relationship between the comparison person names according to the attribute features. For example, comparing person names Zhang three and Zhang Wu, the contradictory relationship is Zhang three to Zhang five father and Zhang five son, while the attribute of the father-son relationship may be characterized by the father's age attribute being greater than the son's age attribute. By acquiring the age attribute, the age of Zhang three is 62 years old, the age of Zhang Wu is 33 years old, and obviously Zhang three accords with the age attribute of father, and Zhang five accords with the age attribute of son, so that the relationship of people between Zhang three and Zhang five can be determined as the father of Zhang three to Zhang five.
If the comparison person name does not accord with the attribute characteristics, the fact that the contradictory relation cannot be corrected through the behavior characteristics and the attribute characteristics is indicated, and prompt information comprising the contradictory relation is output so as to be corrected later; it is also possible to re-execute the analysis method, adjust the name extraction requirements and the corresponding name extraction methods, or adjust the preset relationship extraction rules in order to attempt to correct the contradictory relationships by means of parameter adjustment.
According to the analysis method for the person relationship in the file, for the contradictory relationship without behavior characteristics, the corresponding attribute characteristics are obtained through the relationship types, whether the comparison person names corresponding to the contradictory relationships accord with the attribute characteristics is judged, the person relationship among the comparison person names is determined according to the attribute characteristics, and the effect of improving the accuracy of the analysis method is achieved.
In one implementation manner of the present embodiment, as shown in fig. 7, step S107 is to analyze each person relationship to form person relationship data according to a preset relationship analysis mode and profile update data, including:
s701, analyzing a preset relation analysis mode to obtain relation types;
s702, analyzing the relationship types corresponding to the relationship of each person according to the relationship types and the archive update data to form the relationship corresponding types;
s703, forming character relation data according to the relation corresponding type and the character relation.
In practical application, the relationship between the people is complicated, so before the relationship between the people is analyzed to obtain the relationship data between the people, the extracted relationship between the people needs to be classified, and the step S701 is executed to analyze the preset relationship analysis mode, so as to obtain the relationship type therein. The relationship types are different types of personal relationships, such as family relationships, work relationships, friendship relationships, etc.
After the relationship type is obtained, step S702 is executed, that is, the relationship type corresponding to each person relationship is analyzed according to the relationship type and the file update data to form a relationship corresponding type, and the relationship corresponding type is combined with the person relationship to form person relationship data, that is, step S703 is executed.
According to the analysis method for the person relationships in the file, the person relationships are classified according to the relationship types, and the acquired relationship corresponding types are combined with the person relationships to form person relationship data, so that subsequent analysis is more specific and targeted, and the effectiveness of the analysis method is improved.
In one implementation manner of the present embodiment, as shown in fig. 8, step S703, that is, forming person relationship data according to the relationship correspondence type and the person relationship, includes:
s801, acquiring corresponding relation strength indexes according to relation types;
s802, analyzing the relationship strength of each person relationship in each corresponding relationship type according to the relationship strength index;
s803, forming character relation data by taking each character relation as a reference and combining the corresponding relation type and relation strength.
In practical application, the relationship of the people in each relationship type also has a corresponding relationship strength, and in addition to the relationship type, the relationship strength needs to be evaluated, so that step S801 is executed to obtain a corresponding relationship strength index according to the relationship type. The relationship strength index refers to various indexes for analyzing the relationship strength, such as frequency, duration, economic trade and the like.
According to the relationship strength index obtained in step S801, step S802 is performed, that is, according to the relationship strength index, the relationship strength of each person relationship in each relationship type corresponding to the relationship is analyzed, and step S803 is performed, that is, based on each person relationship, the relationship type and the relationship strength are combined to form person relationship data, so that by evaluating the relationship strength, it is possible to determine which relationships are more important and significant, and the interaction force between the persons.
According to the analysis method for the person relationship in the file, the relationship strength of the person relationship is analyzed through the relationship strength index, the relationship strength in each obtained relationship type is combined with the corresponding relationship strength, and the person relationship data is formed by taking the person relationship as a reference, so that a subsequently formed relationship network is more specific and targeted, and the effectiveness of the analysis method is improved.
The analysis method for the relationship of the characters in the archive provided by the application breaks through the traditional utilization mode, archives are archived, and the association relationship among the characters in the archive data is extracted through intelligent recognition analysis, so that the social network among key characters, character groups and characters can be found, full text retrieval and utilization expansion are carried out based on the character relationship data, various relationships such as relatives, family relationships, working relationships and friendship relationships corresponding to the characters are contained in the relationship network, the archive utilization value is improved, and the applicable range of the archive is widened.
In a second aspect, the present application provides a system for analyzing relationships between characters in an archive, as shown in fig. 9, including:
the first acquisition module 1 is used for acquiring archive data corresponding to an archive to be analyzed;
the second obtaining module 2 is used for obtaining a corresponding name extraction requirement and a corresponding name extraction mode according to the file type in the file data;
the first judging module 3 is used for judging whether the file data accords with the name extraction requirement;
the first extraction module 4 is used for extracting the name of the person in the archive data according to a name extraction mode if the name extraction requirement is met;
a first updating module 5, configured to update the archive data according to the name of the person to form archive update data;
the second extraction module 6 is used for extracting the association relationship among the names of the people to form a person relationship according to the preset relationship extraction rule and the file updating data;
the first analysis module 7 is used for analyzing each person relationship to form person relationship data according to a preset relationship analysis mode and file updating data;
the first generation module 8 is configured to combine the person name and the person relationship data, and generate a relationship network corresponding to the archive to be analyzed.
It should be noted that, the data transmission relationship or the logic connection relationship between the above functional modules may be determined according to the corresponding steps in the analysis method of the character relationship in the corresponding file, which will not be described here again. The data transmission relationship or the logical connection relationship of each functional module shown in fig. 9 is only used for understanding the analysis system as the in-file human relation, and is not particularly limited.
According to the analysis system for the character relationship in the archive, the character names in the archive data meeting the name extraction requirements are extracted according to the name extraction mode by acquiring the name extraction requirements corresponding to the archive types and the corresponding name extraction mode, the archive data is updated according to the extracted character names, the character relationship in the updated archive data is extracted according to the preset relationship extraction rule, the character relationship is analyzed according to the preset relationship analysis mode to form character relationship data, and the character names and the character relationship data are combined to generate a relationship network. The character name, the character relation and the character relation data are obtained according to the corresponding requirements, rules, modes and the like selected by the file types, so that the data required by the relation network are obtained according to different file types, and the effects of improving the character relation analysis efficiency and the analysis accuracy are achieved.
Furthermore, in the analysis system for intra-archive person relationships provided in this embodiment, other functional modules may be set as needed, or the functional modules may be divided into a plurality of functional units, so as to achieve a technical effect corresponding to the analysis method for intra-archive person relationships of any one of the foregoing.
In a third aspect, the present application provides a computer readable storage medium, where computer instructions are stored, and when the computer instructions are loaded and executed by a processor, the method for analyzing a person relationship in an archive according to any one of the above is used, and the technical effect corresponding to the method for analyzing a person relationship in an archive according to any one of the above can be achieved.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. Unless explicitly stated herein, the steps are not strictly limited to the order of execution, and may be executed in other orders, i.e., the order of execution may be reasonably ordered with respect to each other according to actual needs. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The analysis method of the character relationship in the file is characterized by comprising the following steps:
acquiring file data corresponding to a file to be analyzed;
acquiring a corresponding character name extraction rule according to the file type in the file data;
analyzing the character name extraction rule to obtain name extraction requirements and corresponding name extraction modes;
judging whether the archive data accords with the name extraction requirement or not;
if the name extraction requirement is met, extracting the name of the person in the archive data according to the name extraction mode;
updating the archive data according to the name of the person to form archive update data;
extracting the association relation among the person names to form a person relation according to a preset relation extraction rule and the archive update data;
analyzing each person relationship to form person relationship data according to a preset relationship analysis mode and the archive update data;
and combining the person names and the person relation data to generate a relation network corresponding to the file to be analyzed.
2. The method of claim 1, further comprising, after said determining whether said profile data meets said name extraction requirement:
If the name extraction requirement is not met, a preset pretreatment mode is obtained;
analyzing the preset pretreatment mode to obtain a sub-treatment mode and a corresponding treatment sequence table;
acquiring a current processing order according to the processing order table;
processing the archive data according to the sub-processing mode corresponding to the current processing order until the archive data meets the corresponding sub-processing requirement;
and updating the current processing order according to the processing order table and returning to the previous step until the current processing order is the last processing order in the processing order table.
3. The method according to claim 1, wherein the name extraction method is named entity recognition, and the extracting the name of the person in the profile data according to the name extraction method if the name extraction requirement is met comprises:
if the name extraction requirement is met, carrying out named entity recognition on the archive data, and extracting a name recognition result belonging to the name type;
acquiring a determined name in the name recognition result according to the person naming rule;
acquiring names of the determined names in the name recognition result to form a name substitution relation;
And acquiring the person name corresponding to the archive data according to the determined name and the name substitution relation.
4. The method of claim 3, wherein updating the profile data based on the name of the person to form profile update data comprises:
according to the name replacement relation, the names in the archive data are respectively replaced with the corresponding determined names to form first archive data;
acquiring each personal name pronoun in the first file data and a corresponding data position in the first file data;
and replacing the human-called pronouns with the corresponding determined names according to the context of the data position to form the archive update data.
5. The method according to claim 1, wherein extracting the association relationships between the individual character names to form the character relationships according to the preset relationship extraction rules and the profile update data comprises:
extracting the association relation among the person names according to a preset relation extraction rule and the archive update data;
Judging whether a contradiction relation exists in the association relation;
if the contradiction relation exists, the person name directly corresponding to the contradiction relation is obtained and used as a comparison person name;
acquiring corresponding behavior characteristics according to the relationship type of the contradictory relationship;
and if the behavior characteristics exist among the comparison person names, determining the person relation among the comparison person names according to the behavior characteristics.
6. The method according to claim 5, further comprising, after the obtaining the corresponding behavior feature according to the relationship type of the contradictory relationship:
if the behavior characteristics do not exist between the names of the comparison characters, acquiring corresponding attribute characteristics according to the relationship types;
and if the comparison person names respectively accord with the attribute characteristics, determining the person relation between the comparison person names according to the attribute characteristics.
7. The method according to claim 1, wherein analyzing each of the personal relationships to form personal relationship data according to a predetermined relationship analysis method and the profile update data comprises:
Analyzing the preset relation analysis mode to obtain the relation type;
analyzing the relationship types corresponding to the person relationships according to the relationship types and the archive update data to form relationship corresponding types;
and forming the figure relationship data according to the relationship corresponding type and the figure relationship.
8. The method for analyzing relationships among persons in a file according to claim 7, wherein the forming the relationship data includes:
acquiring a corresponding relationship strength index according to the relationship type;
analyzing the relationship strength of each person relationship in each corresponding relationship type according to the relationship strength index;
and taking each person relationship as a reference, and combining the corresponding relationship type and the relationship strength to form the person relationship data.
9. A system for analyzing a person relationship within a file, comprising:
the first acquisition module is used for acquiring file data corresponding to the file to be analyzed;
the second acquisition module is used for acquiring name extraction requirements and corresponding name extraction modes according to the file types in the file data;
The first judging module is used for judging whether the archive data accords with the name extraction requirement or not;
the first extraction module is used for extracting the name of the person in the archive data according to the name extraction mode if the name extraction requirement is met;
the first updating module is used for updating the archive data according to the name of the person to form archive updating data;
the second extraction module is used for extracting the association relationship among the person names to form a person relationship according to a preset relationship extraction rule and the archive update data;
the first analysis module is used for analyzing each person relationship to form person relationship data according to a preset relationship analysis mode and the archive update data;
and the first generation module is used for combining the person name and the person relation data to generate a relation network corresponding to the file to be analyzed.
10. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program which, when executed by a processor, implements a method of analyzing character relationships within an archive as claimed in any one of claims 1 to 8.
CN202311047547.2A 2023-08-21 2023-08-21 Method for analyzing character relationship in file and related equipment Pending CN116756088A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311047547.2A CN116756088A (en) 2023-08-21 2023-08-21 Method for analyzing character relationship in file and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311047547.2A CN116756088A (en) 2023-08-21 2023-08-21 Method for analyzing character relationship in file and related equipment

Publications (1)

Publication Number Publication Date
CN116756088A true CN116756088A (en) 2023-09-15

Family

ID=87953719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311047547.2A Pending CN116756088A (en) 2023-08-21 2023-08-21 Method for analyzing character relationship in file and related equipment

Country Status (1)

Country Link
CN (1) CN116756088A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060277465A1 (en) * 2005-06-07 2006-12-07 Textual Analytics Solutions Pvt. Ltd. System and method of textual information analytics
CN103631883A (en) * 2013-11-15 2014-03-12 宁波保税区攀峒信息科技有限公司 Data selecting method and device during genetic relationship data collision
CN103886011A (en) * 2013-12-30 2014-06-25 安徽讯飞智元信息科技有限公司 Social-relation network creation and retrieval system and method based on index files
CN109960789A (en) * 2017-12-22 2019-07-02 广州帷策智能科技有限公司 Character relation analysis method based on natural language processing
CN111813770A (en) * 2020-09-03 2020-10-23 平安国际智慧城市科技股份有限公司 Data model construction method and device and computer readable storage medium
CN113157947A (en) * 2021-05-20 2021-07-23 中国工商银行股份有限公司 Knowledge graph construction method, tool, device and server
CN113254659A (en) * 2021-02-04 2021-08-13 天津德尔塔科技有限公司 File studying and judging method and system based on knowledge graph technology
CN113961719A (en) * 2021-10-29 2022-01-21 罗普特科技集团股份有限公司 Family tree construction and query method and system based on graph database
CN114202319A (en) * 2022-02-21 2022-03-18 南京云档信息科技有限公司 Archive management system based on mixed metadata scheme
CN115760453A (en) * 2022-11-16 2023-03-07 北京合思信息技术有限公司 Method and device for creating accounting archive data association relation and electronic equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060277465A1 (en) * 2005-06-07 2006-12-07 Textual Analytics Solutions Pvt. Ltd. System and method of textual information analytics
CN103631883A (en) * 2013-11-15 2014-03-12 宁波保税区攀峒信息科技有限公司 Data selecting method and device during genetic relationship data collision
CN103886011A (en) * 2013-12-30 2014-06-25 安徽讯飞智元信息科技有限公司 Social-relation network creation and retrieval system and method based on index files
CN109960789A (en) * 2017-12-22 2019-07-02 广州帷策智能科技有限公司 Character relation analysis method based on natural language processing
CN111813770A (en) * 2020-09-03 2020-10-23 平安国际智慧城市科技股份有限公司 Data model construction method and device and computer readable storage medium
CN113254659A (en) * 2021-02-04 2021-08-13 天津德尔塔科技有限公司 File studying and judging method and system based on knowledge graph technology
CN113157947A (en) * 2021-05-20 2021-07-23 中国工商银行股份有限公司 Knowledge graph construction method, tool, device and server
CN113961719A (en) * 2021-10-29 2022-01-21 罗普特科技集团股份有限公司 Family tree construction and query method and system based on graph database
CN114202319A (en) * 2022-02-21 2022-03-18 南京云档信息科技有限公司 Archive management system based on mixed metadata scheme
CN115760453A (en) * 2022-11-16 2023-03-07 北京合思信息技术有限公司 Method and device for creating accounting archive data association relation and electronic equipment

Similar Documents

Publication Publication Date Title
US9652719B2 (en) Authoring system for bayesian networks automatically extracted from text
CN106462604B (en) Identifying query intent
US7469251B2 (en) Extraction of information from documents
US8583420B2 (en) Method for the extraction of relation patterns from articles
US20160085742A1 (en) Automated collective term and phrase index
US20080021700A1 (en) System and method for automating the generation of an ontology from unstructured documents
CN108052659A (en) Searching method, device and electronic equipment based on artificial intelligence
US11769003B2 (en) Web element rediscovery system and method
US8533140B2 (en) Method and system for design check knowledge construction
US11216492B2 (en) Document annotation based on enterprise knowledge graph
EP2831770A1 (en) A method and apparatus for computer assisted innovation
CN110555205A (en) negative semantic recognition method and device, electronic equipment and storage medium
CN112925901A (en) Evaluation resource recommendation method for assisting online questionnaire evaluation and application thereof
CN112925879A (en) Information processing apparatus, storage medium, and information processing method
CN105378706A (en) Entity extraction feedback
WO2019085118A1 (en) Topic model-based associated word analysis method, and electronic apparatus and storage medium
Eyal-Salman et al. Feature-to-code traceability in legacy software variants
Tovar et al. Identification of Ontological Relations in Domain Corpus Using Formal Concept Analysis.
CN116756088A (en) Method for analyzing character relationship in file and related equipment
JP2020067987A (en) Summary creation device, summary creation method, and program
CN111625579B (en) Information processing method, device and system
CN113536182A (en) Method and device for generating long text webpage, electronic equipment and storage medium
CN113032371A (en) Database grammar analysis method and device and computer equipment
CN111476037B (en) Text processing method and device, computer equipment and storage medium
Quille et al. Detecting favorite topics in computing scientific literature via Dynamic Topic Modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination