CN112463986A - Information storage method and device - Google Patents

Information storage method and device Download PDF

Info

Publication number
CN112463986A
CN112463986A CN202011421906.2A CN202011421906A CN112463986A CN 112463986 A CN112463986 A CN 112463986A CN 202011421906 A CN202011421906 A CN 202011421906A CN 112463986 A CN112463986 A CN 112463986A
Authority
CN
China
Prior art keywords
graph
entity
attribute
map
triple
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011421906.2A
Other languages
Chinese (zh)
Inventor
荆小兵
匙朝阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Minglue Technology Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202011421906.2A priority Critical patent/CN112463986A/en
Publication of CN112463986A publication Critical patent/CN112463986A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an information storage method and device. The invention comprises the following steps: carrying out named entity recognition on the unstructured text to obtain a plurality of entity tags, storing the entity tags with associations between every two entity tags in a triple form, modeling and representing the entities and/or relations in the attribute graph in an ontology graph form to form an attribute graph ontology graph; constructing a target rule of mapping the triples to the attribute graph according to the output type of each triplet in the triplet set and the attribute ontology graph; and mapping the triples in the triple set into an attribute graph according to the target rule, storing the attribute graph into a subject graph, manually checking the subject graph, and synchronously storing the subject graph into a knowledge base after correction. By adopting the scheme, the incremental information is converted into the triples, then the triples are mapped into the attribute graph to form the subject map, and the subject map is stored in the knowledge base, so that the problems of low accuracy and large resource consumption in the process of storing the incremental information into the knowledge base in the related technology are solved.

Description

Information storage method and device
Technical Field
The invention relates to the field of information processing, in particular to a method and a device for storing information.
Background
In the related technology, the knowledge graph is a key link for realizing cognitive intelligence, and many factors such as construction efficiency, data accuracy, labor cost and the like need to be considered for constructing the knowledge graph. Attribute graphs are a widely used model of atlas data, with attribute graphs constructed from unstructured data. The basic process comprises knowledge modeling, knowledge acquisition, knowledge fusion and knowledge storage; the knowledge acquisition is to acquire entity, relation and attribute data from unstructured data through identification and extraction; the knowledge fusion comprises the fusion of a body layer and the fusion of a data layer, and the fusion of the data layer usually combines entities and relations according to a certain strategy.
In the knowledge acquisition and knowledge fusion process, an attribute graph is mainly constructed from simple triple increments at present, the automatic knowledge extraction accuracy in the step is not enough, a large amount of manual extraction is needed, and data repetition and data pollution are easily caused by increment graph entry, so that the data quality is reduced.
Aiming at the problems of low accuracy and large resource consumption in the process of storing incremental information into a knowledge base in the related technology, an effective solution is not provided at present.
Disclosure of Invention
The invention mainly aims to provide an information storage method and an information storage device, and aims to solve the problems that in the related art, the accuracy rate is low and a large amount of resources are consumed in the process of storing incremental information into a knowledge base.
To achieve the above object, according to one aspect of the present invention, there is provided a method of information storage. The invention comprises the following steps: carrying out named entity recognition on the unstructured text to obtain a plurality of entity tags, and storing the entity tags with association between every two entity tags in a triple form to form a triple set; modeling and representing the entities and/or relations in the attribute graph in the form of an ontology graph to form an attribute graph ontology graph; constructing a target rule of mapping the triples to the attribute graph according to the output type of each triplet in the triplet set and the attribute ontology graph; mapping the triples in the triple set into an attribute graph according to the target rule, and storing the attribute graph into a subject map; receiving an input signal of a target object, correcting the theme map according to the input signal, and storing an entity on the corrected theme map into a knowledge base.
In order to achieve the above object, according to another aspect of the present invention, there is provided an apparatus for information storage. The device includes: the entity identification module is used for carrying out named entity identification on the unstructured text to obtain a plurality of entity labels, and storing the entity labels with association between every two entity labels in a triple form to form a triple set; the modeling module is used for modeling and representing the entities and/or the relations in the attribute graph in the form of the ontology graph to form an attribute graph ontology graph; the target rule building module is used for building a target rule of mapping the triples to the attribute graph according to the output type of each triplet in the triplet set and the attribute ontology graph; the first storage module is used for mapping the triples in the triple set into an attribute map according to the target rule and storing the attribute map into a subject map; and the second storage module is used for receiving an input signal of a target object, correcting the theme map according to the input signal and storing the corrected entity on the theme map into a knowledge base.
The invention adopts the following steps: carrying out named entity recognition on the unstructured text to obtain a plurality of entity tags, storing the entity tags with associations between every two entity tags in a triple form, modeling and representing the entities and/or relations in the attribute graph in an ontology graph form to form an attribute graph ontology graph; constructing a target rule of mapping the triples to the attribute graph according to the output type of each triplet in the triplet set and the attribute ontology graph; and mapping the triples in the triple set into an attribute graph according to the target rule, storing the attribute graph into a subject graph, manually checking the subject graph, and synchronously storing the subject graph into a knowledge base after correction. By adopting the scheme, the incremental information is firstly converted into the triples, then the triples are mapped into the attribute graph to form the subject map, and the subject map is stored in the knowledge base, so that the problems of low accuracy and large resource consumption in the process of storing the incremental information into the knowledge base in the related technology are solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow diagram of a method of information storage according to an embodiment of the invention;
FIG. 2 is a data flow block diagram according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an attribute map schema according to an embodiment of the application;
FIG. 4 is an example of an attribute map according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an entity binding flow according to an embodiment of the present application;
fig. 6 is a schematic diagram of an apparatus for information storage according to an embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances in order to facilitate the description of the embodiments of the invention herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the knowledge acquisition and knowledge fusion process in the related technology, an attribute graph is mainly constructed from simple triple increments at present, and the following difficulties exist in realizing the step: one of the difficulties is that the accuracy of automatic knowledge extraction is not sufficient, and a large amount of manual extraction is needed. The machine learning algorithm is used for Entity identification (NER) and relation extraction, and an Entity label (Entity indication) and a triple (knowledge representation form of combination of a subject, an object and a relation category) can be output with high accuracy. The reason is that the triple structure is simple, the computer is easy to process, the entities and the relations in the attribute graph usually have various attributes, the structure is relatively complex, the extraction accuracy of the machine learning algorithm is too low, and a large amount of manual participation is often needed to construct accurate attribute graph knowledge. The second difficulty is that adding the data into the graph easily causes data repetition and data pollution, resulting in data quality reduction. Data extracted from unstructured data often does not have a unique ID, incremental graph entry can cause repetition, even if the unique ID exists, original data can be polluted if a coverage updating mode is adopted, and the quality of the data in a knowledge base is difficult to guarantee.
For convenience of description, some terms or expressions referring to the embodiments of the present invention are explained below:
knowledge map (Knowledge Graph, abbreviated KG): a knowledge representation in which "nodes" and "edges" are connected may also refer to a database that stores the nodes and edges.
Property Graph (Property Graph): a data model of a knowledge graph, nodes and edges, has attributes.
The Knowledge Base (Knowledge Base) is similar to the concept of a Knowledge map, and the database used for storage is emphasized more.
Knowledge map Construction/Population (KBP for short).
Unstructured Data (Unstructured Data): there is no fixed format data, such as text data.
According to an embodiment of the present invention, there is provided a method of information storage. FIG. 1 is a flow chart of a method of information storage according to an embodiment of the present invention. As shown in fig. 1, the present invention comprises the steps of:
step S101, conducting named entity recognition on an unstructured text to obtain a plurality of entity tags, and storing the entity tags with associations between every two entity tags in a triple form to form a triple set;
the "entity tag" obtained from the unstructured text by Named Entity Recognition (NER), contains tag class information, for example, from the unstructured text: "Liudebhua 1961 and 9.27 sunrise in hong Kong, China" obtained 3 tags: (Liudebua, names of people), (9 months 27 days 1961), (hong Kong, Place).
Triplets, with associated tags between pairs, organized in triplets, represent a fact. For example, the relationship type of the two tags (liu de hua, person name) (hong kong, place name) is "birth location", which can be expressed as a relationship triplet: (Liu De Hua-location of birth-hong Kong) of the type (head: person name-type: location of birth-tail: place name), head representing the subject, type representing the triple relationship type, and tail representing the object.
Fig. 2 is a data flow diagram according to an embodiment of the present application, and as shown in fig. 2, an ellipse represents data, a box represents a module, and a solid frame is an important point of the invention of the present application. The method sequentially comprises four parts of knowledge modeling, knowledge acquisition, knowledge fusion and knowledge storage.
Step S102, modeling and representing the entities and/or the relations in the attribute graph in the form of an ontology graph to form an attribute graph ontology graph;
the entities and the relations in the attribute graph are modeled and represented in the form of an ontology graph, for example, in JanusGraph, schema is used for defining which attributes an entity can have, and Neo4j can add Constraint to a Label to define attributes of the entity. The role of the schema resembles the table structure of a relational database. Fig. 3 is a schematic diagram of an attribute diagram schema according to an embodiment of the application, and as shown in fig. 3, a person is a subject and includes attributes such as name, identification number, gender, age, and the like, a vehicle is an object and has attributes such as a model number, a license plate number, and factory time, and a location is an object and has attributes such as a name, a zip code, latitude and longitude. The owning and the provenance represent two triple relationship types.
Step S103, constructing a target rule of mapping the triples to the attribute graph according to the output type of each triplet in the triplet set and the attribute ontology graph;
optionally, the target rule comprises at least one of: the triple is mapped into attribute graph relation rule, and the triple is mapped into attribute graph entity rule.
The attribute map mapping needs to be determined according to the ontology model and the triple type. From the triple transformation attribute graph, some triples will become relationships and some triples will become attributes. According to the triple output type and the attribute graph ontology graph, a mapping rule from the triple to the attribute graph relation and the entity can be constructed. For example, the entity mapping rules are shown in table 1 below, and the relationship mapping rules table in table 2 below:
TABLE 1
Figure BDA0002822741600000061
TABLE 2
Figure BDA0002822741600000062
Inputting a triple: (Liu De Hua age 59) (Liu De Hua hong Kong Address hong Kong) (hong Kong postal code 100) (Liu De Hua sex male), an attribute map example can be obtained through rule mapping, and FIG. 4 is an attribute map example according to the embodiment of the present application, and as shown in FIG. 4, the attribute map example comprises a person entity, a place entity and a relationship of a place of birth. Step S104, mapping the triples in the triple set into an attribute map according to the target rule, and storing the attribute map into a subject map;
the attribute map takes the triples and entity labels as input and is converted into entity and relationship output in the attribute map model.
From the above example, the attribute map has several features:
a) the triples are likely to be mapped to relationships, and likely to be mapped to attributes, depending on the relationship class and the schema structure. In the example of "entity identification," the "place of birth" is an attribute, whereas in the above example, the "place of birth" is a relationship that links to a "place".
b) Some labels can trigger entity instantiation, and the labels themselves also become the attributes of the entity; the rest labels can not trigger entity instantiation, and only can become the attributes of the existing entity.
c) The attribute value may be a null value (null), such as the Lioude's identity card number null
d) Entities, relationships may not be abstracted, as in the example above, there are no "vehicle" entities.
e) Before the relationship is established, the entity searching process does not necessarily rely on the unique ID, and noise can be introduced. For example, looking up an entity by the name of a person in table 2 may be repeated and the result may not be correct, while looking up an entity by the number plate may ensure correct.
Based on the characteristics, it can be known that data obtained after the triple is mapped by the attribute map has the problems of relation connection error, attribute null error and no ID of the entity. These errors can be reduced by organizing the text by topic, or can be edited and corrected in the topic map.
Step S105, receiving an input signal of a target object, correcting the theme map according to the input signal, and storing an entity on the corrected theme map into a knowledge base.
The theme map receives a batch of information extraction results, provides an interactive interface, allows the ginseng to be audited, edited and confirmed, and data after manual confirmation can be output to a knowledge base.
In the subject map auditing module, firstly, unstructured data is automatically converted into attribute map data through a preset rule and a preset model. The attribute graph output in the first step has limited accuracy and has errors and null values, so that the attribute graph cannot be directly accessed to a knowledge graph, but is stored into a 'subject graph', and a user can check, edit and confirm the attribute graph before entering the graph.
For the convenience of manual review, one text or several texts with the same subject is usually extracted and then placed in a subject map, which has the advantages that: 1) small data size, manpower enough to handle, 2) limited scope of content knowledge, easy to understand manually. 3) The content is highly related, and the homonym labels are more likely to refer to the same meaning. This component is called a "topic map".
The theme map can be stored persistently, the editing state can be saved, and the theme map can be taken out at any time to be continuously edited. The storage may employ a conventional database or graph database, stored separately from the ultimately constructed knowledge-graph.
1. Data auditing is carried out on the corresponding original text
Optionally, before receiving the input signal of the target object, the topic map and the unstructured text are associated and displayed, wherein the topic map is drawn in the form of nodes and edges.
And (3) displaying the theme graph and the extracted original text in a correlation manner, wherein the theme graph is drawn in the form of nodes and edges, as shown in fig. 4. When the label in fig. 4 is not null, it may be from the original text, and may be linked to the label position of the original text by drawing a connecting line. The original text can be referred to for manual review, and the map can be reviewed.
2. Perfecting a map by labeling and dragging association editing
Labeling: tags that are not present but are needed may be supplemented by a label (e.g., mouse selected from the original text), such as "11019610716" from the original text, to generate a new tag.
And (3) association: the attribute map maps the relationships, attributes, that are not obtained, and may be manually associated, such as associating the label "11019610716" just labeled to the "ID number" attribute of "Liu De Hua". This step may be interactively in the form of a drag, selection, etc.
3. Attribute map mapping condition restriction
Optionally, after receiving the input signal of the target object, when the input signal does not meet the target rule, an alarm signal is sent to the target object.
During editing of the theme map, the attribute map mapping plays a role of conditional restriction, that is, when the attribute map mapping table is manually edited, an action in the attribute map mapping table is triggered, so that the system can confirm that the currently operated tags and triples conform to the corresponding mode. Such as the second rule in table 1, if the user wants to associate an age-class label to the "gender" attribute of the "people" entity, then a warning is given to the editing action to prevent human error.
4. Attribute map mapping recommendation ranking
Optionally, after receiving an input signal of a target object, detecting an entity of a topic map corresponding to the input signal; advancing a display order of correspondent entity labels associated with the entities.
During editing of the theme map, the labels most probably used by the user are arranged at the front position according to the attribute map mapping, so that the labels are easier to find by a human. For example, when the user selects "person" class entity "liu de hua", the triplets of head with names of persons "liu de hua" that are not associated are all found, and their object labels "59" and "men" are sorted to the front position to assist manual editing.
The knowledge base or knowledge graph may be any general database for storing entities, relationships in the form of attribute graphs. It is proposed to use a dedicated graph database such as Neo4j, janussgraph, etc.
Optionally, storing the modified entities on the topic map into a knowledge base, wherein the storing includes at least one of: inserting the entity into the knowledge base when the entity is not retrieved from the knowledge base; when the entity is searched in the knowledge base, updating the entity in the knowledge base according to the entity on the corrected theme map, and displaying at least one of the following information of the entity stored in the knowledge base on the theme map: attributes, surrounding entities, relationships. This step, which may be referred to as an incremental map, is similar to the second step, and is performed manually on the topic map.
Fig. 5 is a schematic diagram of an entity binding process according to an embodiment of the present application, as shown in fig. 5, including the following steps:
1. entity binding
And finding the same entity from the knowledge base for the entities on the theme map, automatically or manually executing association binding, wherein the two entities can have different numbers of attributes and different attribute values.
The entity with the unique identification is directly searched from the knowledge base according to the unique identification, if the knowledge base does not exist, the insertion operation is performed, and if the knowledge base exists, the updating operation is performed, and the direct binding is performed;
and (3) searching the entity without the unique identifier from the knowledge base according to a preset name field, if the knowledge base does not exist, performing insertion operation, and if the knowledge base exists, recommending the binding according to the step (2).
2. Contextual embedding of automatically bound entities, unbound entity recommendation similarity
The construction of the knowledge graph is a progressive process, and new entities and relationships are written into the knowledge base, wherein the new entities and relationships can be 'insertion operations' or 'update operations'.
For the update operation, the update policy may be determined according to the business requirements. For example, the public security data, the identification number can uniquely determine a person, and the person can be updated according to the identification number when writing. If the extracted original text only has names and does not have identity card numbers, all entities with the same name can be searched from the existing knowledge graph, the similarity of the current entities is compared, and the entities are recommended to be manually confirmed and bound.
For example, "Sun Yue, birth place: hebei ″, searching for the same-name entity in the knowledge graph, calculating a recommendation confidence coefficient, and returning: (Sun Yue-basketball players, 80%) (Sun Yue-singer, 10%).
A single click may be made manually or a threshold may be set, such as a direct confirmation binding with a confidence level above 85%.
3. Further editing according to context, binding entities into graphs
The associated and bound entities query the attributes, the peripheral entities and the relationships thereof from the knowledge graph (for example, a one-degree relationship thereof is set), are displayed on the topic graph, are slightly different from the existing attributes, entities and relationships (for example, different colors are distinguished) on the topic graph in form, and are used for assisting the user in auditing and editing operations. By adopting the scheme and the man-machine combination method, not only is the cost consumption of pure labor avoided, but also the problem of low model accuracy can be solved.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
The embodiment of the present invention further provides an information storage device, and it should be noted that the Z device in the embodiment of the present invention may be used to execute the method for storing information provided in the embodiment of the present invention. The following describes an information storage apparatus according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of an apparatus for information storage according to an embodiment of the present invention. As shown in fig. 6, the apparatus includes: the entity identification module 602 is configured to perform named entity identification on an unstructured text to obtain a plurality of entity tags, and store, in a triple form, entity tags in which associations exist between every two entity tags to form a triple set;
the modeling module 604 is used for modeling and representing the entities and/or the relations in the attribute graph in the form of an ontology graph to form an attribute graph ontology graph;
a build target rule module 606, configured to build a target rule that the triples are mapped to the attribute map according to the output type of each triplet in the triplet set and the attribute ontology map;
a first storage module 608, configured to map the triples in the triple set into an attribute map according to the target rule, and store the attribute map in the subject map;
the second storage module 610 is configured to receive an input signal of a target object, modify the theme map according to the input signal, and store an entity on the modified theme map in a knowledge base.
The invention adopts the following steps: carrying out named entity recognition on the unstructured text to obtain a plurality of entity tags, storing the entity tags with associations between every two entity tags in a triple form, modeling and representing the entities and/or relations in the attribute graph in an ontology graph form to form an attribute graph ontology graph; constructing a target rule of mapping the triples to the attribute graph according to the output type of each triplet in the triplet set and the attribute ontology graph; and mapping the triples in the triple set into an attribute graph according to the target rule, storing the attribute graph into a subject graph, manually checking the subject graph, and synchronously storing the subject graph into a knowledge base after correction. By adopting the scheme, the incremental information is firstly converted into the triples, then the triples are mapped into the attribute graph to form the subject map, and the subject map is stored in the knowledge base, so that the problems of low accuracy and large resource consumption in the process of storing the incremental information into the knowledge base in the related technology are solved.
Optionally, the target rule comprises at least one of: the triple is mapped into attribute graph relation rule, and the triple is mapped into attribute graph entity rule.
Optionally, the second storage module 610 is further configured to associate the topic graph with the unstructured text for presentation before receiving the input signal of the target object, wherein the topic graph is drawn in the form of nodes and edges.
Optionally, the second storage module 610 is further configured to, after receiving an input signal of a target object, issue an alarm signal to the target object when the input signal does not conform to the target rule.
Optionally, the second storage module 610 detects an entity of the topic map corresponding to the input signal after receiving the input signal of the target object; advancing a display order of correspondent entity labels associated with the entities.
Optionally, the second storage module 610 stores the modified entities on the topic map into a knowledge base, where the entity includes at least one of: inserting the entity into the knowledge base when the entity is not retrieved from the knowledge base; when the entity is searched in the knowledge base, updating the entity in the knowledge base according to the entity on the corrected theme map, and displaying at least one of the following information of the entity stored in the knowledge base on the theme map: attributes, surrounding entities, relationships.
The information storage device comprises a processor and a memory, wherein the entity identification module 602, the modeling module 604, the object rule building module 606 and the first storage module 608, the second storage module 610 and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, the increment information is converted into the triple by adjusting the kernel parameters, then the triple is mapped into the attribute graph to form the theme graph, and the theme graph is stored in the knowledge base, so that the problems of low accuracy and large resource consumption in the process of storing the increment information into the knowledge base in the related technology are solved.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present invention provides a storage medium on which a program is stored, which, when executed by a processor, implements the method of information storage.
The embodiment of the invention provides a processor, which is used for running a program, wherein the Z method for storing information is executed when the program runs.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein the processor executes the program and realizes the following steps:
carrying out named entity recognition on the unstructured text to obtain a plurality of entity tags, and storing the entity tags with association between every two entity tags in a triple form to form a triple set;
modeling and representing the entities and/or relations in the attribute graph in the form of an ontology graph to form an attribute graph ontology graph;
constructing a target rule of mapping the triples to the attribute graph according to the output type of each triplet in the triplet set and the attribute ontology graph;
mapping the triples in the triple set into an attribute graph according to the target rule, and storing the attribute graph into a subject map;
receiving an input signal of a target object, correcting the theme map according to the input signal, and storing an entity on the corrected theme map into a knowledge base.
Optionally, the target rule comprises at least one of: the triple is mapped into attribute graph relation rule, and the triple is mapped into attribute graph entity rule.
Optionally, before receiving the input signal of the target object, the topic map and the unstructured text are associated and displayed, wherein the topic map is drawn in the form of nodes and edges.
Optionally, after receiving the input signal of the target object, when the input signal does not meet the target rule, an alarm signal is sent to the target object.
Optionally, after receiving an input signal of a target object, detecting an entity of a topic map corresponding to the input signal; advancing a display order of correspondent entity labels associated with the entities.
Optionally, storing the modified entities on the topic map into a knowledge base, wherein the storing includes at least one of: inserting the entity into the knowledge base when the entity is not retrieved from the knowledge base; when the entity is searched in the knowledge base, updating the entity in the knowledge base according to the entity on the corrected theme map, and displaying at least one of the following information of the entity stored in the knowledge base on the theme map: attributes, surrounding entities, relationships. . The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The invention also provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device:
carrying out named entity recognition on the unstructured text to obtain a plurality of entity tags, and storing the entity tags with association between every two entity tags in a triple form to form a triple set;
modeling and representing the entities and/or relations in the attribute graph in the form of an ontology graph to form an attribute graph ontology graph;
constructing a target rule of mapping the triples to the attribute graph according to the output type of each triplet in the triplet set and the attribute ontology graph;
mapping the triples in the triple set into an attribute graph according to the target rule, and storing the attribute graph into a subject map;
receiving an input signal of a target object, correcting the theme map according to the input signal, and storing an entity on the corrected theme map into a knowledge base.
Optionally, the target rule comprises at least one of: the triple is mapped into attribute graph relation rule, and the triple is mapped into attribute graph entity rule.
Optionally, before receiving the input signal of the target object, the topic map and the unstructured text are associated and displayed, wherein the topic map is drawn in the form of nodes and edges.
Optionally, after receiving the input signal of the target object, when the input signal does not meet the target rule, an alarm signal is sent to the target object.
Optionally, after receiving an input signal of a target object, detecting an entity of a topic map corresponding to the input signal; advancing a display order of correspondent entity labels associated with the entities.
Optionally, storing the modified entities on the topic map into a knowledge base, wherein the storing includes at least one of: inserting the entity into the knowledge base when the entity is not retrieved from the knowledge base; when the entity is searched in the knowledge base, updating the entity in the knowledge base according to the entity on the corrected theme map, and displaying at least one of the following information of the entity stored in the knowledge base on the theme map: attributes, surrounding entities, relationships.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present invention, and are not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (10)

1. A method of information storage, comprising:
carrying out named entity recognition on the unstructured text to obtain a plurality of entity tags, and storing the entity tags with association between every two entity tags in a triple form to form a triple set;
modeling and representing the entities and/or relations in the attribute graph in the form of an ontology graph to form an attribute graph ontology graph;
constructing a target rule of mapping the triples to the attribute graph according to the output type of each triplet in the triplet set and the attribute ontology graph;
mapping the triples in the triple set into an attribute graph according to the target rule, and storing the attribute graph into a subject map;
receiving an input signal of a target object, correcting the theme map according to the input signal, and storing an entity on the corrected theme map into a knowledge base.
2. The method of claim 1, wherein the target rule comprises at least one of:
the triple is mapped into attribute graph relation rule, and the triple is mapped into attribute graph entity rule.
3. The method of claim 1, wherein prior to receiving the input signal of the target object, the method further comprises:
and associating and displaying the theme graph and the unstructured text, wherein the theme graph is drawn in the form of nodes and edges.
4. The method of claim 1, wherein after receiving the input signal of the target object, the method further comprises:
and when the input signal does not accord with the target rule, sending an alarm signal to the target object.
5. The method of claim 1, wherein after receiving the input signal of the target object, the method further comprises:
detecting entities of a topic map corresponding to the input signal;
advancing a display order of correspondent entity labels associated with the entities.
6. The method of claim 1, wherein storing the modified entities on the topic graph in a knowledge base comprises at least one of:
inserting the entity into the knowledge base when the entity is not retrieved from the knowledge base;
when the entity is searched in the knowledge base, updating the entity in the knowledge base according to the entity on the corrected theme map, and displaying at least one of the following information of the entity stored in the knowledge base on the theme map: attributes, surrounding entities, relationships.
7. An apparatus for information storage, comprising:
the entity identification module is used for carrying out named entity identification on the unstructured text to obtain a plurality of entity labels, and storing the entity labels with association between every two entity labels in a triple form to form a triple set;
the modeling module is used for modeling and representing the entities and/or the relations in the attribute graph in the form of the ontology graph to form an attribute graph ontology graph;
the target rule building module is used for building a target rule of mapping the triples to the attribute graph according to the output type of each triplet in the triplet set and the attribute ontology graph;
the first storage module is used for mapping the triples in the triple set into an attribute map according to the target rule and storing the attribute map into a subject map;
and the second storage module is used for receiving an input signal of a target object, correcting the theme map according to the input signal and storing the corrected entity on the theme map into a knowledge base.
8. The apparatus of claim 7, wherein the target rule comprises at least one of: the triple is mapped into attribute graph relation rule, and the triple is mapped into attribute graph entity rule.
9. A "computer-readable storage medium" or "non-volatile storage medium", characterized in that the "computer-readable storage medium" or "non-volatile storage medium" comprises a stored program, wherein the program, when executed, controls a device in which the "computer-readable storage medium" or "non-volatile storage medium" is located to perform the method of information storage according to any one of claims 1 to 6.
10. A processor for executing a program, wherein the program executes to perform the information storage method according to any one of claims 1 to 6.
CN202011421906.2A 2020-12-08 2020-12-08 Information storage method and device Pending CN112463986A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011421906.2A CN112463986A (en) 2020-12-08 2020-12-08 Information storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011421906.2A CN112463986A (en) 2020-12-08 2020-12-08 Information storage method and device

Publications (1)

Publication Number Publication Date
CN112463986A true CN112463986A (en) 2021-03-09

Family

ID=74800978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011421906.2A Pending CN112463986A (en) 2020-12-08 2020-12-08 Information storage method and device

Country Status (1)

Country Link
CN (1) CN112463986A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112905808A (en) * 2021-03-29 2021-06-04 北京机电工程研究所 Knowledge graph construction method and device and electronic equipment
CN113468342A (en) * 2021-07-22 2021-10-01 北京京东振世信息技术有限公司 Data model construction method, device, equipment and medium based on knowledge graph
CN113609271A (en) * 2021-08-11 2021-11-05 平安科技(深圳)有限公司 Service processing method, device and equipment based on knowledge graph and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017076263A1 (en) * 2015-11-03 2017-05-11 中兴通讯股份有限公司 Method and device for integrating knowledge bases, knowledge base management system and storage medium
CN108875051A (en) * 2018-06-28 2018-11-23 中译语通科技股份有限公司 Knowledge mapping method for auto constructing and system towards magnanimity non-structured text
CN109885691A (en) * 2019-01-08 2019-06-14 平安科技(深圳)有限公司 Knowledge mapping complementing method, device, computer equipment and storage medium
CN110489561A (en) * 2019-07-12 2019-11-22 平安科技(深圳)有限公司 Knowledge mapping construction method, device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017076263A1 (en) * 2015-11-03 2017-05-11 中兴通讯股份有限公司 Method and device for integrating knowledge bases, knowledge base management system and storage medium
CN108875051A (en) * 2018-06-28 2018-11-23 中译语通科技股份有限公司 Knowledge mapping method for auto constructing and system towards magnanimity non-structured text
CN109885691A (en) * 2019-01-08 2019-06-14 平安科技(深圳)有限公司 Knowledge mapping complementing method, device, computer equipment and storage medium
CN110489561A (en) * 2019-07-12 2019-11-22 平安科技(深圳)有限公司 Knowledge mapping construction method, device, computer equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112905808A (en) * 2021-03-29 2021-06-04 北京机电工程研究所 Knowledge graph construction method and device and electronic equipment
CN113468342A (en) * 2021-07-22 2021-10-01 北京京东振世信息技术有限公司 Data model construction method, device, equipment and medium based on knowledge graph
CN113468342B (en) * 2021-07-22 2023-12-05 北京京东振世信息技术有限公司 Knowledge graph-based data model construction method, device, equipment and medium
CN113609271A (en) * 2021-08-11 2021-11-05 平安科技(深圳)有限公司 Service processing method, device and equipment based on knowledge graph and storage medium
CN113609271B (en) * 2021-08-11 2023-07-25 平安科技(深圳)有限公司 Knowledge graph-based service processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
AU2019204976B2 (en) Intelligent data ingestion system and method for governance and security
CN110750649A (en) Knowledge graph construction and intelligent response method, device, equipment and storage medium
CN112463986A (en) Information storage method and device
CN110472068A (en) Big data processing method, equipment and medium based on heterogeneous distributed knowledge mapping
US20080120257A1 (en) Automatic online form filling using semantic inference
CN102930023A (en) A data quality solution based on knowledge
CN105518658A (en) Apparatus, systems, and methods for grouping data records
CN104750771A (en) Method and system for contextual data analysis using domain information
CN107515866B (en) Data operation method, device and system
CN111475653B (en) Method and device for constructing knowledge graph in oil and gas exploration and development field
CN110442585B (en) Data updating method, data updating device, computer equipment and storage medium
CN108345658A (en) Algorithm calculates decomposing process, server and the storage medium of track
US11023465B2 (en) Cross-asset data modeling in multi-asset databases
CN109144999B (en) Data positioning method, device, storage medium and program product
CN116467433A (en) Knowledge graph visualization method, device, equipment and medium for multi-source data
CN108549722B (en) Multi-platform data publishing method, system and medium
CN116595191A (en) Construction method and device of interactive low-code knowledge graph
CN108766513B (en) Intelligent health medical data structured processing system
CN116596069A (en) Target object map construction method and device, electronic equipment and storage medium
CN112579787A (en) Knowledge graph construction method and device
CN116204391A (en) Early warning method and device based on custom configuration
CN115116069A (en) Text processing method and device, electronic equipment and storage medium
CN104391921A (en) Method and system for establishing geographic space decision element model for isomeric model management
CN114492310A (en) Text labeling method, text labeling device, electronic equipment and storage medium
CN115687724A (en) Information processing method, information processing device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230921

Address after: Room 401, 4th Floor, Building J, Yunmi City, No. 19 Ningshuang Road, Yuhuatai District, Nanjing City, Jiangsu Province, 210000

Applicant after: Nanjing Minglue Technology Co.,Ltd.

Address before: 100089 a1002, 10th floor, building 1, yard 1, Zhongguancun East Road, Haidian District, Beijing

Applicant before: MININGLAMP SOFTWARE SYSTEMS Co.,Ltd.