CN109472032A - A kind of determination method, apparatus, server and the storage medium of entity relationship diagram - Google Patents
A kind of determination method, apparatus, server and the storage medium of entity relationship diagram Download PDFInfo
- Publication number
- CN109472032A CN109472032A CN201811355514.3A CN201811355514A CN109472032A CN 109472032 A CN109472032 A CN 109472032A CN 201811355514 A CN201811355514 A CN 201811355514A CN 109472032 A CN109472032 A CN 109472032A
- Authority
- CN
- China
- Prior art keywords
- entity
- entity relationship
- target
- relationship
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010586 diagram Methods 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 57
- 230000015654 memory Effects 0.000 claims abstract description 23
- 239000000284 extract Substances 0.000 claims abstract description 15
- 238000004590 computer program Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 13
- 238000000605 extraction Methods 0.000 description 9
- 238000010801 machine learning Methods 0.000 description 6
- 238000010009 beating Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000009795 derivation Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013501 data transformation Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Abstract
The invention discloses a kind of determination methods of entity relationship diagram, device, server and storage medium, this method comprises: determining at least one entity in target data, and extract the entity relationship between each entity, determine the reliability of each entity relationship, according to the sequence of the reliability, determine target entity relationship and corresponding target entity pair, each target entity pair is connected based on each target entity relationship, it constitutes entity relationship diagram and stores, through the above technical solution, solves existing entity, relation data stores the problems such as brought storage overhead and cumbersome process, simplify the determination process of entity relationship diagram, improve efficiency, save memory space.
Description
Technical field
The present embodiments relate to field of computer technology more particularly to a kind of determination method, apparatus of entity relationship diagram,
Server and storage medium.
Background technique
In face of increasing massive information, the information really needed is therefrom quickly selected, and these information are divided
Class, extraction and reconstruct, it appears particularly important.
In this background, information extraction technique comes into being, and broadly, the object of information extraction processing can be
The medias such as text, image, voice or video are usually to extract in practical application to text information.Text information is taken out
Take be a kind of entity and relationship that specified type is extracted from natural language text technology, main includes three aspects: processing is non-
The natural language text of structuring, selectivity extract the information specified in text, the information of extraction forms structural data and indicates.
For this purpose, the prior art utilizes information extraction technique, the relationship between entity two-by-two is extracted, and stored, then pass through place
These relationships to be managed, final relational graph is formed, this method not only needs to occupy biggish memory space, but also process is cumbersome,
Efficiency is lower.
Summary of the invention
The embodiment of the present invention provides determination method, apparatus, server and the storage medium of a kind of entity relationship diagram, with simplification
The determination process of entity relationship diagram, improves efficiency, and saves memory space.
In a first aspect, the embodiment of the present invention provides a kind of determination method of entity relationship diagram, comprising:
It determines at least one entity in target data, and extracts the entity relationship between each entity;
Determine the reliability of each entity relationship;
According to the sequence of the reliability, target entity relationship and corresponding target entity pair are determined;
Each target entity pair is connected based on each target entity relationship, entity relationship diagram is constituted and stores.
Further, at least one entity in the determining target data, comprising:
Semantic parsing is carried out to the keyword of target data;
At least one entity in the target data is determined according to parsing result.
Further, after determining at least one entity in target data, further includes:
Each entity disambiguate and merger is handled, obtains at least one standards entities.
Further, described that disambiguation and merger processing are carried out to each entity, at least one standards entities is obtained, is wrapped
It includes:
Disambiguation processing is carried out to each entity according to the disambiguation rule of setting;
Calculate entity attributes similarity after each disambiguation;
The entity similarity of entity after each disambiguation is determined according to each attributes similarity;
According to each entity similarity, merger processing is carried out to the entity each after disambiguation, obtains at least one standard
Entity.
Further, the entity relationship extracted between each entity, comprising:
Existing entity relationship between each entity is determined according to preset rules, and extracts each entity relationship.
Further, the reliability of each entity relationship of the determination, comprising:
According to the source of the entity, the source coefficient of the entity is determined;
According to the generation time of the entity, the time coefficient of the entity is determined;
According to the frequency that entity described in preset time occurs, the frequency of occurrences coefficient of the entity is determined;
According to the source coefficient, the time coefficient and the frequency of occurrences coefficient determine each entity relationship can
By degree.
Further, before determining at least one entity in target data, further includes:
Initial data is grabbed, and analyzes the data characteristics for extracting the initial data;
It is integrated according to the data characteristics and obtains target data.
Second aspect, the embodiment of the present invention also provide a kind of determining device of entity relationship diagram, which includes:
First determining module for determining at least one entity in target data, and extracts the entity between each entity
Relationship;
Second determining module, for determining the reliability of each entity relationship;
Third determining module determines that target entity relationship and corresponding target are real for the sequence according to the reliability
Body pair;
Module is constituted, for connecting each target entity pair based on each target entity relationship, constitutes entity relationship
Scheme and stores.
The third aspect, the embodiment of the present invention also provide a kind of server, comprising:
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes the determination method of entity relationship diagram as described in relation to the first aspect.
Fourth aspect, the embodiment of the present invention also provide a kind of storage medium, are stored thereon with computer program, the program quilt
The determination method of entity relationship diagram as described in relation to the first aspect is realized when processor executes.
The embodiment of the present invention provides determination method, apparatus, server and the storage medium of a kind of entity relationship diagram, by true
At least one entity in the data that set the goal, and extract the entity relationship between each entity, determine each entity relationship can
By degree, according to the sequence of the reliability, target entity relationship and corresponding target entity pair are determined, it is real based on each target
Body relationship connects each target entity pair, constitutes entity relationship diagram and stores, and solves existing entity, and relation data stores institute
The problems such as bring storage overhead and cumbersome process, simplifies the determination process of entity relationship diagram, improves efficiency, saves and deposit
Store up space.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the determination method for entity relationship diagram that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of the determination method of entity relationship diagram provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of schematic diagram of parsing tree provided by Embodiment 2 of the present invention;
Fig. 4 is a kind of structure chart of the determining device for entity relationship diagram that the embodiment of the present invention three provides;
Fig. 5 is a kind of structure chart for server that the embodiment of the present invention four provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just
Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of flow chart of the determination method for entity relationship diagram that the embodiment of the present invention one provides, and the present embodiment can
The case where applied to entity relationship diagram is determined according to entity and entity relationship, this method can be by the determining devices of entity relationship diagram
It executes, the device is integrated in the server, and with reference to Fig. 1, this method comprises the following steps:
S110, it determines at least one entity in target data, and extracts the entity relationship between each entity.
Target data can be the data obtained based on setting rule, be also possible to the certain kinds selected according to actual needs
Data, wherein setting rule can be the rule such as data cleansing, data integration and data normalization to turn unstructured data
It changes structural data into, meets the needs of users, wherein structural data is also referred to as row data, is patrolled by two-dimentional table structure
Volume expression and the data realized, it then follows data format and length specification, mainly by relevant database progress storage and management,
Unstructured data is that data structure is irregular or imperfect, without predefined data model, it has not been convenient to database two dimension
Logical table is come the data, such as office documents, text, picture, all kinds of reports, image and audio/visual information etc. that show.It is specific
Class data can be certain standard that user selects according to actual needs, such as professional standard.
The entity of data is the set of a kind of things, can be people, thing, object and tissue, wherein ground can be reality
Geographical location, be also possible to the virtual addresses such as network address or IP address, thing can be specific event, and object can be entity
Object is also possible to virtual object, and wherein physical objects can be trees, birds and beasts or clothes etc., and virtual object can be stock or bill etc.,
Tissue can be specific organization or group, such as song and dance ensemble.In practical application, every data may include multiple realities
Body, and the incidence relation between each entity can be embodied, the incidence relation between each entity is denoted as entity relationship, example
Such as " Xiao Ming has an apple ", that " Xiao Ming " indicates is entity people, and what " apple " indicated is physical objects, the reality between two entities
Body relationship as " possesses ".It should be noted that can have one or more entity relationships between different entities, same reality
Also it can have different entity relationships between body, possess and use as the entity relationship of people and object can be, people and tissue
Entity relationship is to belong to, and the entity relationship of person to person can be good friend or same people.
Specifically, the method for determining the entity of target data can select according to actual needs, such as can be using artificial
Mode is analyzed target data, summarized and is concluded, and determines the entity of target data, automated manner can also be used, by mesh
The entity that pre-sets of mark data input determines in model, exports entity by physical model, wherein entity determines that model can be with
It is the Chinese Named Entity Extraction Model based on CRF (Conditional Random Field, condition random field).Entity relationship
Figure includes the relationship between entity and each entity, thus determine that needing further to extract each after the entity of target data
Relationship between a entity.It is understood that there may be a variety of entity relationships, such as the reality of people and object between two entities
Body relationship, which can be, to be possessed, and is also possible to using for this purpose, can determine each entity by the semanteme of parsing target data
Entity relationship, and the entity relationship is taken out, foundation is provided for the determination of entity relationship diagram.
S120, the reliability for determining each entity relationship.
Reliability is that the reflection whether true index of entity relationship shows if reliability is more than or equal to preset value
The entity relationship of extraction can really reflect the relationship in target data between two entities, whereas if reliability is less than
Preset value shows that the entity relationship extracted is insincere, cannot determine entity relationship diagram according to the entity relationship, wherein preset value
It can be set according to actual needs, embodiment is without limiting.Specifically, can determine the meter of reliability according to the feature of entity
Then calculation method calculates the reliability of each entity relationship according to this method, wherein the feature of entity can be according to actual needs
Selection.
S130, according to the sequence of the reliability, determine target entity relationship and corresponding target entity pair.
Target entity relationship is the entity relationship that reliability meets preset value, and target entity is to being and target entity relationship pair
The entity pair answered, for example, the reliability of " possessing " this entity relationship meets preset value, entity corresponding with " possessing " is respectively
" Xiao Ming " and " apple ", then " possessing " is target entity relationship, and " Xiao Ming " and " apple " is corresponding target entity pair.Specifically
, it, can be true according to reliability when reliability is more than or equal to preset value to calculated reliability according to sorting from high to low
The entity relationship that sets the goal and target entity pair corresponding with target entity relationship.
S140, each target entity pair is connected based on each target entity relationship, constitutes entity relationship diagram and stores.
Entity relationship diagram is referred to as Entity-Relationship figure, is a kind of structure chart for database design, including entity
Entity relationship between entity can be with the relationship between very clear each entity, and therefrom quickly by entity relationship diagram
Acquisition need information.Specifically, connecting corresponding each target entity to may make up according to each target entity relationship
Entity relationship diagram.Entity relationship diagram can store after generating in graphic data base, and graphic data base is a kind of non-relational number
According to library, using the entity relationship between graph theory storage entity, such as entity relationship diagram can be stored in Neo4j figure number
According to library, the benefit stored in this way is compared to relevant database, and result can be more clear intuitive exhibition by graphic data base
Show to user, for the information that user selects oneself to need, saves the time.
The embodiment of the present invention one provides a kind of determination method of entity relationship diagram, by determining at least one in target data
A entity, and the entity relationship between each entity is extracted, the reliability of each entity relationship is determined, according to the reliability
Sequence, determines target entity relationship and corresponding target entity pair, connects each target based on each target entity relationship
Entity pair constitutes entity relationship diagram and stores, and solves existing entity, and relation data stores brought storage overhead and process
The problems such as cumbersome, simplifies the determination process of entity relationship diagram, improves efficiency, saves memory space.
It is understood that there may be identical or very close situations for the meaning of different entity on behalf, such as open
That three and Zhang Asan is indicated is the same person, and same entity is also likely to be present a variety of meanings, and such as " bank " can indicate bank,
Can indicate riverbank, the presence of these situations will affect the accuracy of entity relationship diagram therefore the entity for determining target data it
After need to carry out these entities disambiguate and merger, to improve the accuracy of entity relationship diagram.
Specifically, on the basis of the above embodiments, after determining at least one entity in target data, also wrapping
It includes:
Each entity disambiguate and merger is handled, obtains at least one standards entities.
Disambiguating is the ambiguity for eliminating entity, to determine meaning of the entity in the data.It is based on specifically, can use
The disambiguation of dictionary, i.e., based on the disambiguation of semantical definition.If including vocabulary Other to i-th kind of definition of Word in dictionaryi, that
If in a sentence comprising Word, while also there is Otheri, then it is assumed that the semanteme of Word takes in the sentence
I-th kind of definition in dictionary.Merger is the merging of entity, i.e., substantially identical entity is merged into an entity, such as will
Zhang San and Zhang great San are unified for Zhang San.
In order to further illustrate disambiguate and merger process, can will " to each entity carry out disambiguate and merger handle,
Obtain at least one standards entities " it is embodied as:
Disambiguation processing is carried out to each entity according to the disambiguation rule of setting;
Calculate entity attributes similarity after each disambiguation;
The entity similarity of entity after each disambiguation is determined according to each attributes similarity;
According to each entity similarity, merger processing is carried out to the entity each after disambiguation, obtains at least one standard
Entity.
Specifically, disambiguating rule can be the rule set according to actual needs, such as it can be semantic-based disambiguation
Rule, detailed process have been described in front, and details are not described herein again.Entity attributes can be the characteristic of entity, such as student
(entity) has the attributes such as student number, name, age and gender.Attributes similarity is the similarity of the same alike result of same class entity,
When the similarity of two attributes is greater than preset attributes similarity, show that the entity of two attribute representatives is same entity, it is real
It, can be using attributes similarity as the entity similarity of entity corresponding with attribute, for example, the corresponding reality of attribute 1 in the application of border
Body is entity 1, and the corresponding entity of attribute 2 is entity 2, if the attributes similarity of attribute 1 and attribute 2 is 70%, then it is assumed that real
The entity similarity of body 1 and entity 2 is also 70%.
Optionally, semantic analysis is carried out to the entity after disambiguation, and is one group by the entity division of semantic similarity, such as open
Three, Zhang great San and Zhang San 1990 are one group, and whether what is indicated in order to further determine three attributes is same people, can be calculated
Big three and Zhang San 1990 attributes similarity and Zhang San and Zhang great San attributes similarity, it is assumed that preset attributes similarity value
It is 90%, if the attributes similarity of Zhang San and Zhang great San are more than or equal to 90%, it is determined that Zhang San and Zhang great San are same people, such as
The attributes similarity of fruit Zhang great San and Zhang San 1990 are less than 90%, it is determined that Zhang San and Zhang San 1990 are two people.Further,
According to attributes similarity value, can determine that Zhang San and Zhang great San indicate is same entity, therefore can be by Zhang San and Zhang great San
Unification is carried out, i.e., substantially identical entity is merged into same entity, obtains a standards entities, improve the reliable of data
Degree, while saving the space of data storage.
Embodiment two
Fig. 2 is a kind of flow chart of the determination method of entity relationship diagram provided by Embodiment 2 of the present invention, in above-mentioned implementation
It is optimized on the basis of example, specifically, this method comprises the following steps:
S210, crawl initial data, and analyze the data characteristics for extracting the initial data.
Initial data can be non-structured daily record data, can be existed by the modes such as script crawler or data convergence
The crawl such as mail or network, wherein crawler is a kind of mode for automatically grabbing internet information, such as can be climbed using Python
Worm crawls daily record data or extracts data in mail.Data characteristics is the feature or characteristic of data, can reflect data generation
The information of table, such as can be cell-phone number, identity card or WeChat ID etc..Optionally, obtain initial data after, to initial data into
Row pretreatment, wherein pretreatment includes but is not limited to data cleansing and data transformation, and data cleansing can be data deduplication sum number
According to noise reduction, to improve the quality of data, data transformation mainly carries out standardization processing to data, by the data conversion of different-format
For preset format.After pre-processing to initial data, the data characteristics of initial data is extracted, to generate structural data.
S220, acquisition target data is integrated according to the data characteristics.
After extracting data characteristics, data characteristics can be arranged in a certain order, to obtain target data, and
By the target data deposit Kafka caching of generation, wherein Kafka caching is a kind of distributed post subscription message system, is used for
Storing data.
S230, semantic parsing is carried out to the keyword of target data.
Specifically, keyword is the theme for reflecting target data, include according to the target data that data characteristics is integrated
Multiple keywords need to carry out keyword semantic parsing in order to further determine the meaning of keyword.
S240, at least one entity in the target data is determined according to parsing result.
Illustratively, for keyword A after parsing, expression is geographical location, therefore, can will be closed according to parsing result
Keyword A is determined as physically.Optionally, after determining entity, a label can be assigned for entity, it can be true according to the label
The meaning that the type and the entity for determining current entity indicate.
S250, existing entity relationship between each entity is determined according to preset rules, and extract each entity and close
System.
Preset rules are the foundation of existing entity relationship between determining each entity, such as can be and first determine entity A
Two realities are further determined that from entity relationship that may be present then in conjunction with context with entity B entity relationship that may be present
The entity relationship of body.For example, Zhang San and Li Si are good friends, according to preset rules, first determine Zhang San and Li Si may be same
People, it is also possible to which two people can determine Zhang San and Li Si is two people, and the entity of Zhang San and Li Si close in conjunction with context
System is good friend.It should be noted that entity herein can be the entity determined according to target data, it is also possible to by disambiguating
With the standards entities obtained after merger processing.It optionally, can be according to preset rules after determining the entity of target data
Entity relationship between each entity is obtained using manual type, automated manner can also be used, such as machine learning in advance will be pre-
If rule input machine learning, then inputs machine learning for target data, the entity of each entity is exported by machine learning
Relationship, wherein manual type or automated manner are in above-described embodiment by the agency of mistake, and details are not described herein again.Further, it is taking out
After the entity relationship for taking each entity, entity relationship is applied between the entity for assigning label, in order to determine the entity of extraction
The reliability of relationship is conducted into knowledge base after extracting entity relationship and is learnt, and adjusts entity according to learning outcome
Relationship.
Optionally, the relationship between each entity can also be determined according to parsing tree by building parsing tree,
Parsing tree can reflect syntax, semanteme and logical relation in target data between word and word, phrase and phrase.It is exemplary
, by taking " I plays basketball " as an example, " I " is noun, is indicated with NN, and " beating " is verb, is indicated with " Vt ", and " basketball " is noun, with
NN indicates that, according to the principle of parsing tree it is found that " I " corresponding derivation probability is 0.5, the path of tree is " I ", " beating "
Deriving probability is 1.0, and the path of tree is " beating ", and the derivation probability of " basketball " is 0.5, and the path of tree is " basketball ", " basketball " and
" beating " combination meets VP rule, wherein what VP was indicated is verb phrases, and deriving probability is that 0.5, NN and VP combination meets S rule
Then, deriving probability is 0.25, and what wherein S was indicated is sentence, and the path of tree is " I plays basketball ", the final sentence obtained as a result,
The schematic diagram of method parsing tree is as shown in Figure 3.It should be noted that in same sentence, the derivation probability of identical part of speech and be 1.
After obtaining parsing tree, using machine learning since the term vector on parsing tree top, to sentence
Grammer is iterated and merges, and the vector for finally obtaining the sentence indicates, is indicated according to the vector, determines the entity of each entity
Relationship.Wherein, machine learning can be recurrent neural network.
S260, the reliability for determining each entity relationship.
In order to further clarify the determination process of reliability, " reliability for determining each entity relationship " is carried out below
It embodies, specifically, the reliability of each entity relationship of the determination, comprising:
According to the source of the entity, the source coefficient of the entity is determined;
According to the generation time of the entity, the time coefficient of the entity is determined;
According to the frequency that entity described in preset time occurs, the frequency of occurrences coefficient of the entity is determined;
According to the source coefficient, the time coefficient and the frequency of occurrences coefficient determine each entity relationship can
By degree.
Specifically, the source of entity is the source of entity, i.e. the source of data corresponding to current entity is true according to source
The source coefficient of entity is determined, for example, if the corresponding data source of entity is reliable, such as selected from Baidupedia or certain mark
Standard, then corresponding source coefficient is higher, whereas if the corresponding data source of entity is unreliable, then corresponding source coefficient compared with
Low, specifically, setting, which data source is reliable, which data source is unreliable to can be set according to actual needs.
The time of the corresponding data of generation time, that is, entity of entity, e.g. 2018 data either 2017
Data, embodiment setting, the data of the selection corresponding time is closer with current time, and corresponding time coefficient is higher, such as
The time coefficient of data corresponding entity of the time coefficient of the corresponding entity of data in 2018 higher than 2001.
Entity occur frequency, that is, preset time in entity occur frequency, if there is frequency it is higher, then occur frequency
Rate coefficient is higher, wherein preset time can be configured according to actual needs.Specifically, the calculating of reliability can basis
The calculation formula of setting calculates, wherein calculation formula can be reliability=source coefficient * time coefficient * frequency of occurrences coefficient.
It should be noted that source coefficient, time coefficient and frequency of occurrences coefficient are percentage.
S270, according to the sequence of the reliability, determine target entity relationship and corresponding target entity pair.
S280, each target entity pair is connected based on each target entity relationship, constitutes entity relationship diagram and stores.
Second embodiment of the present invention provides a kind of determination methods of entity relationship diagram, on the basis of the above embodiments, to " really
At least one entity in the data that set the goal, and extract the entity relationship between each entity " and " determine each entity relationship
Reliability " optimize, integrate to obtain number of targets by pre-processing initial data, and according to the data characteristics of extraction
According to, and determine according to the source coefficient of entity, time coefficient and the frequency of occurrences coefficient reliability of each entity relationship, it improves
The quality of data increases the reliability of entity relationship diagram.
Embodiment three
Fig. 4 is a kind of structure chart for the determining device of entity relationship diagram that the embodiment of the present invention three provides, which can be with
The determination method for executing entity relationship diagram described in above-described embodiment, specifically, the device includes:
First determining module 410 for determining at least one entity in target data, and extracts the reality between each entity
Body relationship;
Second determining module 420, for determining the reliability of each entity relationship;
Third determining module 430 determines target entity relationship and corresponding target for the sequence according to the reliability
Entity pair;
Module 440 is constituted, for connecting each target entity pair based on each target entity relationship, entity is constituted and closes
System schemes and stores.
The embodiment of the present invention three provides a kind of determining device of entity relationship diagram, by determining at least one in target data
A entity, and the entity relationship between each entity is extracted, the reliability of each entity relationship is determined, according to the reliability
Sequence, determines target entity relationship and corresponding target entity pair, connects each target based on each target entity relationship
Entity pair constitutes entity relationship diagram and stores, and solves existing entity, and relation data stores brought storage overhead and process
The problems such as cumbersome, simplifies the determination process of entity relationship diagram, improves efficiency, saves memory space.
On the basis of the above embodiments, the first determining module 410, comprising:
Resolution unit carries out semantic parsing for the keyword to target data;
Entity determination unit, for determining at least one entity in the target data according to parsing result.
On the basis of the above embodiments, the device further include:
Processing module, for being disambiguated to each entity after determining at least one entity in target data
It is handled with merger, obtains at least one standards entities.
On the basis of the above embodiments, processing module, comprising:
Processing unit is disambiguated, for carrying out disambiguation processing to each entity according to the disambiguation rule of setting;
Computing unit, for calculating entity attributes similarity after each disambiguation;
Determination unit, for determining the entity similarity of entity after each disambiguation according to each attributes similarity;
Merger processing unit, for carrying out merger processing to the entity each after disambiguation according to each entity similarity,
Obtain at least one standards entities.
On the basis of the above embodiments, the first determining module 410, further includes:
Extracting unit for determining existing entity relationship between each entity according to preset rules, and extracts each institute
State entity relationship.
On the basis of the above embodiments, the second determining module 420, comprising:
First determination unit determines the source coefficient of the entity for the source according to the entity;
Second determination unit determines the time coefficient of the entity for the generation time according to the entity;
Third determination unit, the frequency for being occurred according to entity described in preset time, determines the appearance of the entity
Coefficient of frequency;
4th determination unit, for being determined according to the source coefficient, the time coefficient and the frequency of occurrences coefficient
The reliability of each entity relationship.
On the basis of the above embodiments, the device further include:
Handling module for grabbing initial data, and analyzes the data characteristics for extracting the initial data;
Module is integrated, obtains target data for integrating according to the data characteristics.
Example IV
Fig. 5 is a kind of structure chart for server that the embodiment of the present invention four provides, and with reference to Fig. 5, which includes: processing
Device 510, memory 520, input unit 530 and output device 540.The quantity of processor 510 can be one in the server
Or it is multiple, Fig. 5 is by taking a processor 510 as an example.Processor 510, memory 520, input unit 530 in the server and defeated
Device 540 can be connected by bus or other modes out, in Fig. 5 for being connected by bus.
Memory 520 is used as a kind of computer readable storage medium, can be used for storing software program, journey can be performed in computer
Sequence and module, such as the corresponding program instruction/module of the determination method of entity relationship diagram in the embodiment of the present invention.Processor 510
By running software program, instruction and module stored in memory, thereby executing terminal various function application and
Data processing, i.e. the determination method of the entity relationship diagram of realization above-described embodiment.
Memory 520 mainly includes storing program area and storage data area, wherein storing program area can store operation system
Application program needed for system, at least one function;Storage data area, which can be stored, uses created data etc. according to terminal.This
Outside, memory 520 may include high-speed random access memory, can also include nonvolatile memory, for example, at least one
Disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, memory 520 can be into one
Step includes the memory remotely located relative to processor, these remote memories can pass through network connection to terminal.It is above-mentioned
The example of network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Input unit 530 can be used for receiving the number or character information of input, and generate and user setting and function
Control related key signals input.Output device 540 may include that display screen etc. shows the audios such as equipment, loudspeaker and buzzer
Equipment.
The determination method of server and entity relationship diagram provided by the above embodiment that the embodiment of the present invention four provides belongs to
Same inventive concept, the technical detail of detailed description not can be found in above-described embodiment in the present embodiment, and the present embodiment has
The standby identical beneficial effect of determination method for executing entity relationship diagram.
Embodiment five
The embodiment of the present invention five also provides a kind of storage medium, is stored thereon with computer program, and the program is by processor
The determination method of the entity relationship diagram as described in any embodiment of that present invention is realized when execution.
Certainly, a kind of storage medium provided by the embodiment of the present invention, computer executable instructions are not limited to institute as above
The operation of the determination method for the entity relationship diagram stated, can also be performed entity relationship diagram provided by any embodiment of the invention
It determines the relevant operation in method, and has corresponding function and beneficial effect.
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention
It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more
Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art
Part can be embodied in the form of software products, which can store in computer readable storage medium
In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer
Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are with so that a computer is set
Standby (can be robot, personal computer, server or the network equipment etc.) executes reality described in each embodiment of the present invention
The determination method of body relational graph.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (10)
1. a kind of determination method of entity relationship diagram characterized by comprising
It determines at least one entity in target data, and extracts the entity relationship between each entity;
Determine the reliability of each entity relationship;
According to the sequence of the reliability, target entity relationship and corresponding target entity pair are determined;
Each target entity pair is connected based on each target entity relationship, entity relationship diagram is constituted and stores.
2. the method according to claim 1, wherein at least one entity in the determining target data, packet
It includes:
Semantic parsing is carried out to the keyword of the target data;
At least one entity in the target data is determined according to parsing result.
3. according to the method described in claim 2, it is characterized in that, after determining at least one entity in target data,
Further include:
Each entity disambiguate and merger is handled, obtains at least one standards entities.
4. according to the method described in claim 3, it is characterized in that, it is described to each entity carry out disambiguate and merger handle,
Obtain at least one standards entities, comprising:
Disambiguation processing is carried out to each entity according to the disambiguation rule of setting;
Calculate entity attributes similarity after each disambiguation;
The entity similarity of entity after each disambiguation is determined according to each attributes similarity;
According to each entity similarity, merger processing is carried out to entity after each disambiguation, obtains at least one standards entities.
5. the method according to claim 1, wherein the entity relationship extracted between each entity, comprising:
Existing entity relationship between each entity is determined according to preset rules, and extracts each entity relationship.
6. the method according to claim 1, wherein the reliability of each entity relationship of the determination, comprising:
According to the source of the entity, the source coefficient of the entity is determined;
According to the generation time of the entity, the time coefficient of the entity is determined;
According to the frequency that entity described in preset time occurs, the frequency of occurrences coefficient of the entity is determined;
The reliable of each entity relationship is determined according to the source coefficient, the time coefficient and the frequency of occurrences coefficient
Degree.
7. the method according to claim 1, wherein before determining at least one entity in target data,
Further include:
Initial data is grabbed, and analyzes the data characteristics for extracting the initial data;
It is integrated according to the data characteristics and obtains target data.
8. a kind of determining device of entity relationship diagram characterized by comprising
First determining module for determining at least one entity in target data, and extracts the entity relationship between each entity;
Second determining module, for determining the reliability of each entity relationship;
Third determining module determines target entity relationship and corresponding target entity pair for the sequence according to the reliability;
Module is constituted, for connecting each target entity pair based on each target entity relationship, constitutes entity relationship diagram simultaneously
Storage.
9. a kind of server characterized by comprising
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as the determination method of entity relationship diagram of any of claims 1-7.
10. a kind of storage medium, is stored thereon with computer program, which is characterized in that the realization when program is executed by processor
Such as the determination method of entity relationship diagram of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811355514.3A CN109472032A (en) | 2018-11-14 | 2018-11-14 | A kind of determination method, apparatus, server and the storage medium of entity relationship diagram |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811355514.3A CN109472032A (en) | 2018-11-14 | 2018-11-14 | A kind of determination method, apparatus, server and the storage medium of entity relationship diagram |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109472032A true CN109472032A (en) | 2019-03-15 |
Family
ID=65672962
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811355514.3A Pending CN109472032A (en) | 2018-11-14 | 2018-11-14 | A kind of determination method, apparatus, server and the storage medium of entity relationship diagram |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109472032A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427623A (en) * | 2019-07-24 | 2019-11-08 | 深圳追一科技有限公司 | Semi-structured document Knowledge Extraction Method, device, electronic equipment and storage medium |
CN110674304A (en) * | 2019-10-09 | 2020-01-10 | 北京明略软件系统有限公司 | Entity disambiguation method and device, readable storage medium and electronic equipment |
CN110851586A (en) * | 2019-10-22 | 2020-02-28 | 陈华 | Bank operation data processing system and method, equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101882259A (en) * | 2009-05-06 | 2010-11-10 | 日电(中国)有限公司 | Method and equipment for filtering entity relationship instance |
CN104933164A (en) * | 2015-06-26 | 2015-09-23 | 华南理工大学 | Method for extracting relations among named entities in Internet massive data and system thereof |
US20160092549A1 (en) * | 2014-09-26 | 2016-03-31 | International Business Machines Corporation | Information Handling System and Computer Program Product for Deducing Entity Relationships Across Corpora Using Cluster Based Dictionary Vocabulary Lexicon |
CN106294744A (en) * | 2016-08-11 | 2017-01-04 | 上海动云信息科技有限公司 | Interest recognition methods and system |
US20170277856A1 (en) * | 2016-03-24 | 2017-09-28 | Fujitsu Limited | Healthcare risk extraction system and method |
CN107992480A (en) * | 2017-12-25 | 2018-05-04 | 东软集团股份有限公司 | A kind of method, apparatus for realizing entity disambiguation and storage medium, program product |
CN108363816A (en) * | 2018-03-21 | 2018-08-03 | 北京理工大学 | Open entity relation extraction method based on sentence justice structural model |
-
2018
- 2018-11-14 CN CN201811355514.3A patent/CN109472032A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101882259A (en) * | 2009-05-06 | 2010-11-10 | 日电(中国)有限公司 | Method and equipment for filtering entity relationship instance |
US20160092549A1 (en) * | 2014-09-26 | 2016-03-31 | International Business Machines Corporation | Information Handling System and Computer Program Product for Deducing Entity Relationships Across Corpora Using Cluster Based Dictionary Vocabulary Lexicon |
CN104933164A (en) * | 2015-06-26 | 2015-09-23 | 华南理工大学 | Method for extracting relations among named entities in Internet massive data and system thereof |
US20170277856A1 (en) * | 2016-03-24 | 2017-09-28 | Fujitsu Limited | Healthcare risk extraction system and method |
CN106294744A (en) * | 2016-08-11 | 2017-01-04 | 上海动云信息科技有限公司 | Interest recognition methods and system |
CN107992480A (en) * | 2017-12-25 | 2018-05-04 | 东软集团股份有限公司 | A kind of method, apparatus for realizing entity disambiguation and storage medium, program product |
CN108363816A (en) * | 2018-03-21 | 2018-08-03 | 北京理工大学 | Open entity relation extraction method based on sentence justice structural model |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427623A (en) * | 2019-07-24 | 2019-11-08 | 深圳追一科技有限公司 | Semi-structured document Knowledge Extraction Method, device, electronic equipment and storage medium |
CN110427623B (en) * | 2019-07-24 | 2021-09-21 | 深圳追一科技有限公司 | Semi-structured document knowledge extraction method and device, electronic equipment and storage medium |
CN110674304A (en) * | 2019-10-09 | 2020-01-10 | 北京明略软件系统有限公司 | Entity disambiguation method and device, readable storage medium and electronic equipment |
CN110851586A (en) * | 2019-10-22 | 2020-02-28 | 陈华 | Bank operation data processing system and method, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200401765A1 (en) | Man-machine conversation method, electronic device, and computer-readable medium | |
US11227118B2 (en) | Methods, devices, and systems for constructing intelligent knowledge base | |
US20200301954A1 (en) | Reply information obtaining method and apparatus | |
US10083690B2 (en) | Better resolution when referencing to concepts | |
CN106776544B (en) | Character relation recognition method and device and word segmentation method | |
JP6667504B2 (en) | Orphan utterance detection system and method | |
WO2020082560A1 (en) | Method, apparatus and device for extracting text keyword, as well as computer readable storage medium | |
US20190163691A1 (en) | Intent Based Dynamic Generation of Personalized Content from Dynamic Sources | |
KR102288249B1 (en) | Information processing method, terminal, and computer storage medium | |
EP3405912A1 (en) | Analyzing textual data | |
CN109408811B (en) | Data processing method and server | |
CN106874441A (en) | Intelligent answer method and apparatus | |
WO2018045646A1 (en) | Artificial intelligence-based method and device for human-machine interaction | |
CN108304375A (en) | A kind of information identifying method and its equipment, storage medium, terminal | |
CN108538294B (en) | Voice interaction method and device | |
WO2020005601A1 (en) | Semantic parsing of natural language query | |
WO2021114841A1 (en) | User report generating method and terminal device | |
JP2023535709A (en) | Language expression model system, pre-training method, device, device and medium | |
CN108536807B (en) | Information processing method and device | |
CN116775847B (en) | Question answering method and system based on knowledge graph and large language model | |
CN111382260A (en) | Method, device and storage medium for correcting retrieved text | |
CN109472032A (en) | A kind of determination method, apparatus, server and the storage medium of entity relationship diagram | |
KR20200088088A (en) | Apparatus and method for classifying word attribute | |
CN111126084A (en) | Data processing method and device, electronic equipment and storage medium | |
CN113392305A (en) | Keyword extraction method and device, electronic equipment and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190315 |
|
RJ01 | Rejection of invention patent application after publication |