CN108763192A - Entity relation extraction method and device for text-processing - Google Patents
Entity relation extraction method and device for text-processing Download PDFInfo
- Publication number
- CN108763192A CN108763192A CN201810348221.6A CN201810348221A CN108763192A CN 108763192 A CN108763192 A CN 108763192A CN 201810348221 A CN201810348221 A CN 201810348221A CN 108763192 A CN108763192 A CN 108763192A
- Authority
- CN
- China
- Prior art keywords
- entity
- similarity
- threshold value
- predetermined threshold
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 32
- 238000012545 processing Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 claims abstract description 31
- 239000013598 vector Substances 0.000 claims description 25
- 238000012549 training Methods 0.000 claims description 15
- 238000001914 filtration Methods 0.000 claims description 9
- 238000012216 screening Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 7
- 239000000463 material Substances 0.000 claims description 6
- 238000006467 substitution reaction Methods 0.000 claims description 5
- 238000002372 labelling Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000001746 injection moulding Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000004148 unit process Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
This application discloses a kind of entity relation extraction method and devices for text-processing.This method includes:Input pending text;Identify the entity in the pending text, wherein the pending text includes multiple entities;The entity is screened according to default sample to obtain the contextual feature of input example;The context similarity between each seed sample in the input example and seed sample library is calculated by the contextual feature;Judge whether the context similarity is more than the first predetermined threshold value;If the similarity is more than first predetermined threshold value, number of the similarity more than the seed sample of the predetermined threshold value is counted;Judge that whether the similarity is more than the number of the seed sample of the predetermined threshold value more than second predetermined threshold value;If the number that the similarity is more than the seed sample of the predetermined threshold value is more than second predetermined threshold value, using the entity relationship example for inputting example and being obtained as the text-processing.Present application addresses the high precision of rule and method low the technical issues of recalling.
Description
Technical field
This application involves text-processing technical fields, are taken out in particular to a kind of entity relationship for text-processing
Take method and device.
Background technology
With the fast development of internet, internet has become the main channel that people obtain information, on internet
Text data also show explosive growth.Abundant information is contained in text data on internet, and structure is known
Knowing library and knowledge mapping has very important effect;But manually progress relevant knowledge extraction workload is extremely huge, if
Useful information can be gone out using Computer Automatic Extraction, that will have very important significance.However the textual data on internet
According to be nearly all in the form of natural language existing for can not directly be handled without structured data, computer.
In order to solve this problem, information extraction technique comes into being, textual data of the information extraction technique from Un-structured
Relationship between structural data, including entity, entity, event etc. are extracted in.Relation extraction is one in information extraction field
Key technology usually identifies the entity in text by name entity recognition techniques, then identifies entity by Relation extraction technology
Relationship between.The common method of Relation extraction includes:Rule-based method, unsupervised approaches have measure of supervision and half
Measure of supervision.Rule-based method is there are clearly disadvantageous, and this method needs manual compiling largely regular, and workload is very
Greatly, not easy care, cannot expand to other field well.When unsupervised approaches are clustered text, often effect is not
Very well, there is a problem of that recall rate and preparation rate be not high, and need many manual interventions.
When carrying out relationship classification based on traditional machine learning algorithm, need manually to mark a large amount of training corpus, workload
Greatly, and field transplantability and processing new relation can not be solved the problems, such as.And semi-supervised method mainly utilizes a small amount of mark
Example is noted as initial seed set, then by continuous iteration, similar case extension kind is extracted from unstructured data
Subclass, in view of the above-mentioned problems, currently no effective solution has been proposed.
Invention content
The main purpose of the application is to provide a kind of entity relation extraction method and device for text-processing, with solution
Certainly the high precision of rule and method is low recalls problem.
To achieve the goals above, according to the one side of the application, a kind of entity pass for text-processing is provided
It is abstracting method.
Include according to the entity relation extraction method for text-processing of the application:Input pending text;Identification institute
State the entity in pending text, wherein the pending text includes multiple entities;The entity is sieved according to default sample
Choosing obtains the contextual feature of input example;By the contextual feature calculate the input example and each seed sample it
Between context similarity;Judge whether the context similarity is more than predetermined threshold value;If the similarity is more than described
First predetermined threshold value then counts number of the similarity more than the seed sample of the predetermined threshold value;Judge the similarity
Whether the number more than the seed sample of the predetermined threshold value is more than second predetermined threshold value;If the similarity is more than institute
State the seed sample of predetermined threshold value number be more than second predetermined threshold value, then using the input example as the text at
Manage obtained entity relationship example.
Further, include before the entity abstracting method starts:Training term vector model, specifically includes:It uses
Gensim tools training background language material obtains the term vector model.
Further, identify that the entity in the pending text includes:Described in name entity recognition method acquisition
Entity in pending text.Further, the context for screening to obtain input example to the entity according to default sample is special
Sign includes:The pending text is segmented;Part-of-speech tagging is carried out to word segmentation result;Filtering part of speech annotation results are waited for
Select word;The target word in the word to be selected is obtained using contextual window;Above and below target word composition input example
Literary feature.
Further, to calculate the input example by the contextual feature similar to the context between seed sample
Degree includes:Contextual feature substitution preset formula is obtained into the context similarity;The preset formula is:
Wherein, similarity indicates the context similarity.
To achieve the goals above, according to the another aspect of the application, a kind of entity pass for text-processing is provided
It is draw-out device.
Include according to the entity relation extraction device for text-processing of the application:Input module inputs pending text
This;Identification module identifies the entity in the pending text, wherein the pending text includes multiple entities, and structure is defeated
Enter example (entity, entity 2 input text);Screening module screens the entity according to default sample to obtain input example
Contextual feature;Computing module is calculated upper between the input example and each seed sample by the contextual feature
Hereafter similarity;First judgment module, judges whether the context similarity is more than the first predetermined threshold value;Statistical module, such as
Similarity described in fruit is more than the predetermined threshold value, then counts seed sample of the similarity more than first predetermined threshold value
Number;Second judgment module, for judging that whether the similarity is more than the number of the seed sample of the predetermined threshold value more than institute
State the second predetermined threshold value;Terminate module, if the number that the similarity is more than the seed sample of the predetermined threshold value is more than institute
The second predetermined threshold value is stated, then the entity relationship example obtained the input example as the text-processing.
Further, the entity relation extraction device further includes:Training module, for training term vector model, specifically
Including:The term vector model is obtained using gensim tools training background language material.
Further, the identification module includes:Entity acquisition module is waited for using described in name entity recognition method acquisition
Handle the entity in text.
Further, the screening module includes:Word-dividing mode, for being segmented to the pending text;Mark
Module carries out part-of-speech tagging to word segmentation result;Filtering module, filtering part of speech annotation results obtain word to be selected;Target word obtains mould
Block, for obtaining the target word in the word to be selected using contextual window;Contextual feature generation module, it is described for obtaining
Target word constitutes the contextual feature of input example.
Further, the computing module includes:Module is substituted into, for obtaining contextual feature substitution preset formula
Go out the context similarity.
In the embodiment of the present application, it in such a way that term vector model is combined with context similarity, is inputted by calculating
Similarity between example and seed sample, is compared with predetermined threshold value, obtains the sample for meeting target, has reached reality
The purpose of body Relation extraction to realize the technique effect for the recall rate for promoting Relation extraction, and then solves rule and method
High precision low the technical issues of recalling.
Description of the drawings
The attached drawing constituted part of this application is used for providing further understanding of the present application so that the application's is other
Feature, objects and advantages become more apparent upon.The illustrative examples attached drawing and its explanation of the application is for explaining the application, not
Constitute the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the entity relation extraction method schematic diagram for text-processing according to the embodiment of the present application;
Fig. 2 is the generation contextual feature schematic diagram according to the embodiment of the present application;
Fig. 3 is the entity relation extraction schematic device for text-processing according to the embodiment of the present application;
Fig. 4 is the screening module schematic diagram according to the embodiment of the present application;And
Fig. 5 is the method operational flowchart according to the embodiment of the present application.
Specific implementation mode
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application
Attached drawing, technical solutions in the embodiments of the present application are clearly and completely described, it is clear that described embodiment is only
The embodiment of the application part, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people
The every other embodiment that member is obtained without making creative work should all belong to the model of the application protection
It encloses.
It should be noted that the term " comprising " in the description and claims of this application and above-mentioned attached drawing and " tool
Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing series of steps or unit
Process, method, system, product or equipment those of are not necessarily limited to clearly to list step or unit, but may include without clear
It is listing to Chu or for these processes, method, product or equipment intrinsic other steps or unit.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
As shown in Figure 1, this application involves a kind of entity relation extraction method for text-processing, this method includes as follows
Step S101 to step S106:
Step S101 inputs pending text;
Pending text can be contained:The structural data extracted from the text data of Un-structured is needed,
Include but not limited in pending text, relationship, event etc. between entity, entity.
Step S102 identifies the entity in the pending text, wherein the pending text includes multiple entities;
Identify that the mode of the entity in pending text is to obtain the pending text using name entity recognition method
In entity.
Step S103 screens the entity according to default sample to obtain the contextual feature of input example;
As the preferred of the present embodiment, as shown in Fig. 2, wherein step S103, screens the entity according to default sample
The contextual feature for obtaining input example includes the following steps S201 to step S205:
Step S201 segments the pending text;
Step S202 carries out part-of-speech tagging to word segmentation result;
Preferably, word segmentation result is labeled as:Noun, verb, adverbial word etc..
Step S203, filtering part of speech annotation results obtain word to be selected;
Preferably, only retain the verb and noun in the word to be selected.
Step S204 obtains the target word in the word to be selected using contextual window;
Preferably, based on context window (a, b, c, d) obtains context [left1, right1, left2, right2],
Wherein left1, right1, left2, right2 are respectively a, 1 left side of entity word, b, the right word, c, 2 left side of entity word, the right side
D, side word.If practical word number is less than window size, whole words are taken.
Step S205 constitutes input example context feature according to the target word.
Step S104 calculates the context between the input example and each seed sample by the contextual feature
Similarity;
Preferably, contextual feature substitution preset formula is obtained into the context similarity;The preset formula
For:
Wherein, similarity indicates the context similarity.
Step S105, judges whether the context similarity is more than the first predetermined threshold value;
Step S106 counts the similarity more than described if the similarity is more than first predetermined threshold value
The number of the seed sample of predetermined threshold value;
Step S107 judges that the similarity is more than the number of the seed sample of the predetermined threshold value and whether is more than described the
Two predetermined threshold values;
Step S108, if the number that the similarity is more than the seed sample of the predetermined threshold value is more than described second in advance
If threshold value, then using the entity relationship example for inputting example and being obtained as the text-processing.
It should be noted that step shown in the flowchart of the accompanying drawings can be in such as a group of computer-executable instructions
It is executed in computer system, although also, logical order is shown in flow charts, and it in some cases, can be with not
The sequence being same as herein executes shown or described step.
According to the embodiment of the present application, a kind of device for implementing the above method is additionally provided, as shown in figure 3, the device
Including:Input module 10 inputs pending text;Identification module 20 identifies the entity in the pending text, wherein institute
It includes multiple entities to state pending text;Screening module 30 screens the entity according to default sample to obtain input example
Contextual feature;Computing module 40 is calculated by the contextual feature between the input example and each kind of sub-instance
Context similarity;First judgment module 50, judges whether the context similarity is more than the first predetermined threshold value;Statistical module
60, if the similarity is more than the predetermined threshold value, count the seed that the similarity is more than first predetermined threshold value
Sample number;Second judgment module 70, for judging that the number of seed sample that the similarity is more than the predetermined threshold value is
It is no to be more than second predetermined threshold value;Terminate module 80, if the similarity is more than the seed sample of the predetermined threshold value
Number is more than second predetermined threshold value, then the entity relationship reality obtained the input example example as the text-processing
Example.
As shown in figure 4, screening module 30 includes:Word-dividing mode 301, for being segmented to the pending text;Mark
Injection molding block 302 carries out part-of-speech tagging to word segmentation result;Filtering module 303, filtering part of speech annotation results obtain word to be selected;Target
Word acquisition module 304, for obtaining the target word in the word to be selected using contextual window;Contextual feature generation module
305, the contextual feature of input example is constituted for obtaining the target word.
As shown in figure 5, the method operational flowchart of the present invention is specific as follows:
Seed sample generates, and writes some rule templates according to domain knowledge, identifies designated entities relationship.Rule template is most
Amount is stringent, it is ensured that high-accuracy.In addition, rule template answers the expression way of covering relation as much as possible.It is identified in rule
After candidate seed sample, by artificial filter, the sample of mistake is removed, obtains final seed sample in this way.
Training term vector model, term vector method is that Hinton was proposed in 1986, by one low-dimensional real number of word
Vector indicates, such as [0.179, -0.157, -0.117,0.909, -0.532 ...] this form, that is, term vector.And
And in term vector space, two small points of vector angle, the word representated by them is semantically similar or related.Compared with
The term vector that good training algorithm obtains, can preferably reflect the similarity between word semantically.
The similitude similarityX, Y of word X and word Y is calculated with COS distance:
The present embodiment trains term vector using gensim tools.The language material used is full field news corpus.Vector dimension
For 128 dimensions.
Sample contextual feature generates, and sample is a triple (entity 1, entity 2, content of text).For what is given
Sample, we segment content of text, part-of-speech tagging, name Entity recognition, obtain following form result [w0/tag0,
W1/tag1 ..., wi-1/tagi-1, entity 1, wi+1/tagi+1 ..., wj-1/tagj-1, entity 2, wj+1/tagj+1 ...,
wk/tagk].It is filtered by part of speech, only retains verb, noun.Based on context window (a, b, c, d) obtain context [left1,
Right1, left2, right2], wherein left1, right1, left2, right2 are respectively a, 1 left side of entity word, the right b
A word, c, 2 left side of entity word, d, the right word.If practical word number is less than window size, whole words are taken.Finally according to training
Good term vector model, the vector for obtaining contextual feature indicate [[vj-a ..., vj-1], [vj+1 ..., vj+b], [vk-
C ..., vk-1], [vk+1 ..., vk+d]].
Sample similarity calculation generates contextual feature to candidate sample, and calculates the phase with each seed sample successively
Like degree.For candidate sample feature [[wj-a ..., wj-1], [wj+1 ..., wj+b], [wk-c ..., wk-1], [wk+ of input
1 ..., wk+d]] and seed sample feature [[vj-a ..., vj-1], [vj+1 ..., vj+b], [vk-c ..., vk-1], [vk+
1 ..., vk+d]], weight vectors [[f1 ..., fa], [fa+1 ..., fa+b], [fa+b+1 ..., fa+b+c], [fa+b+c+
1 ..., fa+b+c+d]], calculating formula of similarity is as follows
Here the physical length of two feature vector windows is not necessarily identical, and common point is taken when calculating molecule, calculates and divides
The actual size of seed sample feature vector window is taken when female.
It is phase of the candidate sample relative to seed sample it should be pointed out that similarity here and being unsatisfactory for symmetry
Like degree.
Seed sample extends, and for the corpus of input, traverses every document therein, to document by big punctuate (fullstop,
Question mark etc.) carry out subordinate sentence.
To each big sentence, it is named Entity recognition first, if including the entity of two specified types, constructs candidate sample
Example (entity 1, entity 2, content of text).Otherwise next processing is carried out.
The contextual feature of the candidate sample of construction, calculates the similarity of candidate sample and each sample in seed sample library,
And count the sample number that similarity is more than given threshold value.If obtained sample number is more than given threshold value (such as current kind of increment
The 10% of number of cases), then candidate sample is added in sample library, otherwise carries out next processing.
It can be seen from the above description that the application realizes following technique effect:By with identical entity relationship
Entity to similar context, based on sample context similarity extension sample library, can effectively promote Relation extraction
Recall rate.By training term vector model, it is trained using extensive general language material.Context phase is carried out based on term vector
It is calculated like degree, generalization ability can be obviously improved.
Obviously, those skilled in the art should be understood that each module of above-mentioned the application or each step can be with general
Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed
Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored
Be performed by computing device in the storage device, either they are fabricated to each integrated circuit modules or by they
In multiple modules or step be fabricated to single integrated circuit module to realize.In this way, the application be not limited to it is any specific
Hardware and software combines.
The foregoing is merely the preferred embodiments of the application, are not intended to limit this application, for the skill of this field
For art personnel, the application can have various modifications and variations.Within the spirit and principles of this application, any made by repair
Change, equivalent replacement, improvement etc., should be included within the protection domain of the application.
Claims (10)
1. a kind of entity relation extraction method for text-processing, which is characterized in that including:
Input pending text;
Identify the entity in the pending text, wherein the pending text includes multiple entities;
The entity is screened according to default sample to obtain the contextual feature of input example;
By the contextual feature calculate it is described input example and seed sample library in each seed sample between up and down
Literary similarity;
Judge whether the context similarity is more than the first predetermined threshold value;
If the similarity is more than first predetermined threshold value, the seed that the similarity is more than the predetermined threshold value is counted
The number of sample;
Judge that whether the similarity is more than the number of the seed sample of the predetermined threshold value more than second predetermined threshold value;
If the number that the similarity is more than the seed sample of the predetermined threshold value is more than second predetermined threshold value, by institute
State the entity relationship example that input example is obtained as the text-processing.
2. entity relation extraction method according to claim 1, which is characterized in that before the entity abstracting method starts
Including:
Training term vector model, specifically includes:The term vector model is obtained using gensim tools training background language material.
3. entity relation extraction method according to claim 1, which is characterized in that the reality in the identification pending text
Body includes:
Entity in the pending text is obtained using name entity recognition method.
4. entity relation extraction method according to claim 1, which is characterized in that sieved to the entity according to default example
Choosing obtain input example contextual feature include:
The pending text is segmented;
Part-of-speech tagging is carried out to word segmentation result;
Filtering part of speech annotation results obtain word to be selected;
The target word in the word to be selected is obtained using contextual window;
Input example context feature is constituted according to the target word.
5. entity relation extraction method according to claim 1, which is characterized in that calculate institute by the contextual feature
Stating the context similarity inputted between example and each seed sample includes:
Contextual feature substitution preset formula is obtained into the context similarity;
The preset formula is:
Wherein, similarity indicates the context similarity.
6. a kind of entity relation extraction device for text-processing, which is characterized in that including:
Input module inputs pending text;
Identification module identifies the entity in the pending text, wherein the pending text includes multiple entities;
Screening module screens the entity according to default sample to obtain the contextual feature of input example;
It is similar to the context between each seed sample to calculate the input example by the contextual feature for computing module
Degree;
First judgment module, judges whether the context similarity is more than the first predetermined threshold value;
It is default more than described first to count the similarity if the similarity is more than the predetermined threshold value for statistical module
The seed sample number of threshold value;
Second judgment module, for judging that whether the similarity is more than the number of the seed sample of the predetermined threshold value more than institute
State the second predetermined threshold value;
Terminate module, if the number that the similarity is more than the seed sample of the predetermined threshold value is more than the described second default threshold
Value, then the entity relationship example obtained the input example as the text-processing.
7. entity relation extraction device according to claim 6, which is characterized in that the entity relation extraction device also wraps
It includes:Training module is specifically included for training term vector model:Institute's predicate is obtained using gensim tools training background language material
Vector model.
8. entity relation extraction device according to claim 6, which is characterized in that the identification module includes:
Entity acquisition module obtains the entity in the pending text using name entity recognition method.
9. entity relation extraction device according to claim 6, which is characterized in that the screening module includes:
Word-dividing mode, for being segmented to the pending text;
Labeling module carries out part-of-speech tagging to word segmentation result;
Filtering module, filtering part of speech annotation results obtain word to be selected;
Target word acquisition module, for obtaining the target word in the word to be selected using contextual window;
Contextual feature generation module constitutes the contextual feature of input example for obtaining the target word.
10. entity relation extraction device according to claim 6, which is characterized in that the computing module includes:
Module is substituted into, for contextual feature substitution preset formula to be obtained the context similarity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810348221.6A CN108763192B (en) | 2018-04-18 | 2018-04-18 | Entity relation extraction method and device for text processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810348221.6A CN108763192B (en) | 2018-04-18 | 2018-04-18 | Entity relation extraction method and device for text processing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108763192A true CN108763192A (en) | 2018-11-06 |
CN108763192B CN108763192B (en) | 2022-04-19 |
Family
ID=64011106
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810348221.6A Active CN108763192B (en) | 2018-04-18 | 2018-04-18 | Entity relation extraction method and device for text processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108763192B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522399A (en) * | 2018-11-20 | 2019-03-26 | 北京京东尚科信息技术有限公司 | Method and apparatus for generating information |
CN110909116A (en) * | 2019-11-28 | 2020-03-24 | 中国人民解放军军事科学院军事科学信息研究中心 | Entity set expansion method and system for social media |
CN111488467A (en) * | 2020-04-30 | 2020-08-04 | 北京建筑大学 | Construction method and device of geographical knowledge graph, storage medium and computer equipment |
CN113538075A (en) * | 2020-04-14 | 2021-10-22 | 阿里巴巴集团控股有限公司 | Data processing method, model training method, device and equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170032781A1 (en) * | 2015-07-28 | 2017-02-02 | Google Inc. | Collaborative language model biasing |
CN107203511A (en) * | 2017-05-27 | 2017-09-26 | 中国矿业大学 | A kind of network text name entity recognition method based on neutral net probability disambiguation |
CN107463607A (en) * | 2017-06-23 | 2017-12-12 | 昆明理工大学 | The domain entities hyponymy of bluebeard compound vector sum bootstrapping study obtains and method for organizing |
CN107784125A (en) * | 2017-11-24 | 2018-03-09 | 中国银行股份有限公司 | A kind of entity relation extraction method and device |
CN107861939A (en) * | 2017-09-30 | 2018-03-30 | 昆明理工大学 | A kind of domain entities disambiguation method for merging term vector and topic model |
US10394886B2 (en) * | 2015-12-04 | 2019-08-27 | Sony Corporation | Electronic device, computer-implemented method and computer program |
-
2018
- 2018-04-18 CN CN201810348221.6A patent/CN108763192B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170032781A1 (en) * | 2015-07-28 | 2017-02-02 | Google Inc. | Collaborative language model biasing |
US10394886B2 (en) * | 2015-12-04 | 2019-08-27 | Sony Corporation | Electronic device, computer-implemented method and computer program |
CN107203511A (en) * | 2017-05-27 | 2017-09-26 | 中国矿业大学 | A kind of network text name entity recognition method based on neutral net probability disambiguation |
CN107463607A (en) * | 2017-06-23 | 2017-12-12 | 昆明理工大学 | The domain entities hyponymy of bluebeard compound vector sum bootstrapping study obtains and method for organizing |
CN107861939A (en) * | 2017-09-30 | 2018-03-30 | 昆明理工大学 | A kind of domain entities disambiguation method for merging term vector and topic model |
CN107784125A (en) * | 2017-11-24 | 2018-03-09 | 中国银行股份有限公司 | A kind of entity relation extraction method and device |
Non-Patent Citations (2)
Title |
---|
LISHUANG LI 等: ""A distributed meta-learning system for Chinese entity relation extraction"", 《NEUROCOMPUTING》 * |
黄勋 等: ""关系抽取技术研究综述"", 《现代图书情报技术》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522399A (en) * | 2018-11-20 | 2019-03-26 | 北京京东尚科信息技术有限公司 | Method and apparatus for generating information |
CN109522399B (en) * | 2018-11-20 | 2022-08-12 | 北京京东尚科信息技术有限公司 | Method and apparatus for generating information |
CN110909116A (en) * | 2019-11-28 | 2020-03-24 | 中国人民解放军军事科学院军事科学信息研究中心 | Entity set expansion method and system for social media |
CN110909116B (en) * | 2019-11-28 | 2022-12-23 | 中国人民解放军军事科学院军事科学信息研究中心 | Entity set expansion method and system for social media |
CN113538075A (en) * | 2020-04-14 | 2021-10-22 | 阿里巴巴集团控股有限公司 | Data processing method, model training method, device and equipment |
CN111488467A (en) * | 2020-04-30 | 2020-08-04 | 北京建筑大学 | Construction method and device of geographical knowledge graph, storage medium and computer equipment |
CN111488467B (en) * | 2020-04-30 | 2022-04-05 | 北京建筑大学 | Construction method and device of geographical knowledge graph, storage medium and computer equipment |
Also Published As
Publication number | Publication date |
---|---|
CN108763192B (en) | 2022-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104199972B (en) | A kind of name entity relation extraction and construction method based on deep learning | |
CN108763192A (en) | Entity relation extraction method and device for text-processing | |
CN108334495A (en) | Short text similarity calculating method and system | |
CN104881458B (en) | A kind of mask method and device of Web page subject | |
CN108182976A (en) | A kind of clinical medicine information extracting method based on neural network | |
CN104035975B (en) | It is a kind of to realize the method that remote supervisory character relation is extracted using Chinese online resource | |
CN106909537B (en) | One-word polysemous analysis method based on topic model and vector space | |
CN109871955A (en) | A kind of aviation safety accident causality abstracting method | |
CN110175246A (en) | A method of extracting notional word from video caption | |
EP3483747A1 (en) | Preserving and processing ambiguity in natural language | |
CN112101031B (en) | Entity identification method, terminal equipment and storage medium | |
CN110929520B (en) | Unnamed entity object extraction method and device, electronic equipment and storage medium | |
CN111475622A (en) | Text classification method, device, terminal and storage medium | |
CN112395395A (en) | Text keyword extraction method, device, equipment and storage medium | |
Ren et al. | Detecting the scope of negation and speculation in biomedical texts by using recursive neural network | |
CN113051914A (en) | Enterprise hidden label extraction method and device based on multi-feature dynamic portrait | |
CN105760524A (en) | Multi-level and multi-class classification method for science news headlines | |
CN104537280B (en) | Protein interactive relation recognition methods based on text relation similitude | |
CN111177375A (en) | Electronic document classification method and device | |
CN115600605A (en) | Method, system, equipment and storage medium for jointly extracting Chinese entity relationship | |
CN111428502A (en) | Named entity labeling method for military corpus | |
CN111161861A (en) | Short text data processing method and device for hospital logistics operation and maintenance | |
CN111368532B (en) | Topic word embedding disambiguation method and system based on LDA | |
CN113076391A (en) | Remote supervision relation extraction method based on multi-layer attention mechanism | |
CN113191118A (en) | Text relation extraction method based on sequence labeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: Room 501, 502, 503, No. 66 Boxia Road, China (Shanghai) Pilot Free Trade Zone, Pudong New Area, Shanghai, March 2012 Patentee after: Daguan Data Co.,Ltd. Address before: Room 515, building Y1, No. 112, liangxiu Road, Pudong New Area, Shanghai 201203 Patentee before: DATAGRAND INFORMATION TECHNOLOGY (SHANGHAI) Co.,Ltd. |