CN109582933A - A kind of method and relevant apparatus of determining text novelty degree - Google Patents
A kind of method and relevant apparatus of determining text novelty degree Download PDFInfo
- Publication number
- CN109582933A CN109582933A CN201811348626.6A CN201811348626A CN109582933A CN 109582933 A CN109582933 A CN 109582933A CN 201811348626 A CN201811348626 A CN 201811348626A CN 109582933 A CN109582933 A CN 109582933A
- Authority
- CN
- China
- Prior art keywords
- candidate
- entity
- target
- text
- relationship
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present application provides the method and relevant apparatus of a kind of determining text novelty degree, this method comprises: determining target text;Multiple target entities in the target text are extracted, target entity set is obtained;Obtain the candidate entity sets of every candidate text in candidate text collection;Determine that the first instance intersection of the target entity set and the candidate entity sets, the first instance intersection are the entity to match in the target entity set and the candidate entity sets;The novel degree of the target text and the candidate text is determined according to the difference parameter of the first instance intersection and the target entity set.In the embodiment of the present application, the accuracy rate that novel degree calculates is improved.
Description
Technical field
The present invention relates to data processing fields, and in particular to a kind of method and relevant apparatus of determining text novelty degree.
Background technique
With the arriving in technology explosion epoch, information importance constantly enhances, and data volume constantly increases, and information retrieval is with regard to outstanding
It is important.
User usually needs to inquire time similar with the target text in the database according to target text searching database
Selection sheet, but current search method is all based on greatly text retrieval, text retrieval is conceived to the matching of text character.For example,
User determines the keyword in target text, inputs keyword, and then searching system is according to the candidate in keyword and database
Text carries out Keywords matching, and the novel degree of the higher candidate text and target text of keyword quantity Matching is lower.
In current mode, user is needed to determine keyword, the selection of keyword is very big on search result influence, and
The selection of keyword has subjectivity, and the not necessarily understanding of target text actual content, therefore, target text and candidate are literary
The accuracy rate of this novel degree is lower.
Summary of the invention
In view of this, being determined the embodiment of the invention provides the method and relevant apparatus of a kind of determination text novelty degree
Candidate entity all in the candidate text of all target entities and every in target text, according to first instance intersection and institute
The difference parameter for stating target entity set determines the novel degree of the target text and the candidate text, relative to existing skill
Art, the keyword only determined by user's subjectivity determine novelty degree by Keywords matching, the determination method of novel degree need by
To the influence of user's subjective understanding, method provided by the embodiments of the present application is more objective, is that target text and candidate text are really interior
The expression of appearance, therefore, novel degree calculate more acurrate.
In a first aspect, the embodiment of the present application provides a kind of method of determining text novelty degree, comprising:
Determine target text;
Multiple target entities in the target text are extracted, target entity set is obtained;
Obtain the candidate entity sets of every candidate text in candidate text collection;
Determine the first instance intersection of the target entity set and the candidate entity sets, the first instance intersection
For the entity to match in the target entity set and the candidate entity sets;
The target text and institute are determined according to the difference parameter of the first instance intersection and the target entity set
State the novel degree of candidate text.
In one possible implementation, the method also includes:
Multiple binary crelations in the target text are extracted, target binary crelation set, the binary crelation packet are obtained
Include two entities and its between relationship;
Obtain the candidate binary set of relationship including multiple binary crelations in the candidate text;
Determine the first binary crelation intersection of the target binary crelation set Yu the candidate binary set of relationship, it is described
First binary crelation intersection includes the binary to match in the target binary crelation set and the candidate binary set of relationship
Relationship;
The difference parameter according to the first instance set and the target entity set determines the target text
With the novel degree of the candidate text, comprising:
The difference parameter according to the first instance set and the target entity set determines first instance novelty
Degree;
The first binary is determined according to the difference parameter of the first binary crelation intersection and the target binary crelation set
Relationship novelty degree;
According to the first instance novelty degree and the first binary crelation novelty degree determine the target text with it is described
The novel degree of candidate text.
In one possible implementation, the method also includes:
The target ternary relation set in the target text is extracted, the target ternary relation set includes multiple ternarys
Relationship, the ternary relation include two binary crelations, entity having the same in described two binary crelations;
Obtain the candidate ternary relation set including multiple ternary relations in the candidate text;
Determine the first ternary relation intersection of the target ternary relation set and the candidate ternary relation set, it is described
First ternary relation intersection includes the ternary to match in the target ternary relation set and the candidate ternary relation set
Relationship;
It is described according to the first instance novelty degree and the first binary crelation novelty degree determine the target text with
The novel degree of candidate's text, comprising:
The first ternary is determined according to the difference parameter of the first ternary relation intersection and the target ternary relation set
Relationship novelty degree;
According to the first instance novelty degree, the first binary crelation novelty degree and the first ternary relation novelty degree
Determine the novel degree of the target text and the candidate text.
In one possible implementation, the multiple target entities extracted in the target text, comprising:
The target text is input to entity extraction model, the target text is identified by the entity extraction model
In multiple target entities.
In one possible implementation, the multiple binary crelations extracted in the target text, comprising:
The target text for having recognized the target entity is input to relationship and extracts model, mould is extracted by the relationship
Type extracts the binary crelation between the target entity.
In one possible implementation, comprising:
According to the relationship between the target entity, structured representation is carried out to the target text, generates object construction.
In one possible implementation, including node and side, the node are described for indicating the target entity
Side is used to indicate the relationship between target entity.
In one possible implementation, every candidate text is that the candidate of structuring is tied in the candidate text collection
Structure, the target text are object construction, the method also includes:
The candidate entity sets of candidate map are extracted, candidate's map includes an at least candidate structure;
The entity intersection of the determination target entity set and the candidate entity sets, comprising:
Determine the second instance intersection of the candidate entity sets of the target entity set and the candidate map;
The method also includes:
The target text and institute are determined according to the difference parameter of the second instance set and the target entity set
State the novel degree of candidate map.
In one possible implementation, when the candidate map includes at least two candidate structures, it is described at least
Two candidate structures are the first candidate structure and the second candidate structure;
Determine the associated entity of first candidate structure and second candidate structure;
First candidate structure and second candidate structure are associated by the associated entity, obtained described
Candidate map.
In one possible implementation, the method also includes:
Multiple binary crelations in the object construction are extracted, target binary crelation set is obtained;
Two target entities that each target binary crelation in the target binary crelation set is included are navigated to
Corresponding two provider locations in candidate's map;
Calculate the distance between corresponding described two provider locations of each target binary crelation;
Second binary crelation of each target binary crelation relative to the candidate map is determined according to the distance
Novel degree;
The difference parameter according to the second instance set and the target entity set determines the target text
With the novel degree of the candidate text, comprising:
Second instance novelty degree is determined according to the difference parameter of the second instance set and the target entity set;
According to the second instance novelty degree and the second binary crelation novelty degree determine the object construction with it is described
The novel degree of candidate map.
In one possible implementation, the method also includes:
Obtain the candidate binary set of relationship including multiple binary crelations in the candidate map;
Determine the second binary crelation intersection of the target binary crelation set Yu the candidate binary set of relationship;
The first binary is determined according to the difference parameter of the second binary crelation intersection and the target binary crelation set
Relationship novelty degree;
It is true according to the first binary crelation novelty degree and the second binary crelation novelty degree and corresponding weight
Determine binary crelation novelty degree;
It is described according to the second instance novelty degree and the second binary crelation novelty degree determine the object construction with
The novel degree of the candidate structure, comprising:
The object construction and the candidate are determined according to the second instance novelty degree and the binary crelation novelty degree
The novel degree of map.
In one possible implementation, the method also includes:
Multiple ternary relations in the object construction are extracted, target ternary relation set is obtained;
Any two target entity that each target ternary relation in the target ternary set is included is navigated to
Three provider locations of correspondence in candidate's map;
Calculate the distance between any two provider locations in three provider locations;
Second ternary relation of each target ternary relation relative to the candidate map is determined according to the distance
Novel degree;
It is described according to the second instance novelty degree and the second binary crelation novelty degree determine the object construction with
The novel degree of the candidate structure, comprising:
According to the determination of the second instance novelty degree, the second binary crelation novelty degree and the second ternary relation novelty degree
The novel degree of object construction and the candidate map.
In one possible implementation, the method also includes:
Obtain the candidate ternary relation set including multiple ternary relations in the candidate map;
Determine the second ternary relation intersection of the target ternary relation set and the candidate ternary relation set;
The first ternary relation is determined according to the difference parameter of the ternary relation intersection and the target ternary relation set
Novel degree;
It is true according to the first ternary relation novelty degree and the second ternary relation novelty degree and corresponding weight
Determine ternary relation novelty degree;
The object construction and the candidate are determined according to the second instance novelty degree and the binary crelation novelty degree
The novel degree of map, comprising:
The target is determined according to the second instance novelty degree, binary crelation novelty degree and the ternary relation novelty degree
The novel degree of structure and the candidate map.
In one possible implementation, the multiple binary crelations extracted in the target text, comprising:
Entity relationship data set is obtained, the entity relationship data set is according between the entity and entity in text collection
Relationship obtain;The entity relationship matrix includes the relationship between N number of entity and N number of entity, the N be greater than or
Equal to 2;
It is inquired in the entity relationship data set, obtains having with the first instance related M second in fact
Body, the M are less than or equal to N;
In the presetting range in the target text, the second instance is searched;
If finding at least one target second instance in the M second instance, establish the first instance with
Relationship between the target second instance.
In one possible implementation, in the presetting range in the target text, described second is searched
Before entity, the method also includes:
Create Entities Matching window;
The presetting range in the target text is determined according to the size of the Entities Matching window.
Second aspect provides a kind of device of determining text novelty degree in the embodiment of the present application, comprising:
First determining module, for determining target text;
Extraction module, for extracting multiple target entities in the target text that first determining module determines,
Obtain target entity set;
Module is obtained, for obtaining the candidate entity sets of every in candidate text collection candidate text;
Second determining module, for determining the target entity set and the acquisition module of the extraction module identification
The first instance intersection of the candidate entity sets obtained, the first instance intersection are the target entity set and described
The entity to match in candidate entity sets;
Novelty determining module, the first instance intersection for being determined according to second determining module are mentioned with described
The difference parameter for the target entity set that modulus block extracts determines the novel degree of the target text and the candidate text.
The third aspect, the embodiment of the present application provide a kind of electronic equipment, comprising:
Memory and processor;
Connection is communicated with each other between the memory and the processor, is stored with computer instruction in the memory,
The processor is by executing the computer instruction, thereby executing method described in above-mentioned first aspect.
Fourth aspect, the embodiment of the present application provide a kind of computer storage medium, which is characterized in that the computer can
It reads storage medium and is stored with computer instruction, the computer instruction is for executing the computer described in above-mentioned first aspect
Method.
In the present embodiment, it is first determined need the target text of novelty degree to be determined, which can be for one specially
Benefit;Multiple target entities in the target text are further extracted, target entity set is obtained;Obtain candidate text collection
In every candidate text candidate entity sets;Traverse each candidate text, determine the target entity set with it is each
The first instance intersection of the candidate entity sets of the candidate text of a piece, first instance intersection are the target entity set and the time
Select the entity to match in entity sets;Finally, being joined according to the difference of the first instance intersection and the target entity set
Number determines the novel degree of the target text and the candidate text.In the present embodiment, it is contemplated that all mesh in target text
Candidate entity all in entity and every candidate text is marked, according to the difference of first instance intersection and the target entity set
Different parameter determines the novel degree of the target text and the candidate text, compared with the existing technology, only subjective really by user
Fixed keyword determines novelty degree by Keywords matching, and the determination method of novel degree needs the shadow by user's subjective understanding
It rings, it is the expression of target text and candidate text true content that method provided by the embodiments of the present application is more objective, therefore, novel
Degree calculates more acurrate.
Detailed description of the invention
The features and advantages of the present invention will be more clearly understood by referring to the accompanying drawings, and attached drawing is schematically without that should manage
Solution is carries out any restrictions to the present invention, in the accompanying drawings:
Fig. 1 is that a kind of step process of one embodiment of the method for training structure model of the embodiment of the present application is illustrated
Figure;
Fig. 2 is a kind of step flow diagram of one embodiment of the method for text structure of the embodiment of the present application;
Fig. 3 is the schematic diagram of the object construction in the embodiment of the present application;
Fig. 4 is the schematic diagram of the picture structure in the embodiment of the present application;
Fig. 5 is a kind of step process signal of one embodiment of the method for determining text similarity in the embodiment of the present application
Figure;
Fig. 6 is the Word2vec model training process schematic in the embodiment of the present application;
Fig. 7 is that the step process of the one embodiment for the method that one of the embodiment of the present application determines text novelty degree is shown
It is intended to;
Fig. 8 is the schematic diagram of the candidate map in the embodiment of the present application;
Fig. 9 is the step flow diagram that one embodiment of method of image information is obtained in the embodiment of the present application;
Figure 10 is the schematic diagram of Detailed description of the invention and attached drawing in candidate text in the embodiment of the present application;
Figure 11 is the topological schematic diagram of the first candidate image and the second candidate image in the embodiment of the present application;
Figure 12 is a kind of step process signal of the one embodiment for the method for obtaining entity information in the embodiment of the present application
Figure;
Figure 13 is a kind of structural schematic diagram of one embodiment of the device of determining text novelty degree in the embodiment of the present application;
Figure 14 is a kind of structural representation of another embodiment of the device of determining text novelty degree in the embodiment of the present application
Figure;
Figure 15 is the structural schematic diagram of one embodiment of a kind of electronic equipment in the embodiment of the present application.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.
The embodiment of the present application provides a kind of method of text structure, and the text in the embodiment of the present application includes but unlimited
Due to technical literature, patent document, academic paper etc., after carrying out structured representation to text, obtain structuring information (such as
Structure chart) facilitate understanding of the user to text content.Alternatively, the information of structuring can also be used as retrieving information
Retrieval type is all based on greatly text retrieval currently for the search method of patent, text retrieval is conceived to by taking patent document as an example
The matching of text character lacks the understanding to user demand and the understanding of patent content, not on the basis of content understanding into
Row retrieval.And patent text is indicated by the method providing in the embodiment of the present application by way of structuring, Ke Yi
Patent content is retrieved on the basis of understanding, improves the accuracy of retrieval.
A kind of method of text structure is provided in the embodiment of the present application, this method is applied to a kind of electronic equipment, should
Electronic equipment can be server, or terminal device, the terminal device include but is not limited to computer, mobile phone, palm
Computer etc..The electronic equipment obtains the target text to structuring, for example, the target text can be a patent, then
The target text is input to trained entity extraction model, the target text is identified by the entity extraction model
In entity;Then the target text for having recognized entity is input to trained relationship and extracts model, pass through the relationship
Extract the relationship between entity described in model extraction;According to the relationship between the entity and the entity, to the target text
This progress structured representation generates the information (or text representation for structuring) of structuring, for example, the text of the structuring
Expression can be structure chart or flow chart etc..In the embodiment of the present application, pass through trained entity extraction model extraction
Entity in target text, then the relationship between entity described in model extraction is extracted by trained relationship, according to entity
And the relationship between entity is automatically generated the text representation of structuring, conducive to the understanding to content of text, conversion speed is fast, section
Less manpower cost.
The method of the text structure provided in the embodiment of the present application understands, for convenience first to the embodiment of the present application
The word of middle offer is explained:
Entity: such in such as patent, paper for indicating for indicating the word of feature in text (such as patent, paper)
In technical literature, which is the word for presentation technology feature, and entity includes component, attribute or attribute value.
Component: the building block in text, such as charging equipment, memory are indicated.
Attribute: an attribute of component, such as " voltage " of charging equipment are indicated.
Attribute value: the value of one attribute of component is indicated, for example the voltage of charging equipment is " 240v ".
Relationship between entity: the relationship between technical characteristic, specifically, include the relationship between the component, it is described
Relationship between component and the attribute, or, the relationship of the attribute and the attribute value.
Wherein, 1) type of the relationship between component includes but is not limited to:
Inclusion relation, citing, charging pile include control unit.
Connection relationship, citing, humidity control apparatus connect refrigerating fan.
2) relationship of component and attribute:
Component has certain attribute, for example charging equipment has voltage properties.
3) relationship of the attribute of component and attribute value:
Attribute has specific attribute value, such as voltage "Yes" 240v.
Embodiment 1
Understood incorporated by reference to Fig. 1, the method for the text structure provided in the embodiment of the present application is carried out below detailed
Illustrate, the method for text structuring mainly includes two parts, and first part is training structure model, the second part
For text is carried out structured representation.
First, training structure model;
The structural model includes entity extraction model for extracting entity and for extracting the pass between the entity
System extract model, trained method the following steps are included:
The first corpus set that step 101, acquisition have marked, the first corpus set is according to the first presetting rule pair
Every text in first text collection carries out what entity corpus labeling obtained.
First text collection includes but is not limited to technical literature, patent, academic paper etc., is somebody's turn to do in the embodiment of the present application
First text collection is illustrated by taking patent as an example.For example, first text collection may include 10,000 patents, need to illustrate
, the quantity for the patent for including in first text collection only illustrate and and it is non-limiting.
The first corpus set is to carry out entity language to every text in the first text collection according to the first presetting rule
Material mark obtains.First presetting rule are as follows: will indicate the first vocabulary of the entity and indicate the second vocabulary of non-physical
Distinguish mark.
Specifically, being illustrated by taking the partial content in a patent in the first text collection as an example:
A kind of text are as follows: " High-Position Automotive Brake Lights, it is characterised in that: the installation seat plate (1) including rectangle, the installation
Seat board (1) is equipped with matched shell frame (2), is equipped with multiple partitions (3) in the shell frame (2) ", for above-mentioned text
This, marks corpus into following format:
" a kind of High-Position Automotive Brake Lights, including rectangle /pre peace/start dress/in/in plate/end (/after 1),
Described/pre peace/start dress/in/in plate/end (/after 1) be equipped with matched/pre it is outer/start shell/
In frame/end (/after 2), described/pre is outer/start shell/in frame/end (/after 2) in be equipped with multiple/pre every/
Start plate/end (3), each/pre have one/pre axis/entity "/after every/start plate/end
Wherein, first presetting rule specifically: the first identifier (such as :/start) represents the first character of entity, the
Two marks (such as :/end) represent the last character of entity, and third mark (such as :/in) proxy component is located at first identifier
Word between start and second identifier end.4th mark (such as :/entity) represents only one word of this component.5th mark
(such as :/pre) represents the word before first identifier start.6th mark (such as :/after) represent second identifier end it
All words of the word afterwards in addition to physical name assign the 7th unified mark (such as :/w).
Such as: packet/w includes/w square/w shape/w/pre peace/start dress/in/in plate/end (/after 1/w)/w.
It should be noted that be merely illustrative in the embodiment of the present application for the mark of corpus labeling, do not cause pair
The limited explanation of the embodiment of the present application.
Step 102 is trained the first corpus set, obtains entity extraction model.
Use condition random field (Conditional Random Field, CRF) model training the first corpus set, obtains
To model parameter, which is constructed according to model parameter.
CRF can be labeled Chinese character, i.e., be made of word (group word) word, not only allow for the frequency of text word appearance
Information, while considering context of co-text, have preferable learning ability, therefore it all has the identification of ambiguity word and unregistered word
There is good effect.
Step 103, using the second text collection as the input of the entity extraction model, pass through the entity extraction model
Identify the entity information in second text collection.
Second text collection is also the set of patent.Using the second text collection as the input of the entity extraction model,
The entity information in second text collection is identified by the entity extraction model.
For example, the partial content of a patent in second text collection are as follows:
A kind of battery detection managing device, including battery pack (1), monitoring modular (2), CPU processor (3) and display
(4), it for this section of text, is parsed, is obtained using entity extraction model:
One/w kind/w electricity/the pond w/w prison/w survey/w pipe/w reason/w dress/w sets/w, and/w packet/w includes/the w electricity/pond start/in group/
End (/w 1/w)/w ,/w prison/start survey/in mould/in block/end (/w 2/w)/w, at/w CPU/start/in reason/in device/
End (/w 3/w)/w and/w is aobvious/and start shows/in device/end (/w 4/w)/w
Four component names: battery pack, monitoring modular, CPU processor and display are extracted from the example of above-mentioned text.
The second corpus set that step 104, acquisition have marked, the second corpus set is according to the second presetting rule pair
Every text of second text collection carries out relationship corpus labeling and entity marks.
After entity extraction model completes component extraction, the corpus labeling of relationship is carried out, and be converted to the corpus of CRF model
Format is trained.
Second presetting rule are as follows: by the first vocabulary for indicating the entity, the third vocabulary for indicating relationship, indicate non-
The entity and third vocabulary of non-relationship distinguishes mark.
Specifically, being exemplified below:
Example: the installation seat plate (1) is equipped with matched shell frame (2)
Relationship corpus standard carried out to the text, mark at:
" institute/w states/w installation seat plate/e (/w 1/w)/w is upper/w sets/r_start has/r_end and/w it/w phase/w/w
With/w /w shell frame/e (/w 2/w)/w ".
Wherein, the 7th mark (such as :/w) is general character, and the 8th mark (such as :/e) is the group of entity extraction model identification
Part, the beginning word of the 9th mark (such as :/r_start) representation relation, the end word of the 9th mark (such as :/r_end) representation relation.
It should be noted that entity extraction model identified is the relationship between entity and entity in the embodiment of the present application,
In example in the embodiment of the present application, entity extraction model identifies that component is merely illustrative, which can also
With recognition property and attribute value, do not illustrate one by one in embodiment only, therefore, example shown by the embodiment of the present application
Son does not cause the limited explanation to the application.
Step 105 is trained the second corpus information set, obtains the relationship and extracts model.
The second corpus information set is trained using CRF model, model parameter is obtained, according to the model parameter structure
It builds the relationship and extracts model.Wherein, which includes regularization term parameter a, value L2, can be obtained more better than L1 quasi-
Close effect.Hyper-parameter parameter c can be fitted as far as possible training data with value 3.Participate in the threshold of the feature of training
Value f, the f value 3 are not involved in training if the number that word occurs is less than f.
For example, extracting the relationship between entity from above-mentioned text are as follows: " dress seat board " is equipped with " shell frame ".
In the embodiment of the present application, the first corpus set marked is obtained, the first corpus set is pre- according to first
It sets rule and what entity corpus labeling obtained is carried out to every text in the first text collection;Then to the first corpus set
It is trained, obtains entity extraction model, which is used to extract the entity in text;Then, by the second text
Gather the input as the entity extraction model, the reality in second text collection is identified by the entity extraction model
Body information;Obtain the second corpus set marked;The second corpus information set is trained, the relationship is obtained and mentions
Modulus type, the relationship extract model and are used to extract relationship between entity, the relationship between entity and entity be used for this paper into
Row structured representation.
On the basis of the above embodiments, the entity extraction model in the embodiment of the present application includes at least two entity extractions
Submodel, at least two entity extractions submodel include that first instance extracts submodel and second instance extraction submodel,
It is described that the first corpus set is trained, the entity extraction model is obtained, can also be specifically included:
The first corpus set is trained, the first instance is obtained and extracts submodel;
The input that submodel is extracted using third text collection as the first instance extracts son by the first instance
Model identifies the target entity set in the third text collection;
The target entity set is trained, the second instance is obtained and extracts submodel.
It in the embodiment of the present application, does not need to prepare entity dictionary in advance, starts only mark a certain amount of corpus (such as first
Corpus set) training first instance extraction submodel, then extracts submodel by the first instance and identifies third text collection
In target entity set, which can be used as new mark corpus again, then to the target entity set into
Row training obtains second instance and extracts submodel, which extracts submodel can cover more entities again, and thus
Entity dictionary is generated, by the identification of multiple entity extraction submodels, which can include more and more entities, example
Such as, the entity vocabulary extracted in all patents is summarised in together, forms entity dictionary, which may include 2 column, real
Body+the frequency.The frequency is the patent numbers comprising this component.For example, mounting seat, 3;Shell frame, 4.Lead in the embodiment of the present application
It crosses and marks a certain amount of entity corpus, by constantly training entity extraction submodel, covered by multiple entity extraction submodels
More entities are covered, the entity accuracy in identification text is greatly improved.
Similarly, it includes that at least two relationships extract submodel that the relationship in the embodiment of the present application, which extracts model, this at least two
A entity extraction submodel includes that the first relationship extracts submodel and the second relationship extraction submodel, to second corpus information
Set is trained, and is obtained the relationship and is extracted model, can also specifically include:
The second corpus set is trained, first relationship is obtained and extracts submodel;
The input that submodel is extracted using the 4th text collection as first relationship extracts submodule by first relationship
Type identifies the relationship by objective (RBO) set in the 4th text collection;
The relationship by objective (RBO) set is trained, the second instance is obtained and extracts submodel.
It in the embodiment of the present application, does not need to prepare entity relationship dictionary in advance, starts only mark a certain amount of relationship language
Expect that the first relationship of (such as the second corpus set) training extracts submodel, submodel identification the is then extracted by first relationship
Relationship by objective (RBO) set in four text collections, which can be used as new mark relationship corpus again, then to this
Relationship by objective (RBO) set is trained, and second relationship that obtains extracts submodel, which extracts submodel and can cover again more
More relationships, and relationship dictionary is thus generated, the identification of submodel is extracted by multiple relationships, which can include more next
More relationship, for example, the relationship vocabulary extracted in all patents is summarised in together, component relationship dictionary, the relationship dictionary
It may include 2 column, the relationship+frequency.The frequency is the patent numbers comprising this relationship.For example, including 10;It is equipped with, 20.The application
By marking a certain amount of relationship corpus in embodiment, submodel is extracted by continuous repetitive exercise relationship, passes through multiple relationships
It extracts submodel and covers more relationships, the relationship accuracy in identification text is greatly improved.
Then text structure expression is carried out;
It executes the step 101- step 105 in above-mentioned example and obtains entity extraction model and relationship extraction model, further
Model can be extracted by the entity extraction model and the relationship and structured representation is carried out to target text, please refer to shown in Fig. 2,
The embodiment of the present application provides a kind of method of text structure, may include steps of:
The target text of step 201, acquisition to structuring.
The target text to structuring is obtained, for example, the target text can be a patent.
The target text is input to entity extraction model by step 202, by described in entity extraction model identification
Target entity set in target text.The entity extraction model is trained to obtain to the first corpus set, should
First corpus set is to carry out entity corpus labeling to every text in the first text collection to obtain.
Firstly, the target text is input to entity extraction model, the target is identified by the entity extraction model
Target entity set in text.For example, the target text includes following content: " a kind of High-Position Automotive Brake Lights, feature exist
In: the installation seat plate (1) including rectangle, the installation seat plate (1) are equipped with matched shell frame (2), the shell
Multiple partitions (3) are equipped in frame (2) ", which exports the target entity collection in the target text and is combined into " mounting base
Plate, shell frame, partition ".
The target text for having recognized the target entity set is input to relationship extraction model by step 203, passes through institute
State the relationship between target entity described in relationship extraction model extraction.
The target text for having recognized the target entity is input to relationship and extracts model, it is each that relationship extracts model output
Relationship between target entity, for example, the relationship between each entity are as follows: installation seat plate is equipped with shell frame;Shell frame be equipped with every
Plate.
Step 204, according to the relationship between the entity and the target entity, structuring is carried out to the target text
It indicates, generates object construction.
It please refers to Fig. 3 to be understood, Fig. 3 is the schematic diagram of object construction.The generation object construction includes node and side, institute
Stating node indicates the entity, which includes component, attribute or attribute value;Relationship between the side presentation-entity, it is described
Relationship between entity includes the relationship between the component, the relationship between the component and the attribute, or, the attribute
With the relationship of the attribute value.
For example, the entity and its relationship that extract from a patent are as follows:
Brake lamp includes installation seat plate
Brake lamp includes grating version
Brake lamp includes LED light
Installation seat plate is equipped with shell frame
Shell frame is equipped with partition
Shell frame is equipped with installation cavity
The result of entity extraction in target text is merged with the result that entity relationship is extracted, entire chapter target can be obtained
The structure chart (object construction as shown in Figure 3) of text.
In the embodiment of the present application, by the entity in trained entity extraction model extraction target text, then pass through
Trained relationship extracts the relationship between entity described in model extraction, is automatically given birth to according to the relationship between entity and entity
At the text representation of structuring, either target text or candidate text is made of the relationship between entity and entity, mentions
The relationship between the entity and entity in content of text is taken, conducive to the understanding to content of text, conversion speed is fast, saves artificial
Cost.
In an application scenarios, user finds a target text (such as patent), and patent length is very long or logicality
Stronger, user is taken much time by the content needs of the subjective understanding patent, and user can be by the electronic equipment (such as
Mobile phone) this patent is converted into structure chart, mobile phone receives this patent, this patent is input to entity extraction model, leads to
Cross the target entity set in the entity extraction model identification patent;Then, the target entity set will have been recognized
Patent is input to relationship and extracts model, extracts the relationship between target entity described in model extraction by the relationship;According to mesh
The relationship between entity and the target entity is marked, structured representation is carried out to the target text, generates object construction, terminal
Show the object construction.Alternatively, user can also be sent this patent to server by terminal (such as mobile phone), by server
This patent is converted into object construction, then, which is sent to terminal for the object construction, the terminal display target
Structure.Target text is converted into object construction in the embodiment of the present application, more conducively user understands the content in target text, and
It is greatly saved cost of labor.
On the basis of the above embodiments, present invention also provides in another embodiment, model pair is extracted by relationship
Relationship between entity, which extracts, is possible to will appear such situation, i.e. target entity possibly is present in two words, is led
Cause relationship is extracted model and possibly can not be identified.For example, in one example, text to be identified is " battery pack connection monitoring mould
Block;It is also connected with CPU processor and display.", " battery pack connecting detection mould can be identified by extracting model by above-mentioned relation
Relationship between block ", i.e. battery pack and detection module, since CPU processor and display are in another sentence, Ke Nengcun
The case where cannot identify.
For above situation, the relationship solved between entity is present in different sentences, and relationship is extracted model and may be deposited
The case where cannot identify, this application provides another embodiments:
The target text includes first instance, after step 203, can also be included the following steps: before step 204
Entity relationship data set is obtained, the entity relationship data set is between the entity and entity extracted in text collection
Relationship obtain;The entity relationship matrix includes the relationship between N number of entity and N number of entity, the N be greater than or
Equal to 2;
It is inquired in the entity relationship data set, obtains having with the first instance related M second in fact
Body, the M are less than or equal to N.
In the presetting range in the target text, the second instance is searched;
If finding at least one target second instance in the M second instance, establish the first instance with
Relationship between the target second instance.
Specifically, the entity relationship data set is the reality extracted in text collection firstly, obtaining entity relationship data set
What the relationship between body and entity obtained;The entity relationship matrix includes the relationship between N number of entity and N number of entity,
The N is more than or equal to 2.
Wherein, the specific method of acquisition entity relationship data set includes:
The text collection is input to entity extraction model, the text collection is identified by the entity extraction model
In entity information;Text set can be understood as include more texts set, for example, text collection is combined into Bao Shiwan
The set of patent.It should be noted that the quantity for the text for including in text set is for example, not to the application reality
Apply the limitation of example.
The target text set for having recognized the entity information is input to relationship and extracts model, is mentioned by the relationship
Take the relationship in text collection described in model extraction in every text between entity and entity.The entity relationship data set includes text
Relationship in this set in every text between entity and entity.
Shown in the following matrix A of entity relationship data set:
Brake lamp | Pedestal | …… | LED light | …… | Lamp housing | |
Brake lamp | 0 | It is equipped with | 0 | 0 | ||
Pedestal | 0 | 0 | Include | Connection | ||
…… | ||||||
LED light | 0 | 0 | Connection | |||
…… | ||||||
Lamp housing | 0 | Connection | Connection | 0 |
Then, it is inquired in the entity relationship data set, obtains having with the first instance related M the
Two entities, the M are less than or equal to N.
For example, the first instance is " pedestal ", and " pedestal " is no and other assemblies generate relationship in target text, that
, it is likely that a kind of situation of appearance is that " pedestal " and have related component in different sentences from it, then just needing
Determine which first instance and which entity have relationship in the entity relationship data set, then in target text, this
One entity may also have relationship with which entity.
For example, the first instance is " pedestal ".In the lookup of above-mentioned matrix A and " pedestal " related second instance, specifically
Method can be with are as follows:
" pedestal " a line is navigated in matrix A, is obtained and " pedestal " related all components set S_a, S_a packet
The component contained are as follows: LED light, lamp housing.The column of " pedestal " one are navigated in matrix A, are obtained and " pedestal " related all components
The component that set S_b, S_b include are as follows: Brake lamp, lamp housing.
Include in set S=S_a+S_b, set S (s_0, s_1, s_2 ... s_k ... s_n);
In the above example, include in set S (LED light, lamp housing, Brake lamp).
Further, in the presetting range in the target text, the second instance is searched;
The presetting range can be to be determined by the size of Entities Matching window, according to the size of the Entities Matching window
Determine the presetting range in the target text.The size of the Entities Matching window can be preset.
From the position that this component occurs, target is searched in the range of within g position forward and backward g position
Second instance.For example, the Entities Matching window is from the position of " pedestal ", 10 characters forward, the model of 10 characters backward
In enclosing, second instance is searched.
Finally, it is real to establish described first if finding at least one target second instance in the M second instance
Relationship between body and the target second instance.
For example, if finding 3 second instances, this 3 second instances are as follows: Brake lamp, folded piece, LED light, at these three
In entity, wherein matching there are two entity with " LED light, the Brake lamp " in set S, being somebody's turn to do " LED light, Brake lamp " is target second
Entity, then establish the relationship between " pedestal " and target second instance, and the type of this relationship is " having relationship ".
In the present embodiment, obtain entity relationship data set, inquired in the entity relationship data set, obtain with it is described
First instance has related M second instance, and the M is less than or equal to N;Then the preset model in the target text
In enclosing, the second instance is searched;If finding at least one target second instance in the M second instance, establish
Relationship between the first instance and the target second instance, to solve to distinguish with the related second instance of first instance
In different sentences, relationship extracts the situation that model may not be able to identify.
Optionally, the object construction in the embodiment of the present application can be text structure, or picture structure generates figure
As the concrete mode of structure includes:
Firstly, obtaining the target image information for indicating the entity;
Specifically, can be from internet data (such as various related forums, patent database, paper database) and local number
According to obtaining image collection in library;
Identify the text in described image set in each image;If the text in the target entity and described image set
Word matches, then the image information for indicating the target entity is selected from described image set.For example, identification image set
Text in conjunction in each image, if the text of text (such as engine) and first object entity wherein in the first image is (such as
Engine) match, wherein the text (such as connecting rod) of the text (such as connecting rod) in the second image and the second target entity matches,
Wherein the text (such as press mechanism) of the text (such as press mechanism) in third image and the second target entity matches, then selects
First image, the second image and third image are as the image information for indicating first object entity and the second target entity.
Then, the mesh indicated according to the relationship between the target entity and the target entity, generation with image information
Mark structure.
It please refers to shown in Fig. 4, Fig. 4 is picture structure schematic diagram.For example, " engine ", " connecting rod " and " press mechanism " it
Between relationship are as follows: " engine " connection " connecting rod ", " engine " connection " press mechanism ", according to " engine ", " connecting rod " and " under
Press mechanism " and its between connection relationship, generate picture structure as shown in Figure 4.In this example, obtain for indicating target reality
The image information of body generates picture structure according to the relationship between target entity and target entity, shows the picture structure, more raw
Move the vivider relationship embodied in text between each entity and each entity, it is easier to which user understands content of text.
The method for extracting model above for training entity extraction model and relationship is described in detail, and applying below should
Entity extraction model and relationship extract model and carry out structured representation to text.
It should be noted that executing the executing subject and above-mentioned steps 201- step 204 of above-mentioned steps 101- step 105
Executing subject can be the same electronic equipment, or different electronic equipments;Step 101- step 105 is in step
Before 201, after the completion of entity extraction model and relationship extract model training, step 101- step 105 can not be executed, and
Directly execute step 201.
Embodiment 2
It please refers to shown in Fig. 5, the embodiment of the present application also provides a kind of method of determining text similarity, in this example
Method is applied to electronic equipment, which can be server, or terminal, this method may include walking as follows
It is rapid:
Step 301 obtains target text and candidate data set, and the candidate data set includes multiple arrays, the multiple
The semantic vector of one entity of each array representation in array;The entity is contained in candidate text.
Server can receive the target text of terminal transmission, for example, the target text can be a patent.
The specific method that server obtains candidate data set includes at least the following two kinds mode:
In the first possible implementation:
Firstly, obtaining text collection, text set includes n candidate texts, and the n is whole more than or equal to 2
Number, it is to be understood that text set can be all patents of a technical field in patent database, alternatively, text collection
The a subset of all patents of a technical field in conjunction or patent database, for example, the n can be 100,000 or million.
Then, the entity in described n candidate text in every candidate text is extracted, m entity is obtained, needs to illustrate
It is that in this step, the specific method for extracting the entity in n candidate texts in every candidate text can be according in embodiment 1
The entity extraction model extracts, and by every candidate text input to the entity extraction model, passes through the entity extraction
Model exports the entity in every candidate text, obtains m entity, which is the integer more than or equal to 2, for example, the m can
Think 10,000,000,20,000,000 etc..
According to the entity that described n candidate text and every candidate text are included, objective matrix is determined, for example, the mesh
It is as follows to mark matrix B:
Entity 1 | …… | Entity j | …… | Entity m | |
Patent 1 | 1 | 0 | 0 | ||
Patent 2 | 0 | 3 | 4 | ||
…… | |||||
Patent i | 0 | 1 | 1 | ||
…… | 0 | 0 | |||
Patent n | 6 | 1 | 1 |
In matrix B, including n row and m are arranged, the candidate text of every a line expression one in the n row, in the m column
Each column indicate an entity.Wherein, the number that B [i] [j]=entity j occurs in patent i.For example, entity j is in patent 2
The number of middle appearance is 3 times, and the number that entity m occurs in patent i is 1 inferior.
Finally, carrying out singular value decomposition to above-mentioned objective matrix B, candidate data set is obtained.
Specifically, singular value decomposition is carried out to objective matrix B, it is as follows:
B=U Σ VT;
Matrix U is obtained, for the matrix of n row k column, every a line indicates the vector of a text (such as patent).
Matrix ∑ is the eigenvalue matrix of matrix B, k row k column, wherein k is specified numerical value, for example, k can be 300.
Matrix V, k row m column, wherein each column indicate the vector of an entity, and in this example, which is
The matrix V, the matrix V are referred to as " candidate matrices ".
The example of the matrix V is as follows:
Each column in matrix V, for indicating the k dimensional vector an of component, wherein each value V [i] [j] represents entity
Projection value of the j in i-th of dimension.
It should be noted that in this example, objective matrix B and matrix V merely for convenience of description, and the example carried out
The expression of property, does not cause the limited explanation to the application.
In the second possible implementation:
The candidate data set can be obtained by trained Word2vec model, candidate data set includes multiple entities
Vector, which be trained according to entity corpus set, which can be root
It is obtained according to the method recorded in step 101 in embodiment 1, alternatively, the entity corpus set is also possible to pass through entity extraction
Model carries out what entity extraction obtained to each text in text collection, in order by each word in entity corpus set
For number from 1 to W, W is the integer greater than 1.The entity corpus set is input to Word2vec model, current word and prediction word exist
L can be set in maximum distance in one sentence, for example, the l can be 5,10 etc., the l can be carried out for 5 in this example
Explanation.It please refers to Fig. 6 to be understood, Fig. 6 is Word2vec model training process schematic.
Word2vec model includes input layer, middle layer and output layer.
Input layer shares d node, corresponding d entity.
Middle layer, shares 300 nodes, each input layer has side to be all connected with 300 nodes.
Output layer shares d node, corresponding d entity.
Traversal obtains the serial number i of t, input layer [i]=1, remaining is defeated for each of entity corpus set entity t
Enter node layer=0.
Other words within the distance 5 of t are obtained, serial number a1, a2, a3, a4, the a5 of other words are obtained, output layer a1 is written
Position=1, a2 position=1, a3 position=1, a4 position=1, a5 position=1, remaining position=0.
Gradient descent algorithm is called to calculate the weight on each side.
After model training is completed, the weighted list on 300 sides of any input layer i to middle layer node is exactly
Represent the vector of i-th of entity.The vector of i entity constitutes the candidate data set.
Candidate data set in this example includes the vector of multiple entities.The entity that will be extracted in each candidate text
Be input to the Word2vec model, export the vector of each entity by the Word2vec model, all obtained entities to
Amount forms the candidate data set.
Target entity set in step 302, the extraction target text, multiple array institutes table of the candidate data set
The entity sets shown include the target entity set.
It can be illustrated so that the first implementation obtains candidate data set as an example in the embodiment of the present application.Please refer to square
The example of battle array V, in the matrix V, each column indicates an array, each data includes multiple elements, each element
Represent projection value of the entity in dimension.
Target entity set in entity extraction model extraction target text documented by 1 through the foregoing embodiment, the mesh
Mark entity sets contain target entity all in the target text, for example, include 3 target entities in the target text,
3 target entities are respectively entity 1 (such as seat board) and entity j (such as LED light).Represented by multiple arrays of candidate data set
Entity sets include the target entity set, for example, entity sets (seat board ..., LED represented by the vector in matrix V
Lamp ..., connector) contain entity 1 and entity j in target text.It should be noted that for target text in this example
Included in entity and quantity and the candidate data set entity that is included and quantity be for facilitating explanation and for example,
The limited explanation to the application is not caused.
Step 303 determines each of target entity set target entity and every according to the candidate data set
The angle value of each of one candidate text vector of entity, obtains entity similarity.
According to the entity vector that candidate data is concentrated, calculate in each target entity and each piece candidate text
The angle value of the vector of each entity.For example, the target entity in target text are as follows: entity 1 and entity j.One candidate text
Entity in this c are as follows: entity 2 and entity x calculate separately the similarity of entity 1 Yu entity 2, entity for the candidate text c
The similarity of 1 and entity x, the similarity of entity j and entity 2, the similarity of entity j and entity x.
To be illustrated for the similarity of computational entity 1 and entity j:
In the first possible implementation:
Entity similarity (Rela) is the included angle cosine value of two entity vectors.
For example, the included angle cosine value of Rela (entity 1, entity 2)=entity 1 vector (V1) and 2 vector of entity (V2).
In the second possible implementation: determine the terminal of the vector of each target entity with it is described each
Target range between an each of the piece candidate text terminal of the vector of entity;
According to the candidate data set determine the semantic vector of each of target entity set target entity with
The included angle cosine value (being indicated with " Distance1 ") of each of each piece candidate text semantic vector of entity and
The target range (being indicated with " Distance2 ") obtains the entity similarity.
The included angle cosine value of Distance1=V1 and V2.
Wherein, Distance1 is the included angle cosine value of V1 and V2.
Similarity Rela (entity 1, entity 2)=Distance1*weight1+Distance2* of entity 1 and entity 2
weight2。
Wherein, Weight1 represents the weight of Distance1, and weight2 represents the weight of Distance2.Weight1 with
Weight2 default value can be 0.5, can also be specified by user according to actual use scene, such as Weight1 is 0.6,
Weight2 is 0.4.
In this example, similarity between any two entities is according to the included angle cosine values of two vectors and two vectors
The target range of terminal obtains, and both considers the angle of two vectors, it is also contemplated that the final position of two vectors, and user
The weight that included angle cosine value and target range can be determined according to practical application scene, improves the similarity between computational entity
Accuracy rate.
Step 304, according to the entity similarity, determine that the target text is similar to each candidate target of text
Degree.
In the first implementation, for each candidate text, by each target entity in the target text
Entity similarity add up, obtain the first cumulative similarity;
According to the described first cumulative similarity, the target similarity of the target text with each candidate text is determined.
For example, in the above example, entity 1 and entity j, wherein the entity in a candidate text c are as follows: entity 2 and reality
Body x calculates separately the similarity (being denoted as " Re 1 ") of entity 1 Yu entity 2, the phase of entity 1 and entity x for the candidate text c
Like degree (being denoted as " Re 2 "), the similarity (being denoted as " Re 3 ") of entity j and entity 2, entity j and the similarity of entity x (are denoted as
" Re 4 "), then, for a candidate text, it will calculate and the similarity of each entity (Re 1 ", " Re 2 ", " Re
3 " add up with " Re 4 "), obtain the first cumulative similarity, optionally can be small by similarity degree during calculating
In the score value of 50% (being free of) be all 0.In one implementation, which can be used as target text and waits
The similarity of selection sheet.
Optionally, for each candidate text, it is similar to candidate text to calculate each of target text entity
Spend sim1.
Sim1=first adds up similarity/(target text entity sum U candidate text entities sum), which can make
For the target similarity of the target text and candidate text.
In the present embodiment, electronic equipment obtains target text and candidate data set, candidate data set include multiple arrays, should
The semantic vector of one entity of each array representation in multiple arrays;The entity is contained in candidate text;Further
, the target entity set in the target text is extracted, entity sets represented by multiple arrays of candidate data set include
The target entity set;The language of each of target entity set target entity is determined according to the candidate data set
The included angle cosine value of each of adopted vector and each piece candidate text semantic vector of candidate entity, obtains entity phase
Like degree;In the present embodiment, the similarity of each candidate entity in each target entity and candidate text can be calculated, according to
The entity similarity determines the target similarity of the target text with each candidate text.In the present embodiment, mesh is determined
Mark similarity of the similarity of text and candidate text in view of each of target text and candidate text entity, similarity
Determination more can real table reveal the content of target text and the similarity of candidate content of text is only led to compared with the existing technology
The keyword that user's subjectivity determines is crossed, the determination side of the similarity of target text and candidate text is determined by Keywords matching
Method needs the influence by user's subjective understanding, and method provided by the embodiments of the present application is more objective, is target text and candidate's text
The expression of this true content, therefore, similarity calculation is more acurrate.
On the basis of above-mentioned example, before step 304, the method also includes following steps:
Extract the relationship in the target text between target entity;
Obtain the candidate relationship set in every candidate text;The relationship by objective (RBO) collection is determined according to the entity similarity
The relationship similarity of each candidate relationship in each relationship and the candidate relationship set in conjunction;
In step 304, according to the entity similarity and relationship similarity, the target text and each time are determined
The target similarity of selection sheet.
The relationship in the embodiment of the present application includes binary crelation or binary crelation to X member relationship, wherein X is big
In or equal to 3 integer, the binary crelation includes the relationship between two entities and two entities.The X member relationship packet
X entity, and at least (X-1) a binary crelation are included, should and be somebody's turn to do each binary crelation in (X-1) a binary crelation includes a pass
Join entity, at least (X-1) a binary crelation is connected (X-1) a binary crelation by associated entity for this.
For example, then the relationship includes binary crelation and ternary relation when X is equal to 3;When X is equal to 4, then the relationship packet
Include binary crelation, ternary relation and quaternary relationship, in the embodiment of the present application, for convenience of explanation, which can be closed with binary
It is illustrated for system and ternary relation.
Binary crelation and ternary relation are illustrated below:
Binary crelation: including two entities and its between relationship, i.e. the pass of entity 1+ entity 2+ entity 1 and entity 2
System, such as: Brake lamp (entity 1) includes (relationship) pedestal (entity 2).
Ternary relation: including two binary crelations, for example including binary crelation 1 and binary crelation 2, and two binary crelations
Entity having the same in having, which is associated entity, for connecting two binary crelations.The ternary relation is such as
(Brake lamp-installation seat plate, installation seat plate-shell frame).Wherein installation seat plate is associated entity.
The method of the similarity of the similarity and ternary relation that determine binary crelation is further illustrated below:
Optionally, the binary crelation between the every two target entity in the target text is extracted, target text is obtained
Target binary crelation set.For example, target binary crelation set are as follows: (Brake lamp-installation seat plate, installation seat plate-shell frame,
Shell frame-partition, shell frame-installation cavity, Brake lamp-grating version, Brake lamp-LED light).
Obtain the candidate binary set of relationship in every candidate text.For example, candidate binary set of relationship are as follows: (Brake lamp-
Pedestal, pedestal-shell frame, shell frame-dust-proof plated film, Brake lamp-grating version, Brake lamp-LED light, LED light-lamp housing).
Each binary crelation and the candidate in the target binary crelation set are determined according to the entity similarity
The binary crelation similarity of each candidate binary relationship in binary crelation set.Binary crelation similarity are as follows: target binary is closed
The similarity of first object entity in system and the first candidate entity of candidate binary relationship, the second mesh in target binary crelation
Mark the similarity of the second candidate entity of entity and candidate binary relationship, relationship and candidate binary relationship in target binary crelation
In the sum of the similarity of relationship.Formula indicates are as follows: binary crelation similarity Rela2 (close by target binary crelation, candidate binary
System)=Rela1 (target entity 1, candidate entity 1)+Rela1 (target entity 2, candidate entity 2)+R (close by relationship by objective (RBO), candidate
System);If relationship 1 is equal to relationship 2, R (relationship 1, relationship 2)=1;If relationship 1 is not equal to relationship 2, R (relationship 1, relationship
2)=0.It is exemplified below, e.g., target binary crelation are as follows: Brake lamp-installation seat plate, candidate binary relationship are as follows: Brake lamp-
Pedestal binary crelation similarity Rela2 (Brake lamp-installation seat plate, Brake lamp-pedestal)=Rela1 (Brake lamp, Brake lamp)+
Rela1 (installation seat plate, pedestal)+R (connection, connection).
Further, the binary crelation similarity of each binary crelation in the target text is added up, is obtained
Second cumulative similarity;The second cumulative similarity are as follows: each binary crelation in target text is traversed in candidate text
One time, the similarity Rela2 with each binary crelation is calculated, entity similarity degree can be remembered less than the score value of 50% (being free of)
It is 0, and all similarities is added.
Further, for each candidate text, each of target text binary crelation and candidate structure are calculated
Similarity Sim2.Specifically, the union of binary crelation sum and binary crelation sum in candidate text in target text is calculated,
For example, binary crelation sum is 12 in target text, binary crelation sum is 14 in candidate text, then union is that 14, Sim3 is
The ratio of second cumulative similarity and the union, i.e., as follows:
Sim2=second adds up similarity/(in target text in binary crelation sum U candidate's text binary crelation sum).
Further, on the basis of the above embodiments, can also include the following steps:
According to the target binary crelation set, target ternary relation set is determined, in the target ternary relation set
Comprising multiple ternary relations, the ternary relation includes two binary crelations, and has identical reality in described two binary crelations
Body.For example, target ternary relation set are as follows: (Brake lamp-installation seat plate, installation seat plate-shell frame), (installation seat plate-shell
Frame, shell frame-partition), (installation seat plate-shell frame, shell frame-installation cavity).
Obtain the candidate ternary relation set in every candidate text.For example, candidate ternary relation set are as follows: (Brake lamp-
Pedestal, pedestal-shell frame), (pedestal-shell frame, shell frame-dust-proof plated film).
According to the binary crelation similarity determine each ternary relation in the target ternary relation set with it is described
The ternary relation similarity of each of candidate ternary relation set candidate's ternary relation.Ternary relation similarity are as follows: target three
The binary crelation similarity of first object binary crelation in first relationship and the first candidate binary relationship in candidate ternary relation,
With the binary pass of the second target binary crelation in target ternary relation and the second candidate binary relationship in candidate ternary relation
It is the sum of similarity, it can be indicated with such as under type:
Rela3 (target ternary relation, candidate ternary relation)=Rela2 (first object binary crelation, first candidate binary
Relationship)+Rela2 (the second target binary crelation, the second candidate binary relationship).
For example, target ternary relation are as follows: (Brake lamp-installation seat plate, installation seat plate-shell frame);
Candidate ternary relation are as follows: (Brake lamp-pedestal, pedestal-shell frame);
Rela3 [(Brake lamp-installation seat plate, installation seat plate-shell frame), (Brake lamp-pedestal, pedestal-shell frame)]
=Rela2 (Brake lamp-installation seat plate, Brake lamp-pedestal)+Rela2 (installation seat plate-shell frame, pedestal-shell
Frame)
=Rela1 (Brake lamp, Brake lamp)+Rela1 (installation seat plate, pedestal)+R (connection, connection)+Rela1 (installation
Seat board, pedestal)+Rela1 (shell frame, shell frame)+R (connection, connection).
The ternary relation similarity of each ternary relation in the target text is added up, the cumulative phase of third is obtained
Like degree;The third each ternary relation that similarity is target text that adds up traverse one time in candidate text, calculates and each
The similarity Rela3 of candidate ternary relation, entity similarity degree are all 0 less than the score value of 50% (being free of), and all similar
Degree is added.
Calculate the similarity Sim3 of each of target text ternary relation and candidate text.Specifically, calculating target
The union of ternary relation sum and ternary relation sum in candidate text in text, for example, ternary relation sum in target text
It is 10, ternary relation sum is 8 in candidate text, then union is the ratio that 10, Sim3 is third cumulative similarity and the union,
It is i.e. as follows:
Sim3=third adds up similarity/(in target text in ternary relation sum U candidate's text ternary relation sum).
Further, the target entity includes special entity, the method also includes:
Determine the entity similarity of the special entity;The special entity can be the entity that user specifies, special entity
Quantity do not limit.For example, the special entity is " Brake lamp " or the special entity can be " Brake lamp " and " installation
Seat board ", which can be entity important in practical solutions, and in this example, which can be with
It is illustrated by taking " Brake lamp " as an example.Such as the candidate entity in candidate text including is " Brake lamp ", " pedestal " and " lamp housing ",
For candidate's text, the entity similarity of special entity includes: that the similarity of " Brake lamp " and " Brake lamp " (is denoted as
" R11 "), the similarity (being denoted as " R12 ") of " Brake lamp " and " pedestal ", the similarity of " Brake lamp " and " lamp housing " (is denoted as
“R13”)。
For each candidate text, the entity similarity of the special entity is added up, it is tired to obtain the 4th
Add similarity;4th cumulative similarity are as follows: R11+R12+R13.
In above-mentioned steps 304, according to the above-mentioned first cumulative similarity, the second cumulative similarity, the cumulative similarity of third
With the similarity SIM of the 4th cumulative similarity and its corresponding weight calculation target text and candidate text.
Formula 1:SIM=sim1*weight1+sim2*weight2+sim3*weight3+sim4*weight 4, wherein
Weight1 is the weight of entity similarity, and weight2 is the weight of binary crelation similarity, and weight3 is that ternary relation is similar
The weight of degree, weight4 are the weights of special entity similarity.
Above-mentioned weight1, weight2, weight3 and weight4 can be configured according to the scene specifically applied,
For example, user think special entity similarity and binary crelation similarity it is even more important, then can by weight2 and
Weight4 is set as high value, such as weight4 is 0.4, weight2 0.3, and weight1 0.2, weight3 are
0.1.Under normal conditions, weight1, weight2, weight3 and weight4 can be set to 0.25.
By formula 1 it is found that in the first possible implementation, can be tired out according to the first cumulative similarity and second
Add similarity, determines the similarity of target text and candidate text, i.e. weight3 is the case where 0, weight4 is 0.
In the second possible implementation, can add up similarity according to the first cumulative similarity and third, determine
The similarity of target text and candidate text, i.e. weight2 is the case where 0, weight4 is 0.
In the third possible implementation, it can be determined according to the first cumulative similarity and the 4th cumulative similarity
The similarity of target text and candidate text, i.e. weight2 is the case where 0, weight3 is 0.
It in the fourth possible implementation, can be according to the first cumulative similarity, the second cumulative similarity and third
The case where cumulative similarity determines the similarity of target text and candidate text, i.e. weight4 is 0.
It in a fifth possible implementation, can be according to the first cumulative similarity, the second cumulative similarity and the 4th
The case where cumulative similarity determines the similarity of target text and candidate text, i.e. weight3 is 0.
It further, can be according to the size of SIM to every in target text and candidate text collection in the embodiment of the present application
The similarity of candidate text is ranked up, and is ranked up according to the sequence of similarity from big to small or sequence from small to large
It is ranked up, the candidate text of preset quantity is shown according to the sequence of similarity, for example, according to the candidate text of sequence display 3
This.
In the present embodiment, by calculating the candidate entity in each of target text target entity and candidate text
Relationship in the relationship between target entity in similarity, target text and candidate text between candidate entity determines target text
The similarity of this and candidate text considers the similarity of entity, it is also contemplated that the similarity of relationship, entity and its entity it
Between relationship can more embody the practical expression of content in text.Further, which may include binary crelation to N member and closes
System, for example, the relationship may include binary crelation and ternary relation, binary crelation includes between two entities and two entities
Relationship, ternary relation includes two binary crelations, and two binary crelations can be attached by associated entity.The application is real
Apply in example, ternary relation be related to three entities and its between relationship therefore calculate target binary crelation and candidate binary relationship
Similarity and the similarity of target ternary relation and candidate relationship can more embody the practical expression of content in text.Further
, can also determine the similarity of specific objective entity, can be determined according to the specific application scenarios of user target text with
The similarity of candidate text enhances user's actual need degree.
Optionally, the novel degree of the target text with each candidate text, institute are determined according to the target similarity
State novel degree and the target similarity inverse correlation.The similarity of target text and candidate text is higher, then the target text phase
It is lower for the novelty of candidate's text.For example, target similarity is 70%, then novelty degree can be with are as follows: 1-70%=30%,
Alternatively, novelty degree can be 1-k*70%, wherein k is correction coefficient, in the present embodiment, the specific method for determining novelty
It does not limit, novel degree and the target similarity inverse correlation.
Optionally, on the basis of the above embodiments, target text in the present embodiment can be object construction, candidate
Text can be candidate structure, i.e., by the method for the record in embodiment 1, extract model by entity extraction model and relationship
Target text is converted into object construction, extracting model candidate text conversion by entity extraction model and relationship is candidate knot
Structure.
Specifically, the target text is the text of structuring, and in step 201, the step of the acquisition target text,
It can be the following steps are included: obtaining target text;
The target text is input to entity extraction model, the target text is identified by the entity extraction model
In entity;
The target text for having recognized the entity is input to relationship and extracts model, model is extracted by the relationship and is mentioned
Take the relationship between the entity;
According to the relationship between the entity and the entity, structured representation is carried out to the target text, generates knot
The text of structure.
Optionally, in above-mentioned steps 202, the target entity set extracted in the target text can be specific
Include the following steps:
Using the target text as the input of entity extraction model, pass through target described in the entity extraction model extraction
Target entity set in text, the entity extraction model is trained to the first corpus set, described
First corpus set is to carry out entity corpus labeling to every text in the first text collection to obtain.
Optionally, the step of binary crelation between the every two entity extracted in the target text, can be with
Specifically comprise the following steps:
The target text for having recognized the target entity set is input to relationship and extracts model, is mentioned by the relationship
Take the relationship between target entity described in model extraction;The relationship, which extracts model, to be carried out to the second corpus information set
Trained, the second corpus set is to carry out relationship corpus labeling and entity mark to every text of second text collection
What note obtained.
Embodiment 3
It please refers to shown in Fig. 7, the embodiment of the present application also provides a kind of method of determining text novelty degree, this method applications
In a kind of electronic equipment, which can be server, or in terminal the present embodiment, which can be with
It is illustrated for terminal, this method specifically comprises the following steps:
Step 401 determines target text.
For example, the target text can be a patent, a paper, in the present embodiment, which is with patent
Example is illustrated.
Multiple target entities in step 402, the extraction target text, obtain target entity set.
In this example, by multiple target entities in target text described in the entity extraction model extraction in embodiment 1,
Specifically, the target text is input to entity extraction model, the target text is identified by the entity extraction model
In multiple target entities, multiple target entity forms the target entity set.
Step 403, the candidate entity sets for obtaining every candidate text in candidate text collection.
For candidate's text collection including that can be patent set, which includes more candidate texts (as specially
Benefit), server gets candidate's text collection from patent database, extracts every in candidate's text collection offline in advance
The candidate entity of candidate text, obtains candidate entity sets.Alternatively, the server can also be in On-line testing candidate's text collection
Every candidate text candidate entity, the candidate entity sets are obtained, specifically, reality documented by embodiment 1 can be passed through
Body extracts the candidate entity of every candidate text in model extraction candidate text collection, obtains candidate's entity sets.
Step 404, the first instance intersection for determining the target entity set and the candidate entity sets, described first
Entity intersection is the entity to match in the target entity set and the candidate entity sets.
For example, the target entity collection is combined into (Brake lamp, pedestal, lamp housing), candidate entity sets (Brake lamp, installation seat plate,
Shell frame).The first instance intersection is (Brake lamp).
Step 405 determines the target according to the difference parameter of the first instance intersection and the target entity set
The novel degree of text and the candidate text.
The difference parameter according to the first instance set and the target entity set determines first instance novelty
Degree.That is:
First instance novelty degree=[target entity set-intersection (target entity set, candidate entity sets)]/target is real
Body set=1- first instance intersection/target entity set.
In the present embodiment, the difference parameter of first instance intersection and target entity set is that first instance intersection and target are real
The ratio of body set, alternatively, the difference parameter of the first instance intersection and target entity set may be first instance intersection
Difference parameter with target entity set is the ratio of first instance intersection and target entity set multiplied by a coefficient, the difference
Different parameter does not repeat herein there are also other deformations.
In the present embodiment, it is first determined need the target text of novelty degree to be determined, which can be for one specially
Benefit;Multiple target entities in the target text are further extracted, target entity set is obtained;Obtain candidate text collection
In every candidate text candidate entity sets;Traverse each candidate text, determine the target entity set with it is each
The first instance intersection of the candidate entity sets of the candidate text of a piece, first instance intersection are the target entity set and the time
Select the entity to match in entity sets;Finally, being joined according to the difference of the first instance intersection and the target entity set
Number determines the novel degree of the target text and the candidate text.In the present embodiment, it is contemplated that all mesh in target text
Candidate entity all in entity and every candidate text is marked, according to the difference of first instance intersection and the target entity set
Different parameter determines the novel degree of the target text and the candidate text, compared with the existing technology, only subjective really by user
Fixed keyword determines novelty degree by Keywords matching, and the determination method of novel degree needs the shadow by user's subjective understanding
It rings, it is the expression of target text and candidate text true content that method provided by the embodiments of the present application is more objective, therefore, novel
Degree calculates more acurrate.
Optionally, on the basis of the above embodiments, the embodiment of the present application can also include following step before step 405
It is rapid:
Multiple binary crelations in the target text are extracted, target binary crelation set is obtained, binary crelation includes two
A entity and its between relationship;
Obtain the candidate binary set of relationship including multiple binary crelations in the candidate text;
Determine the first binary crelation intersection of the target binary crelation set Yu the candidate binary set of relationship, it is described
First binary crelation intersection includes the binary to match in the target binary crelation set and the candidate binary set of relationship
Relationship;
Then, in step 405, determined according to the difference parameter of the first instance set and the target entity set
The novel degree of the target text and the candidate text can specifically include:
First instance novelty degree is determined according to the difference parameter of the first instance set and the target entity set;
That is: first instance novelty degree (R1_1)=[target entity set-intersection (target entity set, candidate entity sets)]/target
Entity sets=1- first instance intersection/target entity set.According to the first binary crelation intersection and the target binary
The difference parameter of set of relationship determines the first binary crelation novelty degree;
R2_1=[target binary crelation set-intersection (target binary crelation set, candidate binary set of relationship]/target
Binary crelation set=the first binary crelation of 1- intersection/target binary crelation set.
The difference parameter of the first binary crelation intersection and the target binary crelation set can be the first binary crelation
The ratio of intersection and the target binary crelation set, or, or the ratio is multiplied by other deformations such as coefficient, tool
The not restriction of body.
Optionally, in the mode that another kind may be implemented, according to the first instance novelty degree and the first binary crelation
Novel degree and its respective weight can determine the novel degree of the target text and the candidate text.In this kind of implementation
In, the novel degree of target binary crelation and the candidate binary relationship in candidate text in target text is further calculated,
When determining that the novelty of target text and candidate text is spent, both in view of the novel degree between entity, further binary is combined to close
Novel degree between system, improves the accuracy rate of novel degree.
On the basis of the above embodiments, can also include the following steps:
The target ternary relation set in the target text is extracted, the target ternary relation set includes multiple ternarys
Relationship, the ternary relation include two binary crelations, entity having the same in described two binary crelations;
Obtain the candidate ternary relation set including multiple ternary relations in the candidate text;
Determine the first ternary relation intersection of the target ternary relation set and the candidate ternary relation set, it is described
First ternary relation intersection includes the ternary to match in the target ternary relation set and the candidate ternary relation set
Relationship;
Wherein, described that the target text is determined according to the first instance novelty degree and the first binary crelation novelty degree
The novel degree of this and the candidate text, can also specifically include:
The first ternary is determined according to the difference parameter of the first ternary relation intersection and the target ternary relation set
Relationship novelty degree.That is, R3_1=[target ternary relation set-intersection (target ternary relation set, candidate ternary relation collection
Close]/target binary crelation set=the first ternary relation of 1- intersection/target ternary relation set.The first ternary relation intersection
Difference parameter with the target binary crelation set can be the first ternary relation intersection and the target ternary relation set
Ratio, alternatively, the ratio multiplied by coefficient etc. other deformation, do not limit specifically.
According to the first instance novelty degree, the first binary crelation novelty degree and the first ternary relation novelty degree
And its corresponding weight determines the novel degree of the target text and the candidate text.
Novelty degree=the R1_1*weight1+R2_1*weight2+R3_1*weight3, wherein in this example, should
Weight1 is the weight of first instance novelty degree;Weight2 is the weight of the first binary crelation novelty degree;Weight3 is
The weight of one ternary relation.In this kind of implementation, target ternary relation and candidate in target text are further calculated
The novel degree of candidate ternary relation in text both considers entity when determining that the novelty of target text and candidate text is spent
Between novel degree, further combine the novel degree between binary crelation and the novel degree between ternary relation, improve new
The accuracy rate of clever degree.
It should be noted that the relationship can also include 4 yuan of relationships, 5 yuan of relationships etc., this reality in the embodiment of the present application
It applies in example, is only illustrated using binary crelation and ternary relation as example, do not cause the limited explanation to the application.
Optionally, in the present embodiment, which is structured text, as object construction, candidate's text collection
In every candidate text be structuring candidate structure.In this example, candidate map, Ke Yili can be obtained according to candidate structure
Solution, candidate map may include an at least candidate structure, when candidate map includes a candidate structure, candidate figure
It composes identical as candidate structure.When candidate map includes being more than or equal to 2 candidate structures, understood incorporated by reference to Fig. 8,
Fig. 8 is the structural schematic diagram of candidate map, and the method for determining candidate's map can also include the following steps:
Determine the associated entity of first candidate structure and second candidate structure;Such as, the first candidate structure includes
Entity: pedestal, lamp housing and lampshade.Relationship between entity includes: pedestal-Shade base-lamp housing.The second candidate structure packet
The entity included: lamp housing, wick and electric switch.Relationship between entity includes: lamp housing-wick lamp housing-electric switch.The then first candidate knot
The associated entity of structure and second candidate structure is " lamp housing ".
First candidate structure and second candidate structure are associated by the associated entity, obtained described
Candidate map.Understood incorporated by reference to Fig. 8, by the associated entity by first candidate structure and second candidate structure
It is associated.
On the basis of the above embodiments, optionally, in the present embodiment, when target text and candidate text are structuring
When text, in the embodiment of the present application, it can be wrapped in candidate's map by calculating the novel degree of object construction and candidate map
The quantity of the candidate structure contained does not limit, for example, may include 3 candidate structures in candidate's map, 4 candidate structures,
Or all candidate structures are ok in candidate entity sets, it is real can to pass through association for the relevant entity of each piece candidate structure
Each piece candidate structure is attached by body, in practical applications, the quantity of candidate structure included in candidate's map
It does not limit, in the present embodiment, for convenience of explanation, the quantity of candidate's text included in candidate's map can be with 2
For be illustrated.Method in the embodiment can also include the following steps:
Extract the candidate entity sets of candidate map;In candidate map, each node indicates an entity, each
The set of side expression relationship.Still by taking binary crelation and ternary relation as an example, binary crelation collection is combined into candidate map the relationship
The set of relationship of all two adjacent nodes.Ternary relation collection is combined into the relationship of all three adjacent nodes in candidate map
Set.
Determine the second instance intersection of the candidate entity sets of the target entity set and the candidate map, this step
It can be understood in conjunction with step 404 in the present embodiment;
Second instance novelty degree is determined according to the difference parameter of the second instance intersection and the target entity set;
I.e.: second instance novelty degree R1_2=(target entity set-intersection [target entity set, candidate entity sets)]/target reality
Body set=1- second instance intersection/target entity set.This step can be understood in conjunction with step 405 in the present embodiment.
Optionally, this method can also include the following steps:
Multiple binary crelations in the object construction are extracted, target binary crelation set is obtained;For example, the target binary
A target binary crelation for including in set of relationship is " lamp housing-wick ".
Two target entities that each target binary crelation in the target binary crelation set is included are navigated to
Corresponding two provider locations in candidate's map;The target binary crelation " lamp housing-wick " is navigated into candidate map
In, " lamp housing " and " wick " the two nodes are found in candidate map.
Calculate the distance between corresponding described two provider locations of each target binary crelation;It calculates in candidate map
" lamp housing " arrives the distance of " wick ", it should be noted that the interval between node two neighboring in candidate map is equal (as remembered
A), to calculate the distance between two nodes, it can be understood as from first instance position (such as " lamp housing ") to second instance position
The distance in (such as " wick ") path is a from " lamp housing " to the distance of " wick " by taking Fig. 8 as an example, and from " pedestal " to " wick "
Path are as follows: from " pedestal " to " lamp housing " is a from " pedestal " to the distance of " lamp housing ", from " lamp housing " from " lamp housing " again to " wick "
Arriving " wick " again also is a, i.e., is 2a from the distance L of from " pedestal " to " wick ".
Second binary crelation of each target binary crelation relative to the candidate map is determined according to the distance
Novel degree;The novelty score R2_2 of one the second binary crelation is directly proportional to L, and the L the short, and then novelty degree is lower, and the L the long then new
Clever degree is higher.
It, can be novel according to second instance novelty degree R1_2 and the second binary crelation in the mode that the first may be implemented
Degree R2_2 and its corresponding weight determine the novel degree of the object construction and the candidate map.This kind of implementation
In, it determines second instance novelty degree, further calculates second in target text in target binary crelation and candidate map
The novel degree of binary crelation had both considered the novel degree between entity when determining that the novelty of object construction and candidate structure is spent,
The novel degree between binary crelation is further combined, the accuracy rate of novel degree is improved.
In the mode that may be implemented at second, firstly, obtaining the time including multiple binary crelations in the candidate map
Select binary crelation set;Determine that the second binary crelation of the target binary crelation set and the candidate binary set of relationship is handed over
Collection;Determine that the first binary crelation is new according to the difference parameter of the second binary crelation intersection and the target binary crelation set
Clever degree;
It is then possible to according to above-mentioned first binary crelation novelty degree and the second binary crelation novelty degree and its corresponding
The novel degree of weight calculation binary crelation.That is: the first binary crelation of binary crelation novelty degree R2=novelty degree R2_1* weight1
+ R2_2*weight2, wherein in this example, weight1 is the weight of R2_1;Weight2 is the weight of R2_2;The weight can
To carry out different settings according to different application scenarios.
Then, true according to second instance novelty degree R2_1 and binary crelation novelty degree R2_2 and its corresponding weight
The novel degree of the structure that sets the goal and the candidate map.In this kind of implementation, the binary crelation novelty degree is by the first binary
Relationship novelty degree and the second binary crelation novelty degree and its respective weights determine jointly, increase determining binary crelation novelty degree
It is applicable in scene.
On the basis of the above embodiments, optionally, this method can also include the following steps:
Multiple ternary relations in the target map are extracted, target ternary relation set is obtained;Such as, which closes
It is " lamp housing-wick-electric switch " that assembly, which closes a target ternary relation for including,.
Three target entities that each target ternary relation in the target ternary set is included are navigated to described
Three provider locations of correspondence in candidate map;Should " lamp housing-wick-electric switch " navigate to respectively it is corresponding in candidate map
" lamp housing ", the position of " wick " and " electric switch ".
Calculate the shortest distance of any two position in three provider locations;Calculate two nodes of arbitrary neighborhood, " lamp
Shell " and the shortest distance L1 of " wick " in candidate map, " wick " and the shortest distance L2 of " electric switch " in candidate map.Meter
Calculate the sum of two shortest distances, the novel degree score R3_2 of second ternary relation is directly proportional to L1+L2, L1+L2 more it is short then
Novel degree is lower, and the more long then novelty degree of L1+L2 is higher.
It in the third possible implementation, can be according to the second instance novelty degree R1_2, the second binary crelation
Novel degree R2_2 and the second ternary relation novelty degree R3_2 and its corresponding weight determine the object construction and the time
Select the novel degree of map.
For example, the novelty degree=R1_2*weight1+R2_2*weight2+R3_2*weight3, in this kind of implementation,
Weight1 is the weight of second instance novelty degree, and weight2 is the weight of the second binary novelty degree, and weight3 is the second ternary
The weight of relationship novelty degree.
It further, can be according to the size of novel degree to target text and candidate text collection in the embodiment of the present application
In the novel degree of every candidate text be ranked up, be ranked up according to the sequence of novel degree from big to small or from small to large
Sequence is ranked up, and the candidate text of preset quantity is shown according to the sequence of novel degree, for example, according to 3 times of sequence display
Selection sheet.
In this kind of implementation, the in target ternary relation in object construction and candidate map is further calculated
The novel degree of two ternary relations, when determining that the novelty of object construction and candidate structure is spent, both in view of the novelty between entity
Degree further combines the novel degree between binary crelation and the novel degree between ternary relation, improves the accurate of novel degree
Rate.
Further, on the basis of the third above-mentioned implementation, the 4th kind of possible implementation is additionally provided, it should
Method can also include the following steps:
Determine the target ternary relation set of the object construction and the candidate ternary relation set of candidate map
Second ternary relation intersection;
[the first ternary is determined according to the difference parameter of the second ternary relation intersection and the target ternary relation set
Relationship novelty degree;That is: R3_1=[target ternary relation set-intersection (target ternary relation set, candidate ternary relation collection
Close)]/target ternary relation set.
In the fourth possible implementation, firstly, according to the first ternary relation novelty degree and the described 2nd 3
First relationship novelty degree and corresponding weight determine ternary relation novelty degree;That is: ternary relation novelty degree
R3=R3_1*weight1+R3_2*weight2, in this kind of implementation, weight1 is the weight of R3_1;
Weight2 is the weight of R3_2.
Then, according to the second instance novelty degree R1_2, binary crelation novelty degree R2 and the ternary relation novelty degree
R3 and its corresponding weight determine the novel degree of the object construction and the candidate map.In this kind of implementation,
The ternary relation novelty degree determines jointly by the first ternary relation novelty degree and the second ternary relation novelty degree and its respective weights,
Increase the applicable scene of determining ternary relation novelty degree.
It should be noted that in the embodiment of the present application, being mutually related in embodiment 1, embodiment 2 and embodiment 3 interior
Appearance can be quoted mutually.It such as, can also include such as in the multiple binary crelations extracted in the target text the step of
Lower step:
Entity relationship data set is obtained, the entity relationship data set is according between the entity and entity in text collection
Relationship obtain;The entity relationship matrix includes the relationship between N number of entity and N number of entity, the N be greater than or
Equal to 2;
It is inquired in the entity relationship data set, obtains having with the first instance related M second in fact
Body, the M are less than or equal to N;
In the presetting range in the target text, the second instance is searched;
If finding at least one target second instance in the M second instance, establish the first instance with
Relationship between the target second instance.
In the presetting range in the target text, in the step of searching before the second instance, the method is also
It may include steps of:
Create Entities Matching window;
The presetting range in the target text is determined according to the size of the Entities Matching window.
In the multiple target entities extracted in the target text the step of, following step can also be specifically included
It is rapid:
The target text is input to entity extraction model, the target text is identified by the entity extraction model
In multiple target entities.
In the multiple binary crelations extracted in the target text the step of, following step can also be specifically included
It is rapid:
The target text for having recognized the target entity is input to relationship and extracts model, mould is extracted by the relationship
Type extracts the binary crelation between the target entity.
According to the relationship between the target entity, structured representation is carried out to the target text, generates object construction.
The object construction includes node and side, and the node is for indicating the target entity, and the side is for indicating target entity
Between relationship.
Embodiment 4
Referring to Fig. 9, the embodiment of the present application provides a kind of method for obtaining image information, this method is applied to a kind of electricity
Sub- equipment, the electronic equipment can be server, or terminal, the executing subject in the embodiment of the present application is specifically not
It limits, this method may include steps of:
Step 501 receives target text information to be matched;Wherein, target text information includes target entity.
If executing subject is terminal, terminal receives the target text information to be matched of user's input.If the executing subject
For server, then the target text information to be matched that server receiving terminal is sent, for example, the target text information is " hair
Motivation ".In an application scenarios, which can be illustrated by taking server as an example, and e.g., user wants search " hair
The corresponding image information of motivation ", terminal receive " engine " of user's input, and terminal sends the target entity to server,
The server receives the target text information.It should be noted that the quantity of the target entity in the embodiment of the present application and unlimited
Fixed, it is only exemplary illustration that the target entity, which is " engine ", in this example, does not cause the limited explanation to the application.
Step 502, target entity with image data is concentrated into each candidate image associated by candidate entity match.
Server target entity concentrated into each candidate image with image data associated by candidate entity match,
In, which can be server internal storage, be also possible to obtain from another equipment, specifically not
It limits.It includes a large amount of candidate image that the image data, which is concentrated, and each candidate image has associated candidate entity.For example,
Candidate image 1 is associated with " connecting rod ", and candidate image 2 is associated with " engine " etc..
If candidate entity associated by the first candidate image that step 503, target entity are concentrated with image data matches,
Then determine that the first candidate image is the candidate image to match with target entity.
For example, if candidate associated by the first candidate image that target entity (such as " engine ") and image data are concentrated is real
Body (such as " engine ") matches, it is determined that first candidate image is the candidate image to match with target entity.
Specifically, candidate entity associated by the first candidate image of target entity and image data concentration is matched
Concrete mode can be with are as follows:
Firstly, the semantic vector of candidate entity associated by the semantic vector of acquisition target entity and each candidate image;?
In a kind of possible implementation, the language of target entity can be obtained by " candidate matrices " in step 301 in embodiment 2
The semantic vector of adopted vector sum candidate entity, concrete implementation mode are managed incorporated by reference to by step 301 in embodiment 2
Solution, does not repeat herein.It, can be according in embodiment 2 in step 301, by trained in second of possible implementation
Word2vec model obtains the speech vector of target entity and the semantic vector of candidate entity, concrete implementation mode, incorporated by reference to
Understood by step 301 in embodiment 2, is not repeated herein.
Then, the included angle cosine value of the semantic vector of target entity and the semantic vector of candidate entity is calculated.
According to the included angle cosine value of the semantic vector of target entity and the semantic vector of candidate entity, the target entity is obtained
With the similarity of candidate entity, the similarity is higher, shows that the matching degree of the target entity and candidate entity is higher.
According to matching degree by high sequence on earth, U associated with the target entity candidate entity is determined, which is big
In or equal to 1 integer, determine that candidate image associated by this U candidate entity is the first candidate image, this is first candidate
The quantity of image does not limit.
Step 504, the first candidate image of output.
If executing subject is terminal, terminal display first candidate image.If the executing subject is server, the clothes
Business device sends first candidate image to terminal, so that the terminal display first candidate image.
In an application scenarios, user inputs " engine ", and terminal receives " engine ", then should " engine "
It is sent to server, server matches " engine " with the candidate entity of each of image data concentration, last server
The similarity for being matched to target entity " engine " and candidate entity " engine " is higher than threshold value, target entity " engine " and time
Select the similarity of entity " engine " also above threshold value, it is determined that candidate image Aa and time associated by candidate's entity " engine "
Selecting candidate image Ab associated by entity " engine " is the first candidate image.Server by candidate image Aa and candidate image Ab to
Terminal is sent, the terminal display candidate image Aa and candidate image Ab.
In the embodiment of the present application, target text information to be matched is received first;Wherein, target text information includes target
Entity;Then candidate entity associated by target entity being concentrated each candidate image with image data matches;If target is real
Candidate entity associated by the first candidate image that body is concentrated with image data matches, it is determined that the first candidate image is and mesh
The candidate image that mark entity matches;Export the first candidate image.In the embodiment of the present application, the first candidate image of output be with
The candidate image that target entity in target text information matches, the expression target entity which can be more lively,
The method that image information is obtained in the embodiment of the present application is not needed as being needed in the artificial text of access piece by piece in the prior art
Attached drawing, the image that selection matches with target entity, is greatly saved cost of labor.
On the basis of the above embodiments, the image data set can be pre-established, below to how establishing the picture number
It is described in detail according to collection.In step 503, image data set includes the first image data set, by target entity and picture number
According to before concentrating text information associated by each candidate image to be matched, method can also include the following steps:
In the first possible implementation, image data set includes the first image data set.
Obtain candidate text collection;Wherein, candidate text collection can be patent text set, candidate's text collection packet
More candidate texts are included, every candidate text includes candidate entity;If executing subject is terminal, which can be from server
Candidate's text collection is obtained, if the executing subject is server, which can be server internal storage,
Alternatively, being also possible to what server was obtained from another equipment, do not limit specifically, in the embodiment of the present application, the executing subject
It can be illustrated by taking server as an example.
The frequency that each candidate entity occurs in candidate text collection is counted, for example, in candidate's text collection, " hair
The frequency that motivation " occurs is 10000 times, and the frequency that " connecting rod " occurs is 9900 times, and the frequency that " press mechanism " occurs is 9800
It is secondary etc., in this example for candidate entity and its appearance the frequency by way of example only, do not cause to the embodiment of the present application
Limited explanation.
High frequency entity is determined according to the frequency;Wherein, high frequency entity includes that the frequency occurred in candidate text collection is higher than
The entity of thresholding, such as high frequency entity are the entity that the frequency is higher than 9000.Alternatively, high frequency entity includes being ranked up according to the frequency
Afterwards, the entity before preset position, for example, the sequence by all entities occurred in candidate text according to the frequency from high to low
It is ranked up, selects entity of the ranking before 10000 for high frequency entity.
By at least one the corresponding candidate image of each high frequency entity associated, the first image data set is obtained.First figure
As the high frequency entity in data set is the higher entity of frequency occurred.
Optionally, image data set further includes the second image data set, and target entity and image data are concentrated each candidate
Before text information associated by image is matched, method can also include the following steps:
Obtain candidate text collection;Wherein, every in candidate text collection candidate text includes Detailed description of the invention and attached drawing,
Detailed description of the invention includes the mark of candidate entity and candidate entity, and attached drawing includes candidate image and mark;In candidate's text collection
Every candidate text (such as patent), every patent includes Detailed description of the invention and attached drawing, please refers to Figure 10 and is understood, Figure 10 is attached
The schematic diagram of figure explanation and attached drawing.In Figure 10, comprising multiple candidate entities and each candidate entity in attached drawing in Detailed description of the invention
In corresponding number, such as " soy bean milk making machine ontology " reference numeral " 1 ", in the accompanying drawings number " 1 " corresponding to candidate entity candidate
Image is the candidate image of " soy bean milk making machine ontology ";It is real to number candidate corresponding to " 2 " in the accompanying drawings for " head " reference numeral " 2 "
The candidate image of body is the candidate image of " head ".
The incidence relation that candidate entity and candidate image are established according to mark, obtains the second image data set.Identify attached drawing
In mark (as number), the number in Detailed description of the invention match with the number in attached drawing, then numbers corresponding time for identical
It selects entity to be associated with candidate image, obtains second image data set.
Optionally, image data set further includes third image data set, and target entity and image data are concentrated each candidate
Before text information associated by image is matched, method can also include the following steps:
Obtain candidate text collection;Wherein, the candidate text of every in candidate text collection includes title and Figure of abstract;
For candidate's text still by taking patent as an example, every patent will include title and Figure of abstract, which is that can indicate
The Main Reference of this patent.Such as, entitled " a kind of soy bean milk making machine " of the patent.
Extract the Figure of abstract in candidate text.
Identify the candidate entity in title;Such as it is by the candidate entity in entity extraction model extraction " a kind of soy bean milk making machine "
" soy bean milk making machine ".
The incidence relation for establishing candidate entity and Figure of abstract, obtains third image data set.Establish should " soy bean milk making machine " with
The incidence relation of the Figure of abstract.
It should be noted that the image data set may include the first image data set, the second image data set and third
At least one image data set in image data set.It include the first picture number with the image data set in the embodiment of the present application
According to being illustrated for collection, the second image data set and third image data set.
Optionally, in above-mentioned steps 502, target entity and image data are concentrated into candidate associated by each candidate image
The step of entity is matched can specifically comprise the following steps:
Firstly, candidate entity associated by target entity with the first image data is concentrated each candidate image matches;
Concentrating the candidate entity for including in the first image data is the higher entity of frequency of occurrence, can be first real by target entity and high frequency
Body matches, to improve rate matched.
If target entity is not matched to candidate entity in the first image data concentration, by target entity and in addition to the first figure
As other image datas except data set concentrate candidate entity associated by each candidate image to be matched.If the target entity
It is not matched to candidate entity in the first image data concentration, then by target entity and the second image data set and/or third image
Candidate entity associated by each candidate image is matched in data set.If target entity is matched in the first image data concentration
The associated candidate image of candidate's entity is then directly sent to terminal by candidate entity, so that terminal display candidate's entity.
In the embodiment of the present application, target entity is first matched with the first image data set, matched rate is improved.
Optionally, on the basis of the above embodiments, image data set further includes candidate image relationship, candidate image relationship
Including the relationship between at least two candidate images and at least two candidate images.For example, candidate image relationship are as follows: (candidate's figure
As 1 connection candidate image 2), such as candidate image relationship (soy bean milk making machine ontology image connects head image).The candidate image relationship is
It is obtained according to the relationship between candidate entity, model is such as extracted by relationship and identifies that the relationship between candidate entity is " beans
Pulp grinder ontology " connection " head ", then according to the relationship between candidate entity determine candidate entity associated by pass between image
It is to get candidate image relationship is arrived.
Optionally, on the basis of the above embodiments, when the first candidate image is contained in target candidate images relations,
Such as, it is concentrated in the image data, target candidate images relations are (soy bean milk making machine ontology image connects head image), the first candidate figure
Picture (such as soy bean milk making machine ontology image) is contained in the target candidate image, and method can also include the following steps:
The second candidate image for including in target candidate images relations, the second candidate image and the first candidate figure are determined first
As having relationship;Determine the second candidate image (such as head image) for including in the target candidate images relations.
Then the first candidate image and the second candidate image are exported.
In an application scenarios, if the target entity of user's input is " soy bean milk making machine ", want more lively by image information
Understanding " soy bean milk making machine " structure, which is sent to server by terminal, and server is by the target entity (soy bean milk making machine)
Candidate entity associated by each candidate image is concentrated to match with image data, the candidate entity being matched at this time is " soya-bean milk
Machine ontology ", further, the first candidate image associated by the soy bean milk making machine ontology (i.e. soy bean milk making machine ontology image) are candidate with second
Image (i.e. head image) has connection relationship, waits then then exporting the first candidate image (i.e. soy bean milk making machine ontology image) and second
Select image (i.e. head image).It should be noted that the quantity in the embodiment of the present application for the second candidate image does not limit,
In practical applications, the quantity of first candidate image does not limit, for example, the quantity of first candidate image is 2, each
First candidate image may have the second candidate image of incidence relation, and the quantity of the second candidate image does not also limit, example
Such as, for each first candidate image tool there are two the second candidate image with incidence relation, the amount of images finally exported is 4
A, the first candidate image and the second candidate image of output can be a topological structure, and as shown in figure 11, Figure 11 is the first time
Select the topological schematic diagram of image and the second candidate image.Terminal not only shows the image information of " soy bean milk making machine " and has with " soy bean milk making machine "
Other related image informations.In the present embodiment, it can be exported to have with the first candidate image according to candidate image relationship and closed
Second candidate image of system does not need manual analysis retrieval and other related images of the first candidate image, saves artificial
Cost increases application scenarios.
On the basis of the above embodiments, optionally, target entity includes at least first object entity and the second target is real
Body, target text information further include the first relationship between first object entity and the second target entity;Method can also be specific
Include the following steps:
If the first candidate entity associated by the first candidate image that first object entity is concentrated with image data matches,
Second candidate entity associated by the second candidate image that second target entity is concentrated with image data matches;Then by the first mesh
The first relationship between entity and the second target entity is marked, and second between the first candidate entity and the second candidate entity closes
System is matched;
If the first relationship matches with the second relationship, method further include:
Export the second candidate image.
In an application scenarios, if user's input is first object entity for " soy bean milk making machine ", the second target entity is
" head ", the first relationship between the first object entity and the second target entity is " connection ", if first object entity (beans
Pulp grinder) the first candidate entity (soy bean milk making machine ontology) associated by the first candidate image for concentrating with image data matches, and second
Second candidate entity phase associated by the second candidate image (head image) that target entity (head) is concentrated with image data
Match, then further matching relationship, which is " connection ", the between the first candidate entity and the second candidate entity
Two relationships export the second candidate image if the first relationship matches with the second relationship for " connection ".
Optionally, the relationship established between candidate image is specifically as follows:
Extract the relationship between the candidate entity and candidate entity in candidate text;
According to the relationship between candidate image associated by the candidate entity of relationship foundation between candidate entity.Such as, it extracts
Relationship between candidate entity " soy bean milk making machine ontology " and " head " is " connection ", establishes candidate entity " soy bean milk making machine ontology " and candidate
Relationship between entity " head " is connection relationship.
Optionally, the relationship extracted between the candidate entity in candidate text and candidate entity can specifically include following step
It is rapid:
By candidate text input to entity extraction model, identify that the candidate in candidate text is real by entity extraction model
Body;
The candidate text input for having identified candidate entity is extracted into model to relationship, it is candidate to extract model output by relationship
Relationship between entity.Specifically, extracting model extraction candidate by entity extraction model extraction candidate's entity, and by relationship
Relationship between entity can not repeat herein refering to step 202 and step 203 in embodiment 1.
Optionally, target text information is the object construction of structured representation.
Embodiment 5
It please refers to shown in Figure 12, the embodiment of the present application also provides a kind of method for obtaining entity information, this method applications
In a kind of electronic equipment, which can be server, or terminal, the executing subject tool in the embodiment of the present application
The not restriction of body.In order to better understand the present embodiment, the word in the present embodiment is illustrated first:
It should be noted that " incidence relation " and above-described embodiment 1- between entity and entity in the present embodiment
Entity in embodiment 4 is identical as " relationship " meaning before entity.Explaining for incidence relation in the embodiment of the present application
It is bright, it is also applied for the explanation in above-described embodiment 1- embodiment 4 to " relationship ".
The attribute of incidence relation includes relationship type, and relationship type includes but is not limited to conceptual relation, belonging relation, position
Set relationship, ordinal relation and logical relation.
Wherein, conceptual relation: refer to summary and specific relationship, i.e. hyponymy, such as relative to " automobile ", the vehicles
Belong to upperseat concept, relative to " bus ", " automobile " belongs to upperseat concept.
The conceptual relation can extract model by relationship and be identified, it is in above-described embodiment which, which extracts model,
Relationship extracts model, and optionally, it is further to the right in a large amount of patent text that the relationship in the present embodiment, which extracts model,
Claim learn and training obtains, and includes a large amount of upper subordinate concept in claims, for example, connection component includes
Screw and nut, connection component are upperseat concept, and screw and nut is subordinate concept, and relationship is extracted model and passed through to a large amount of power
The study of sharp claim, which, which extracts model, can identify the hyponymy in text between entity.
Belonging relation: including but not limited to inclusion relation, connection relationship and coordination.
1) inclusion relation: is defined according to inclusion relation, and upper entity includes junior's entity, under level assembly as above includes
Level assembly is relationship between superior and subordinate relationship between automobile and wheel if automobile includes wheel.
2) connection relationship: there is connection relationship, such as " pedestal " connection " LED light ", between pedestal and LED light between entity
Relationship is connection relationship.
3) coordination: having coordination between entity, such as " soy bean milk making machine " includes " upper cover " and " lower cover ", " upper cover " and
There is no inclusion relation between " lower cover ", also without connection relationship, " upper cover " and " lower cover " is arranged side by side, i.e. " upper cover " and " lower cover "
Between relationship be coordination.
Ordinal relation: there is sequencing relationship between entity.For example, step 1: receiving the first signal;Step 2: to letter
It number is handled, obtains second signal.The first signal and the second signal have the sequence in step, i.e. the first signal is first, the
Binary signal is rear, then the first signal and the second signal have temporal ordinal relation, and " the first signal " and " second signal " are
Ordinal relation.
Positional relationship: refer to relationship spatially, such as inside and outside, left, right, front and rear.Such as, " LED light " is set to " bottom
On seat ", " LED light " and " pedestal " has positional relationship.
Logical relation: using an entity as benchmark position in the logic statement of natural language, in the preset model of the entity
Lookup at least one entity in enclosing, at least one entity in the entity and presetting range of the base position is logical relation.
For example, in a logic of natural language statement: a kind of double-layer lower cover soy milk grinder, including cup body and head, head are located at cup
On body, head includes a upper cover and mutually covers the lower cover of conjunction with the upper cover, and motor and control circuit are fixedly installed on head, electricity
Arbor extends down into the cup body below motor room, and motor shaft ends are equipped with crushing knife tool.It is with " motor " in the text
Base position, forward or backward g character, such as the g are 10, then using motor as benchmark position, 10 characters, are found separately forward
One entity " head ", 10 characters, find " control circuit " and " motor shaft " backward, then " head ", " control circuit " and " electricity
Arbor " and " motor " are logical relation.
It please refers to shown in Figure 12, a kind of method of the acquisition entity information provided in the embodiment of the present application may include as follows
Step:
Step 601 receives target text information;Wherein, target text information includes first object entity.
If executing subject is terminal, terminal receives the target text information of user's input.If the executing subject is server,
The then target text information that server receiving terminal is sent.For example, the target text information is " engine ".It should in the present embodiment
Executing subject can be illustrated by taking server as an example.In an application scenarios, e.g., terminal receives " starting for user's input
Machine ", terminal send the target entity to server, which receives the target text information.It should be noted that this Shen
Please the quantity of first object entity in embodiment do not limit, it is only exemplary that the target entity, which is " engine ", in this example
Illustrate, does not cause the limited explanation to the application.
Step 602 retrieves the first candidate entity to match with first object entity in data set;Wherein, data set
Comprising the relationship between candidate entity, and candidate entity, candidate entity is real including at least the first candidate entity and with the first candidate
Body has the second candidate entity of incidence relation.
The data set, which can be, to be pre-established, and is then stored the data set, alternatively, the data set is also possible to
It is obtained from another equipment.It is illustrated below to how establishing the data set:
Obtain candidate text collection;Wherein, candidate text collection can be patent text set, candidate's text collection packet
More candidate texts are included, every candidate text includes candidate entity;It is extracted in model extraction every candidate text by relationship
Candidate entity, then by relationship extract model extraction candidate text in relationship, obtain candidate entity and its between pass
System.According to the incidence relation between candidate entity, and candidate entity, data set is obtained.
If first object entity is " soy bean milk making machine ", first to match in data set with first object entity is candidate real
Body, if the first candidate entity is " soy bean milk making machine ontology ";In data set, have with the first candidate entity " soy bean milk making machine ontology "
The candidate entity " upper cover " of the second of incidence relation.It should be noted that the incidence relation in the embodiment of the present application includes above-mentioned
Belonging relation, conceptual relation, ordinal relation and logical relation.
Such as second candidate entity can be " upper cover ", i.e., the first candidate entity and the second candidate entity are belonging relation
(inclusion relation), the second candidate entity are the time for having conceptual relation, ordinal relation or logical relation with the first candidate entity
Entity is selected, different one is illustrated herein.
It should be noted that first object entity can be tied with the first candidate specific matching process of entity in this step
The step 503 closed in above-described embodiment 4 is understood, is not repeated herein.
Step 603 selects the second candidate entity for having incidence relation with the first candidate entity in data set.
The second candidate entity with the first candidate entity with incidence relation is selected in the data set, for example, " on
Lid " is inclusion relation with the first candidate entity, and " motor " is logical relation with the first candidate entity, and " cap assemblies " first are candidate real
Body is conceptual relation etc., herein a different citing.
Step 604, the candidate entity of output second.
Server sends the second candidate entity to terminal, the terminal display second candidate entity.It, should in the present embodiment
The quantity of second candidate entity does not limit, and the incidence relation between the second candidate entity and the first candidate entity is not yet
It limits.
In an application scenarios, when user needs to improve the dependency structure of soy bean milk making machine, user can be inputted
" soy bean milk making machine ", terminal receive user input " soy bean milk making machine ", and should " soy bean milk making machine " be sent to server, server should " beans
Pulp grinder " is matched with the candidate entity in data set, is somebody's turn to do " soy bean milk making machine " and is matched with candidate entity " soy bean milk making machine ontology ", determines
There is the second candidate entity of incidence relation with " the soy bean milk making machine ontology ", which is sent to terminal by server,
The candidate entity of terminal display multiple second, can show the multiple second candidate entities in the form of a list.
In the embodiment of the present application, target text information is received;Wherein, target text information includes first object entity;?
The the first candidate entity to match with first object entity is retrieved in data set;Candidate entity include at least the first candidate entity and
There is the second candidate entity of incidence relation with the first candidate entity;Then selection has with the first candidate entity in data set
The candidate entity of the second of incidence relation;The candidate entity of output second.In the present embodiment, it can be pushed away automatically according to first object entity
It recommends out and has the related second candidate entity with the first object entity, avoid user by retrieval, text is divided piece by piece
Analysis, so that the mode of the candidate entity of selection second, is greatly saved cost of labor.
Optionally, on the basis of the above embodiments, the attribute of incidence relation includes relationship type, and target text information is also
Including relationship by objective (RBO) condition, relationship by objective (RBO) condition is used to indicate the relation object between target entity and candidate entity to be obtained
Type;The relationship by objective (RBO) condition can be specific character express, such as: it include to connect, bottom etc.."comprising" indicates target
Relationship type between entity and candidate entity to be obtained is belonging relation;" connection " indicates target entity and time to be obtained
Selecting the relationship type between entity is belonging relation, and " bottom " indicates the relationship between target entity and candidate entity to be obtained
Type is conceptual relation.Optionally, which can also be indicated with mark, for example, " bh " expression includes " lj "
Indicate " connection " etc..
In above-mentioned steps 603, select have the second candidate of incidence relation real with the first candidate entity in data set
The specific steps of body can be with are as follows:
The meet the type of relationship by objective (RBO) condition second candidate entity is selected in data set according to the first candidate entity.
For example, the target text information includes first object entity " soy bean milk making machine ", relationship by objective (RBO) condition is "comprising", then
The the second candidate entity for meeting "comprising" relationship according to first candidate entity " soy bean milk making machine ontology " selection in data set, as this
Two candidate entities can be " motor ", " upper cover " and " lower cover " etc..
In the present embodiment, which can also include relationship by objective (RBO) condition further can be according to first
Candidate entity selects the meet the type of relationship by objective (RBO) condition second candidate entity in data set, increases applicable scene.
Optionally, the second candidate entity that there is incidence relation with the first candidate entity is selected specifically may be used also in data set
To include:
Select have the second candidate of incidence relation real with the first candidate entity in data set according to the first candidate entity
The candidate entity of multiple the second of body;
According to presetting rule from the multiple second candidate entities the candidate entity of selection target second, target second is candidate real
Body is as the second candidate entity.
In a kind of mode that may be implemented, determine that each second candidate entity is in data set in the multiple second candidate entities
The frequency of middle appearance;For example, the multiple second candidate entity is " motor ", " upper cover " and " lower cover " etc..Wherein, " motor " is in number
According to concentrating the frequency occurred to be greater than thresholding, alternatively, should " motor " frequency for occurring in data set in all second candidate entities
In rank the first position.
According to the frequency from the multiple second candidate entities the candidate entity of selection target second, the candidate entity of target second is made
For the second candidate entity.For example, can choose " motor " is the candidate entity of target second.
In another implementation, time belonging to each second candidate entity in the multiple second candidate entities can be determined
The relevant date of selection sheet, the relevant date include but is not limited to date of application, submission date and publication date, and multiple
Different text belonging to two candidate entities;
According to relevant date from the multiple second candidate entities the candidate entity of selection target second, target second is candidate real
Body is as the second candidate entity.The relevant date is illustrated by taking publication date as an example, according to publication date apart from current date
The candidate entity of selection target second from the multiple second candidate entities of sequence from the near to the distant.For example, patent belonging to " motor "
The publication date of text is 2018.6.3, and the publication date of patent text belonging to " upper cover " is 2017.5.4, belonging to " lower cover "
Patent text publication date be 2017.1.4, then can choose the publication date nearest from current date it is corresponding second wait
Selecting entity is the candidate entity of target second.It should be noted that being intended merely to the in this present embodiment multiple second candidate entities
Facilitate explanation and for example, do not cause the limited explanation to the application.
Optionally, on the basis of the above embodiments, the attribute of incidence relation further includes relationship dimension, and relationship dimension includes
Binary crelation, or, binary crelation, to X member relationship, X is the integer more than or equal to 3, binary crelation includes two entities and two
Relationship between a entity, X member relationship include X entity, at least (X-1) a binary crelation, and (X-1) a binary crelation passes through
Associated entity connection.
Optionally, on the basis of the above embodiments, the quantity of the second candidate entity is multiple, and target text information is also
Including the second target entity and relationship by objective (RBO) condition, select have the second of incidence relation with the first candidate entity in data set
Candidate entity can also specifically include:
The the multiple second candidate entities to match with the second target entity are retrieved in data set;
Selection meets the candidate entity of target second of relationship by objective (RBO) condition from the multiple second candidate entities;
Export R member relationship group;Wherein, R is the integer more than or equal to 2, and less than or equal to N, R member relationship group packet
Multiple R member relationships are included, each R member relationship includes the first candidate entity, the candidate entity of target second and the first candidate entity and mesh
Relationship between the candidate entity of mark second.
For example, first object entity is " engine ", the second target entity is " connecting rod ", and relationship by objective (RBO) condition is " even
Connect ", the first candidate entity is " engine " and " engine " etc., and retrieval matches multiple with the second target entity in data set
Second candidate entity, which can be " upper connecting rod ", " lower link " and " link assembly " etc., the R member relationship group
Can be binary crelation group and/or ternary relation group, in the present embodiment, the R member relationship group can by taking binary crelation group as an example into
Row explanation, e.g., which includes: binary crelation 1 (engine connection upper connecting rod), (engine connects binary crelation 2 under connecting
Bar), binary crelation 3 (engine connection link assembly) etc..It can be according to first object entity, the second target entity in the present embodiment
And its relationship between first object entity and the second target entity, automatically retrieval go out R member relationship group and export.
Optionally, entity include component, and/or, attribute, and/or, attribute value.
Target entity includes target element, objective attribute target attribute, and/or, Target Attribute values;Candidate entity include candidate component,
Candidate attribute, and/or, candidate value, candidate entity and the candidate textual association belonging to it, for example, candidate text is patent
Text, every patent text have the patent No., candidate's entity can by the patent No. with its belonging to candidate text close
Connection.Method can also include:
Respectively by target element and each candidate component, objective attribute target attribute and each candidate attribute, and/or, Target Attribute values
It is matched with each candidate value;For example, target element is " motor ", objective attribute target attribute is " voltage ", the Target Attribute values
For " 220V ".
The determining target candidate component to match with target element, target candidate attribute, and/or, target candidate attribute value;
The first candidate text with target candidate component liaison is obtained, the second candidate text with target candidate Attribute Association
This, and/or, with the associated third candidate text of target candidate attribute value;The first candidate text, the second candidate text and third
The quantity of candidate text does not limit, it may for example comprise the candidate text of the first of " motor " has 100, second including " voltage "
Candidate text has 80, and third candidate's text including " 220V " has 80.This 100 first candidate texts, 80 the second times
Can have in identical candidate text, such as candidate text XX in selection sheet and 80 third candidate's texts and include " motor ", " electricity
Pressure " and " 220 ", the i.e. first candidate text, the second candidate text and third candidate's text may be the same or different.It needs
It is noted that the quantity explanation merely for convenience of text candidate for first, the second candidate text and third candidate's text
And for example, not to the application being defined property explanation.
The candidate text of output first, the second candidate text, and/or, third candidate's text.
Specifically, the first candidate text, the second candidate text are exported by way of list, and/or, the candidate text of third
This, user can check comprising " motor ", " voltage ", " 220V " candidate text, so that user is checked in detail including target
The detailed description of content in candidate text belonging to component, objective attribute target attribute and/or Target Attribute values.
Optionally, on the basis of the above embodiments, the data set includes candidate relationship, and the candidate relationship includes extremely
Relationship between few two candidate entities and at least two candidate entity, the target text information includes relationship by objective (RBO),
The relationship by objective (RBO) includes at least the relationship between two target entities and the target entity, and described two target entities include
The first object entity and the second target entity;
The second candidate entity for selecting that there is incidence relation with the described first candidate entity in the data set
Step can also specifically include:
The target candidate entity to match with second target entity, the target candidate are retrieved in the data set
Entity and the first candidate entity have incidence relation;For example, relationship by objective (RBO) include first object entity be " lid ", second
Relationship (" packet between target entity " upper cover " and the first object entity (" lid ") and second target entity (" upper cover ")
Containing " relationship).Retrieved in data set match with the second target entity (" upper cover ") target candidate entity (such as " upper cover " or
" upper end cover " or " upper cover body " etc., specific quantity do not limit), each target candidate entity and first object entity (such as lid)
All there is incidence relation (such as inclusion relation).
The first candidate relationship comprising the target candidate entity is searched according to the candidate relationship, wherein described first
Candidate relationship further includes the relationship between third candidate entity and the target candidate entity and the third candidate entity;Number
It include a large amount of candidate relationship according to concentrating, each candidate relationship can contain at least two between candidate entity, and candidate entity
Relationship;Further according to candidate relationship a large amount of in data set, search comprising the target candidate entity (such as " upper cover " or
" upper end cover " or " upper cover body ") the first candidate relationship, in order to briefly explain, the target candidate entity by taking " upper end cover " as an example into
Row explanation, which includes target candidate relationship and third candidate entity (e.g., button, display screen etc.), for example, should
First candidate relationship can be with are as follows: (upper end cover setting button) or (upper end cover setting display screen) etc..It should be noted that first
Incidence relation in candidate relationship between target candidate entity and the first candidate relationship does not limit, and such as can be setting, connection,
Comprising etc..
Further, in the first implementation, using the first candidate relationship as the described second candidate entity output, such as
It exports (upper end cover setting button), server sends first candidate relationship to terminal, and terminal is according to first candidate relationship
Show first candidate relationship, i.e. displaying (upper end cover setting button).In an application scenarios, if technical staff's input (lid
Body includes upper cover), server can automatically recommend out component associated with the relationship by objective (RBO), i.e., can be set in " upper cover "
" display screen " can be set on " button ", or " upper cover ", there is great reference value for technological improvement to technical staff.?
In second of possible implementation, the third candidate entity can also be exported.I.e. directly output third candidate entity (is pressed
Button or display screen).
In the third possible implementation, it is real that the candidate similar with the third candidate entity can also be searched
Body, it is similar between two entities by being determined documented by step 303 in embodiment 1 by the semantic vector of two entities
Degree, does not repeat herein, and selection is greater than the candidate entity of threshold value with the similarity of the third candidate entity, for example, with the third
The similar candidate entity of candidate entity is " key ", and the directly output candidate entity similar with third candidate's entity " is pressed
Key ".
It optionally, in the fourth possible implementation, can also be candidate by the third according to the candidate relationship
Entity is matched with the candidate entity that each candidate relationship is included, determining to wait with the third candidate entity matches the 4th
Select entity;For example, the 4th time that third candidate's entity is " button " and the third candidate entity (such as " button ") matches
Select entity (such as key).
It is candidate comprising the 4th using the second candidate relationship comprising the described 4th candidate entity as the described second candidate entity
Second candidate relationship of entity can be (key is set to operation panel), export second candidate relationship, can open up in terminal
The content shown are as follows: lid includes upper cover, upper cover setting button, and key is set to operation panel, and optionally, the content of the displaying can
To be the text of structuring, alternatively, the image of structuring.In an application scenarios, if (lid includes upper to technical staff's input
Lid), server can automatically recommend out component associated with the relationship by objective (RBO), i.e., " button " can be set in " upper cover ",
Or " key " can be set in " upper cover ", " key " is set on " operation panel ", and server is to the recommendation of entity to technology people
Member has great reference value for technological improvement.
Optionally, in a fifth possible implementation, the target text information includes relationship by objective (RBO), the target
Relationship includes at least the relationship between two target entities and the target entity, and described two target entities include described first
Target entity and the second target entity;It is described to select that there is incidence relation with the described first candidate entity in the data set
Second candidate entity can also specifically include:
The target candidate entity to match with second target entity, the target candidate are retrieved in the data set
Entity and the first candidate entity have incidence relation;For example, relationship by objective (RBO) include first object entity be " lid ", second
Relationship (" packet between target entity " upper cover " and the first object entity (" lid ") and second target entity (" upper cover ")
Containing " relationship).Retrieved in data set match with the second target entity (" upper cover ") target candidate entity (such as " upper cover " or
" upper end cover " or " upper cover body " etc., specific quantity do not limit), each target candidate entity and first object entity (such as lid)
All there is incidence relation (such as inclusion relation).
Optionally, in a fifth possible implementation, it is searched according to candidate relationship and is had with the target candidate entity
The candidate entity of relevant the 5th, the described 5th candidate entity are contained in third candidate relationship, wherein the third is waited
Selecting relationship includes the described 5th candidate entity, the 6th candidate entity and the 5th candidate entity and the 6th candidate entity
Between relationship;The 5th candidate that there is incidence relation with the target candidate entity (upper end cover) is such as searched according to candidate relationship
Entity (soy bean milk making machine ontology), the described 5th candidate entity are contained in third candidate relationship, which can be
(upper end cover connect soy bean milk making machine ontology), alternatively, the third candidate relationship may be (soy bean milk making machine ontology includes lower cover), this
Six candidate entities can be identical as target candidate entity, can also be different.
Further, using the third candidate relationship as the described second candidate entity output.In an application scenarios,
If technical staff's input (lid includes upper cover), server can automatically recommend out candidate pass associated with the relationship by objective (RBO)
System, the content that can be shown such as terminal are as follows: lid includes upper cover, and upper end cover connects soy bean milk making machine ontology, under soy bean milk making machine ontology includes
End cap is alternatively, lid includes upper cover, and soy bean milk making machine ontology connects pedestal, and upper cover and soy bean milk making machine ontology have the relationship of connection.Originally show
In example, according to relationship by objective (RBO), server can recommend the relationship for having incidence relation with the relationship by objective (RBO), enhance applicable field
Scape, server have great reference value for technological improvement to the recommendation of relationship.
Optionally, in a sixth possible implementation, it is determined according to the candidate relationship candidate comprising the third
4th candidate relationship of relationship;Such as, the 4th candidate relationship are as follows: (upper end cover connects soy bean milk making machine ontology, and soy bean milk making machine ontology connects bottom
Seat), further, using the 4th candidate relationship as the described second candidate entity output.In an application scenarios, if skill
Art personnel input (lid includes upper cover), server can automatically recommend out candidate relationship associated with the relationship by objective (RBO), such as
The content that terminal can be shown are as follows: lid includes upper cover, and upper end cover connects soy bean milk making machine ontology, and soy bean milk making machine ontology includes lower cover.
In this example, according to relationship by objective (RBO), server can recommend the relationship for having incidence relation with the relationship by objective (RBO), enhance applicable
Scene, server have great reference value for technological improvement to the recommendation of relationship.
It should be noted that, for candidate relationship, relationship by objective (RBO), candidate entity is all exemplary illustration in the present embodiment,
The limited explanation to the application is not caused.
Optionally, on the basis of the above embodiments, data set further includes image data set, and image data set includes multiple
Candidate image, each candidate image in multiple candidate images have associated candidate entity, the selection and first in data set
After the candidate entity of candidate entity tool related second, method further include:
According to the second candidate entity lookup image data set, candidate image associated with second candidate's entity is determined, it will
The candidate image of second candidate entity is as the second candidate entity.
For example, the second candidate entity is " upper connecting rod " and " lower link " in an application scenarios, according to second candidate
Entity lookup image data set determines candidate image associated with " upper connecting rod " and " lower link ", by the image of " upper connecting rod "
The image of " lower link " is as the second candidate entity output.
In the present embodiment, the candidate image of the available second candidate entity directly exports the candidate of the second candidate entity
Image, enhances the vividness of the second candidate entity, and image information is easier to user and understands the second candidate entity.
Optionally, on the basis of the above embodiments, it is illustrated below to how establishing image data set:
In one implementation, image data set includes the first image data set, according to the second candidate entity lookup figure
As data set, before determining candidate image associated with second candidate's entity, method further include:
Candidate text collection is obtained, candidate text collection includes more candidate texts, and every candidate text includes candidate
Entity;
Count the frequency that each candidate entity occurs in candidate text collection;
High frequency entity, high frequency entity are determined according to the frequency are as follows: the frequency of appearance is higher than the entity of thresholding, alternatively, high frequency is real
Body are as follows: the entity after being ranked up according to the frequency, before preset position;
By at least one the corresponding candidate image of each high frequency entity associated, the first image data set is obtained.
In the second implementation, image data set includes the second image data set, according to the second candidate entity lookup
Image data set, before determining candidate image associated with second candidate's entity, method further include:
Candidate text collection is obtained, every candidate text in candidate text collection includes Detailed description of the invention and attached drawing, attached drawing
Illustrate the mark comprising candidate entity and candidate entity, attached drawing includes candidate image and mark;
The incidence relation that candidate entity and candidate image are established according to mark, obtains the second image data set.
In the third implementation, image data set includes third image data set, according to the second candidate entity lookup
Image data set, before determining candidate image associated with second candidate's entity, method further include:
Candidate text collection is obtained, every candidate text in candidate text collection includes title and Figure of abstract;
Extract the Figure of abstract in candidate text;
Identify the candidate entity in title;
The incidence relation for establishing candidate entity and Figure of abstract, obtains third image data set.
In the present embodiment, which includes the first image data set, the second image data set and/or third image
The method of data set, first image data set, the second image data set and third image data set specifically established can join
It reads and establishes the specific method of image data in above-described embodiment 4 and understood.
Optionally, it is illustrated below to how searching image data set:
Image data set includes the first image data set, and the first image data set includes the candidate image of high frequency entity, high
Frequency entity is the candidate entity that frequency of usage is higher than thresholding;
It will be according to second the first image data set of candidate entity lookup;
If candidate image associated with second candidate's entity is not found in the first image data concentration, according to second
Other image data set (such as the second image data set and/or the thirds of candidate entity lookup other than the first image data set
Image data set).
Firstly, candidate entity associated by target entity with the first image data is concentrated each candidate image matches;
It, can be first by target entity and height because concentrating the candidate entity for including in the first image data is the higher entity of frequency of occurrence
Frequency entity matches, to improve rate matched.
If target entity is not matched to candidate entity in the first image data concentration, by target entity and in addition to the first figure
As other image datas except data set concentrate candidate entity associated by each candidate image to be matched.If the target entity
It is not matched to candidate entity in the first image data concentration, then by target entity and the second image data set and/or third image
Candidate entity associated by each candidate image is matched in data set.If target entity is matched in the first image data concentration
The associated candidate image of candidate's entity is then directly sent to terminal by candidate entity, so that terminal display candidate's entity.
In the embodiment of the present application, target entity is first matched with the first image data set, matched rate is improved.
Embodiment 6 please refers to shown in Figure 13, and the embodiment of the present application provides the one of the device of a kind of determining text novelty degree
A embodiment, the device are used to execute the practical method and step executed of electronic equipment in above-described embodiment 3, the device
1300 include:
Text determining module 1301, for determining target text;
Entity extraction module 1302, for extracting multiple mesh in the target text that the text determining module 1301 determines
Entity is marked, target entity set is obtained;
Entity obtains module 1303, obtains the candidate entity sets of every candidate text in candidate text collection;
Entity intersection determining module 1304, the target entity collection extracted for determining the entity extraction module 1302
The first instance intersection that the candidate entity sets that module 1303 obtains are obtained with entity is closed, the first instance intersection is institute
State the entity to match in target entity set and the candidate entity sets;
Novel degree determining module 1305, the first instance intersection for being determined according to the entity intersection determining module 1304
The difference parameter for the target entity set extracted with the entity extraction module 1302 determines the target text and the candidate
The novel degree of text.
It please refers to shown in Figure 14, on the basis of Figure 13 corresponding embodiment, the embodiment of the present application provides a kind of determination
Another embodiment of the device 1400 of text novelty degree, the device further include relationship extraction module 1306, Relation acquisition module
1307 and relationship intersection determining module 1308;
Relationship extraction module 1306 obtains target binary pass for extracting multiple binary crelations in the target text
Assembly close, the binary crelation include two entities and its between relationship;
Relation acquisition module 1307, for obtaining the candidate binary relationship including multiple binary crelations in the candidate text
Set;
Relationship intersection determining module 1308 is also used to determine the target binary crelation that the relationship extraction module 1306 extracts
First binary crelation intersection of the candidate binary set of relationship that set is obtained with the Relation acquisition module 1307, the described 1st
First relationship intersection includes the binary crelation to match in the target binary crelation set and the candidate binary set of relationship;
Novel degree determining module 1305, also particularly useful for:
First instance novelty degree is determined according to the difference parameter of the first instance set and the target entity set;
The first binary is determined according to the difference parameter of the first binary crelation intersection and the target binary crelation set
Relationship novelty degree;
According to the first instance novelty degree and the first binary crelation novelty degree determine the target text with it is described
The novel degree of candidate text.
Optionally, relationship extraction module 1306 is also used to extract the target ternary relation set in the target text, institute
Stating target ternary relation set includes multiple ternary relations, and the ternary relation includes two binary crelations, described two binary
Entity having the same in relationship;
Relation acquisition module 1307 is also used to obtain the candidate ternary including multiple ternary relations in the candidate text and closes
Assembly is closed;
Relationship intersection determining module 1308 is also used to determine the target ternary relation that the relationship extraction module 1306 extracts
First ternary relation intersection of the candidate ternary relation set that set is obtained with the Relation acquisition module 1307, the described 1st
First relationship intersection includes the ternary relation to match in the target ternary relation set and the candidate ternary relation set;
Novel degree determining module 1305, also particularly useful for:
The first ternary is determined according to the difference parameter of the first ternary relation intersection and the target ternary relation set
Relationship novelty degree;
According to the first instance novelty degree, the first binary crelation novelty degree and the first ternary relation novelty degree
Determine the novel degree of the target text and the candidate text.
Optionally,
Entity extraction module 1302 is also used to the target text being input to entity extraction model, passes through the entity
It extracts model and identifies multiple target entities in the target text.
Optionally, relationship extraction module 1306 is also used to for the target text for having recognized the target entity being input to
Relationship extracts model, extracts the binary crelation between target entity described in model extraction by the relationship.
It optionally, further include generation module 1309;
Generation module 1309, target entity and relationship extraction module for being extracted according to the entity extraction module 1302
Relationship between 1306 target entities extracted carries out structured representation to the target text, generates object construction.
Optionally, object construction includes node and side, and for the node for indicating the target entity, the side is used for table
Show the relationship between target entity.
Optionally, candidate structure of the every candidate text for structuring, the target text in the candidate text collection
For object construction;Entity extraction module 1302, is also used to extract the candidate entity sets of candidate map, and candidate's map includes
An at least candidate structure;
Entity intersection determining module 1304 is also used to determine the target entity set that entity extraction module 1302 is extracted
The second instance intersection of the candidate entity sets for the candidate map that module 1303 obtains is obtained with the entity;
Novel degree determining module 1305 is also used to the difference according to the second instance intersection and the target entity set
Parameter determines the novel degree of the target text and the candidate map.
Optionally, when the candidate map includes at least two candidate structures, at least two candidate structures are the
One candidate structure and the second candidate structure;The device further includes associated entity determining module 1310 and relating module 1311;
Associated entity determining module 1310, for determining the association of first candidate structure and second candidate structure
Entity;
Relating module 1311, the associated entity for being determined by the associated entity determining module 1310 is by described first
Candidate structure and second candidate structure are associated, and obtain the candidate map.
Optionally, relationship extraction module 1306 is also used to extract multiple binary crelations in the object construction, obtains mesh
Mark binary crelation set;
Novel degree determining module 1305, also particularly useful for:
Two target entities that each target binary crelation in the target binary crelation set is included are navigated to
Corresponding two provider locations in candidate's map;
Calculate the distance between corresponding described two provider locations of each target binary crelation;
Second binary crelation of each target binary crelation relative to the candidate map is determined according to the distance
Novel degree;
Second instance novelty degree is determined according to the difference parameter of the second instance set and the target entity set;
According to the second instance novelty degree and the second binary crelation novelty degree determine the object construction with it is described
The novel degree of candidate map.
Optionally,
Relation acquisition module 1307 is also used to obtain the candidate binary including multiple binary crelations in the candidate map and closes
Assembly is closed;
Relationship intersection determining module 1308 is also used to determine the target binary crelation set and the candidate binary relationship
Second binary crelation intersection of set;
Novel degree determining module 1305, also particularly useful for:
The first binary is determined according to the difference parameter of the second binary crelation intersection and the target binary crelation set
Relationship novelty degree;
It is true according to the first binary crelation novelty degree and the second binary crelation novelty degree and corresponding weight
Determine binary crelation novelty degree;
The object construction and the candidate are determined according to the second instance novelty degree and the binary crelation novelty degree
The novel degree of map.
Optionally,
Relationship extraction module 1306 is also used to extract multiple ternary relations in the object construction, obtains target ternary
Set of relationship;
Novel degree determining module 1305, also particularly useful for:
Any two target entity that each target ternary relation in the target ternary set is included is navigated to
Three provider locations of correspondence in candidate's map;
Calculate the distance between any two provider locations in three provider locations;
Second ternary relation of each target ternary relation relative to the candidate map is determined according to the distance
Novel degree;
According to the determination of the second instance novelty degree, the second binary crelation novelty degree and the second ternary relation novelty degree
The novel degree of object construction and the candidate map.
Optionally,
Relation acquisition module 1307 is also used to obtain the candidate ternary including multiple ternary relations in the candidate map and closes
Assembly is closed;
Relationship intersection determining module 1308 is also used to determine the target ternary relation set and the candidate ternary relation
Second ternary relation intersection of set;
Novel degree determining module 1305, also particularly useful for:
The first ternary relation is determined according to the difference parameter of the ternary relation intersection and the target ternary relation set
Novel degree;
It is true according to the first ternary relation novelty degree and the second ternary relation novelty degree and corresponding weight
Determine ternary relation novelty degree;
The target is determined according to the second instance novelty degree, binary crelation novelty degree and the ternary relation novelty degree
The novel degree of structure and the candidate map.
Please refer to Figure 15, the embodiment of the present application also provides a kind of electronic equipment 70, electronic equipment 70 include: memory 710,
Transceiver 720 and processor 730.Skilled artisans will appreciate that electronic equipment can also include other components, such as counting
Common various assemblies in calculation machine.The intercommunication of memory 710, transceiver 720 and processor 730, memory 710 are used for
Computer instruction is stored, for being communicated with other devices, computer instruction makes transceiver 720 when processor 730 executes
Electronic equipment 70 executes method described in above-mentioned each method embodiment.
The embodiment of the present application also provides a kind of computer storage mediums, for storing computer software instructions, it includes
For executing method performed by electronic equipment in embodiment of the method.
It is that can lead to it will be understood by those skilled in the art that realizing all or part of the process in above-described embodiment method
Computer program is crossed to instruct relevant hardware and complete, the program can be stored in a computer-readable storage medium
In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can for magnetic disk,
CD, read-only memory (Read-Only Memory, ROM), random access memory (Random Access
Memory, RAM), flash memory (Flash Memory), hard disk (Hard Disk Drive, abbreviation: HDD) or solid-state it is hard
Disk (Solid-State Drive, SSD) etc.;The storage medium can also include the combination of the memory of mentioned kind.
Although being described in conjunction with the accompanying the embodiment of the present invention, those skilled in the art can not depart from the present invention
Spirit and scope in the case where various modifications and variations can be made, such modifications and variations are each fallen within by appended claims institute
Within the scope of restriction.
Claims (16)
1. a kind of method of determining text novelty degree characterized by comprising
Determine target text;
Multiple target entities in the target text are extracted, target entity set is obtained;
Obtain the candidate entity sets of every candidate text in candidate text collection;
Determine the first instance intersection of the target entity set and the candidate entity sets, the first instance intersection is institute
State the entity to match in target entity set and the candidate entity sets;
The target text and the time are determined according to the difference parameter of the first instance intersection and the target entity set
The novel degree of selection sheet.
2. the method according to claim 1, wherein the method also includes:
Multiple binary crelations in the target text are extracted, target binary crelation set is obtained, the binary crelation includes two
A entity and its between relationship;
Obtain the candidate binary set of relationship including multiple binary crelations in the candidate text;
Determine the first binary crelation intersection of the target binary crelation set Yu the candidate binary set of relationship, described first
Binary crelation intersection includes the binary crelation to match in the target binary crelation set and the candidate binary set of relationship;
The difference parameter according to the first instance set and the target entity set determines the target text and institute
State the novel degree of candidate text, comprising:
First instance novelty degree is determined according to the difference parameter of the first instance set and the target entity set;
The first binary crelation is determined according to the difference parameter of the first binary crelation intersection and the target binary crelation set
Novel degree;
The target text and the candidate are determined according to the first instance novelty degree and the first binary crelation novelty degree
The novel degree of text.
3. according to the method described in claim 2, it is characterized in that, the method also includes:
The target ternary relation set in the target text is extracted, the target ternary relation set includes that multiple ternarys are closed
System, the ternary relation include two binary crelations, entity having the same in described two binary crelations;
Obtain the candidate ternary relation set including multiple ternary relations in the candidate text;
Determine the first ternary relation intersection of the target ternary relation set and the candidate ternary relation set, described first
Ternary relation intersection includes the ternary relation to match in the target ternary relation set and the candidate ternary relation set;
It is described according to the first instance novelty degree and the first binary crelation novelty degree determine the target text with it is described
The novel degree of candidate text, comprising:
The first ternary relation is determined according to the difference parameter of the first ternary relation intersection and the target ternary relation set
Novel degree;
It is determined according to the first instance novelty degree, the first binary crelation novelty degree and the first ternary relation novelty degree
The novel degree of the target text and the candidate text.
4. the method according to claim 1, wherein the multiple targets extracted in the target text are real
Body, comprising:
The target text is input to entity extraction model, is identified in the target text by the entity extraction model
Multiple target entities.
5. according to the method described in claim 2, it is characterized in that, the multiple binary extracted in the target text are closed
System, comprising:
The target text for having recognized the target entity is input to relationship and extracts model, model is extracted by the relationship and is mentioned
Take the binary crelation between the target entity.
6. according to the method described in claim 5, it is characterised by comprising:
According to the relationship between the target entity, structured representation is carried out to the target text, generates object construction.
7. according to the method described in claim 6, the node is used it is characterized in that, the object construction includes node and side
In indicating the target entity, the side is used to indicate the relationship between target entity.
8. the method according to claim 1, wherein every candidate text is structure in candidate's text collection
The candidate structure of change, the target text are object construction, the method also includes:
The candidate entity sets of candidate map are extracted, candidate's map includes an at least candidate structure;
The entity intersection of the determination target entity set and the candidate entity sets, comprising:
Determine the second instance intersection of the candidate entity sets of the target entity set and the candidate map;
The method also includes:
The target text and the time are determined according to the difference parameter of the second instance intersection and the target entity set
Select the novel degree of map.
9. according to the method described in claim 8, it is characterized in that, when the candidate map includes at least two candidate structures
When, at least two candidate structures are the first candidate structure and the second candidate structure;
Determine the associated entity of first candidate structure and second candidate structure;
First candidate structure and second candidate structure are associated by the associated entity, obtain the candidate
Map.
10. according to the method described in claim 8, it is characterized in that, the method also includes:
Multiple binary crelations in the object construction are extracted, target binary crelation set is obtained;
Two target entities that each target binary crelation in the target binary crelation set is included are navigated to described
Corresponding two provider locations in candidate map;
Calculate the distance between corresponding described two provider locations of each target binary crelation;
Determine that each target binary crelation is novel relative to the second binary crelation of the candidate map according to the distance
Degree;
The difference parameter according to the second instance set and the target entity set determines the target text and institute
State the novel degree of candidate text, comprising:
Second instance novelty degree is determined according to the difference parameter of the second instance set and the target entity set;
The object construction and the candidate are determined according to the second instance novelty degree and the second binary crelation novelty degree
The novel degree of map.
11. according to the method described in claim 10, it is characterized in that, the method also includes:
Obtain the candidate binary set of relationship including multiple binary crelations in the candidate map;
Determine the second binary crelation intersection of the target binary crelation set Yu the candidate binary set of relationship;
The first binary crelation is determined according to the difference parameter of the second binary crelation intersection and the target binary crelation set
Novel degree;
Two are determined according to the first binary crelation novelty degree and the second binary crelation novelty degree and corresponding weight
First relationship novelty degree;
It is described according to the second instance novelty degree and the second binary crelation novelty degree determine the object construction with it is described
The novel degree of candidate structure, comprising:
The object construction and the candidate map are determined according to the second instance novelty degree and the binary crelation novelty degree
Novel degree.
12. according to the method for claim 11, which is characterized in that the method also includes:
Multiple ternary relations in the object construction are extracted, target ternary relation set is obtained;
Any two target entity that each target ternary relation in the target ternary set is included is navigated to described
Three provider locations of correspondence in candidate map;
Calculate the distance between any two provider locations in three provider locations;
Determine that each target ternary relation is novel relative to the second ternary relation of the candidate map according to the distance
Degree;
It is described according to the second instance novelty degree and the second binary crelation novelty degree determine the object construction with it is described
The novel degree of candidate structure, comprising:
The target is determined according to the second instance novelty degree, the second binary crelation novelty degree and the second ternary relation novelty degree
The novel degree of structure and the candidate map.
13. according to the method for claim 12, which is characterized in that the method also includes:
Obtain the candidate ternary relation set including multiple ternary relations in the candidate map;
Determine the second ternary relation intersection of the target ternary relation set and the candidate ternary relation set;
The first ternary relation novelty is determined according to the difference parameter of the ternary relation intersection and the target ternary relation set
Degree;
Three are determined according to the first ternary relation novelty degree and the second ternary relation novelty degree and corresponding weight
First relationship novelty degree;
The object construction and the candidate map are determined according to the second instance novelty degree and the binary crelation novelty degree
Novel degree, comprising:
The object construction is determined according to the second instance novelty degree, binary crelation novelty degree and the ternary relation novelty degree
With the novel degree of the candidate map.
14. a kind of device of determining text novelty degree characterized by comprising
First determining module, for determining target text;
Extraction module is obtained for extracting multiple target entities in the target text that first determining module determines
Target entity set;
Module is obtained, for obtaining the candidate entity sets of every in candidate text collection candidate text;
Second determining module, the target entity set and the acquisition module for determining the extraction module identification obtain
The candidate entity sets first instance intersection, the first instance intersection is the target entity set and the candidate
The entity to match in entity sets;
Novelty determining module, the first instance intersection and the extraction mould for being determined according to second determining module
The difference parameter for the target entity set that block extracts determines the novel degree of the target text and the candidate text.
15. a kind of electronic equipment characterized by comprising
Memory and processor;
Connection is communicated with each other between the memory and the processor, and computer instruction is stored in the memory, it is described
Processor is by executing the computer instruction, thereby executing method of any of claims 1-13.
16. a kind of computer storage medium, which is characterized in that the computer-readable recording medium storage has computer instruction,
The computer instruction is used to that the computer perform claim to be made to require method described in any one of 1-13.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811348626.6A CN109582933B (en) | 2018-11-13 | 2018-11-13 | Method and related device for determining text novelty |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811348626.6A CN109582933B (en) | 2018-11-13 | 2018-11-13 | Method and related device for determining text novelty |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109582933A true CN109582933A (en) | 2019-04-05 |
CN109582933B CN109582933B (en) | 2021-09-03 |
Family
ID=65922365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811348626.6A Active CN109582933B (en) | 2018-11-13 | 2018-11-13 | Method and related device for determining text novelty |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109582933B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111144709A (en) * | 2019-12-06 | 2020-05-12 | 北京邮电大学 | Method and device for determining novelty of machine-generated text |
CN111708873A (en) * | 2020-06-15 | 2020-09-25 | 腾讯科技(深圳)有限公司 | Intelligent question answering method and device, computer equipment and storage medium |
CN111930898A (en) * | 2020-09-18 | 2020-11-13 | 北京合享智慧科技有限公司 | Text evaluation method and device, electronic equipment and storage medium |
CN112052835A (en) * | 2020-09-29 | 2020-12-08 | 北京百度网讯科技有限公司 | Information processing method, information processing apparatus, electronic device, and storage medium |
CN113743087A (en) * | 2021-09-07 | 2021-12-03 | 珍岛信息技术(上海)股份有限公司 | Text generation method and system based on neural network vocabulary extension paragraphs |
CN115879441A (en) * | 2022-11-10 | 2023-03-31 | 中国科学技术信息研究所 | Text novelty detection method and device, electronic equipment and readable storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110202545A1 (en) * | 2008-01-07 | 2011-08-18 | Takao Kawai | Information extraction device and information extraction system |
US20130232160A1 (en) * | 2012-03-02 | 2013-09-05 | Semmle Limited | Finding duplicate passages of text in a collection of text |
CN104636325A (en) * | 2015-02-06 | 2015-05-20 | 中南大学 | Document similarity determining method based on maximum likelihood estimation |
CN105653706A (en) * | 2015-12-31 | 2016-06-08 | 北京理工大学 | Multilayer quotation recommendation method based on literature content mapping knowledge domain |
CN106815293A (en) * | 2016-12-08 | 2017-06-09 | 中国电子科技集团公司第三十二研究所 | System and method for constructing knowledge graph for information analysis |
JP2017123168A (en) * | 2016-01-05 | 2017-07-13 | 富士通株式会社 | Method for making entity mention in short text associated with entity in semantic knowledge base, and device |
CN107015961A (en) * | 2016-01-27 | 2017-08-04 | 中文在线数字出版集团股份有限公司 | A kind of text similarity comparison method |
CN107665252A (en) * | 2017-09-27 | 2018-02-06 | 深圳证券信息有限公司 | A kind of method and device of creation of knowledge collection of illustrative plates |
WO2018153295A1 (en) * | 2017-02-27 | 2018-08-30 | 腾讯科技(深圳)有限公司 | Text entity extraction method, device, apparatus, and storage media |
CN108763566A (en) * | 2018-06-05 | 2018-11-06 | 北京玄科技有限公司 | Text similarity computing method and device, intelligent robot |
CN108763569A (en) * | 2018-06-05 | 2018-11-06 | 北京玄科技有限公司 | Text similarity computing method and device, intelligent robot |
-
2018
- 2018-11-13 CN CN201811348626.6A patent/CN109582933B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110202545A1 (en) * | 2008-01-07 | 2011-08-18 | Takao Kawai | Information extraction device and information extraction system |
US20130232160A1 (en) * | 2012-03-02 | 2013-09-05 | Semmle Limited | Finding duplicate passages of text in a collection of text |
CN104636325A (en) * | 2015-02-06 | 2015-05-20 | 中南大学 | Document similarity determining method based on maximum likelihood estimation |
CN105653706A (en) * | 2015-12-31 | 2016-06-08 | 北京理工大学 | Multilayer quotation recommendation method based on literature content mapping knowledge domain |
JP2017123168A (en) * | 2016-01-05 | 2017-07-13 | 富士通株式会社 | Method for making entity mention in short text associated with entity in semantic knowledge base, and device |
CN107015961A (en) * | 2016-01-27 | 2017-08-04 | 中文在线数字出版集团股份有限公司 | A kind of text similarity comparison method |
CN106815293A (en) * | 2016-12-08 | 2017-06-09 | 中国电子科技集团公司第三十二研究所 | System and method for constructing knowledge graph for information analysis |
WO2018153295A1 (en) * | 2017-02-27 | 2018-08-30 | 腾讯科技(深圳)有限公司 | Text entity extraction method, device, apparatus, and storage media |
CN107665252A (en) * | 2017-09-27 | 2018-02-06 | 深圳证券信息有限公司 | A kind of method and device of creation of knowledge collection of illustrative plates |
CN108763566A (en) * | 2018-06-05 | 2018-11-06 | 北京玄科技有限公司 | Text similarity computing method and device, intelligent robot |
CN108763569A (en) * | 2018-06-05 | 2018-11-06 | 北京玄科技有限公司 | Text similarity computing method and device, intelligent robot |
Non-Patent Citations (2)
Title |
---|
ALIYA NUGUMANOVA等: "A new text representation model enriched with semantic relations", 《2015 15TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS)》 * |
赵夷平 等: "关联数据在学术资源网相似文献发现中的应用研究", 《现代图书情报技术》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111144709A (en) * | 2019-12-06 | 2020-05-12 | 北京邮电大学 | Method and device for determining novelty of machine-generated text |
CN111144709B (en) * | 2019-12-06 | 2023-04-18 | 北京邮电大学 | Method and device for determining novelty of machine-generated text |
CN111708873A (en) * | 2020-06-15 | 2020-09-25 | 腾讯科技(深圳)有限公司 | Intelligent question answering method and device, computer equipment and storage medium |
CN111708873B (en) * | 2020-06-15 | 2023-11-24 | 腾讯科技(深圳)有限公司 | Intelligent question-answering method, intelligent question-answering device, computer equipment and storage medium |
CN111930898A (en) * | 2020-09-18 | 2020-11-13 | 北京合享智慧科技有限公司 | Text evaluation method and device, electronic equipment and storage medium |
CN112052835A (en) * | 2020-09-29 | 2020-12-08 | 北京百度网讯科技有限公司 | Information processing method, information processing apparatus, electronic device, and storage medium |
US11908219B2 (en) | 2020-09-29 | 2024-02-20 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and device for processing information, electronic device, and storage medium |
CN113743087A (en) * | 2021-09-07 | 2021-12-03 | 珍岛信息技术(上海)股份有限公司 | Text generation method and system based on neural network vocabulary extension paragraphs |
CN113743087B (en) * | 2021-09-07 | 2024-04-26 | 珍岛信息技术(上海)股份有限公司 | Text generation method and system based on neural network vocabulary extension paragraph |
CN115879441A (en) * | 2022-11-10 | 2023-03-31 | 中国科学技术信息研究所 | Text novelty detection method and device, electronic equipment and readable storage medium |
CN115879441B (en) * | 2022-11-10 | 2024-04-12 | 中国科学技术信息研究所 | Text novelty detection method and device, electronic equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109582933B (en) | 2021-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109582800A (en) | The method and relevant apparatus of a kind of training structure model, text structure | |
CN109597878A (en) | A kind of method and relevant apparatus of determining text similarity | |
CN109582933A (en) | A kind of method and relevant apparatus of determining text novelty degree | |
CN111061946B (en) | Method, device, electronic equipment and storage medium for recommending scenerized content | |
CN112100529B (en) | Search content ordering method and device, storage medium and electronic equipment | |
KR20180041200A (en) | Information processing method and apparatus | |
KR20170001550A (en) | Human-computer intelligence chatting method and device based on artificial intelligence | |
CN113505204B (en) | Recall model training method, search recall device and computer equipment | |
CN110019650B (en) | Method and device for providing search association word, storage medium and electronic equipment | |
CN109635277A (en) | A kind of method and relevant apparatus obtaining entity information | |
CN113254711B (en) | Interactive image display method and device, computer equipment and storage medium | |
CN110134885A (en) | A kind of point of interest recommended method, device, equipment and computer storage medium | |
CN109145083B (en) | Candidate answer selecting method based on deep learning | |
CN114691831A (en) | Task-type intelligent automobile fault question-answering system based on knowledge graph | |
CN103927339B (en) | Knowledge Reorganizing system and method for knowledge realignment | |
CN115129883B (en) | Entity linking method and device, storage medium and electronic equipment | |
CN116662495A (en) | Question-answering processing method, and method and device for training question-answering processing model | |
CN117271818B (en) | Visual question-answering method, system, electronic equipment and storage medium | |
CN117786068A (en) | Knowledge question-answering method, device, equipment and readable storage medium | |
CN109635139A (en) | A kind of method and relevant apparatus obtaining image information | |
CN117009599A (en) | Data retrieval method and device, processor and electronic equipment | |
CN116561339A (en) | Knowledge graph entity linking method, knowledge graph entity linking device, computer equipment and storage medium | |
CN115269961A (en) | Content search method and related device | |
CN114637855A (en) | Knowledge graph-based searching method and device, computer equipment and storage medium | |
CN114443916A (en) | Supply and demand matching method and system for test data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |